My audience skews left; that is, the people who read my blog are more liberal, on average, than the general population. For example, if I surveyed my readers and asked where they place themselves on a scale from liberal to conservative, the results might look like this:
To be clear, I have not done a survey and this is fake data, but if it were real, we would conclude that my audience is more liberal, on average, than the general population. So in the normal use of the word skew, we might say that this distribution “skews to the left”.
But according to statisticians, that would be wrong, because within the field of statistics, skew has been given a technical meaning that is contrary to its normal use. Here’s how Wikipedia explains the technical definition:
positive skew: The right tail is longer; the mass of the distribution is concentrated on the left of the figure. The distribution is said to be right-skewed, right-tailed, or skewed to the right, despite the fact that the curve itself appears to be skewed or leaning to the left; right instead refers to the right tail being drawn out and, often, the mean being skewed to the right of a typical center of the data. A right-skewed distribution usually appears as a left-leaning curve.https://en.wikipedia.org/wiki/Skewness
By this definition, we would say that the distribution of political alignment in my audience is “skewed to the right”. It is regrettable that the term was defined this way, because it’s very confusing.
Recently I ran a Twitter poll to see what people think skew means. Here are the results:
Interpreting these results is almost paradoxical: the first two responses are almost equally common, which proves that the third response is correct. If the statistically-literate people who follow me on Twitter don’t agree about what skew means, we have to treat it as ambiguous unless specified.
The comments suggest I’m not the only one who thinks the technical definition is contrary to intuition.
- This has always been confusing for me, since the shape of a right-skewed distribution looks like it’s “leaning” to the left…
- I learnt it as B, but there’s always this moment when I consciously have to avoid thinking it’s A.
- This is one of those things where once I learned B was right, I hated it so much that I never forgot it.
It gets worse
If you think the definition of skew is bad, let’s talk about bias. In the context of statistics, bias is “a systematic tendency which causes differences between results and fact”. In particular, sampling bias is bias caused by a non-representative sampling process.
In my imaginary survey, the mean of the sample is less than the actual mean in the population, so we could say that my sample is biased to the left. Which means that the distribution is technically biased to the left and skewed to the right. Which is particularly confusing because in natural use, bias and skew mean the same thing.
So 20th century statisticians took two English words that are (nearly) synonyms, and gave them technical definitions that can be polar opposites. The result is 100 years of confusion.
For early statisticians, it seems like creating confusing vocabulary was a hobby. In addition to bias and skew, here’s a partial list of English words that are either synonyms or closely related, which have been given technical meanings that are opposites or subtly different.
- accuracy and precision
- probability and likelihood
- efficacy and effectiveness
- sensitivity and specificity
- confidence and credibility
And don’t get me started on “significance”.
If you got this far, it seems like you are part of my audience, so if you want to answer a one-question survey about your political alignment, follow this link. Thank you!
My poll and this article were prompted by this excellent video about the Central Limit Theorem:
Around the 7:52 mark, a distribution that leans left is described as “skewed towards the left”. In statistics jargon, that’s technically incorrect, but in this context I think it’s is likely to be understood as intended.