On Correlation & Causation
06 January, 2022 - 5 min read
Two things happening at the same time is correlation while one causing the other to change is causation.
The temperature of the earth has been warming over the years. The number of pirates has declined in that time. There is a correlation between the temperature rise and pirates declining, but the warming is not the cause of the decline.
Correlation does not imply causation.
Correlation is when two or more variables change together, but they are not the cause of the other change or changes. Causation is when two or more variables change and one of those variables is responsible for the other change of changes.
Don't draw conclusions if something is correlated. Take time to find and understand the hidden factors.
Correlation and causation are mostly misunderstood and often used interchangeably. They are both statistical terms which are important to understand to draw correct conclusions. Failure to do so influences illogical inferences.
Correlation shows how strongly the pair of variables are linearly related and change together. It does not tell us why the relationship exists, but it just says the relationship exists.
News media are good at this. Blindly consuming information can have negative consequences. It is upon us to uncover the underlying variables, finding more information and observing whether the variables are truly correlated or not.
Correlation coefficient is used to measure how strongly or poorly variables are correlated. The correlation coefficient varies between -1 and 1.
Causation takes correlation far by indicating a variable causing other to change or vice-versa. This is called cause and effect. There are cases in which classifying a cause is difficult but a good study of causal relationships takes into account of randomized controlled factors. This minimizes bias. The precision of outcomes are indicated by providing their confidence intervals.
Correlation does not cause causation
Correlation does not imply causation is a common phrase in the field of statistics. This phrase refers to the inability to reach a cause-and-effect relationship between two variables on the basis of an observed correlation between the variables. In technical terms of logic, implies means there is a sufficient condition for. Statisticians often refer to this as causation is not certain.
Where there is causation, there is correlation. Correlation is often used in concluding causation because it is a necessary condition, it is not a sufficient condition.
We do not have knowledge of a thing until we have grasped its why, that is to say, its cause. — Aristotle
Causal analysis is the field of experimental design and statistics pertaining to establishing cause and effect. For any two correlated events, A and B, their possible relationships include:
- A causes B (direct causation)
- B causes A (reverse causation)
- A and B are both caused by C (the common-causal variable)
- A causes B and B causes A (bidirectional or cyclic causation)
- There is no connection between A and B; the correlation is a coincidence.
Thus there can be no conclusion made regarding the existence or the direction of a cause-and-effect relationship only from the fact that A and B are correlated. Determining whether there is an actual cause-and-effect relationship requires further investigation, even when the relationship between Aand B is statistically significant, a large effect size is observed or a large variance is explained.
Examples of illogical conclusions
I. Reverse Causation — B causes A where cause and effect are reversed
- Example 1: The faster that windmills are observed to rotate, the more wind is observed. Wind velocity does not imply that wind is caused by windmills. It is the other way around — wind doesn't need windmills, while windmills need to wind to rotate. Wind can be observed where there are no windmills.
- Example 2: Children that watch a lot of TV are the most violent. Clearly, TV makes children more violent. This could easily be the other way round — violent children like watching more TV than less violent ones.
II. Third Causation — C causes both A and B where it asserts that A causes B when, in reality, A and B are both caused by C
- Example 1: Sleeping with shoes on is strongly correlated with waking up with a headache. This prematurely concludes that sleeping with shoes on causes headache. A more logical explanation could be that both are caused by a third factor which is going to bed drunk which gives rise to a correlation.
- Example 2: As ice cream sales increase, the rate of drowning deaths increases sharply. Therefore, ice cream consumption causes drowning. This example fails to consider that people tend to engage in water activities such as swimming during hot weather than winter weather. Ice cream happens to be consumed more during the summer months. The increased drowning deaths are not due to ice cream.
One of the first things taught in introductory statistics textbooks is that correlation is not causation. It is also one of the first things forgotten. — Thomas Sowell