# On Central Limit Theorem

28 January, 2022 - 9 min read

A large properly drawn sample will resemble the population from which it is drawn.

## Definition

The central limit theorem (CLT) implies that, in a majority of situations, when you add independent random variables to a formula, the sum will tend toward a normal distribution.

In other words, given a large enough sample size from a population, the mean of the sample will be representative of the mean of the population. The theorem also states that the distribution of the sample and whole population will be similar. As the number of variables grows, the variances will also tend toward a normal distribution.

A reasonable sample size to apply the central limit theorem to is considered to be between 30 and 40 units.

## Deep analysis

The central limit theorem (CLT) is considered to be one of the most powerful theorems in all of statistics and probability. The question is why does this mathematical pattern show up so much. The answer is that normal distributions come out of a process of averaging. The 3 major characteristics of central limit theorem areT

1. Generalize conclusions on the entire population based on the results from the sample population.
2. Sample size of 30 or more is enough for this theorem to take effect.
3. The distribution of sample mean will approach a normal distribution even though the underlying population does not have a normal distribution.

If the central limit theorem applies, calculating the mean value for each sample then building a distribution from those mean values should lead to a normal result.

Using this theorem, you should be able to generalize conclusions about an entire population based on results found through analyzing a sample of the population. It is a powerful theorem because it allows you to make reasonable assumptions about a population regardless of what the initial distribution looks like.

There are endless applications to this theorem, including hypothesis testing, confidence intervals, and estimation. Normal distributions show up everywhere. Human height is (approximately) normal. Intelligence, as measured by IQ, is normal. Measurement errors tend to be normally distributed.

If you were to represent the central limit theorem using an example, you should consider rolling a die. The more times you roll the die, the more likely the distribution will tend toward a normal distribution. Generally, once you’ve rolled the die at least 30-40 times, you should see a relatively normal distribution of variables.

Normal distributions are also known as bell curves or Gaussian functions.

The central limit theorem shows that the average of your sample means will be the population mean as well. Considering this, you can see how the central limit theorem can be used to predict the characteristics of a population rather accurately.

The central limit theorem is limited by the fact that you must have a sample size ranging from 30-40 units before the theorem can be applied.

### History

The central limit theorem has seen many iterations over the course of time, with the first version of the theorem dating back to 1810. The modern form of this theorem wasn’t precisely stated until around 1920. Once the central limit theorem was established, a bridge was erected between classical and modern probability theories.

The Dutch mathematician Henk Tijms on central limit theorem:

The central limit theorem has an interesting history. The first version of this theorem was postulated by the French-born mathematician Abraham de Moivre who, in a remarkable article published in 1733, used the normal distribution to approximate the distribution of the number of heads resulting from many tosses of a fair coin. This finding was far ahead of its time, and was nearly forgotten until the famous French mathematician Pierre-Simon Laplace rescued it from obscurity in his monumental work Théorie analytique des probabilités, which was published in 1812. Laplace expanded De Moivre's finding by approximating the binomial distribution with the normal distribution. But as with De Moivre, Laplace's finding received little attention in his own time. It was not until the nineteenth century was at an end that the importance of the central limit theorem was discerned, when, in 1901, Russian mathematician Aleksandr Lyapunov defined it in general terms and proved precisely how it worked mathematically. Nowadays, the central limit theorem is considered to be the unofficial sovereign of probability theory.

### Charles Wheelan

Charles Wheelan tries to teach basic concepts of central limit theorem in his book Naked Statistics which is a book on stats 101:

Statistics don't lie, but the data behind them can because they can be faulty, misleading, or downright false.

At times, statistics seems like magic. We are able to draw sweeping and powerful conclusions from relatively little data. Somehow we can gain meaningful insight into a presidential election by calling a mere one thousand American voters. We can test a hundred chicken breasts for salmonella at a poultry processing plant and conclude from that sample alone that the entire plant is safe or unsafe. Where does this extraordinary power to generalize come from? Much of it comes from the central limit theorem.

One of the most common thresholds that researchers use for rejecting a null hypothesis is 5 percent, which is often written in decimal form: .05. This probability is known as significance level, and it represents the upper bound for the likelihood of observing some pattern of data if the null hypothesis were true.

Obviously rejecting the null hypothesis at the .01 level (meaning that there is less than a 1 in 100 chance of observing a result in this range if the null hypothesis were true) carries more statistical heft than rejecting the null hypothesis at the .1 level (meaning that there is less than a 1 in 10 chance of observing a result in this range if the null hypothesis were true).

When you go to the doctor to get tested for some disease, the null hypothesis is that you do not have that disease. If the lab results can be used to reject the null hypothesis, then you are said to test positive. And if you test positive but are not really sick, then it’s a false positive.

Doctors and patients are willing to tolerate a fair number of Type 1 errors (false positives) in order to avoid the possibility of a Type 2 error (missing a cancer diagnosis).

Some classrooms had answer sheets on which the number of wrong-to-right erasures were twenty to fifty standard deviations above the state norm. (To put this in perspective, remember that most observations in a distribution typically fall within two standard deviations of the mean.) So how likely was it that Atlanta students happened to erase massive numbers of wrong answers and replace them with correct answers just a matter of chance? The official who analyzed the data described the probability of the Atlanta pattern occurring without cheating as roughly equal to the chance of having 70,000 people show up for a football game at the Georgia Dome who all happen to be over seven feet tall. Could it happen? Yes. Is it likely? Not so much.

### Education

Students can apply the central limit theorem to make observations about social circumstances, group activities, and their own academic success. Using the central limit theorem, you can determine what outcomes are attainable for you compared to your peers.

For example, if a majority of your classmates are failing Algebra, and you aren’t, there is a chance that a curve will be applied to make the grading system more balanced. The teacher may also add in additional variables, such as extra credit assignments and pop quizzes, to offer the other students more opportunities to pass the class.

Through the central limit theorem, you can assume that those additional assignments add new variables to the grading formula and that those variables will normalize across the distribution and allow some students to improve their grades. Though, if you perform poorly on those assignments, your high grade could be lowered toward the mean distribution.

Businesses can use the central limit theorem to make observations about the market, their business itself, and more. With this theorem, business leaders can determine what their target audience likes, what they don’t like, and how to reach them effectively. That’s only one way the central limit theorem can be applied to business. Same applies to understanding the normal distribution of investment results of various stocks.

### Limitation

The normal distribution is a pretty user-friendly mental model when we are trying to interpret the statistical metrics like mean and standard deviation. However, it may be misleading model.

One limitation on normal distribution is that it is always assumed that the underlying population also has a normal distribution which might not be true due to infinite variables. Second average  is heavily skewed by outliers.

R.C. Geary in his paper “Testing for normality” in 1947:

Normality is a myth; there never was, and never will be, a normal distribution.

The Central Limit Theorem (CLT for short) basically says that for non-normal data, the distribution of the sample means has an approximate normal distribution, no matter what the distribution of the original data looks like, as long as the sample size is large enough (usually at least 30) and all samples have the same size.

Nassim Nicholas Taleb in his book The Black Swan argues, we tend to see events that are in the extreme tail ends more often than we would by normal distributions, and so this gives us too much confidence that rare “black swan” events are statistically impossible.

A theory is like medicine or government: often useless, sometimes necessary, always self-serving and, on occasion, lethal. It needs to be used with care, moderation and close adult supervision.

### Diving deeper

To dive deeper into this theory check out the following paper and tool: