Central Limit Theorem

What is the Central Limit Theorem?

Central Limit Theorem (CLT): For large enough n, sampling distribution of $\bar{x}$ is approximately normal, regardless of population shape

This is remarkable! Population can be skewed, uniform, bimodal, anything → sampling distribution still approximately normal

Formal Statement

If:

Take random samples of size n
From ANY population with mean μ and standard deviation σ
n is "sufficiently large"

Then: $\bar{x}$ is approximately distributed as:

$\bar{x} \sim N\left(\mu, \frac{\sigma}{\sqrt{n}}\right)$

How Large is "Large Enough"?

Rule of thumb:

Population roughly normal → n ≥ 10 okay
Population moderately skewed → n ≥ 30
Population heavily skewed or has outliers → n ≥ 40+

Key: More skewed population → need larger n

In practice: n = 30 often cited as general threshold

Why CLT Matters

Problem: Real populations rarely normal

Solution: CLT lets us use normal distribution for inference anyway (if n large enough)

Applications:

Confidence intervals for means
Hypothesis tests for means
Control charts in quality control

Power: Works for ANY population distribution!

Example: Uniform Population

Population: Uniform on [0, 10]

μ = 5
σ = 10/√12 ≈ 2.89
Shape: Rectangular (not normal!)

Sample n = 30:

$\bar{x} \sim N\left(5, \frac{2.89}{\sqrt{30}}\right) \approx N(5, 0.528)$

Even though population uniform, $\bar{x}$ approximately normal!

Visualizing CLT

Population: Right-skewed (e.g., salaries)

Sampling distributions for different n:

n = 2: Still skewed
n = 5: Less skewed
n = 10: Nearly symmetric
n = 30: Very close to normal

Pattern: As n increases, sampling distribution becomes more normal

CLT for Proportions

Also applies to sample proportions!

If: np ≥ 10 and n(1-p) ≥ 10

Then: $\hat{p}$ approximately normal:

$\hat{p} \sim N\left(p, \sqrt{\frac{p(1-p)}{n}}\right)$

This is why binomial → normal for large n!

Using CLT in Practice

Example: Battery life μ = 50 hours, σ = 8 hours (unknown distribution). Sample n = 40.

P( $\bar{x}$ > 52) = ?

By CLT (n = 40 ≥ 30):

$\bar{x} \sim N(50, 8/\sqrt{40}) = N(50, 1.265)$

$z = \frac{52-50}{1.265} \approx 1.58$

P(Z > 1.58) ≈ 0.057

CLT vs Normal Population

If population already normal:

Sampling distribution of $\bar{x}$ is exactly normal for any n
Don't need CLT

If population not normal:

Need CLT to justify normal approximation
Require "large enough" n

Standard Error from CLT

Population σ usually unknown

Use sample standard deviation s:

$SE = \frac{s}{\sqrt{n}}$

For large n, s ≈ σ, so:

$\bar{x} \sim N\left(\mu, \frac{s}{\sqrt{n}}\right)$

This is basis for t-procedures!

Implications of CLT

1. Large samples good: Overcome non-normality

2. Can make inferences: Use normal probabilities for $\bar{x}$

3. Justifies methods: Confidence intervals, hypothesis tests work even if population not normal

4. Sample size planning: Can determine n needed for desired precision

CLT Limitations

Doesn't apply if:

Sample not random
Population infinite variance (very rare)
n too small relative to skewness

Doesn't fix:

Bias in sampling
Non-random samples
Measurement errors

Remember: CLT is about shape of sampling distribution, not about making bad samples good!

Historical Context

Discovered: 18th century
Pierre-Simon Laplace: Proved for binomial (1812)
Lindeberg-Lévy: General version (1920s)

One of most important theorems in statistics!

Sum vs Mean

CLT applies to both:

Sum: $S = \sum X_i$
$S \sim N(n\mu, \sigma\sqrt{n})$

Mean: $\bar{X} = S/n$
$\bar{X} \sim N(\mu, \sigma/\sqrt{n})$

Relationship: $\bar{X} = S/n$ , so properties related

Checking Conditions

Before using CLT:

Random sample? (Independence)
10% condition? (If sampling without replacement)
Large enough n? (Depends on population shape)

If all met → Proceed with normal approximation

Common Mistakes

❌ Using CLT with small n from skewed population
❌ Thinking CLT makes population normal (it doesn't!)
❌ Forgetting to divide by √n
❌ Applying CLT when sampling not random

Quick Reference

CLT: For large n, $\bar{x} \sim N(\mu, \sigma/\sqrt{n})$ regardless of population shape

Conditions:

Random sample
n ≥ 30 (general rule)
10% condition if without replacement

Power: Works for ANY population distribution!

Remember: CLT is the foundation for much of statistical inference. It's why we can use normal-based methods even when populations aren't normal!

Central Limit Theorem

Central Limit Theorem

What is the Central Limit Theorem?

Formal Statement

How Large is "Large Enough"?

Why CLT Matters

Example: Uniform Population

Visualizing CLT

CLT for Proportions

Using CLT in Practice

CLT vs Normal Population

Standard Error from CLT

Implications of CLT

CLT Limitations

Historical Context

Sum vs Mean

Checking Conditions

Common Mistakes

Quick Reference

📚 Practice Problems

Practice with Flashcards

Browse All Topics