Central Limit Theorem

Distribution of sample means

Central Limit Theorem

What is the Central Limit Theorem?

Central Limit Theorem (CLT): For large enough n, sampling distribution of xˉ\bar{x} is approximately normal, regardless of population shape

This is remarkable! Population can be skewed, uniform, bimodal, anything → sampling distribution still approximately normal

Formal Statement

If:

  • Take random samples of size n
  • From ANY population with mean μ and standard deviation σ
  • n is "sufficiently large"

Then: xˉ\bar{x} is approximately distributed as:

xˉN(μ,σn)\bar{x} \sim N\left(\mu, \frac{\sigma}{\sqrt{n}}\right)

How Large is "Large Enough"?

Rule of thumb:

  • Population roughly normal → n ≥ 10 okay
  • Population moderately skewed → n ≥ 30
  • Population heavily skewed or has outliers → n ≥ 40+

Key: More skewed population → need larger n

In practice: n = 30 often cited as general threshold

Why CLT Matters

Problem: Real populations rarely normal

Solution: CLT lets us use normal distribution for inference anyway (if n large enough)

Applications:

  • Confidence intervals for means
  • Hypothesis tests for means
  • Control charts in quality control

Power: Works for ANY population distribution!

Example: Uniform Population

Population: Uniform on [0, 10]

  • μ = 5
  • σ = 10/√12 ≈ 2.89
  • Shape: Rectangular (not normal!)

Sample n = 30:

xˉN(5,2.8930)N(5,0.528)\bar{x} \sim N\left(5, \frac{2.89}{\sqrt{30}}\right) \approx N(5, 0.528)

Even though population uniform, xˉ\bar{x} approximately normal!

Visualizing CLT

Population: Right-skewed (e.g., salaries)

Sampling distributions for different n:

  • n = 2: Still skewed
  • n = 5: Less skewed
  • n = 10: Nearly symmetric
  • n = 30: Very close to normal

Pattern: As n increases, sampling distribution becomes more normal

CLT for Proportions

Also applies to sample proportions!

If: np ≥ 10 and n(1-p) ≥ 10

Then: p^\hat{p} approximately normal:

p^N(p,p(1p)n)\hat{p} \sim N\left(p, \sqrt{\frac{p(1-p)}{n}}\right)

This is why binomial → normal for large n!

Using CLT in Practice

Example: Battery life μ = 50 hours, σ = 8 hours (unknown distribution). Sample n = 40.

P(xˉ\bar{x} > 52) = ?

By CLT (n = 40 ≥ 30):

xˉN(50,8/40)=N(50,1.265)\bar{x} \sim N(50, 8/\sqrt{40}) = N(50, 1.265)

z=52501.2651.58z = \frac{52-50}{1.265} \approx 1.58

P(Z > 1.58) ≈ 0.057

CLT vs Normal Population

If population already normal:

  • Sampling distribution of xˉ\bar{x} is exactly normal for any n
  • Don't need CLT

If population not normal:

  • Need CLT to justify normal approximation
  • Require "large enough" n

Standard Error from CLT

Population σ usually unknown

Use sample standard deviation s:

SE=snSE = \frac{s}{\sqrt{n}}

For large n, s ≈ σ, so:

xˉN(μ,sn)\bar{x} \sim N\left(\mu, \frac{s}{\sqrt{n}}\right)

This is basis for t-procedures!

Implications of CLT

1. Large samples good: Overcome non-normality

2. Can make inferences: Use normal probabilities for xˉ\bar{x}

3. Justifies methods: Confidence intervals, hypothesis tests work even if population not normal

4. Sample size planning: Can determine n needed for desired precision

CLT Limitations

Doesn't apply if:

  • Sample not random
  • Population infinite variance (very rare)
  • n too small relative to skewness

Doesn't fix:

  • Bias in sampling
  • Non-random samples
  • Measurement errors

Remember: CLT is about shape of sampling distribution, not about making bad samples good!

Historical Context

Discovered: 18th century
Pierre-Simon Laplace: Proved for binomial (1812)
Lindeberg-Lévy: General version (1920s)

One of most important theorems in statistics!

Sum vs Mean

CLT applies to both:

Sum: S=XiS = \sum X_i
SN(nμ,σn)S \sim N(n\mu, \sigma\sqrt{n})

Mean: Xˉ=S/n\bar{X} = S/n
XˉN(μ,σ/n)\bar{X} \sim N(\mu, \sigma/\sqrt{n})

Relationship: Xˉ=S/n\bar{X} = S/n, so properties related

Checking Conditions

Before using CLT:

  1. Random sample? (Independence)
  2. 10% condition? (If sampling without replacement)
  3. Large enough n? (Depends on population shape)

If all met → Proceed with normal approximation

Common Mistakes

❌ Using CLT with small n from skewed population
❌ Thinking CLT makes population normal (it doesn't!)
❌ Forgetting to divide by √n
❌ Applying CLT when sampling not random

Quick Reference

CLT: For large n, xˉN(μ,σ/n)\bar{x} \sim N(\mu, \sigma/\sqrt{n}) regardless of population shape

Conditions:

  • Random sample
  • n ≥ 30 (general rule)
  • 10% condition if without replacement

Power: Works for ANY population distribution!

Remember: CLT is the foundation for much of statistical inference. It's why we can use normal-based methods even when populations aren't normal!

📚 Practice Problems

No example problems available yet.