Central Limit Theorem
Distribution of sample means
Central Limit Theorem
What is the Central Limit Theorem?
Central Limit Theorem (CLT): For large enough n, sampling distribution of is approximately normal, regardless of population shape
This is remarkable! Population can be skewed, uniform, bimodal, anything → sampling distribution still approximately normal
Formal Statement
If:
- Take random samples of size n
- From ANY population with mean μ and standard deviation σ
- n is "sufficiently large"
Then: is approximately distributed as:
How Large is "Large Enough"?
Rule of thumb:
- Population roughly normal → n ≥ 10 okay
- Population moderately skewed → n ≥ 30
- Population heavily skewed or has outliers → n ≥ 40+
Key: More skewed population → need larger n
In practice: n = 30 often cited as general threshold
Why CLT Matters
Problem: Real populations rarely normal
Solution: CLT lets us use normal distribution for inference anyway (if n large enough)
Applications:
- Confidence intervals for means
- Hypothesis tests for means
- Control charts in quality control
Power: Works for ANY population distribution!
Example: Uniform Population
Population: Uniform on [0, 10]
- μ = 5
- σ = 10/√12 ≈ 2.89
- Shape: Rectangular (not normal!)
Sample n = 30:
Even though population uniform, approximately normal!
Visualizing CLT
Population: Right-skewed (e.g., salaries)
Sampling distributions for different n:
- n = 2: Still skewed
- n = 5: Less skewed
- n = 10: Nearly symmetric
- n = 30: Very close to normal
Pattern: As n increases, sampling distribution becomes more normal
CLT for Proportions
Also applies to sample proportions!
If: np ≥ 10 and n(1-p) ≥ 10
Then: approximately normal:
This is why binomial → normal for large n!
Using CLT in Practice
Example: Battery life μ = 50 hours, σ = 8 hours (unknown distribution). Sample n = 40.
P( > 52) = ?
By CLT (n = 40 ≥ 30):
P(Z > 1.58) ≈ 0.057
CLT vs Normal Population
If population already normal:
- Sampling distribution of is exactly normal for any n
- Don't need CLT
If population not normal:
- Need CLT to justify normal approximation
- Require "large enough" n
Standard Error from CLT
Population σ usually unknown
Use sample standard deviation s:
For large n, s ≈ σ, so:
This is basis for t-procedures!
Implications of CLT
1. Large samples good: Overcome non-normality
2. Can make inferences: Use normal probabilities for
3. Justifies methods: Confidence intervals, hypothesis tests work even if population not normal
4. Sample size planning: Can determine n needed for desired precision
CLT Limitations
Doesn't apply if:
- Sample not random
- Population infinite variance (very rare)
- n too small relative to skewness
Doesn't fix:
- Bias in sampling
- Non-random samples
- Measurement errors
Remember: CLT is about shape of sampling distribution, not about making bad samples good!
Historical Context
Discovered: 18th century
Pierre-Simon Laplace: Proved for binomial (1812)
Lindeberg-Lévy: General version (1920s)
One of most important theorems in statistics!
Sum vs Mean
CLT applies to both:
Sum:
Mean:
Relationship: , so properties related
Checking Conditions
Before using CLT:
- Random sample? (Independence)
- 10% condition? (If sampling without replacement)
- Large enough n? (Depends on population shape)
If all met → Proceed with normal approximation
Common Mistakes
❌ Using CLT with small n from skewed population
❌ Thinking CLT makes population normal (it doesn't!)
❌ Forgetting to divide by √n
❌ Applying CLT when sampling not random
Quick Reference
CLT: For large n, regardless of population shape
Conditions:
- Random sample
- n ≥ 30 (general rule)
- 10% condition if without replacement
Power: Works for ANY population distribution!
Remember: CLT is the foundation for much of statistical inference. It's why we can use normal-based methods even when populations aren't normal!
📚 Practice Problems
No example problems available yet.
Practice with Flashcards
Review key concepts with our flashcard system
Browse All Topics
Explore other calculus topics