Central Limit Theorem
Distribution of sample means
Central Limit Theorem
What is the Central Limit Theorem?
Central Limit Theorem (CLT): For large enough n, sampling distribution of is approximately normal, regardless of population shape
This is remarkable! Population can be skewed, uniform, bimodal, anything → sampling distribution still approximately normal
Formal Statement
If:
- Take random samples of size n
- From ANY population with mean μ and standard deviation σ
- n is "sufficiently large"
Then: is approximately distributed as:
How Large is "Large Enough"?
Rule of thumb:
- Population roughly normal → n ≥ 10 okay
- Population moderately skewed → n ≥ 30
- Population heavily skewed or has outliers → n ≥ 40+
Key: More skewed population → need larger n
In practice: n = 30 often cited as general threshold
Why CLT Matters
Problem: Real populations rarely normal
Solution: CLT lets us use normal distribution for inference anyway (if n large enough)
Applications:
- Confidence intervals for means
- Hypothesis tests for means
- Control charts in quality control
Power: Works for ANY population distribution!
Example: Uniform Population
Population: Uniform on [0, 10]
- μ = 5
- σ = 10/√12 ≈ 2.89
- Shape: Rectangular (not normal!)
Sample n = 30:
Even though population uniform, approximately normal!
Visualizing CLT
Population: Right-skewed (e.g., salaries)
Sampling distributions for different n:
- n = 2: Still skewed
- n = 5: Less skewed
- n = 10: Nearly symmetric
- n = 30: Very close to normal
Pattern: As n increases, sampling distribution becomes more normal
CLT for Proportions
Also applies to sample proportions!
If: np ≥ 10 and n(1-p) ≥ 10
Then: approximately normal:
This is why binomial → normal for large n!
Using CLT in Practice
Example: Battery life μ = 50 hours, σ = 8 hours (unknown distribution). Sample n = 40.
P( > 52) = ?
By CLT (n = 40 ≥ 30):
P(Z > 1.58) ≈ 0.057
CLT vs Normal Population
If population already normal:
- Sampling distribution of is exactly normal for any n
- Don't need CLT
If population not normal:
- Need CLT to justify normal approximation
- Require "large enough" n
Standard Error from CLT
Population σ usually unknown
Use sample standard deviation s:
For large n, s ≈ σ, so:
This is basis for t-procedures!
Implications of CLT
1. Large samples good: Overcome non-normality
2. Can make inferences: Use normal probabilities for
3. Justifies methods: Confidence intervals, hypothesis tests work even if population not normal
4. Sample size planning: Can determine n needed for desired precision
CLT Limitations
Doesn't apply if:
- Sample not random
- Population infinite variance (very rare)
- n too small relative to skewness
Doesn't fix:
- Bias in sampling
- Non-random samples
- Measurement errors
Remember: CLT is about shape of sampling distribution, not about making bad samples good!
Historical Context
Discovered: 18th century
Pierre-Simon Laplace: Proved for binomial (1812)
Lindeberg-Lévy: General version (1920s)
One of most important theorems in statistics!
Sum vs Mean
CLT applies to both:
Sum:
Mean:
Relationship: , so properties related
Checking Conditions
Before using CLT:
- Random sample? (Independence)
- 10% condition? (If sampling without replacement)
- Large enough n? (Depends on population shape)
If all met → Proceed with normal approximation
Common Mistakes
❌ Using CLT with small n from skewed population
❌ Thinking CLT makes population normal (it doesn't!)
❌ Forgetting to divide by √n
❌ Applying CLT when sampling not random
Quick Reference
CLT: For large n, regardless of population shape
Conditions:
- Random sample
- n ≥ 30 (general rule)
- 10% condition if without replacement
Power: Works for ANY population distribution!
Remember: CLT is the foundation for much of statistical inference. It's why we can use normal-based methods even when populations aren't normal!
📚 Practice Problems
1Problem 1easy
❓ Question:
State the Central Limit Theorem and explain what conditions must be met for it to apply.
💡 Show Solution
Step 1: State the Central Limit Theorem For a random sample of size n from a population with mean μ and standard deviation σ:
As n increases, the sampling distribution of x̄ (sample mean) approaches a normal distribution with:
- Mean: μₓ̄ = μ
- Standard deviation: σₓ̄ = σ/√n
This happens REGARDLESS of the shape of the population distribution!
Step 2: Conditions that must be met
-
RANDOMNESS:
- Sample must be randomly selected
- Each member has equal chance of selection
- Ensures sample is representative
-
INDEPENDENCE:
- Observations must be independent
- 10% condition: n ≤ 0.10N (sample ≤ 10% of population)
- If sampling without replacement from small population, need correction
-
SAMPLE SIZE:
- Larger n → more normal
- Rule of thumb: n ≥ 30 usually sufficient
- If population is normal: works for ANY n
- If population is skewed: need larger n (maybe 40+)
- If population has extreme outliers: need even larger n
Step 3: Why randomness matters Non-random samples:
- Convenience sample → biased
- Voluntary response → biased
- Cannot apply CLT to biased samples
Random selection ensures:
- Sample represents population
- Results can be generalized
Step 4: Why independence matters Independence violated when:
- Sampling without replacement from small population (>10%)
- Clustered sampling (family members, same class)
- Time series data (measurements over time)
Effects of dependence:
- Standard formulas don't apply
- Need special methods
Step 5: Why sample size matters Small n:
- x̄ distribution looks like population distribution
- If population is skewed, x̄ is skewed
- If population is bimodal, x̄ is bimodal
Large n:
- CLT "kicks in"
- x̄ distribution becomes normal
- Averaging smooths out population shape
Step 6: Examples of CLT conditions check VALID: n = 50 from random digit table ✓ Random ✓ Independent (infinite population) ✓ n = 50 ≥ 30
INVALID: n = 100 from class of 200 students without replacement ✓ Could be random ✗ Not independent (100 > 0.10 × 200 = 20) ✓ n = 100 ≥ 30 Conclusion: 10% condition violated
VALID: n = 20 from normal population ✓ Random (assumed) ✓ Independent (assumed) ✓ Population is normal (works for any n)
Answer: CENTRAL LIMIT THEOREM: The sampling distribution of x̄ approaches Normal(μ, σ/√n) as n increases, regardless of population shape.
CONDITIONS:
- Random sample from population
- Independent observations (10% condition: n ≤ 0.10N)
- Large enough sample (n ≥ 30, or any n if population is normal)
All three conditions must be met to apply CLT and use normal probability calculations.
2Problem 2easy
❓ Question:
A population has a right-skewed distribution with μ = 25 and σ = 8. For samples of size n = 64, describe the sampling distribution of x̄ and calculate P(24 < x̄ < 26).
💡 Show Solution
Step 1: Check CLT conditions Population: Right-skewed (not normal) Sample size: n = 64
Is n ≥ 30? Yes, 64 ≥ 30 ✓ By CLT: x̄ is approximately normally distributed
Step 2: Find parameters of sampling distribution Mean: μₓ̄ = μ = 25
Standard deviation (standard error): σₓ̄ = σ/√n = 8/√64 = 8/8 = 1
Step 3: Describe the sampling distribution x̄ ~ Normal(μ = 25, σ = 1) approximately
Key points:
- Shape: approximately normal (due to CLT)
- Center: μₓ̄ = 25 (same as population)
- Spread: σₓ̄ = 1 (much less than population σ = 8)
Even though population is right-skewed, x̄ is approximately normal!
Step 4: Calculate P(24 < x̄ < 26) Standardize to z-scores:
z₁ = (24 - 25)/1 = -1/1 = -1 z₂ = (26 - 25)/1 = 1/1 = 1
P(24 < x̄ < 26) = P(-1 < Z < 1)
Step 5: Use empirical rule or table From empirical rule: About 68% of normal distribution is within 1 SD of mean
P(-1 < Z < 1) ≈ 0.68
More precisely from table: P(Z < 1) = 0.8413 P(Z < -1) = 0.1587 P(-1 < Z < 1) = 0.8413 - 0.1587 = 0.6826
Step 6: Interpret the result About 68.3% of all samples of size 64 will have sample means between 24 and 26.
This range is μ ± 1σₓ̄ = 25 ± 1 Very common for x̄ to fall in this range!
Step 7: Compare to individual values For individual value X from population:
- Can't use normal (population is skewed)
- Can't easily find P(24 < X < 26)
- Would need actual population distribution
For sample mean x̄:
- CAN use normal (CLT applies)
- Easy to calculate probabilities
- CLT is powerful!
Step 8: Effect of sample size If we used n = 16 instead: σₓ̄ = 8/√16 = 2 P(24 < x̄ < 26) = P(-0.5 < Z < 0.5) ≈ 0.38 Less likely to be close to μ with smaller sample
If we used n = 256 instead: σₓ̄ = 8/√256 = 0.5 P(24 < x̄ < 26) = P(-2 < Z < 2) ≈ 0.95 More likely to be close to μ with larger sample
Answer: Sampling distribution: x̄ ~ Normal(μ = 25, σ = 1) approximately
Despite the right-skewed population, the large sample size (n = 64) allows CLT to apply, making x̄ approximately normal.
P(24 < x̄ < 26) ≈ 0.683 or 68.3%
About 68% of samples will have means within 1 unit of the population mean.
3Problem 3medium
❓ Question:
The weights of carry-on luggage at an airport are heavily right-skewed with μ = 18 lbs and σ = 6 lbs. A flight has 100 passengers. What is the probability that the average luggage weight for these 100 passengers exceeds 19 lbs?
💡 Show Solution
Step 1: Set up the problem Population (luggage weights):
- Heavily right-skewed
- μ = 18 lbs
- σ = 6 lbs
Sample:
- n = 100 passengers
- Find: P(x̄ > 19)
Step 2: Check CLT conditions Random: Assume passengers are representative sample ✓ Independent: 100 passengers << all passengers (10% rule) ✓ Sample size: n = 100 ≥ 30, even with heavy skew ✓
CLT applies!
Step 3: Find sampling distribution parameters μₓ̄ = μ = 18 lbs
σₓ̄ = σ/√n = 6/√100 = 6/10 = 0.6 lbs
x̄ ~ Normal(18, 0.6) approximately
Step 4: Calculate P(x̄ > 19) Standardize: z = (19 - 18)/0.6 = 1/0.6 = 5/3 ≈ 1.67
P(x̄ > 19) = P(Z > 1.67)
Step 5: Look up probability From standard normal table: P(Z < 1.67) ≈ 0.9525
Therefore: P(Z > 1.67) = 1 - 0.9525 = 0.0475
Step 6: Interpret Only about 4.75% chance (less than 5%) that average luggage weight exceeds 19 lbs.
Even though population is heavily skewed:
- Individual bags vary a lot (σ = 6)
- Average of 100 bags is much more stable (σₓ̄ = 0.6)
- Large sample "averages out" the skewness
Step 7: Why this matters for airlines Airline might set weight limit based on average:
- If average > 19 lbs, might have safety concerns
- Probability < 5% means this is rare
- Can plan accordingly
Individual approach would be harder:
- Individual weights range widely
- Many over 19 lbs (population is skewed right)
- But average is more predictable!
Step 8: Compare to individual luggage For one random bag: P(X > 19) = ?
Can't easily calculate - population is skewed, not normal. But probably much higher than 4.75%! Maybe 30-40% of bags exceed 19 lbs.
But average of 100 bags rarely exceeds 19 lbs.
Step 9: Check reasonableness 19 lbs is 1 lb above mean In terms of SE: 19 = 18 + 1(0.6) = 18 + 1.67σₓ̄ About 1.67 SE above mean Should be fairly unlikely ✓
Answer: P(x̄ > 19) ≈ 0.048 or 4.8%
There's only about a 4.8% chance that the average luggage weight for 100 passengers exceeds 19 lbs. The Central Limit Theorem allows us to treat the sample mean as approximately normal despite the heavily skewed population, and the large sample size (n = 100) makes the sample mean much less variable than individual values.
4Problem 4medium
❓ Question:
A factory produces batteries with lifetimes that have μ = 500 hours and σ = 100 hours. Quality control tests samples of 50 batteries. What is the probability that a sample mean is more than 25 hours away from the true mean (in either direction)?
💡 Show Solution
Step 1: Translate the question "More than 25 hours away from true mean" means: Either x̄ < 475 or x̄ > 525
Find: P(|x̄ - μ| > 25) = P(x̄ < 475) + P(x̄ > 525)
Step 2: Set up sampling distribution μ = 500 hours σ = 100 hours n = 50
Check CLT: n = 50 ≥ 30 ✓
Step 3: Find sampling distribution parameters μₓ̄ = μ = 500
σₓ̄ = σ/√n = 100/√50 = 100/7.07 ≈ 14.14 hours
x̄ ~ Normal(500, 14.14) approximately
Step 4: Use symmetry By symmetry of normal distribution: P(x̄ < 475) = P(x̄ > 525)
So: P(x̄ < 475 or x̄ > 525) = 2 × P(x̄ > 525)
Step 5: Calculate P(x̄ > 525) Standardize: z = (525 - 500)/14.14 = 25/14.14 ≈ 1.77
P(x̄ > 525) = P(Z > 1.77)
Step 6: Look up probability From table: P(Z < 1.77) ≈ 0.9616
Therefore: P(Z > 1.77) = 1 - 0.9616 = 0.0384
Step 7: Find total probability P(more than 25 away) = 2 × 0.0384 = 0.0768 ≈ 0.077
Step 8: Interpret About 7.7% of samples will have means more than 25 hours from the true mean.
This means:
- 92.3% of samples within 25 hours of μ = 500
- Quality control can use this to set thresholds
- If x̄ is more than 25 away, might indicate problem
Step 9: Express in terms of standard errors 25 hours = 1.77 × 14.14 ≈ 1.77 SE
So we're asking: P(more than 1.77 SE from mean)
From 68-95-99.7 rule:
- Within 1 SE: ≈68%
- Within 2 SE: ≈95%
- Within 1.77 SE: ≈92.3%
Outside 1.77 SE: ≈7.7% ✓
Step 10: Decision rule for quality control Factory might use rule: "Flag sample if x̄ < 475 or x̄ > 525"
False alarm rate: 7.7% About 1 in 13 good samples will be flagged Reasonable tradeoff for quality control
Answer: P(|x̄ - μ| > 25) ≈ 0.077 or 7.7%
There's about a 7.7% probability that a sample mean will be more than 25 hours away from the true mean of 500 hours. This represents being more than 1.77 standard errors from the mean. Quality control can use this threshold to identify unusual samples that might indicate production problems.
5Problem 5hard
❓ Question:
An elevator has a maximum safe weight of 2000 lbs. If adult weights are normally distributed with μ = 180 lbs and σ = 30 lbs, what is the probability that 10 randomly selected adults will exceed the elevator's limit? What about 12 adults?
💡 Show Solution
Step 1: Understand what we're finding For n adults, total weight = n × x̄ Want: P(total weight > 2000) Equivalently: P(n × x̄ > 2000) Or: P(x̄ > 2000/n)
Step 2: Set up for n = 10 Maximum average weight: 2000/10 = 200 lbs per person
Find: P(x̄ > 200) when n = 10
Step 3: Sampling distribution for n = 10 Population is normal, so x̄ is normal for ANY n (don't need CLT!)
μₓ̄ = μ = 180 lbs
σₓ̄ = σ/√n = 30/√10 = 30/3.16 ≈ 9.49 lbs
x̄ ~ Normal(180, 9.49)
Step 4: Calculate P(x̄ > 200) for n = 10 Standardize: z = (200 - 180)/9.49 = 20/9.49 ≈ 2.11
P(x̄ > 200) = P(Z > 2.11)
From table: P(Z < 2.11) ≈ 0.9826
P(Z > 2.11) = 1 - 0.9826 = 0.0174
Step 5: Interpret n = 10 result About 1.74% chance that 10 adults exceed 2000 lbs Fairly safe - less than 2% risk
Step 6: Set up for n = 12 Maximum average weight: 2000/12 ≈ 166.67 lbs per person
Find: P(x̄ > 166.67) when n = 12
Step 7: Sampling distribution for n = 12 μₓ̄ = 180 lbs
σₓ̄ = σ/√n = 30/√12 = 30/3.46 ≈ 8.66 lbs
x̄ ~ Normal(180, 8.66)
Step 8: Calculate P(x̄ > 166.67) for n = 12 Standardize: z = (166.67 - 180)/8.66 = -13.33/8.66 ≈ -1.54
P(x̄ > 166.67) = P(Z > -1.54)
From table: P(Z < -1.54) ≈ 0.0618
P(Z > -1.54) = 1 - 0.0618 = 0.9382
Step 9: Interpret n = 12 result About 93.8% chance that 12 adults exceed 2000 lbs! Very risky - almost certain to exceed limit
Step 10: Why such a big difference? n = 10: Need average > 200 lbs (20 lbs above μ) = 2.11 SE above mean Unlikely!
n = 12: Need average > 166.67 lbs (13.33 lbs below μ)
= 1.54 SE below mean
Very likely!
Step 11: Find maximum safe capacity At what n does P(exceed) = 0.05 (5% risk)?
Need: P(x̄ > 2000/n) = 0.05 P(Z > z) = 0.05 means z = 1.645
(2000/n - 180)/(30/√n) = -1.645
Solving: 2000/n = 180 - 1.645(30/√n) 2000 = 180n - 49.35√n
Approximately n ≈ 10.6
So maximum safe capacity is about 10 adults for 5% risk level.
Answer: n = 10: P(exceed 2000 lbs) ≈ 0.017 or 1.7% n = 12: P(exceed 2000 lbs) ≈ 0.938 or 93.8%
With 10 adults, there's only about 1.7% chance of exceeding the limit (relatively safe). With 12 adults, there's about 93.8% chance of exceeding the limit (very dangerous!). The maximum average weight needed drops from 200 lbs (n=10) to 166.67 lbs (n=12), and 166.67 is well below the population mean of 180, making it very likely to exceed.
Practice with Flashcards
Review key concepts with our flashcard system
Browse All Topics
Explore other calculus topics