Confidence Intervals for Means

Estimating population means using t-distributions

Confidence Intervals for Means

Why t-Distribution?

Problem: Population σ usually unknown

Solution: Use sample standard deviation s, but this adds uncertainty

Result: Use t-distribution instead of normal

t-distribution:

  • Similar to normal (symmetric, bell-shaped)
  • Heavier tails (accounts for extra uncertainty from using s)
  • Depends on degrees of freedom (df = n - 1)
  • As df increases, approaches normal

One-Sample t-Interval for Mean

Formula:

xˉ±tsn\bar{x} \pm t^* \frac{s}{\sqrt{n}}

Where:

  • xˉ\bar{x} = sample mean
  • s = sample standard deviation
  • n = sample size
  • t* = critical value from t-distribution with df = n - 1

Conditions for t-Interval

Random: Random sample
Normal: Population approximately normal OR n ≥ 30 (CLT)
Independent: n < 10% of population (if sampling without replacement)

For normality:

  • If n < 15: Data must be very close to normal (check with plot)
  • If 15 ≤ n < 30: Data should be roughly symmetric, no outliers
  • If n ≥ 30: Can proceed unless severe outliers or extreme skew

Finding t* Critical Value

Calculator: invT(area to left, df)

Example: 95% CI with n = 20 (df = 19)

  • Area to left = (1 + 0.95)/2 = 0.975
  • invT(0.975, 19) ≈ 2.093

Table: Look up df and confidence level

Example 1: Simple t-Interval

Test scores: n = 25, xˉ\bar{x} = 78, s = 12

95% CI:

Conditions:

  • Random: Assume ✓
  • Normal: n = 25, assume roughly normal ✓
  • Independent: 25 < 10% of students ✓

Calculate:

  • df = 25 - 1 = 24
  • t* = 2.064 (from table/calculator)
  • SE = 12/√25 = 2.4

CI=78±2.064(2.4)=78±4.95CI = 78 \pm 2.064(2.4) = 78 \pm 4.95

(73.05,82.95)(73.05, 82.95)

Interpretation: We are 95% confident the true mean score is between 73.05 and 82.95.

t vs z

Use z when:

  • Known population σ (rare!)
  • Working with proportions

Use t when:

  • Unknown σ, using sample s (almost always for means!)

Key difference: t has heavier tails → wider intervals (more conservative)

Sample Size for Desired ME

Challenge: ME depends on s, which we don't know in advance

Approach:

  1. Estimate s from pilot study or similar data
  2. Use conservative t* (larger than final value)
  3. Calculate n
  4. Round up

Formula:

n=(tsm)2n = \left(\frac{t^* s}{m}\right)^2

Example 2: Checking Normality

Small sample (n = 12):

  • MUST check for approximate normality
  • Use dotplot, boxplot, or normal probability plot
  • Look for: symmetric shape, no outliers, no severe skew

If data skewed or has outliers with small n: t-procedures NOT appropriate

Two-Sample t-Interval

Comparing two means:

(xˉ1xˉ2)±ts12n1+s22n2(\bar{x}_1 - \bar{x}_2) \pm t^* \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}

df: Use calculator (complex formula) or conservative: min(n₁-1, n₂-1)

Conditions: Both samples meet conditions

Interpretation: If interval contains 0, no significant difference

Paired Data

When data naturally paired:

  • Before/after on same subjects
  • Twins, matched pairs

Analyze differences:

  1. Calculate difference for each pair: d = x₁ - x₂
  2. One-sample t-interval on differences

dˉ±tsdn\bar{d} \pm t^* \frac{s_d}{\sqrt{n}}

Where n = number of pairs, df = n - 1

Example 3: Paired Data

Blood pressure before/after medication (n = 15 patients):

  • dˉ\bar{d} = 8.2 (average decrease)
  • s_d = 5.1

90% CI for mean decrease:

  • df = 14
  • t* = 1.761
  • SE = 5.1/√15 ≈ 1.317

CI=8.2±1.761(1.317)=8.2±2.32CI = 8.2 \pm 1.761(1.317) = 8.2 \pm 2.32

(5.88,10.52)(5.88, 10.52)

Interpretation: We are 90% confident medication reduces blood pressure by 5.88 to 10.52 points on average.

Interpreting Confidence Level

Same as for proportions:

95% means if we repeated sampling many times, about 95% of intervals would contain true μ

NOT: "95% of data in interval" or "95% chance μ in interval"

Effect of Sample Size

Larger n:

  • Smaller SE (dividing by √n)
  • More df → smaller t* (approaches z*)
  • Result: Narrower CI (more precise)

Trade-off: Cost and time of collecting larger sample

Robustness of t-Procedures

t-procedures fairly robust to violations of normality if:

  • n reasonably large (≥ 30)
  • No extreme outliers

Less robust for:

  • Small samples with skewness
  • Extreme outliers (affect both xˉ\bar{x} and s)

Calculator Commands (TI-83/84)

STAT → TESTS → 8:TInterval

Enter:

  • Data or Stats
  • If Stats: xˉ\bar{x}, s, n
  • C-Level
  • Calculate

For two-sample: 0:2-SampTInt

Common Mistakes

❌ Using z* instead of t*
❌ Using t* from wrong df
❌ Not checking normality with small samples
❌ Confusing paired with two-sample
❌ Misinterpreting confidence level

Quick Reference

Formula: xˉ±tsn\bar{x} \pm t^* \frac{s}{\sqrt{n}} with df = n - 1

Conditions: Random, approximately normal (or n ≥ 30), independent

Use t (not z) when σ unknown, using s

Paired data: Analyze differences with one-sample t

Remember: t-distribution accounts for extra uncertainty from estimating σ with s. Always check conditions, especially normality for small samples!

📚 Practice Problems

1Problem 1easy

Question:

A random sample of 25 students has a mean study time of 18.5 hours per week with a standard deviation of 4.2 hours. Construct a 95% confidence interval for the mean study time. Assume the population is approximately normal.

💡 Show Solution

Step 1: Identify given information n = 25 x̄ = 18.5 hours s = 4.2 hours Confidence level = 95%

Step 2: Check conditions RANDOM: Random sample ✓ NORMAL: Population approximately normal (given) ✓

  • Since n = 25 < 30, need this assumption INDEPENDENT: Assume n ≤ 0.10N ✓

Step 3: Use t-distribution (not z) We use t because:

  • σ is unknown (only have s)
  • Even though normality assumed

Degrees of freedom: df = n - 1 = 24

Step 4: Find t* critical value From t-table with df = 24, 95% confidence: t* = 2.064

Step 5: Calculate standard error SE = s/√n = 4.2/√25 = 4.2/5 = 0.84

Step 6: Calculate margin of error ME = t* × SE = 2.064 × 0.84 ≈ 1.73

Step 7: Construct confidence interval CI = x̄ ± ME = 18.5 ± 1.73 = (16.77, 20.23) ≈ (16.8, 20.2)

Step 8: Interpret We are 95% confident that the true mean study time for all students is between 16.8 and 20.2 hours per week.

Answer: 95% CI: (16.8, 20.2) hours

We use the t-distribution because the population standard deviation is unknown.

2Problem 2easy

Question:

Why do we use the t-distribution instead of the z-distribution for confidence intervals for means?

💡 Show Solution

Step 1: Understand the key difference Z-distribution: Used when σ (population SD) is KNOWN T-distribution: Used when σ is UNKNOWN, use s (sample SD)

Step 2: Why σ is usually unknown In practice:

  • Rarely know true population standard deviation
  • If we knew σ, we'd probably know μ too!
  • Almost always must estimate from sample

Step 3: What using s instead of σ does Using s adds extra variability:

  • s varies from sample to sample
  • s is random, σ is fixed
  • More uncertainty → wider intervals

Step 4: T-distribution accounts for this T-distribution has:

  • Heavier tails than normal
  • More probability in extremes
  • Depends on sample size (df = n-1)

This compensates for extra uncertainty from estimating σ

Step 5: Compare z and t For 95% confidence:

  • z* = 1.96 (always)
  • t* depends on df:
    • df = 5: t* = 2.571 (much larger!)
    • df = 10: t* = 2.228
    • df = 20: t* = 2.086
    • df = 30: t* = 2.042
    • df = ∞: t* → 1.96 (approaches z)

Step 6: As n increases Small n:

  • s is unreliable estimate of σ
  • Need large t* for extra safety
  • Wide intervals

Large n:

  • s becomes good estimate of σ
  • t* approaches z*
  • T-distribution → Normal

Step 7: When to use which? USE Z:

  • σ known (rare!)
  • Large sample (n ≥ 30) and any distribution
  • Proportions (different formula)

USE T:

  • σ unknown (almost always!)
  • Small sample and population approximately normal
  • Means with sample SD

Answer: Use t-distribution when σ is unknown and we must estimate it with s. The t-distribution has heavier tails to account for the extra uncertainty from estimating σ. As sample size increases, t approaches the normal distribution.

3Problem 3medium

Question:

A researcher measures reaction times (in seconds) for 40 subjects: x̄ = 0.38s, s = 0.12s. Construct a 99% confidence interval for the mean reaction time.

💡 Show Solution

Step 1: Given information n = 40 x̄ = 0.38 seconds s = 0.12 seconds Confidence level = 99%

Step 2: Check conditions RANDOM: Assume random sample ✓ NORMAL: n = 40 ≥ 30, can apply CLT ✓ INDEPENDENT: Assume 40 ≤ 0.10N ✓

Step 3: Find t* critical value df = n - 1 = 39 99% confidence

From t-table: t* ≈ 2.708 (or use calculator/software)

Step 4: Calculate SE SE = s/√n = 0.12/√40 = 0.12/6.325 ≈ 0.0190

Step 5: Calculate ME ME = t* × SE = 2.708 × 0.0190 ≈ 0.0514

Step 6: Construct CI CI = 0.38 ± 0.051 = (0.329, 0.431) ≈ (0.33, 0.43) seconds

Step 7: Interpret We are 99% confident that the true mean reaction time is between 0.33 and 0.43 seconds.

Answer: 99% CI: (0.33, 0.43) seconds

4Problem 4medium

Question:

Compare 90%, 95%, and 99% confidence intervals for the same data: n = 36, x̄ = 50, s = 12. What happens to interval width as confidence level increases?

💡 Show Solution

Step 1: Set up n = 36, x̄ = 50, s = 12 df = 35

Step 2: Calculate SE (same for all) SE = s/√n = 12/√36 = 12/6 = 2

Step 3: Find t* values 90% CI: t* ≈ 1.690 95% CI: t* ≈ 2.030 99% CI: t* ≈ 2.724

Step 4: Calculate MEs ME₉₀ = 1.690 × 2 = 3.38 ME₉₅ = 2.030 × 2 = 4.06 ME₉₉ = 2.724 × 2 = 5.45

Step 5: Construct intervals 90% CI: 50 ± 3.38 = (46.62, 53.38), width = 6.76 95% CI: 50 ± 4.06 = (45.94, 54.06), width = 8.12 99% CI: 50 ± 5.45 = (44.55, 55.45), width = 10.90

Step 6: Compare widths 90% → 95%: width increases by 20% 95% → 99%: width increases by 34% 90% → 99%: width increases by 61%

Higher confidence = wider interval!

Step 7: The tradeoff Higher confidence level:

  • More confident interval captures μ
  • Less precise (wider interval)

Lower confidence level:

  • More precise (narrower interval)
  • Less confident interval captures μ

Cannot have both high confidence AND high precision!

Answer: 90% CI: (46.6, 53.4) 95% CI: (45.9, 54.1) 99% CI: (44.6, 55.4)

As confidence increases from 90% to 99%, interval width increases by 61%. This is the precision-confidence tradeoff.

5Problem 5hard

Question:

A 95% CI for mean weight is (150, 170) lbs based on n = 25. If we want to cut the margin of error in half with the same confidence level, what sample size is needed?

💡 Show Solution

Step 1: Find current ME CI = (150, 170) Width = 170 - 150 = 20 ME = width/2 = 10 lbs

Step 2: Want new ME ME_new = 10/2 = 5 lbs

Step 3: Understand ME formula ME = t* × (s/√n)

For same confidence and approximately same s: ME ∝ 1/√n

Step 4: Set up proportion ME₁/ME₂ = √(n₂/n₁)

10/5 = √(n₂/25) 2 = √(n₂/25) 4 = n₂/25 n₂ = 100

Step 5: Why quadruple? To halve ME, must quadruple n:

  • ME ∝ 1/√n
  • Half the ME → √n must double
  • If √n doubles, n must quadruple

General rule:

  • To reduce ME by factor k → multiply n by k²
  • To halve ME (k=2) → multiply n by 4
  • To third ME (k=3) → multiply n by 9

Step 6: Verify Original: ME = t*/√25 = t*/5 New: ME = t*/√100 = t*/10

Ratio: (t*/5)/(t*/10) = 10/5 = 2 ✓

ME is indeed halved!

Answer: n = 100

Need to quadruple the sample size from 25 to 100 to halve the margin of error. This is because ME ∝ 1/√n, so halving ME requires quadrupling n.