Type I and Type II Errors

Understanding testing errors and power

Type I and Type II Errors

The Four Possible Outcomes

| Decision Reality | H₀ True | H₀ False | |-------------------|---------|----------| | Fail to reject H₀ | ✓ Correct | Type II Error | | Reject H₀ | Type I Error | ✓ Correct |

Type I Error: Reject H₀ when it's actually true (false positive)

Type II Error: Fail to reject H₀ when it's actually false (false negative)

Type I Error (α)

Definition: Rejecting true null hypothesis

Probability: α (significance level)

Example: Medical test

  • H₀: Patient healthy
  • Type I: Diagnose disease when patient is healthy

Consequences: False alarm, unnecessary treatment, wasted resources

Control: Set α before testing (0.05, 0.01, etc.)

Type II Error (β)

Definition: Failing to reject false null hypothesis

Probability: β (depends on true parameter value, sample size, α)

Example: Medical test

  • H₀: Patient healthy
  • Type II: Miss disease in sick patient

Consequences: Miss real effect, fail to treat, potential harm

Control: Increase sample size, increase α (trade-off!)

Power

Power: Probability of correctly rejecting false H₀

Power=1β\text{Power} = 1 - \beta

Higher power = better test (more likely to detect real effect)

Factors increasing power:

  1. Larger sample size (n)
  2. Larger effect size (further from H₀)
  3. Less variability (smaller σ)
  4. Higher α (but increases Type I risk)

Example: Coin Testing

Test if coin is fair:

  • H₀: p = 0.5 (fair)
  • Hₐ: p ≠ 0.5 (biased)
  • Flip 20 times, α = 0.05

Type I Error:

  • Coin actually fair (p = 0.5)
  • Get unusual result (like 15 heads)
  • Reject H₀ (conclude biased)
  • Error: Called fair coin biased

Type II Error:

  • Coin actually biased (say p = 0.7)
  • Get result that looks reasonable for fair coin (like 11 heads)
  • Fail to reject H₀
  • Error: Failed to detect biased coin

Calculating Type I Error Probability

Type I Error probability = α (by design)

Example: If α = 0.05, P(Type I Error) = 0.05

Interpretation: 5% of the time we reject H₀, H₀ is actually true

Calculating Power (Advanced)

Requires:

  • Specific alternative value
  • Sample size
  • Variability
  • α

Example: Test H₀: μ = 100 vs Hₐ: μ > 100

  • α = 0.05, n = 25, σ = 15
  • True μ = 106

Power calculation:

  1. Find critical value for rejection
  2. Find probability of exceeding it when μ = 106
  3. This is the power

Typically use software for exact power calculations

Trade-offs

Decreasing α (stricter):

  • ↓ Type I Error risk
  • ↑ Type II Error risk
  • ↓ Power

Increasing α:

  • ↑ Type I Error risk
  • ↓ Type II Error risk
  • ↑ Power

Can't minimize both simultaneously with fixed n!

Solution: Increase n (decreases both error types)

Choosing α

Common practice: α = 0.05

More conservative (α = 0.01): When Type I Error very costly

  • Example: Approving new drug (don't want false positive)

Less conservative (α = 0.10): When Type II Error very costly

  • Example: Screening test (don't want to miss cases)

Balance: Consider consequences of each error type

Real-World Examples

Criminal Trial:

  • H₀: Defendant innocent
  • Type I: Convict innocent person (false conviction)
  • Type II: Acquit guilty person (false acquittal)
  • System prioritizes avoiding Type I (innocent until proven guilty)

Medical Screening:

  • H₀: Patient disease-free
  • Type I: False positive (unnecessary worry, follow-up tests)
  • Type II: False negative (miss disease, delayed treatment)
  • Balance depends on disease severity

Quality Control:

  • H₀: Process working properly
  • Type I: Stop working process (wasted time, money)
  • Type II: Miss defective process (bad products shipped)

Relationship Between Errors

For fixed n:

  • Lowering α → higher β (inverse relationship)
  • Can't have both low α and low β

Increasing n:

  • Can lower both α and β
  • Only way to improve both

Increasing effect size:

  • β decreases (easier to detect large effects)
  • α unchanged (still set by us)

Power Analysis for Sample Size

Before study: Determine n needed for desired power

Typical goal: Power = 0.80 (80% chance of detecting effect)

Requires specifying:

  • Minimum important effect size
  • Desired α
  • Estimated variability
  • Desired power

Software: G*Power, R, online calculators

Common Misconceptions

❌ "P-value is probability of Type I Error"

  • No! α is P(Type I Error)
  • P-value is P(data | H₀)

❌ "Can eliminate both error types"

  • No! Trade-off exists (for fixed n)

❌ "Type II Error is 1 - α"

  • No! That's only if specific alternative value is exactly on boundary

❌ "High power means H₀ is false"

  • No! Power is property of test, not evidence about H₀

Practical Advice

Before study:

  1. Consider consequences of each error type
  2. Choose α appropriately
  3. Do power analysis to determine n

After study:

  1. Report P-value (not just "significant" or "not")
  2. Consider practical significance, not just statistical
  3. Recognize limitations (Type II error possible if fail to reject)

Quick Reference

Type I Error (α):

  • Reject true H₀
  • P(Type I) = α
  • False positive

Type II Error (β):

  • Fail to reject false H₀
  • P(Type II) = β
  • False negative

Power = 1 - β:

  • Probability of detecting real effect
  • Increase with: larger n, larger effect, smaller σ, larger α

Trade-off:

  • Can't minimize both errors with fixed n
  • Increase n to reduce both

Remember: All hypothesis tests risk errors. Understanding and balancing these risks is key to good statistical practice!

📚 Practice Problems

No example problems available yet.