Loading…
Perform chi-square tests for goodness of fit, homogeneity, and independence.
Learn step-by-step with practice exercises built right in.
Chi-square tests work with categorical data (counts in categories).
A candy company claims their candy bags contain equal proportions of red, blue, green, and yellow candies. A bag contains 30 red, 25 blue, 20 green, and 25 yellow candies. What type of chi-square test should be used, and what are the expected counts?
Test type: Chi-square goodness-of-fit test (Testing if observed distribution matches a claimed distribution)
Total candies = 30 + 25 + 20 + 25 = 100
If proportions are equal, each color should be 25% of total: Expected count for each color = 100 × 0.25 = 25
Expected counts: • Red: 25 • Blue: 25 • Green: 25 • Yellow: 25
All expected counts ≥ 5, so conditions are met.
Avoid these 3 frequent errors
Review key concepts with our flashcard system
Explore more AP Statistics topics
Where = observed count, = expected count
Question: Does sample match hypothesized distribution?
Example: Is a die fair? (each face should appear 1/6 of rolls)
Degrees of freedom: (where = number of categories)
Question: Are two categorical variables independent?
Example: Are smoking status and lung cancer independent?
Degrees of freedom: (rows and columns)
Question: Do multiple populations have the same distribution?
Example: Do males and females have the same opinion distribution?
Degrees of freedom: (same formula as independence)
Data on 400 students (exercise habit vs. GPA):
| Low GPA | High GPA | Total | |
|---|---|---|---|
| Exercise | 40 | 160 | 200 |
| No exercise | 100 | 100 | 200 |
| Total | 140 | 260 | 400 |
Expected (if independent):
; ; p-value
Conclusion: Exercise and GPA are NOT independent.
If condition 3 fails, don't use chi-square!
❌ Using observed counts instead of expected in formula ❌ Not checking expected count condition ❌ Mixing up df formulas ❌ Using chi-square with continuous data
Always create a table. Show calculation of at least two expected counts. Name the test: "chi-square test of independence" (or goodness-of-fit/homogeneity). Check all conditions before concluding.
A researcher surveys 200 people about their exercise habits and stress levels. The results are shown below. Calculate the chi-square test statistic.
Low Stress | High Stress
Exercise 60 | 40 No Exercise 30 | 70
Step 1: Calculate expected counts Row totals: Exercise = 100, No Exercise = 100 Column totals: Low Stress = 90, High Stress = 110 Grand total = 200
Expected = (row total × column total) / grand total
Expected counts: • Exercise & Low: (100 × 90)/200 = 45 • Exercise & High: (100 × 110)/200 = 55 • No Exercise & Low: (100 × 90)/200 = 45 • No Exercise & High: (100 × 110)/200 = 55
Step 2: Calculate χ² χ² = Σ[(Observed - Expected)² / Expected]
χ² = (60-45)²/45 + (40-55)²/55 + (30-45)²/45 + (70-55)²/55 χ² = 225/45 + 225/55 + 225/45 + 225/55 χ² = 5 + 4.09 + 5 + 4.09 χ² ≈ 18.18
df = (rows - 1)(columns - 1) = (2-1)(2-1) = 1
A die is rolled 120 times with the following results: 1(15), 2(18), 3(22), 4(25), 5(20), 6(20). Test at α = 0.05 if the die is fair.
H₀: The die is fair (all outcomes equally likely) Hₐ: The die is not fair
Expected count for fair die: 120/6 = 20 for each outcome
χ² = Σ[(O - E)² / E] χ² = (15-20)²/20 + (18-20)²/20 + (22-20)²/20 + (25-20)²/20 + (20-20)²/20 + (20-20)²/20 χ² = 25/20 + 4/20 + 4/20 + 25/20 + 0/20 + 0/20 χ² = 1.25 + 0.2 + 0.2 + 1.25 + 0 + 0 χ² = 2.9
df = 6 - 1 = 5
P-value: P(χ² > 2.9) ≈ 0.715
Decision: Since p-value (0.715) > α (0.05), fail to reject H₀
Conclusion: There is insufficient evidence to conclude the die is unfair.
A study examines the relationship between smoking status and lung disease in 500 people:
Disease | No Disease
Smoker 80 | 120 Non-smoker 20 | 280
Perform a chi-square test at α = 0.01 to determine if smoking and lung disease are independent.
H₀: Smoking status and lung disease are independent Hₐ: Smoking status and lung disease are associated
Step 1: Calculate expected counts Row totals: Smoker = 200, Non-smoker = 300 Column totals: Disease = 100, No Disease = 400 Total = 500
Expected counts: • Smoker & Disease: (200×100)/500 = 40 • Smoker & No Disease: (200×400)/500 = 160 • Non-smoker & Disease: (300×100)/500 = 60 • Non-smoker & No Disease: (300×400)/500 = 240
Step 2: Calculate χ² χ² = (80-40)²/40 + (120-160)²/160 + (20-60)²/60 + (280-240)²/240 χ² = 1600/40 + 1600/160 + 1600/60 + 1600/240 χ² = 40 + 10 + 26.67 + 6.67 χ² ≈ 83.34
df = (2-1)(2-1) = 1
P-value: P(χ² > 83.34) < 0.0001
Decision: Reject H₀
Conclusion: There is very strong evidence (p < 0.01) that smoking status and lung disease are associated.
A school surveys students from three grades about their favorite subject. Results:
Grade 9: Math(40), Science(30), English(30) Grade 10: Math(35), Science(35), English(30) Grade 11: Math(25), Science(45), English(30)
Test if the distribution of favorite subject is the same across grades at α = 0.05.
H₀: Distribution of favorite subject is the same across grades Hₐ: Distribution differs by grade
Step 1: Set up table Math | Science | English | Total Grade 9 40 | 30 | 30 | 100 Grade 10 35 | 35 | 30 | 100 Grade 11 25 | 45 | 30 | 100 Total 100 | 110 | 90 | 300
Step 2: Calculate expected counts E = (row total × column total) / grand total
For each cell: Grade 9 & Math: (100×100)/300 = 33.33 Grade 9 & Science: (100×110)/300 = 36.67 Grade 9 & English: (100×90)/300 = 30 [Continue for all cells...]
Step 3: Calculate χ² χ² = (40-33.33)²/33.33 + (30-36.67)²/36.67 + ... (all 9 cells) χ² ≈ 1.33 + 1.21 + 0 + 0.09 + 0.08 + 0 + 2.08 + 1.89 + 0 χ² ≈ 6.68
df = (3-1)(3-1) = 4
P-value: P(χ² > 6.68) ≈ 0.154
Decision: Fail to reject H₀
Conclusion: There is insufficient evidence that the distribution of favorite subject differs across grades.