🎯⭐ INTERACTIVE LESSON

Chi-Square Tests for Independence and Homogeneity

Learn step-by-step with interactive practice!

← Back to Standard Lesson

Chi-Square Tests for Independence and Homogeneity - Complete Interactive Lesson

Part 1: Chi-Square Goodness-of-Fit

📊 Chi-Square Goodness-of-Fit Test

Part 1 of 7 — Testing Categorical Distributions

When to Use Chi-Square Goodness-of-Fit

Use when you want to test whether observed frequencies match expected frequencies for a categorical variable.

Example: A die is rolled 60 times. Do the results suggest it is fair?

Outcome	1	2	3	4	5	6
Observed	8	12	7	15	9	9
Expected	10	10	10	10	10	10

The Chi-Square Statistic

$\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}$

where $O_i$ = observed count and $E_i$ = expected count.

For the die example: $\chi^2 = \frac{(8-10)^2}{10} + \frac{(12-10)^2}{10} + \cdots = \frac{4+4+9+25+1+1}{10} = 4.4$

Hypotheses

$H_0$ : The observed distribution matches the expected distribution
$H_a$ : The observed distribution does NOT match the expected

Conditions

Random sample or random assignment
Expected counts ≥ 5 for all categories
Independence — observations are independent

🔑 Chi-square tests are always right-tailed — larger $\chi^2$ values provide more evidence against $H_0$ .

Goodness-of-Fit Check 🎯

Chi-Square Calculation 🧮

A bag should contain equal numbers of 4 colors. From 80 candies: Red=24, Blue=18, Green=22, Yellow=16.

1) Expected count for each color = 80/4 = ?

2) Compute $\chi^2 = \frac{(24-20)^2}{20} + \frac{(18-20)^2}{20} + \frac{(22-20)^2}{20} + \frac{(16-20)^2}{20}$

Part 2: Chi-Square Test for Independence

📊 Chi-Square Test for Independence

Part 2 of 7 — Are Two Categorical Variables Related?

Topics in This Part

Section
📐 When to Use It
📊 Two-Way Tables & Expected Counts
🧮 The Test Statistic
📝 Full Worked Example

🔑 Key Concept: The chi-square test for independence uses data from one sample to determine whether two categorical variables are associated (related) or independent.

When to Use the Test for Independence

Feature	Detail
Data source	One sample or one group of subjects
Variables	Two categorical variables measured on each subject
$H_0$

Part 3: Chi-Square Test for Homogeneity

📊 Chi-Square Test for Homogeneity

Part 3 of 7 — Comparing Distributions Across Populations

Topics in This Part

Section
📐 Independence vs. Homogeneity
📊 Setting Up the Test
🧮 Worked Example
📝 AP Exam Distinction

🔑 Key Concept: The test for homogeneity uses data from two or more independent samples (or treatment groups) to determine whether the distribution of a single categorical variable is the same across all populations.

Independence vs. Homogeneity

Feature	Independence	Homogeneity
Samples	One sample	Two or more independent samples
Variables	Two categorical variables	One categorical variable across groups
$H_0$

Part 4: Conditions and Degrees of Freedom

📊 Conditions and Degrees of Freedom

Part 4 of 7 — When Can You Use the $\chi^2$ Test?

Topics in This Part

Section
✅ Three Conditions
📐 Degrees of Freedom for Each Test
⚠️ What to Do When Conditions Fail
📝 AP Exam Condition-Checking

🔑 Key Concept: All three chi-square tests (GoF, Independence, Homogeneity) require the same three conditions: Random, 10%, and Large Counts.

The Three Conditions

Condition	Requirement	AP Language
Random	Data from random sample or randomized experiment

Part 5: Interpreting Results

📊 Interpreting Results

Part 5 of 7 — Reading $\chi^2$ Output and Drawing Conclusions

Topics in This Part

Section
📐 Interpreting the $\chi^2$ Statistic
📊 Using the $\chi^2$ Table

Part 6: Problem-Solving Workshop

📊 Problem-Solving Workshop

Part 6 of 7 — Full AP Free-Response Practice

Topics in This Part

Section
📝 GoF Worked Example
📝 Independence Worked Example
⚠️ Common AP Mistakes

🔑 Key Concept: Chi-square FRQs follow the same 4-step framework: Hypotheses, Conditions, Calculate, Conclude. Practice writing each step clearly.

Worked Example 1: Goodness-of-Fit

Problem: A company claims its candy mix is 30% red, 20% blue, 20% green, 15% yellow, 15% orange. A random sample of 200 candies yields:

Color	Red	Blue	Green	Yellow	Orange
Observed	75	35	32	28	30
Expected	60	40	40	30	30

Part 7: Review & Applications

📊 Review & Applications

Part 7 of 7 — Comprehensive Chi-Square Review

Topics in This Part

Section
📋 All Three Tests Side-by-Side
📐 Formula & Condition Summary
📝 Mixed Practice

🔑 Key Concept: This review covers all three chi-square tests. Know when to use each, how to check conditions, and how to write full AP-quality solutions.

Three Chi-Square Tests Compared

Feature	Goodness of Fit	Independence	Homogeneity
Samples	One	One	Two or more
Variables	One categorical	Two categorical	One categorical
$H_0$

3) Degrees of freedom = $k - 1$ = ?

Example: Survey 300 students and record both grade level (freshman, sophomore, junior, senior) and preferred lunch (pizza, salad, sandwich). Is there an association between grade and lunch preference?

For each cell in a two-way table:

$\boxed{E = \frac{\text{row total} \times \text{column total}}{\text{grand total}}}$

This gives the count you would expect if the two variables were truly independent.

Data: A random sample of 200 adults:

	Favor	Oppose	Total
Male	60	40	100
Female	45	55	100
Total	105	95	200

$H_0$ : Gender and opinion are independent.
$H_a$ : Gender and opinion are associated.

Expected counts:

	Favor	Oppose
Male	$\frac{100 \times 105}{200} = 52.5$	$\frac{100 \times 95}{200} = 47.5$
Female	$\frac{100 \times 105}{200} = 52.5$	$\frac{100 \times 95}{200} = 47.5$

$\chi^2$ calculation:

$\chi^2 = \frac{(60-52.5)^2}{52.5} + \frac{(40-47.5)^2}{47.5} + \frac{(45-52.5)^2}{52.5} + \frac{(55-47.5)^2}{47.5}$

$= \frac{56.25}{52.5} + \frac{56.25}{47.5} + \frac{56.25}{52.5} + \frac{56.25}{47.5}$

$= 1.071 + 1.184 + 1.071 + 1.184 = 4.510$

$df = (r-1)(c-1) = (2-1)(2-1) = 1$

Using a $\chi^2$ table: $p \approx 0.034$

Conclusion: Since $p = 0.034 < 0.05$ , we reject $H_0$ . There is convincing evidence of an association between gender and opinion on this issue.

Independence Test Concepts 🎯

Expected Count Practice 🧮

A two-way table has row totals of 80 and 120, column totals of 90 and 110, and a grand total of 200.

1) Expected count for the top-left cell (row 1, column 1)?

2) Expected count for the bottom-right cell (row 2, column 2)?

3) Degrees of freedom for this $2 \times 2$ table?

Independence Test Decisions 🔍

Exit Quiz — Test for Independence ✅

⚠️ AP Exam: The math is identical. The difference is in the hypotheses and context. Read the problem carefully to determine which test is appropriate.

Hypotheses for Homogeneity

$H_0: \text{The distribution of [variable] is the same for all [groups].}$ $H_a: \text{The distribution of [variable] is NOT the same for all [groups].}$

Problem: Two schools were surveyed about favorite subject. Is the distribution of preferences the same?

	Math	English	Science	Total
School A	45	30	25	100
School B	35	40	25	100
Total	80	70	50	200

$H_0$ : The distribution of favorite subject is the same for School A and School B.

Expected counts: (each row total = 100, grand total = 200)

	Math	English	Science
School A	$\frac{100 \times 80}{200} = 40$	$\frac{100 \times 70}{200} = 35$	$\frac{100 \times 50}{200} = 25$
School B	$40$	$35$	$25$

$\chi^2 = \frac{(45-40)^2}{40} + \frac{(30-35)^2}{35} + \frac{(25-25)^2}{25} + \frac{(35-40)^2}{40} + \frac{(40-35)^2}{35} + \frac{(25-25)^2}{25}$

$= 0.625 + 0.714 + 0 + 0.625 + 0.714 + 0 = 2.678$

$df = (2-1)(3-1) = 2$

Using a $\chi^2$ table with $df = 2$ : $p \approx 0.262$

Conclusion: Since $p = 0.262 > 0.05$ , we fail to reject $H_0$ . There is not convincing evidence that the distribution of favorite subject differs between the two schools.

Homogeneity Concepts 🎯

Independence or Homogeneity? 🔍

Identify the correct test for each scenario.

Homogeneity Calculation 🧮

Three brands of cereal are compared on sugar level (Low, Medium, High). Samples: Brand A: 50, Brand B: 60, Brand C: 40. Grand total: 150. Column totals: Low = 60, Medium = 50, High = 40.

1) Expected count for Brand A, Low sugar?

2) Expected count for Brand C, High sugar?

3) $df$ for this $3 \times 3$ table?

Exit Quiz — Test for Homogeneity ✅

⚠️ Critical AP Detail: For chi-square, the Large Counts condition uses expected counts, NOT observed counts. This is different from the Large Counts condition for proportions ( $n\hat{p} \geq 10$ ).

Degrees of Freedom Summary

Test	$df$ Formula	Example
Goodness of Fit	$k - 1$ (where $k$ = number of categories)	6 sides of a die → $df = 5$
Independence	$(r-1)(c-1)$	$3 \times 4$ table → $df = 6$
Homogeneity	$(r-1)(c-1)$	$2 \times 3$ table → $df = 2$

Why Degrees of Freedom Matter

The $\chi^2$ distribution changes shape with $df$ :

$df$	Shape
$1$	Strongly right-skewed
$5$	Moderately right-skewed
$15+$	More symmetric

Higher $df$ shifts the distribution to the right and increases the mean ( $\mu = df$ ).

What If Conditions Fail?

Condition	If It Fails
Random	Results may not generalize — state the limitation
10%	Standard errors may be wrong — results are questionable
Large Counts	Combine categories or use Fisher exact test (not on AP exam)

🔑 AP Tip: On the AP exam, if an expected count is below 5, you should note this but may still be asked to proceed with the test. State the concern and continue.

Conditions & df Concepts 🎯

Degrees of Freedom Practice 🧮

1) GoF test with 8 categories: $df =$

2) Independence test with a $5 \times 3$ table: $df =$

3) Homogeneity test comparing 4 groups on a categorical variable with 3 levels: Table is $4 \times 3$ . $df =$

Condition Checking 🔍

Exit Quiz — Conditions & Degrees of Freedom ✅

🔑 Key Concept: A large $\chi^2$ value means the observed data differ substantially from what is expected under $H_0$ . The p-value tells you how surprising your $\chi^2$ would be if $H_0$ were true.

What the $\chi^2$ Value Tells You

$\chi^2$ Value	Interpretation
Near 0	Observed counts closely match expected → little evidence against $H_0$
Moderate	Some discrepancy → may or may not be significant
Large	Big differences → strong evidence against $H_0$

The p-value makes this precise: it gives the probability of getting a $\chi^2$ as large or larger than yours, assuming $H_0$ is true.

Reading the $\chi^2$ Table

The table gives right-tail areas for the $\chi^2$ distribution:

$df$	$\alpha = 0.10$	$\alpha = 0.05$	$\alpha = 0.025$	$\alpha = 0.01$
1	2.706	3.841	5.024	6.635
2	4.605	5.991	7.378	9.210
3	6.251	7.815	9.348	11.345
4	7.779	9.488	11.143	13.277
5	9.236	11.070	12.833	15.086

How to use: If $\chi^2 = 8.5$ with $df = 3$ : $7.815 < 8.5 < 9.348$ , so $0.025 < p < 0.05$ .

AP Conclusion Template

If $p \leq \alpha$ : "Since the p-value ( $p = \text{value}$ ) is less than $\alpha = 0.05$ , we reject $H_0$ . There is convincing evidence that [state $H_a$ in context]."

If $p > \alpha$ : "Since the p-value ( $p = \text{value}$ ) is greater than $\alpha = 0.05$ , we fail to reject $H_0$ . There is not convincing evidence that [state $H_a$ in context]."

⚠️ Never say "accept $H_0$ " — say "fail to reject $H_0$ ."

Follow-Up: Which Cells Drive the Result?

After rejecting $H_0$ , examine individual cell contributions $(O_i - E_i)^2/E_i$ :

Contribution	Interpretation
Large	This category/cell is a major source of the discrepancy
Small	This category/cell fits the model well

Also note the direction: Is $O > E$ (more than expected) or $O < E$ (fewer than expected)?

Interpretation Concepts 🎯

Using the $\chi^2$ Table 🧮

Use the partial table above.

1) $\chi^2 = 6.0$ , $df = 2$ . Is $p$ less than or greater than 0.05? Enter "less" or "greater".

2) $\chi^2 = 10.5$ , $df = 5$ . The p-value is between which two table values? Enter the larger $\alpha$ boundary (e.g., "0.10").

3) For $df = 1$ , what $\chi^2$ value gives $p = 0.05$ ?

Conclusion Writing 🔍

Exit Quiz — Interpreting Results ✅

Step 1 — Hypotheses: $H_0$ : The distribution of colors matches the company claim. $H_a$ : The distribution of colors does not match the company claim.

Step 2 — Conditions:

Random: Random sample stated ✓
10%: $200 < 10\%$ of all candies produced ✓
Large Counts: All expected counts $\geq 5$ (smallest is 30) ✓

Step 3 — Calculate: $\chi^2 = \frac{(75-60)^2}{60} + \frac{(35-40)^2}{40} + \frac{(32-40)^2}{40} + \frac{(28-30)^2}{30} + \frac{(30-30)^2}{30}$

$= \frac{225}{60} + \frac{25}{40} + \frac{64}{40} + \frac{4}{30} + 0 = 3.75 + 0.625 + 1.60 + 0.133 + 0$

$\chi^2 = 6.108, \quad df = 5-1 = 4$

From the table: $p$ is between $0.10$ and $0.25$ (since $6.108 < 7.779$ ).

Step 4 — Conclude: Since the p-value is greater than $\alpha = 0.05$ , we fail to reject $H_0$ . There is not convincing evidence that the distribution of candy colors differs from the company claim.

🔍 Follow-up: The red category had the largest contribution (3.75), suggesting there may be more red candies than claimed.

Worked Example 2: Test for Independence

Problem: A random sample of 400 adults records education level and exercise frequency:

	≤ 3 days/week	> 3 days/week	Total
No degree	120	80	200
College degree	70	130	200
Total	190	210	400

Step 1 — Hypotheses: $H_0$ : Education level and exercise frequency are independent. $H_a$ : Education level and exercise frequency are associated.

Step 2 — Conditions:

Random: Random sample stated ✓
10%: $400 < 10\%$ of all adults ✓
Large Counts: Expected counts: $E_{11} = \frac{200 \times 190}{400} = 95$ , $E_{12} = 105$ , $E_{21} = 95$ , $E_{22} = 105$ . All $\geq 5$ ✓

Step 3 — Calculate: $\chi^2 = \frac{(120-95)^2}{95} + \frac{(80-105)^2}{105} + \frac{(70-95)^2}{95} + \frac{(130-105)^2}{105}$

$= \frac{625}{95} + \frac{625}{105} + \frac{625}{95} + \frac{625}{105} = 6.579 + 5.952 + 6.579 + 5.952 = 25.06$

$df = (2-1)(2-1) = 1$ . From the table: $\chi^2 = 25.06 > 6.635$ so $p < 0.01$ .

Step 4 — Conclude: Since the p-value is less than $\alpha = 0.05$ (in fact less than 0.01), we reject $H_0$ . There is convincing evidence of an association between education level and exercise frequency.

⚠️ Common AP Mistakes on Chi-Square FRQs

Mistake	Fix
Using observed counts for the Large Counts check	Must use expected counts
Forgetting $df$	Always state $df$ with the formula
Not stating hypotheses in context	" $H_0$ : Color distribution matches claim" not just " $H_0$ : fit"
Saying "accept $H_0$ "	Say "fail to reject $H_0$ "
Confusing independence, homogeneity, and GoF	Read the study design carefully
Not showing the $\chi^2$ formula with substitution	Show $\sum (O-E)^2/E$ with at least some terms
Claiming causation from an independence test	Association ≠ causation (unless randomized experiment)

Workshop Concept Check 🎯

Quick Calculations 🧮

A GoF test: categories A, B, C with observed = 30, 25, 45 and expected = 33.3, 33.3, 33.3 (total = 100, equal proportions).

1) Contribution from category A: $(O-E)^2/E$ (round to 2 decimal places)

2) Contribution from category C: $(O-E)^2/E$ (round to 2 decimal places)

3) $df$ for this test?

Which Test? 🔍

Exit Quiz — Problem-Solving Workshop ✅

$\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}$

Universal Conditions

Condition	Requirement
Random	Random sample or randomized experiment
10%	$n < 10\%$ of population
Large Counts	All expected counts $\geq 5$

Test	How to Calculate $E$
GoF	$E_i = n \times p_i$ (hypothesized proportion)
Independence/Homogeneity	$E = \frac{\text{row total} \times \text{column total}}{\text{grand total}}$

Decision Guide: Which Test?

Question	Answer
Does the data fit a specific model?	GoF
Are two variables related (one sample)?	Independence
Same distribution across groups (multiple samples)?	Homogeneity

$\chi^2$ is always $\geq 0$ and always right-tailed
Large Counts uses expected counts, not observed
Never say "accept $H_0$ "
Association ≠ causation (unless randomized experiment)
Show the formula with substitution on FRQs
State $df$ explicitly

Comprehensive Concept Check 🎯

Mixed Practice 🧮

1) GoF test, 4 categories, $n = 100$ , equal proportions. Expected count per category?

2) Independence test, $3 \times 5$ table. $df =$

3) $\chi^2 = 7.5$ , $df = 2$ . From the table ( $\alpha = 0.05$ : 5.991; $\alpha = 0.025$ : 7.378). Is $p$ less than 0.025? Enter "yes" or "no".

Quick Decisions 🔍

Final Exam — Chi-Square Unit ✅

Chi-Square Tests for Independence and Homogeneity

Expected Counts

Worked Example

Hypotheses for Homogeneity

Worked Example

Degrees of Freedom Summary

Why Degrees of Freedom Matter

What If Conditions Fail?

What the $\chi^2$ Value Tells You

Reading the $\chi^2$ Table

AP Conclusion Template

Follow-Up: Which Cells Drive the Result?

Worked Example 2: Test for Independence

⚠️ Common AP Mistakes on Chi-Square FRQs

Universal Formula

Universal Conditions

Expected Counts

Decision Guide: Which Test?

Key AP Reminders

Chi-Square Tests for Independence and Homogeneity

Chi-Square Tests for Independence and Homogeneity - Complete Interactive Lesson

Part 1: Chi-Square Goodness-of-Fit

📊 Chi-Square Goodness-of-Fit Test

When to Use Chi-Square Goodness-of-Fit

The Chi-Square Statistic

Hypotheses

Conditions

Part 2: Chi-Square Test for Independence

📊 Chi-Square Test for Independence

Topics in This Part

When to Use the Test for Independence

Part 3: Chi-Square Test for Homogeneity

📊 Chi-Square Test for Homogeneity

Topics in This Part

Independence vs. Homogeneity

Part 4: Conditions and Degrees of Freedom

📊 Conditions and Degrees of Freedom

Topics in This Part

The Three Conditions

Part 5: Interpreting Results

📊 Interpreting Results

Topics in This Part

Part 6: Problem-Solving Workshop

📊 Problem-Solving Workshop

Topics in This Part

Worked Example 1: Goodness-of-Fit

Part 7: Review & Applications

📊 Review & Applications

Topics in This Part

Three Chi-Square Tests Compared

Expected Counts

Worked Example

Hypotheses for Homogeneity

Worked Example

Degrees of Freedom Summary

Why Degrees of Freedom Matter

What If Conditions Fail?

What the χ2\chi^2χ2 Value Tells You

Reading the χ2\chi^2χ2 Table

AP Conclusion Template

Follow-Up: Which Cells Drive the Result?

Worked Example 2: Test for Independence

⚠️ Common AP Mistakes on Chi-Square FRQs

Universal Formula

Universal Conditions

Expected Counts

Decision Guide: Which Test?

Key AP Reminders

What the $\chi^2$ Value Tells You

Reading the $\chi^2$ Table