Chi-Square Tests

Chi-Square Goodness of Fit Test

Purpose: Test if observed frequencies match expected distribution

Example: Die rolled 60 times. Are outcomes equally likely?

Hypotheses:

H₀: Distribution matches expected (die is fair)
Hₐ: Distribution doesn't match expected (die is biased)

Test Statistic:

$\chi^2 = \sum \frac{(O - E)^2}{E}$

Where:

O = observed count
E = expected count
Sum over all categories

df = number of categories - 1

Example 1: Goodness of Fit

Roll die 60 times:

| Outcome | 1 | 2 | 3 | 4 | 5 | 6 | |---------|----|----|----|----|----|----| | Observed| 8 | 12 | 9 | 11 | 15 | 5 | | Expected| 10 | 10 | 10 | 10 | 10 | 10 |

STATE:

H₀: Die is fair (all outcomes equally likely)
Hₐ: Die is not fair
α = 0.05

PLAN:

Chi-square goodness of fit
Conditions: All expected ≥ 5 ✓

DO:

$\chi^2 = \frac{(8-10)^2}{10} + \frac{(12-10)^2}{10} + ... + \frac{(5-10)^2}{10}$

$= \frac{4}{10} + \frac{4}{10} + \frac{1}{10} + \frac{1}{10} + \frac{25}{10} + \frac{25}{10} = 6.0$

df = 6 - 1 = 5

P-value = P(χ² ≥ 6.0) ≈ 0.306 (from chi2cdf)

CONCLUDE: P-value = 0.306 > 0.05, fail to reject H₀. No evidence die is biased.

Conditions for Goodness of Fit

Random sample
All expected counts ≥ 5
Independent observations

If expected < 5: Combine categories if makes sense

Chi-Square Distribution

Properties:

Always positive (squared differences)
Right-skewed
Shape depends on df
As df increases, approaches normal

P-value: Always upper tail (larger χ² = worse fit)

Chi-Square Test of Independence

Purpose: Test if two categorical variables are independent

Setup: Two-way table (contingency table)

Hypotheses:

H₀: Variables are independent
Hₐ: Variables are associated (dependent)

Expected Counts for Independence

For each cell:

$E = \frac{(\text{row total})(\text{column total})}{\text{grand total}}$

If independent: Expected count = what we'd expect by chance alone

Example 2: Test of Independence

Relationship between gender and favorite sport (200 students):

| | Baseball | Basketball | Soccer | Total | |--------|----------|------------|--------|-------| | Male | 30 | 40 | 30 | 100 | | Female | 20 | 30 | 50 | 100 | | Total | 50 | 70 | 80 | 200 |

Expected for Male/Baseball:

$E = \frac{100 \times 50}{200} = 25$

All expected counts:

| | Baseball | Basketball | Soccer | |--------|----------|------------|--------| | Male | 25 | 35 | 40 | | Female | 25 | 35 | 40 |

STATE:

H₀: Gender and sport preference are independent
Hₐ: Gender and sport preference are associated
α = 0.05

DO:

$\chi^2 = \frac{(30-25)^2}{25} + \frac{(40-35)^2}{35} + ... + \frac{(50-40)^2}{40}$

$= 1 + 0.714 + 2.5 + 1 + 0.714 + 2.5 = 8.43$

df = (rows - 1)(columns - 1) = (2-1)(3-1) = 2

P-value = P(χ² ≥ 8.43) ≈ 0.015

CONCLUDE: P-value = 0.015 < 0.05, reject H₀. Significant association between gender and sport preference.

Degrees of Freedom

Goodness of fit: df = k - 1 (k = number of categories)

Test of independence: df = (r - 1)(c - 1)

r = number of rows
c = number of columns

Conditions for Test of Independence

Random sample
All expected counts ≥ 5
Independent observations

Check expected counts, not observed!

Chi-Square vs Other Tests

Use chi-square when:

Categorical variables (not quantitative)
Comparing distributions
Testing independence

Use t-test when:

Quantitative variable
Comparing means

Use z-test for proportions when:

Single proportion or comparing two proportions
Binary outcome (special case of categorical)

Interpreting Results

Large χ²:

Observed far from expected
Evidence against H₀

Small χ²:

Observed close to expected
Consistent with H₀

Always use P-value for decision!

Calculator Commands (TI-83/84)

Goodness of fit:

Enter observed in list
STAT → TESTS → D:χ²GOF-Test
Enter expected counts

Test of independence:

Enter observed in matrix
STAT → TESTS → C:χ²-Test
Calculator computes expected

P-value: chi2cdf(χ², 99999, df)

Relationship Between Variables

If reject H₀ in test of independence:

Variables are associated
But doesn't tell us HOW they're related
Examine cell contributions and patterns

Cell contribution: (O - E)²/E for that cell

Large contribution → cell differs most from expected

Chi-Square for Homogeneity

Test if distribution is same across multiple populations

Setup: Same as independence (two-way table)
Difference: Conceptual (comparing populations vs testing independence)
Calculation: Identical to test of independence

Example: Do three schools have same distribution of favorite colors?

Common Mistakes

❌ Using chi-square for quantitative data
❌ Checking observed instead of expected counts
❌ Wrong df formula
❌ Two-tail P-value (always use upper tail!)
❌ Confusing goodness of fit with independence

Quick Reference

Goodness of Fit:

Tests if observed matches expected distribution
df = k - 1

Test of Independence:

Tests if two categorical variables independent
df = (r - 1)(c - 1)
Expected: (row total × column total) / grand total

Test Statistic: $\chi^2 = \sum \frac{(O - E)^2}{E}$

Conditions: Random, all expected ≥ 5, independent observations

Remember: Chi-square tests work with counts/frequencies of categorical variables. Large χ² = poor fit or strong association. Always check expected counts!

Chi-Square Tests

Chi-Square Tests

Chi-Square Goodness of Fit Test

Example 1: Goodness of Fit

Conditions for Goodness of Fit

Chi-Square Distribution

Chi-Square Test of Independence

Expected Counts for Independence

Example 2: Test of Independence

Degrees of Freedom

Conditions for Test of Independence

Chi-Square vs Other Tests

Interpreting Results

Calculator Commands (TI-83/84)

Relationship Between Variables

Chi-Square for Homogeneity

Common Mistakes

Quick Reference

📚 Practice Problems

Practice with Flashcards

Browse All Topics