Chi-Square Tests

Goodness of fit and independence tests

Chi-Square Tests

Chi-Square Goodness of Fit Test

Purpose: Test if observed frequencies match expected distribution

Example: Die rolled 60 times. Are outcomes equally likely?

Hypotheses:

  • H₀: Distribution matches expected (die is fair)
  • Hₐ: Distribution doesn't match expected (die is biased)

Test Statistic:

χ2=(OE)2E\chi^2 = \sum \frac{(O - E)^2}{E}

Where:

  • O = observed count
  • E = expected count
  • Sum over all categories

df = number of categories - 1

Example 1: Goodness of Fit

Roll die 60 times:

| Outcome | 1 | 2 | 3 | 4 | 5 | 6 | |---------|----|----|----|----|----|----| | Observed| 8 | 12 | 9 | 11 | 15 | 5 | | Expected| 10 | 10 | 10 | 10 | 10 | 10 |

STATE:

  • H₀: Die is fair (all outcomes equally likely)
  • Hₐ: Die is not fair
  • α = 0.05

PLAN:

  • Chi-square goodness of fit
  • Conditions: All expected ≥ 5 ✓

DO:

χ2=(810)210+(1210)210+...+(510)210\chi^2 = \frac{(8-10)^2}{10} + \frac{(12-10)^2}{10} + ... + \frac{(5-10)^2}{10}

=410+410+110+110+2510+2510=6.0= \frac{4}{10} + \frac{4}{10} + \frac{1}{10} + \frac{1}{10} + \frac{25}{10} + \frac{25}{10} = 6.0

df = 6 - 1 = 5

P-value = P(χ² ≥ 6.0) ≈ 0.306 (from chi2cdf)

CONCLUDE: P-value = 0.306 > 0.05, fail to reject H₀. No evidence die is biased.

Conditions for Goodness of Fit

  1. Random sample
  2. All expected counts ≥ 5
  3. Independent observations

If expected < 5: Combine categories if makes sense

Chi-Square Distribution

Properties:

  • Always positive (squared differences)
  • Right-skewed
  • Shape depends on df
  • As df increases, approaches normal

P-value: Always upper tail (larger χ² = worse fit)

Chi-Square Test of Independence

Purpose: Test if two categorical variables are independent

Setup: Two-way table (contingency table)

Hypotheses:

  • H₀: Variables are independent
  • Hₐ: Variables are associated (dependent)

Expected Counts for Independence

For each cell:

E=(row total)(column total)grand totalE = \frac{(\text{row total})(\text{column total})}{\text{grand total}}

If independent: Expected count = what we'd expect by chance alone

Example 2: Test of Independence

Relationship between gender and favorite sport (200 students):

| | Baseball | Basketball | Soccer | Total | |--------|----------|------------|--------|-------| | Male | 30 | 40 | 30 | 100 | | Female | 20 | 30 | 50 | 100 | | Total | 50 | 70 | 80 | 200 |

Expected for Male/Baseball:

E=100×50200=25E = \frac{100 \times 50}{200} = 25

All expected counts:

| | Baseball | Basketball | Soccer | |--------|----------|------------|--------| | Male | 25 | 35 | 40 | | Female | 25 | 35 | 40 |

STATE:

  • H₀: Gender and sport preference are independent
  • Hₐ: Gender and sport preference are associated
  • α = 0.05

DO:

χ2=(3025)225+(4035)235+...+(5040)240\chi^2 = \frac{(30-25)^2}{25} + \frac{(40-35)^2}{35} + ... + \frac{(50-40)^2}{40}

=1+0.714+2.5+1+0.714+2.5=8.43= 1 + 0.714 + 2.5 + 1 + 0.714 + 2.5 = 8.43

df = (rows - 1)(columns - 1) = (2-1)(3-1) = 2

P-value = P(χ² ≥ 8.43) ≈ 0.015

CONCLUDE: P-value = 0.015 < 0.05, reject H₀. Significant association between gender and sport preference.

Degrees of Freedom

Goodness of fit: df = k - 1 (k = number of categories)

Test of independence: df = (r - 1)(c - 1)

  • r = number of rows
  • c = number of columns

Conditions for Test of Independence

  1. Random sample
  2. All expected counts ≥ 5
  3. Independent observations

Check expected counts, not observed!

Chi-Square vs Other Tests

Use chi-square when:

  • Categorical variables (not quantitative)
  • Comparing distributions
  • Testing independence

Use t-test when:

  • Quantitative variable
  • Comparing means

Use z-test for proportions when:

  • Single proportion or comparing two proportions
  • Binary outcome (special case of categorical)

Interpreting Results

Large χ²:

  • Observed far from expected
  • Evidence against H₀

Small χ²:

  • Observed close to expected
  • Consistent with H₀

Always use P-value for decision!

Calculator Commands (TI-83/84)

Goodness of fit:

  • Enter observed in list
  • STAT → TESTS → D:χ²GOF-Test
  • Enter expected counts

Test of independence:

  • Enter observed in matrix
  • STAT → TESTS → C:χ²-Test
  • Calculator computes expected

P-value: chi2cdf(χ², 99999, df)

Relationship Between Variables

If reject H₀ in test of independence:

  • Variables are associated
  • But doesn't tell us HOW they're related
  • Examine cell contributions and patterns

Cell contribution: (O - E)²/E for that cell

  • Large contribution → cell differs most from expected

Chi-Square for Homogeneity

Test if distribution is same across multiple populations

Setup: Same as independence (two-way table)
Difference: Conceptual (comparing populations vs testing independence)
Calculation: Identical to test of independence

Example: Do three schools have same distribution of favorite colors?

Common Mistakes

❌ Using chi-square for quantitative data
❌ Checking observed instead of expected counts
❌ Wrong df formula
❌ Two-tail P-value (always use upper tail!)
❌ Confusing goodness of fit with independence

Quick Reference

Goodness of Fit:

  • Tests if observed matches expected distribution
  • df = k - 1

Test of Independence:

  • Tests if two categorical variables independent
  • df = (r - 1)(c - 1)
  • Expected: (row total × column total) / grand total

Test Statistic: χ2=(OE)2E\chi^2 = \sum \frac{(O - E)^2}{E}

Conditions: Random, all expected ≥ 5, independent observations

Remember: Chi-square tests work with counts/frequencies of categorical variables. Large χ² = poor fit or strong association. Always check expected counts!

📚 Practice Problems

No example problems available yet.