Chi-Square Tests
Goodness of fit and independence tests
Chi-Square Tests
Chi-Square Goodness of Fit Test
Purpose: Test if observed frequencies match expected distribution
Example: Die rolled 60 times. Are outcomes equally likely?
Hypotheses:
- H₀: Distribution matches expected (die is fair)
- Hₐ: Distribution doesn't match expected (die is biased)
Test Statistic:
Where:
- O = observed count
- E = expected count
- Sum over all categories
df = number of categories - 1
Example 1: Goodness of Fit
Roll die 60 times:
| Outcome | 1 | 2 | 3 | 4 | 5 | 6 | |---------|----|----|----|----|----|----| | Observed| 8 | 12 | 9 | 11 | 15 | 5 | | Expected| 10 | 10 | 10 | 10 | 10 | 10 |
STATE:
- H₀: Die is fair (all outcomes equally likely)
- Hₐ: Die is not fair
- α = 0.05
PLAN:
- Chi-square goodness of fit
- Conditions: All expected ≥ 5 ✓
DO:
df = 6 - 1 = 5
P-value = P(χ² ≥ 6.0) ≈ 0.306 (from chi2cdf)
CONCLUDE: P-value = 0.306 > 0.05, fail to reject H₀. No evidence die is biased.
Conditions for Goodness of Fit
- Random sample
- All expected counts ≥ 5
- Independent observations
If expected < 5: Combine categories if makes sense
Chi-Square Distribution
Properties:
- Always positive (squared differences)
- Right-skewed
- Shape depends on df
- As df increases, approaches normal
P-value: Always upper tail (larger χ² = worse fit)
Chi-Square Test of Independence
Purpose: Test if two categorical variables are independent
Setup: Two-way table (contingency table)
Hypotheses:
- H₀: Variables are independent
- Hₐ: Variables are associated (dependent)
Expected Counts for Independence
For each cell:
If independent: Expected count = what we'd expect by chance alone
Example 2: Test of Independence
Relationship between gender and favorite sport (200 students):
| | Baseball | Basketball | Soccer | Total | |--------|----------|------------|--------|-------| | Male | 30 | 40 | 30 | 100 | | Female | 20 | 30 | 50 | 100 | | Total | 50 | 70 | 80 | 200 |
Expected for Male/Baseball:
All expected counts:
| | Baseball | Basketball | Soccer | |--------|----------|------------|--------| | Male | 25 | 35 | 40 | | Female | 25 | 35 | 40 |
STATE:
- H₀: Gender and sport preference are independent
- Hₐ: Gender and sport preference are associated
- α = 0.05
DO:
df = (rows - 1)(columns - 1) = (2-1)(3-1) = 2
P-value = P(χ² ≥ 8.43) ≈ 0.015
CONCLUDE: P-value = 0.015 < 0.05, reject H₀. Significant association between gender and sport preference.
Degrees of Freedom
Goodness of fit: df = k - 1 (k = number of categories)
Test of independence: df = (r - 1)(c - 1)
- r = number of rows
- c = number of columns
Conditions for Test of Independence
- Random sample
- All expected counts ≥ 5
- Independent observations
Check expected counts, not observed!
Chi-Square vs Other Tests
Use chi-square when:
- Categorical variables (not quantitative)
- Comparing distributions
- Testing independence
Use t-test when:
- Quantitative variable
- Comparing means
Use z-test for proportions when:
- Single proportion or comparing two proportions
- Binary outcome (special case of categorical)
Interpreting Results
Large χ²:
- Observed far from expected
- Evidence against H₀
Small χ²:
- Observed close to expected
- Consistent with H₀
Always use P-value for decision!
Calculator Commands (TI-83/84)
Goodness of fit:
- Enter observed in list
- STAT → TESTS → D:χ²GOF-Test
- Enter expected counts
Test of independence:
- Enter observed in matrix
- STAT → TESTS → C:χ²-Test
- Calculator computes expected
P-value: chi2cdf(χ², 99999, df)
Relationship Between Variables
If reject H₀ in test of independence:
- Variables are associated
- But doesn't tell us HOW they're related
- Examine cell contributions and patterns
Cell contribution: (O - E)²/E for that cell
- Large contribution → cell differs most from expected
Chi-Square for Homogeneity
Test if distribution is same across multiple populations
Setup: Same as independence (two-way table)
Difference: Conceptual (comparing populations vs testing independence)
Calculation: Identical to test of independence
Example: Do three schools have same distribution of favorite colors?
Common Mistakes
❌ Using chi-square for quantitative data
❌ Checking observed instead of expected counts
❌ Wrong df formula
❌ Two-tail P-value (always use upper tail!)
❌ Confusing goodness of fit with independence
Quick Reference
Goodness of Fit:
- Tests if observed matches expected distribution
- df = k - 1
Test of Independence:
- Tests if two categorical variables independent
- df = (r - 1)(c - 1)
- Expected: (row total × column total) / grand total
Test Statistic:
Conditions: Random, all expected ≥ 5, independent observations
Remember: Chi-square tests work with counts/frequencies of categorical variables. Large χ² = poor fit or strong association. Always check expected counts!
📚 Practice Problems
No example problems available yet.
Practice with Flashcards
Review key concepts with our flashcard system
Browse All Topics
Explore other calculus topics