Loading…
Chi-square test for independence (one sample, two categorical variables) and for homogeneity (multiple independent samples), including expected counts, degrees of freedom, and conditions.
Learn step-by-step with practice exercises built right in.
Chi-square tests are used to analyze the relationship between two categorical variables. Two common scenarios are tested: independence (one sample, two categorical variables) and homogeneity (multiple independent samples, one categorical variable each).
Both tests use the same test statistic:
where:
The test statistic follows a chi-square distribution with degrees of freedom depending on the context.
Purpose: Test whether two categorical variables in a single population are independent.
Hypotheses:
Expected Cell Counts (under of independence):
Degrees of Freedom: where = number of rows, = number of columns in the contingency table.
Purpose: Test whether the distribution of a categorical variable is the same across multiple independent groups (samples).
Hypotheses:
Expected Cell Counts (under of homogeneity):
Note: Formula is identical to the independence test, but interpretation differs.
Degrees of Freedom: where = number of categories in the variable, = number of groups/samples.
| Condition | Requirement |
|---|---|
| Random sampling | Data must be from random samples. |
| Independence | Observations are independent within each cell. |
| Large Counts | All expected cell counts must be (some allow if no more than 20% of cells are ). |
A study of 200 college students asks about Gender (M/F) and Exercise Habit (Regular/Irregular). The data:
| Regular | Irregular | Total | |
|---|---|---|---|
| Male | 40 | 30 | 70 |
| Female | 60 | 70 | 130 |
| Total | 100 | 100 | 200 |
Test : Gender and Exercise are independent, at .
Calculate expected counts:
All expected counts ✓
Calculate :
Degrees of freedom:
p-value: For with , .
Since , fail to reject . No significant evidence that gender and exercise are associated.
Compare the distribution of Political Party (Democrat/Republican/Independent) across three Age Groups (18–30, 31–50, 50+). Sample of 600 people:
| Party | 18–30 | 31–50 | 50+ | Total |
|---|---|---|---|---|
| Dem. | 80 | 90 | 70 | 240 |
| Rep. | 50 | 100 | 110 | 260 |
| Ind. | 20 | 30 | 30 | 80 |
| Total | 150 |
Test : Political party distribution is the same across age groups, at .
Calculate expected counts (sample):
Calculate (simplified; full calculation omitted):
Degrees of freedom:
p-value: For with , .
Since , reject . Significant evidence that political party distribution differs across age groups.
⚠️ Using Observed Instead of Expected Counts: The chi-square statistic compares observed counts to expected counts under . Always compute expected counts carefully; a mistake here invalidates the entire test.
⚠️ Ignoring the Large Counts Condition: If expected counts are too small (typically ), the chi-square distribution is not a good approximation. Combine categories or use Fisher's exact test if appropriate.
⚠️ Confusing Independence and Homogeneity: Both use the same statistic and formula, but test different hypotheses. Independence: one sample, two variables. Homogeneity: multiple samples, one variable.
💡 TI-84 / TI-Nspire: Use Goodness-of-Fit or Test. Enter the observed counts in a matrix, then calculate. The calculator computes expected counts, , df, and p-value automatically. (Some calculators use "Chi2 Test" for homogeneity/independence and "Chi2 GOF" for goodness-of-fit.)
In a contingency table with row totals (100, 150) and column totals (80, 110, 60), find the expected count for the cell in row 1, column 2.
Grand total:
Expected count:
Answer: Expected count = 44.
A contingency table for Color (Red/Blue) and Size (Small/Large) has observed counts: (30, 20, 25, 40). Row totals: (50, 65), Column totals: (55, 55). Compute and determine whether to reject at .
A test of homogeneity compares Preference (Yes/No) across four Groups (A, B, C, D). The observed . With and , find the critical value and determine the conclusion.
Avoid these 3 frequent errors
| Sample Size | Usually to ensure reliable results. |
| 220 |
| 230 |
| 600 |
Organize:
| Red | Blue | Total | |
|---|---|---|---|
| Small | 30 | 20 | 50 |
| Large | 25 | 40 | 65 |
| Total | 55 | 55 | 110 |
Expected counts:
All ✓
Chi-square:
Degrees of freedom:
p-value:
Since , reject . Significant evidence that Size and Color are associated.
Critical value from chi-square table: For and , the critical value is .
Decision: Since the observed , we reject .
Alternatively, using p-value:
Since , we reject .
Conclusion: There is significant evidence that the distribution of preferences (Yes/No) differs across the four groups. At least one group has a different preference distribution than the others.