Loading…
Interpret r² as the proportion of variability explained by the regression model.
Learn step-by-step with practice exercises built right in.
Coefficient of determination (\(r^2\)): proportion of variation in response variable y explained by linear relationship with x
Formula:
where r = correlation coefficient
Range: 0 ≤ \(r^2\) ≤ 1 (always non-negative)
A regression has correlation r = 0.8. Calculate and interpret R².
Step 1: Calculate R² Formula: R² = r²
R² = (0.8)² = 0.64
Step 2: Express as percentage R² = 0.64 = 64%
Step 3: Interpret "64% of the variability in y is explained by the linear relationship with x."
The remaining 36% is unexplained variation (random error, other factors).
Step 4: Implications R² = 0.64 suggests:
Answer: R² = 0.64 or 64%. This means 64% of the variation in y is explained by the linear relationship with x.
Avoid these 3 frequent errors
Review key concepts with our flashcard system
Explore more AP Statistics topics
Total variation in y: measured by sum of squared deviations from mean
Variation explained by regression: measured by sum of squared deviations of predicted values from mean
Variation not explained (residual):
Decomposition:
Scenario: 5 students, hours studied (x) vs. exam score (y)
| Hours | Score | \(\hat{y}\) | \(y - \bar{y}\) | \(\hat{y} - \bar{y}\) | \(y - \hat{y}\) |
|---|---|---|---|---|---|
| 1 | 55 | 60 | -15 | -20 | -5 |
| 2 | 65 | 67 | -5 | -13 | -2 |
| 3 | 72 | 74 | 2 | -6 | -2 |
| 4 | 83 | 81 | 13 | 7 | 2 |
| 5 | 90 | 88 | 20 | 14 | 2 |
Mean score \(\bar{y} = 73\). Regression line: \(\hat{y} = 53 + 7x\), so \(r ≈ 0.98\)
Total variation:
Explained variation: approximately equal to total variation when \(r\) is near 1.
Simplified calculation: \(r^2 = (0.98)^2 = 0.9604 ≈ 0.96\)
Interpretation: 96% of variation in exam scores is explained by hours studied; 4% due to other factors (test difficulty, student ability, etc.).
Note: small increase in r causes larger increase in \(r^2\) (quadratic relationship)
Use \(r^2\) when:
Caution: high \(r^2\) doesn't prove causation; still need experimental design
When asked to interpret \(r^2\):
Template: "[r²]% of the variation in [y-variable] is explained by the linear regression model with [x-variable]. The remaining [100−r²]% is due to other factors."
Example response: "\(r^2 = 0.84\) means that 84% of the variation in exam scores can be explained by the linear relationship with hours studied. The remaining 16% of variation is attributable to other factors such as prior knowledge, test difficulty, or sleep quality."
On calculator: \(r^2 = \text{coefficient of determination}\) displayed when you fit linear regression (alongside slope, intercept, r).
Model A has R² = 0.85, Model B has R² = 0.45. Which is better for predictions?
Step 1: Compare R² values Model A: R² = 0.85 = 85% explained Model B: R² = 0.45 = 45% explained
Step 2: Model A interpretation
Step 3: Model B interpretation
Step 4: Conclusion Model A is BETTER because:
Answer: Model A is better. It explains 85% of variation versus only 45% for Model B, meaning more accurate predictions.
A regression has R² = 0.49. What is the correlation r? Can you determine the sign?
Step 1: Calculate |r| R² = r² 0.49 = r² r = ±√0.49 = ±0.7
So |r| = 0.7
Step 2: Determine sign From R² ALONE, cannot determine sign!
Both r = +0.7 and r = -0.7 give R² = 0.49
Step 3: How to find sign Need additional information:
Step 4: Why R² loses sign R² = r² means squaring eliminates sign: (+0.7)² = 0.49 (-0.7)² = 0.49
Answer: |r| = 0.7, but CANNOT determine sign from R² alone. Need slope sign or scatterplot to determine if r = +0.7 or -0.7.
Explain why R² must be between 0 and 1.
Step 1: R² definition R² = r² = (correlation)²
Step 2: Why R² ≥ 0 Any number squared is non-negative:
Step 3: Why R² ≤ 1 Correlation is bounded: -1 ≤ r ≤ 1
Squaring preserves this:
Step 4: Interpretation R² = 0: No linear relationship (0% explained) R² = 1: Perfect linear relationship (100% explained)
You cannot explain less than 0% or more than 100%!
Step 5: If you see R² = 1.5 or R² = -0.3 CALCULATION ERROR! Recheck your work.
Answer: R² must be 0 ≤ R² ≤ 1 because it equals r² (always non-negative) and correlation is bounded by -1 ≤ r ≤ 1. Cannot explain less than 0% or more than 100% of variation.
A model has SST = 500 and SSE = 125. Calculate and interpret R².
Step 1: Understand sum of squares SST = Total Sum of Squares = total variation SSE = Sum of Squared Errors = unexplained variation SSR = Regression Sum of Squares = explained variation
Relationship: SST = SSR + SSE
Step 2: Calculate SSR SSR = SST - SSE SSR = 500 - 125 = 375
Step 3: Calculate R² Formula: R² = SSR/SST
R² = 375/500 = 0.75
Alternative: R² = 1 - SSE/SST = 1 - 125/500 = 1 - 0.25 = 0.75 ✓
Step 4: Interpret R² = 0.75 = 75%
"75% of the total variation in y is explained by the regression model."
Explained: 375/500 = 75% Unexplained: 125/500 = 25%
Answer: R² = 0.75 or 75%. The model explains 375 out of 500 total units of variation, leaving 125 units (25%) unexplained.