Loadingโฆ
Analyze residual plots to assess the fit of a regression model.
Learn step-by-step with practice exercises built right in.
Residual: The difference between observed and predicted value
For regression ลท = 10 + 2x, calculate the residual for the point (5, 25).
Step 1: Identify actual value Point (5, 25): x = 5, y = 25 (actual)
Step 2: Calculate predicted value ลท = 10 + 2(5) = 10 + 10 = 20
Step 3: Calculate residual Residual = y - ลท Residual = 25 - 20 = 5
Step 4: Interpret The residual is POSITIVE (+5), meaning:
Answer: Residual = 5 (point is above the line)
Avoid these 3 frequent errors
Review key concepts with our flashcard system
Explore more AP Statistics topics
Interpretation:
A residual plot graphs residuals on y-axis vs. predicted values (or x-values) on x-axis.
Data: Height vs. Weight (n = 30 students)
Regression line:
One student: Height = 70 in, actual weight = 240 lbs
lbs
Residual lbs (actual weight below prediction)
If residual plot shows random scatter, linear model fits well.
Residual plots help check conditions 1, 3, 4.
โ Confusing residuals with errors (same thing, different context) โ Ignoring patterns; thinking a small curve is "close enough" โ Not checking residuals before making predictions โ Using residuals to predict; they should center on 0
Say "The residual plot shows random scatter with no pattern, so a linear model is appropriate." Or "The residual plot shows a curved pattern, indicating the relationship is non-linear."
A residual plot shows points scattered randomly around zero with no pattern. What does this indicate?
Step 1: Understand what random scatter means Good residual plot characteristics: โ Points scattered RANDOMLY โ No curved, U-shaped, or other patterns โ Roughly equal spread at all x values โ Centered around residual = 0
Step 2: What this indicates The linear model is APPROPRIATE:
Step 3: What to do โ Can proceed with predictions โ Can trust confidence intervals โ Linear regression is validated
Answer: Random scatter indicates the linear model is APPROPRIATE. The relationship is truly linear, variance is constant, and there are no systematic errors.
A residual plot shows a curved (U-shaped) pattern. What does this suggest and what should you do?
Step 1: Identify the problem U-shaped or curved residual plot means: Linear model is INAPPROPRIATE
The relationship is actually nonlinear (curved).
Step 2: Why this is a problem
Step 3: Solutions Option 1: Transform the data
Option 2: Use nonlinear regression
Step 4: Check new model After transformation, residual plot should show random scatter.
Answer: Curved residuals indicate NONLINEAR relationship. Transform variables (log, square root) or use nonlinear regression. Recheck residuals after adjustment.
For points (1,3), (2,5), (3,6) with regression ลท = 2 + 1.5x, verify residuals sum to zero.
Step 1: Calculate predicted values Point 1: ลทโ = 2 + 1.5(1) = 3.5 Point 2: ลทโ = 2 + 1.5(2) = 5 Point 3: ลทโ = 2 + 1.5(3) = 6.5
Step 2: Calculate residuals Residual = y - ลท
Point 1: eโ = 3 - 3.5 = -0.5 Point 2: eโ = 5 - 5 = 0 Point 3: eโ = 6 - 6.5 = -0.5
Step 3: Sum residuals ฮฃ(residuals) = -0.5 + 0 + (-0.5) = -1.0
This is close to zero (small rounding error).
Step 4: Why residuals sum to zero Mathematical property: For least-squares regression, ฮฃ(y - ลท) = 0 ALWAYS
Answer: Residuals sum to approximately 0. For true least-squares line, they ALWAYS sum exactly to zero.
A residual plot shows increasing spread (fan shape) as x increases. What does this violate and what are the implications?
Step 1: Identify the violation Fan-shaped residuals violate: CONSTANT VARIANCE (homoscedasticity)
The spread increases with x (heteroscedasticity).
Step 2: Implications for predictions
Step 3: Implications for inference
Note: Estimates (slope, intercept) are still unbiased, but uncertainty measures are wrong.
Step 4: Solutions
Answer: Violates CONSTANT VARIANCE assumption. Standard errors and confidence intervals unreliable. Solutions: transform y, use weighted least squares, or robust standard errors.