Residuals and Residual Plots
Assessing model fit
Residuals and Residual Plots
What are Residuals?
Residual: Vertical distance from point to regression line
Positive residual: Point above line (prediction too low)
Negative residual: Point below line (prediction too high)
Sum of residuals = 0 (always, for least-squares line)
Residual Plot
Residual Plot: Scatterplot with x on horizontal axis, residuals on vertical axis
Purpose:
- Check if linear model appropriate
- Identify patterns suggesting problems
- Detect outliers
Ideal Residual Plot
Good residual plot shows:
- Random scatter around horizontal line at 0
- No clear pattern (no curve, fan shape, etc.)
- Constant variability across x-values
- No outliers (points far from 0)
Interpretation: Linear model is appropriate
Patterns Indicating Problems
Curved pattern:
- Linear model inappropriate
- Relationship is nonlinear
- Solution: Transform variables or use nonlinear model
Fan shape (increasing or decreasing spread):
- Non-constant variance (heteroscedasticity)
- Predictions less reliable for some x-values
- Solution: Transform variables
Outliers:
- Points far from horizontal band
- Check for errors or unusual cases
- Consider impact on regression line
Example 1: Good Residual Plot
Random scatter around 0, no pattern
Residuals randomly scattered above and below 0 with roughly constant spread.
Conclusion: Linear model appropriate
Example 2: Curved Residual Pattern
Residuals show U-shape or inverted U
Residuals form curved pattern (like parabola or inverted parabola).
Conclusion: Relationship is nonlinear, linear model inappropriate
Action: Try quadratic or other transformation
Example 3: Fan Shape
Spread increases (or decreases) with x
Residuals spread out more (or less) as x increases, forming fan or cone shape.
Conclusion: Non-constant variance
Action: May need transformation (e.g., log)
Standard Deviation of Residuals
Measures: Typical prediction error
Interpretation: "Typical distance of points from regression line is about s [y-units]."
Smaller s → better predictions (points closer to line)
Note: Denominator is n-2 (loses 2 df for slope and intercept)
Using s for Predictions
Rough prediction interval:
Interpretation: About 95% of predictions within 2s of actual value
Example: = 150 pounds, s = 10 pounds
Prediction interval ≈ 150 ± 20 = (130, 170) pounds
Outliers in Residual Plot
Outlier: Residual far from 0
Investigate:
- Data entry error?
- Unusual case?
- Measurement error?
Impact:
- Can affect regression line
- May indicate different subgroup
- Consider separate analysis with/without outlier
Checking Conditions for Regression
Use residual plot to check:
1. Linearity: Random scatter (no curve)
2. Equal variance: Constant spread across x-values
3. Independence: (Can't check from plot alone, depends on data collection)
4. Normality: (Check with histogram or normal probability plot of residuals)
Acronym: LINE (Linearity, Independence, Normality, Equal variance)
Histogram of Residuals
Purpose: Check if residuals approximately normal
Look for:
- Roughly symmetric
- Bell-shaped
- No severe outliers
Note: Normality less critical for large samples (CLT)
Normal Probability Plot of Residuals
Purpose: Check normality of residuals
Good plot:
- Points follow straight line
- Little deviation from line
Bad plot:
- Strong curvature
- Many points far from line
Interpretation: If roughly linear, normality assumption reasonable
Influential Points
Identified in residual plot:
- Large residual AND far from in x-direction
Test influence:
- Calculate regression with point
- Calculate regression without point
- Compare: Big change? Point is influential
Action: Report both analyses, investigate why point is unusual
Comparing Models
Use residual plots to compare different models:
Model 1 (linear): Residuals show pattern
Model 2 (quadratic): Residuals random scatter
Conclusion: Model 2 better (quadratic fits better than linear)
Also compare: Standard deviation of residuals (s)
- Smaller s = better predictions
Calculator Methods
TI-83/84:
Get residuals:
- Run LinReg (stores residuals automatically in RESID list)
- 2nd STAT (LIST) → RESID
Plot residuals:
- STAT PLOT → Plot1
- Type: Scatterplot
- Xlist: L1, Ylist: RESID
- ZOOM → 9:ZoomStat
Common Mistakes
❌ Not checking residual plot (just looking at r²)
❌ Using linear model when residuals show curve
❌ Ignoring fan shape in residuals
❌ Not investigating outliers
❌ Confusing residuals with errors
Residuals vs Errors
Residual: Observed - Predicted (y - )
- Calculated from sample
- Can compute
Error: Observed - True (y - E(y))
- Theoretical (unknown)
- Can't compute (don't know true relationship)
Residuals estimate errors
Transformations
If residual plot shows problems:
For curvature:
- Try log(y), √y, or x²
- Re-fit model with transformed variable
- Check new residual plot
For fan shape:
- Try log(y) transformation
- Stabilizes variance
Goal: Residuals with no pattern and constant spread
Quick Reference
Residual:
Good residual plot:
- Random scatter around 0
- No pattern
- Constant spread
s: Typical prediction error
Check conditions: LINE (Linearity, Independence, Normality, Equal variance)
Problems to look for:
- Curved pattern → nonlinear
- Fan shape → non-constant variance
- Outliers → investigate
Remember: Always examine residual plot! It reveals whether linear model is appropriate and highlights potential problems. Don't rely on correlation alone!
📚 Practice Problems
No example problems available yet.
Practice with Flashcards
Review key concepts with our flashcard system
Browse All Topics
Explore other calculus topics