Residuals and Residual Plots

Assessing model fit

Residuals and Residual Plots

What are Residuals?

Residual: Vertical distance from point to regression line

residual=yy^=observedpredicted\text{residual} = y - \hat{y} = \text{observed} - \text{predicted}

Positive residual: Point above line (prediction too low)
Negative residual: Point below line (prediction too high)

Sum of residuals = 0 (always, for least-squares line)

Residual Plot

Residual Plot: Scatterplot with x on horizontal axis, residuals on vertical axis

Purpose:

  • Check if linear model appropriate
  • Identify patterns suggesting problems
  • Detect outliers

Ideal Residual Plot

Good residual plot shows:

  1. Random scatter around horizontal line at 0
  2. No clear pattern (no curve, fan shape, etc.)
  3. Constant variability across x-values
  4. No outliers (points far from 0)

Interpretation: Linear model is appropriate

Patterns Indicating Problems

Curved pattern:

  • Linear model inappropriate
  • Relationship is nonlinear
  • Solution: Transform variables or use nonlinear model

Fan shape (increasing or decreasing spread):

  • Non-constant variance (heteroscedasticity)
  • Predictions less reliable for some x-values
  • Solution: Transform variables

Outliers:

  • Points far from horizontal band
  • Check for errors or unusual cases
  • Consider impact on regression line

Example 1: Good Residual Plot

Random scatter around 0, no pattern

Residuals randomly scattered above and below 0 with roughly constant spread.

Conclusion: Linear model appropriate

Example 2: Curved Residual Pattern

Residuals show U-shape or inverted U

Residuals form curved pattern (like parabola or inverted parabola).

Conclusion: Relationship is nonlinear, linear model inappropriate

Action: Try quadratic or other transformation

Example 3: Fan Shape

Spread increases (or decreases) with x

Residuals spread out more (or less) as x increases, forming fan or cone shape.

Conclusion: Non-constant variance

Action: May need transformation (e.g., log)

Standard Deviation of Residuals

Measures: Typical prediction error

s=(yy^)2n2s = \sqrt{\frac{\sum(y - \hat{y})^2}{n-2}}

Interpretation: "Typical distance of points from regression line is about s [y-units]."

Smaller s → better predictions (points closer to line)

Note: Denominator is n-2 (loses 2 df for slope and intercept)

Using s for Predictions

Rough prediction interval:

y^±2s\hat{y} \pm 2s

Interpretation: About 95% of predictions within 2s of actual value

Example: y^\hat{y} = 150 pounds, s = 10 pounds

Prediction interval ≈ 150 ± 20 = (130, 170) pounds

Outliers in Residual Plot

Outlier: Residual far from 0

Investigate:

  • Data entry error?
  • Unusual case?
  • Measurement error?

Impact:

  • Can affect regression line
  • May indicate different subgroup
  • Consider separate analysis with/without outlier

Checking Conditions for Regression

Use residual plot to check:

1. Linearity: Random scatter (no curve)

2. Equal variance: Constant spread across x-values

3. Independence: (Can't check from plot alone, depends on data collection)

4. Normality: (Check with histogram or normal probability plot of residuals)

Acronym: LINE (Linearity, Independence, Normality, Equal variance)

Histogram of Residuals

Purpose: Check if residuals approximately normal

Look for:

  • Roughly symmetric
  • Bell-shaped
  • No severe outliers

Note: Normality less critical for large samples (CLT)

Normal Probability Plot of Residuals

Purpose: Check normality of residuals

Good plot:

  • Points follow straight line
  • Little deviation from line

Bad plot:

  • Strong curvature
  • Many points far from line

Interpretation: If roughly linear, normality assumption reasonable

Influential Points

Identified in residual plot:

  • Large residual AND far from xˉ\bar{x} in x-direction

Test influence:

  1. Calculate regression with point
  2. Calculate regression without point
  3. Compare: Big change? Point is influential

Action: Report both analyses, investigate why point is unusual

Comparing Models

Use residual plots to compare different models:

Model 1 (linear): Residuals show pattern
Model 2 (quadratic): Residuals random scatter

Conclusion: Model 2 better (quadratic fits better than linear)

Also compare: Standard deviation of residuals (s)

  • Smaller s = better predictions

Calculator Methods

TI-83/84:

Get residuals:

  1. Run LinReg (stores residuals automatically in RESID list)
  2. 2nd STAT (LIST) → RESID

Plot residuals:

  1. STAT PLOT → Plot1
  2. Type: Scatterplot
  3. Xlist: L1, Ylist: RESID
  4. ZOOM → 9:ZoomStat

Common Mistakes

❌ Not checking residual plot (just looking at r²)
❌ Using linear model when residuals show curve
❌ Ignoring fan shape in residuals
❌ Not investigating outliers
❌ Confusing residuals with errors

Residuals vs Errors

Residual: Observed - Predicted (y - y^\hat{y})

  • Calculated from sample
  • Can compute

Error: Observed - True (y - E(y))

  • Theoretical (unknown)
  • Can't compute (don't know true relationship)

Residuals estimate errors

Transformations

If residual plot shows problems:

For curvature:

  • Try log(y), √y, or x²
  • Re-fit model with transformed variable
  • Check new residual plot

For fan shape:

  • Try log(y) transformation
  • Stabilizes variance

Goal: Residuals with no pattern and constant spread

Quick Reference

Residual: yy^y - \hat{y}

Good residual plot:

  • Random scatter around 0
  • No pattern
  • Constant spread

s: Typical prediction error

Check conditions: LINE (Linearity, Independence, Normality, Equal variance)

Problems to look for:

  • Curved pattern → nonlinear
  • Fan shape → non-constant variance
  • Outliers → investigate

Remember: Always examine residual plot! It reveals whether linear model is appropriate and highlights potential problems. Don't rely on correlation alone!

📚 Practice Problems

No example problems available yet.