Residuals and Residual Plots

Analyze residual plots to assess the fit of a regression model.

🎯⭐ INTERACTIVE LESSON

Try the Interactive Version!

Learn step-by-step with practice exercises built right in.

Start Interactive Lesson →

Residuals and Residual Plots

What Are Residuals?

residual=yy^=observedpredicted\text{residual} = y - \hat{y} = \text{observed} - \text{predicted}

A residual measures how far each observation is from the regression line.

Residual Plots

A residual plot graphs residuals (vertical axis) against the explanatory variable xx or the predicted values y^\hat{y} (horizontal axis).

Interpreting Residual Plots

Good Fit (Linear Model Appropriate)

  • Points scattered randomly around the horizontal line y=0y = 0
  • No obvious pattern
  • Roughly equal spread throughout

Curved Pattern

  • Indicates the relationship is not linear
  • A curved model (quadratic, exponential, etc.) may be more appropriate
  • Consider transforming the data

Fan Shape (Heteroscedasticity)

  • Spread of residuals increases (or decreases) as xx increases
  • Indicates non-constant variability
  • May need a transformation

Outliers in Residuals

  • Points with large residuals (far from 0) are regression outliers
  • These may indicate unusual observations worth investigating

Using Residual Plots to Assess Models

| Residual Plot Pattern | Assessment | |----------------------|------------| | Random scatter | Linear model is appropriate ✅ | | Curved pattern | Need a nonlinear model ❌ | | Fan/funnel shape | Non-constant variance ❌ | | Clusters | Possibly missing a variable |

Standard Deviation of Residuals (ss)

s=(yiy^i)2n2s = \sqrt{\frac{\sum(y_i - \hat{y}_i)^2}{n - 2}}

Interpretation: "The actual [y-values] typically differ from the values predicted by the LSRL by about ss [units]."

We divide by n2n - 2 because we estimated two parameters (aa and bb).

Key Properties of Residuals

  1. The mean of residuals is always 0: eˉ=0\bar{e} = 0
  2. The residuals have no linear relationship with xx
  3. The sum of squared residuals is minimized by the LSRL

AP Tip: On the AP exam, when asked "Is a linear model appropriate?", always refer to the residual plot (not the scatterplot or rr). A residual plot showing random scatter indicates the linear model is appropriate.

📚 Practice Problems

No example problems available yet.