Least-Squares Regression
Finding the line of best fit
Least-Squares Regression
Regression Line
Purpose: Find best-fit line through scatterplot
Equation:
Where:
- = predicted value of y
- b = slope
- a = y-intercept
- x = value of explanatory variable
Least-Squares Criterion
Least-squares regression line: Line minimizing sum of squared residuals
Residual: Difference between observed and predicted
Least-squares minimizes:
Why square? Positive and negative deviations don't cancel
Formulas for Slope and Intercept
Slope:
Where:
- r = correlation
- s_y = standard deviation of y
- s_x = standard deviation of x
y-intercept:
Key insight: Line always passes through
Example: Finding Regression Line
Data: Height (x) and weight (y) of 5 people
,
,
Slope:
Intercept:
Equation:
Interpretation: For each inch increase in height, predicted weight increases by 4 pounds.
Interpreting Slope
Slope b = change in per unit increase in x
Template: "For each [1 unit] increase in [x], predicted [y] [increases/decreases] by [|b|] [y-units]."
Example: b = 4 in height/weight
"For each 1-inch increase in height, predicted weight increases by 4 pounds."
Negative slope: "decreases by..."
Interpreting y-Intercept
y-intercept a = predicted y when x = 0
Often meaningless!
- Height = 0 → weight = -122 pounds? Nonsense!
Only interpret if x = 0 is meaningful and within data range
Example where meaningful:
- y = test score, x = hours studied
- a = predicted score with 0 hours studying
Making Predictions
Substitute x into equation:
Example: Predict weight for height = 70 inches
Caution: Extrapolation (predicting outside data range) is risky!
Extrapolation
Interpolation: Predict within range of data ✓
Extrapolation: Predict outside range of data ⚠
Problem with extrapolation:
- Relationship may not continue
- May become nonlinear
- Other factors may matter
Example: Predicting weight for height = 100 inches
- Well outside typical range
- Relationship might not hold
- Prediction unreliable
Calculator Method
TI-83/84:
- Enter data in L1 (x) and L2 (y)
- STAT → CALC → 8:LinReg(a+bx)
- Read a, b, r, r²
Result shows:
- y = a + bx
- r (correlation)
- r² (coefficient of determination)
Properties of Regression Line
1. Passes through (, )
2. Sum of residuals = 0
- Positive and negative balance out
3. Unique (only one least-squares line)
4. Sensitive to outliers
- One outlier can drastically change line
Residuals
Residual = observed - predicted = y -
Positive residual: Point above line (underestimate)
Negative residual: Point below line (overestimate)
Zero residual: Point on line (exact prediction)
Example: Actual weight = 160, predicted = 158
- Residual = 160 - 158 = 2 pounds
- Underestimated by 2 pounds
Influential Points
Influential point: Removing it substantially changes regression line
Usually:
- Outliers in x-direction (far from )
- Have high leverage (pull line toward them)
Not all outliers are influential!
- Outlier in y-direction but near → less influential
Always identify and investigate influential points
Regression Toward the Mean
Phenomenon: Extreme x-values tend to predict less extreme y-values
Why? Correlation < 1 (not perfect relationship)
Example: Very tall parents tend to have shorter children (still tall, but less extreme)
Slope formula explains:
- Since r < 1, predicted change smaller than proportional
Switching x and y
Regression NOT symmetric!
Different lines:
- Regression of y on x:
- Regression of x on y:
These are NOT equivalent!
Use: Predict y from x → use y on x line
Common Mistakes
❌ Interpreting y-intercept when x = 0 meaningless
❌ Extrapolating beyond data range
❌ Confusing slope units
❌ Thinking regression proves causation
❌ Using regression when relationship nonlinear
Causation Reminder
Regression line can be used for prediction
Does NOT prove causation!
Strong relationship ≠ cause-and-effect
Need: Controlled experiment to establish causation
Quick Reference
Equation:
Slope:
Intercept:
Line passes through:
Residual:
Least-squares minimizes:
Remember: Regression gives best prediction line but doesn't prove causation. Beware extrapolation! Always check for influential points.
📚 Practice Problems
No example problems available yet.
Practice with Flashcards
Review key concepts with our flashcard system
Browse All Topics
Explore other calculus topics