Purpose: predict response variable y from explanatory variable x
Goal: minimize vertical distances (residuals) from points to line
Form:y^=a+bx
where:
\(\hat{y}\) = predicted y value (not actual y)
📚 Practice Problems
1Problem 1medium
❓ Question:
A study measures hours studied (x) and test scores (y) for 5 students: (2,65), (3,70), (4,75), (5,80), (6,85). Given x̄ = 4, ȳ = 75, calculate the least-squares regression line.
Find and interpret the least-squares regression line (LSRL) and make predictions.
How can I study Least-Squares Regression effectively?▾
Start by reading the study notes and working through the examples on this page. Then use the flashcards to test your recall. Practice with the 5 problems provided, checking solutions as you go. Regular review and active practice are key to retention.
Is this Least-Squares Regression study guide free?▾
Yes — all study notes, flashcards, and practice problems for Least-Squares Regression on Study Mondo are 100% free. No account is needed to access the content.
What course covers Least-Squares Regression?▾
Least-Squares Regression is part of the AP Statistics course on Study Mondo, specifically in the Unit 2: Exploring Two-Variable Data section. You can explore the full course for more related topics and practice resources.
Are there practice problems for Least-Squares Regression?
a = y-intercept
b = slope
Finding the Regression Line
Slope:b=r⋅sxsy
where r = correlation, \(s_x\) = std dev of x, \(s_y\) = std dev of y
Y-intercept:a=yˉ−bxˉ
Key fact: regression line always passes through \((\bar{x}, \bar{y})\)
Interpreting Slope and Intercept
Slope (b): predicted change in y for each 1-unit increase in x
Positive slope: as x increases, predicted y increases
Negative slope: as x increases, predicted y decreases
Prediction: If student studies 7 hours:
Score^=59.9+4.95(7)=59.9+34.65=94.55≈94.6 points
(This is interpolation; 7 hours is within data range 2–8)
Residuals and Residual Plots
Residual: actual y − predicted y = \(y - \hat{y}\)
Residual plot: scatterplot of residuals vs. x values
If residuals randomly scattered around 0 → linear model is appropriate
If residuals show pattern (curved, increasing) → linear model inadequate
Sum of residuals: always ≈ 0 for least-squares regression
Common Mistakes
Swapping x and y: regression of y on x ≠ regression of x on y
Over-interpreting intercept: a = 60 (x = 0) may have no real meaning
Extrapolating recklessly: don't predict far outside data range
Confusing \(\hat{y}\) with y: \(\hat{y}\) is prediction, not actual value
Ignoring residuals: always check residual plot to validate linear assumption
AP Exam Tip
FRQ response format:
Show work: state formula for slope \(b = r \cdot \frac{s_y}{s_x}\) and intercept \(a = \bar{y} - b\bar{x}\)
Write regression equation: \(\hat{\text{variable}} = a + b \cdot \text{variable}\)
Interpret slope in context: "For each additional [x-unit], predicted [y-variable] increases by [slope value] [y-units]."
Caveat on predictions: "This prediction assumes the linear relationship continues in this range" or note if extrapolating
Example: "\(\hat{\text{Score}} = 59.9 + 4.95 \cdot \text{Hours}\). For each additional hour studied, we predict the exam score increases by 4.95 points."
2Problem 2medium
❓ Question:
For data with Σx = 50, Σy = 120, Σx² = 350, Σxy = 720, n = 10, find the least-squares regression line.
A regression of car weight (x, in 1000s of lbs) on fuel efficiency (y, mpg) gives ŷ = 45 - 5.2x. Interpret the slope and predict mpg for a 3,500 lb car.
Interpretation: Each 1-unit increase in x predicts an 11.25-unit increase in y.
Answer: ŷ = 71.25 + 11.25x
5Problem 5medium
❓ Question:
The regression of temperature (°F) vs ice cream sales ($) is ŷ = -2 + 0.8x. Is it appropriate to predict sales when temp = 0°F? Explain.
💡 Show Solution
Step 1: Make the prediction
ŷ = -2 + 0.8(0) = -2
This predicts -$2 in sales, which is IMPOSSIBLE!
Step 2: Identify the problem
This is EXTRAPOLATION - predicting outside the data range.
Issues:
Temperature of 0°F likely outside original data range
Linear relationship may not hold at extremes
Model gives nonsensical result (negative sales)
Y-intercept is just a mathematical constant, not meaningful here
Step 3: Proper approach
Should only use regression for INTERPOLATION (within data range).
If data collected at 60-100°F, only predict in that range.
Answer: NO - This is inappropriate extrapolation resulting in an impossible prediction. Only use regression within the range of observed x-values.
▾
Yes, this page includes 5 practice problems with detailed solutions. Each problem includes a step-by-step explanation to help you understand the approach.