Scatterplots and Line of Best Fit
Analyze scatterplots, determine lines of best fit, interpret slope and intercepts in context, and make predictions using linear and nonlinear models.
Try the Interactive Version!
Learn step-by-step with practice exercises built right in.
Scatterplots and Line of Best Fit on the SAT
What Is a Scatterplot?
A scatterplot displays the relationship between two quantitative variables as points on a coordinate plane.
- -axis: independent (explanatory) variable
- -axis: dependent (response) variable
Types of Association
| Pattern | Description | Example | |---|---|---| | Positive | As increases, increases | Height vs. weight | | Negative | As increases, decreases | Temperature vs. hot chocolate sales | | None | No clear pattern | Shoe size vs. GPA |
Strength of Association
- Strong: Points cluster tightly around a line/curve
- Weak: Points are widely scattered
- Nonlinear: Points follow a curve, not a line
Line of Best Fit (Regression Line)
The line of best fit is the straight line that best approximates the data.
- (slope): The predicted change in for each 1-unit increase in
- (-intercept): The predicted value of when
Interpreting Slope in Context
"For each additional [unit of ], [the variable] is predicted to [increase/decrease] by [slope] [units of ]."
Example: If models the relationship between hours studied () and test score (): "For each additional hour of study, the test score is predicted to increase by 2.5 points."
Interpreting the -Intercept
"When [ variable] is 0, the predicted [ variable] is []."
Note: The -intercept may not always make practical sense (e.g., "0 hours of study" may be unrealistic).
Residuals
- Positive residual: Actual > Predicted (point is ABOVE the line)
- Negative residual: Actual < Predicted (point is BELOW the line)
- Zero residual: Point is exactly ON the line
Residual Plots
A good model has residuals that are randomly scattered around zero. A pattern in residuals suggests the model is not a good fit.
Correlation Coefficient ()
| value | Meaning | |---|---| | | Perfect positive linear relationship | | | Perfect negative linear relationship | | | No linear relationship | | close to 1 | Strong linear relationship | | close to 0 | Weak or no linear relationship |
Remember:
- only measures LINEAR relationships
- does NOT indicate causation
- Outliers can significantly affect
Making Predictions
Interpolation vs. Extrapolation
- Interpolation: Predicting within the range of data โ generally reliable
- Extrapolation: Predicting beyond the data range โ less reliable (may be inaccurate)
SAT Question Types
Type 1: Describe the Association
"Which best describes the relationship?" โ positive/negative, strong/weak, linear/nonlinear
Type 2: Interpret Slope or -Intercept
"In context, what does the slope represent?" โ rate of change per unit
Type 3: Find a Residual
Given a point and the line equation, calculate residual = actual โ predicted.
Type 4: Make a Prediction
Use the line equation to predict for a given value.
Type 5: Identify an Outlier
The point farthest from the line of best fit (largest residual).
Common SAT Mistakes
- Claiming causation from a scatterplot โ scatterplots show ASSOCIATION, not causation
- Extrapolating too far beyond the data range
- Misinterpreting slope โ it's per unit change, not total change
- Confusing positive and negative residuals โ positive = above the line
- Ignoring the context โ always interpret slope and intercept in terms of the actual variables
๐ Practice Problems
1Problem 1easy
โ Question:
A scatterplot shows the relationship between hours of sleep () and alertness score (). The points trend upward from left to right and cluster closely around a line. How would you describe this association?
๐ก Show Solution
Direction: Points trend upward โ positive association
Strength: Points cluster closely โ strong association
Form: Follows a line โ linear association
Answer: Strong, positive, linear association.
In context: As hours of sleep increase, alertness scores tend to increase.
2Problem 2medium
โ Question:
The line of best fit for a scatterplot relating years of experience () to salary in thousands () is . Interpret the slope in context.
๐ก Show Solution
The slope is 3.2.
Interpretation: For each additional year of experience, the predicted salary increases by $3,200 (3.2 thousand dollars).
Template: "For each 1-unit increase in [x-variable], the [y-variable] is predicted to [increase/decrease] by [slope] [units]."
Note: The -intercept of 32 means a person with 0 years of experience has a predicted salary of $32,000.
3Problem 3medium
โ Question:
The line of best fit is . A data point has coordinates . What is the residual for this point?
๐ก Show Solution
Step 1: Find the predicted value at :
Step 2: Calculate the residual:
Answer: The residual is .
Interpretation: The actual value (88) is 3 units ABOVE the predicted value (85), so this point lies above the line of best fit.
4Problem 4hard
โ Question:
A researcher collects data and finds a correlation coefficient of between ice cream sales and drowning incidents. Can the researcher conclude that eating ice cream causes drowning?
๐ก Show Solution
Answer: No!
Explanation: A strong correlation () shows that ice cream sales and drowning incidents are associated โ they tend to increase together. However, correlation does NOT prove causation.
What's really happening: There is a confounding variable โ hot weather. When it's hot:
- More people buy ice cream
- More people go swimming โ more drownings
The heat is the common cause. Ice cream doesn't cause drowning.
SAT Rule: Only a randomized controlled experiment can establish causation. Observational studies can only show association.
5Problem 5expert
โ Question:
A line of best fit is for data where ranges from to . Which of the following predictions is most reliable?
A) Predicting when B) Predicting when C) Predicting when D) Predicting when
๐ก Show Solution
Key concept: Interpolation vs. Extrapolation
The data ranges from to .
A) : This is WITHIN the data range โ interpolation โ most reliable โ B) : Beyond the range โ extrapolation โ less reliable โ C) : Far beyond the range โ extrapolation โ unreliable โ D) : Below the range โ extrapolation โ unreliable โ
Answer: A
SAT Tip: Predictions within the data range (interpolation) are more trustworthy than predictions outside it (extrapolation). The farther outside the range, the less reliable the prediction.