A scatterplot displays the relationship between two quantitative variables as points on a coordinate plane.
x-axis: independent (explanatory) variable
y-axis: dependent (response) variable
Types of Association
Pattern
Description
Example
Positive
📚 Practice Problems
1Problem 1easy
❓ Question:
A scatterplot shows the relationship between hours of sleep (x) and alertness score (y). The points trend upward from left to right and cluster closely around a line. How would you describe this association?
💡 Show Solution
Direction: Points trend upward → positive association
Analyze scatterplots, determine lines of best fit, interpret slope and intercepts in context, and make predictions using linear and nonlinear models.
How can I study Scatterplots and Line of Best Fit effectively?▾
Start by reading the study notes and working through the examples on this page. Then use the flashcards to test your recall. Practice with the 5 problems provided, checking solutions as you go. Regular review and active practice are key to retention.
Is this Scatterplots and Line of Best Fit study guide free?▾
Yes — all study notes, flashcards, and practice problems for Scatterplots and Line of Best Fit on Study Mondo are 100% free. No account is needed to access the content.
What course covers Scatterplots and Line of Best Fit?▾
Scatterplots and Line of Best Fit is part of the SAT Prep course on Study Mondo, specifically in the Problem Solving and Data Analysis section. You can explore the full course for more related topics and practice resources.
As x increases, y increases
Height vs. weight
Negative
As x increases, y decreases
Temperature vs. hot chocolate sales
None
No clear pattern
Shoe size vs. GPA
Strength of Association
Strong: Points cluster tightly around a line/curve
Weak: Points are widely scattered
Nonlinear: Points follow a curve, not a line
Line of Best Fit (Regression Line)
The line of best fit is the straight line that best approximates the data.
y=mx+b
m (slope): The predicted change in y for each 1-unit increase in x
b (y-intercept): The predicted value of y when x=0
Interpreting Slope in Context
"For each additional [unit of x], [the y variable] is predicted to [increase/decrease] by [slope] [units of y]."
Example: If y=2.5x+10 models the relationship between hours studied (x) and test score (y):
"For each additional hour of study, the test score is predicted to increase by 2.5 points."
Interpreting the y-Intercept
"When [x variable] is 0, the predicted [y variable] is [b]."
Note: The y-intercept may not always make practical sense (e.g., "0 hours of study" may be unrealistic).
Residuals
Residual=Actual value−Predicted value
Positive residual: Actual > Predicted (point is ABOVE the line)
Negative residual: Actual < Predicted (point is BELOW the line)
Zero residual: Point is exactly ON the line
Residual Plots
A good model has residuals that are randomly scattered around zero. A pattern in residuals suggests the model is not a good fit.
Correlation Coefficient (r)
r value
Meaning
r=1
Perfect positive linear relationship
r=−1
Perfect negative linear relationship
r=0
No linear relationship
$
r
$
r
Remember:
r only measures LINEAR relationships
r does NOT indicate causation
Outliers can significantly affect r
Making Predictions
Interpolation vs. Extrapolation
Interpolation: Predicting within the range of data → generally reliable
Extrapolation: Predicting beyond the data range → less reliable (may be inaccurate)
SAT Question Types
Type 1: Describe the Association
"Which best describes the relationship?" → positive/negative, strong/weak, linear/nonlinear
Type 2: Interpret Slope or y-Intercept
"In context, what does the slope represent?" → rate of change per unit
Type 3: Find a Residual
Given a point and the line equation, calculate residual = actual − predicted.
Type 4: Make a Prediction
Use the line equation to predict y for a given x value.
Type 5: Identify an Outlier
The point farthest from the line of best fit (largest residual).
Common SAT Mistakes
Claiming causation from a scatterplot — scatterplots show ASSOCIATION, not causation
Extrapolating too far beyond the data range
Misinterpreting slope — it's per unit change, not total change
Confusing positive and negative residuals — positive = above the line
Ignoring the context — always interpret slope and intercept in terms of the actual variables
Strength: Points cluster closely → strong association
Form: Follows a line → linear association
Answer: Strong, positive, linear association.
In context: As hours of sleep increase, alertness scores tend to increase.
2Problem 2medium
❓ Question:
The line of best fit for a scatterplot relating years of experience (x) to salary in thousands (y) is y=3.2x+32. Interpret the slope in context.
💡 Show Solution
The slope is 3.2.
Interpretation: For each additional year of experience, the predicted salary increases by $3,200 (3.2 thousand dollars).
Template: "For each 1-unit increase in [x-variable], the [y-variable] is predicted to [increase/decrease] by [slope] [units]."
Note: The y-intercept of 32 means a person with 0 years of experience has a predicted salary of $32,000.
3Problem 3medium
❓ Question:
The line of best fit is y=−0.5x+100. A data point has coordinates (30,88). What is the residual for this point?
💡 Show Solution
Step 1: Find the predicted value at x=30:
y^=−0.5(30)+
4Problem 4hard
❓ Question:
A researcher collects data and finds a correlation coefficient of r=0.85 between ice cream sales and drowning incidents. Can the researcher conclude that eating ice cream causes drowning?
💡 Show Solution
Answer: No!
Explanation: A strong correlation (r=0.85) shows that ice cream sales and drowning incidents are associated — they tend to increase together. However, correlation does NOT prove causation.
What's really happening: There is a confounding variable — hot weather. When it's hot:
More people buy ice cream
More people go swimming → more drownings
The heat is the common cause. Ice cream doesn't cause drowning.
SAT Rule: Only a randomized controlled experiment can establish causation. Observational studies can only show association.
5Problem 5expert
❓ Question:
A line of best fit is y=1.8x+22 for data where x ranges from 5 to 50. Which of the following predictions is most reliable?
A) Predicting y when x=30
B) Predicting y when x=80
C) Predicting y when
D) Predicting when
💡 Show Solution
Key concept: Interpolation vs. Extrapolation
The data ranges from x=5 to x=50.
A) x=30: This is WITHIN the data range → → most reliable ✓
Beyond the range → extrapolation → less reliable ✗
Far beyond the range → extrapolation → unreliable ✗
Below the range → extrapolation → unreliable ✗
Are there practice problems for Scatterplots and Line of Best Fit?▾
Yes, this page includes 5 practice problems with detailed solutions. Each problem includes a step-by-step explanation to help you understand the approach.
100=
−15+
100=
85
Step 2: Calculate the residual:
Residual=Actual−Predicted=88−85=3
Answer: The residual is +3.
Interpretation: The actual value (88) is 3 units ABOVE the predicted value (85), so this point lies above the line of best fit.
x=100
y
x=−5
interpolation
B) x=80:
C) x=100:
D) x=−5:
Answer: A
SAT Tip: Predictions within the data range (interpolation) are more trustworthy than predictions outside it (extrapolation). The farther outside the range, the less reliable the prediction.