Scatterplots and Line of Best Fit

Analyze scatterplots, determine lines of best fit, interpret slope and intercepts in context, and make predictions using linear and nonlinear models.

๐ŸŽฏโญ INTERACTIVE LESSON

Try the Interactive Version!

Learn step-by-step with practice exercises built right in.

Start Interactive Lesson โ†’

Scatterplots and Line of Best Fit on the SAT

What Is a Scatterplot?

A scatterplot displays the relationship between two quantitative variables as points on a coordinate plane.

  • xx-axis: independent (explanatory) variable
  • yy-axis: dependent (response) variable

Types of Association

| Pattern | Description | Example | |---|---|---| | Positive | As xx increases, yy increases | Height vs. weight | | Negative | As xx increases, yy decreases | Temperature vs. hot chocolate sales | | None | No clear pattern | Shoe size vs. GPA |

Strength of Association

  • Strong: Points cluster tightly around a line/curve
  • Weak: Points are widely scattered
  • Nonlinear: Points follow a curve, not a line

Line of Best Fit (Regression Line)

The line of best fit is the straight line that best approximates the data.

y=mx+by = mx + b

  • mm (slope): The predicted change in yy for each 1-unit increase in xx
  • bb (yy-intercept): The predicted value of yy when x=0x = 0

Interpreting Slope in Context

"For each additional [unit of xx], [the yy variable] is predicted to [increase/decrease] by [slope] [units of yy]."

Example: If y=2.5x+10y = 2.5x + 10 models the relationship between hours studied (xx) and test score (yy): "For each additional hour of study, the test score is predicted to increase by 2.5 points."

Interpreting the yy-Intercept

"When [xx variable] is 0, the predicted [yy variable] is [bb]."

Note: The yy-intercept may not always make practical sense (e.g., "0 hours of study" may be unrealistic).


Residuals

Residual=Actualย valueโˆ’Predictedย value\text{Residual} = \text{Actual value} - \text{Predicted value}

  • Positive residual: Actual > Predicted (point is ABOVE the line)
  • Negative residual: Actual < Predicted (point is BELOW the line)
  • Zero residual: Point is exactly ON the line

Residual Plots

A good model has residuals that are randomly scattered around zero. A pattern in residuals suggests the model is not a good fit.


Correlation Coefficient (rr)

| rr value | Meaning | |---|---| | r=1r = 1 | Perfect positive linear relationship | | r=โˆ’1r = -1 | Perfect negative linear relationship | | r=0r = 0 | No linear relationship | | โˆฃrโˆฃ|r| close to 1 | Strong linear relationship | | โˆฃrโˆฃ|r| close to 0 | Weak or no linear relationship |

Remember:

  • rr only measures LINEAR relationships
  • rr does NOT indicate causation
  • Outliers can significantly affect rr

Making Predictions

Interpolation vs. Extrapolation

  • Interpolation: Predicting within the range of data โ†’ generally reliable
  • Extrapolation: Predicting beyond the data range โ†’ less reliable (may be inaccurate)

SAT Question Types

Type 1: Describe the Association

"Which best describes the relationship?" โ†’ positive/negative, strong/weak, linear/nonlinear

Type 2: Interpret Slope or yy-Intercept

"In context, what does the slope represent?" โ†’ rate of change per unit

Type 3: Find a Residual

Given a point and the line equation, calculate residual = actual โˆ’ predicted.

Type 4: Make a Prediction

Use the line equation to predict yy for a given xx value.

Type 5: Identify an Outlier

The point farthest from the line of best fit (largest residual).


Common SAT Mistakes

  1. Claiming causation from a scatterplot โ€” scatterplots show ASSOCIATION, not causation
  2. Extrapolating too far beyond the data range
  3. Misinterpreting slope โ€” it's per unit change, not total change
  4. Confusing positive and negative residuals โ€” positive = above the line
  5. Ignoring the context โ€” always interpret slope and intercept in terms of the actual variables

๐Ÿ“š Practice Problems

1Problem 1easy

โ“ Question:

A scatterplot shows the relationship between hours of sleep (xx) and alertness score (yy). The points trend upward from left to right and cluster closely around a line. How would you describe this association?

๐Ÿ’ก Show Solution

Direction: Points trend upward โ†’ positive association

Strength: Points cluster closely โ†’ strong association

Form: Follows a line โ†’ linear association

Answer: Strong, positive, linear association.

In context: As hours of sleep increase, alertness scores tend to increase.

2Problem 2medium

โ“ Question:

The line of best fit for a scatterplot relating years of experience (xx) to salary in thousands (yy) is y=3.2x+32y = 3.2x + 32. Interpret the slope in context.

๐Ÿ’ก Show Solution

The slope is 3.2.

Interpretation: For each additional year of experience, the predicted salary increases by $3,200 (3.2 thousand dollars).

Template: "For each 1-unit increase in [x-variable], the [y-variable] is predicted to [increase/decrease] by [slope] [units]."

Note: The yy-intercept of 32 means a person with 0 years of experience has a predicted salary of $32,000.

3Problem 3medium

โ“ Question:

The line of best fit is y=โˆ’0.5x+100y = -0.5x + 100. A data point has coordinates (30,88)(30, 88). What is the residual for this point?

๐Ÿ’ก Show Solution

Step 1: Find the predicted value at x=30x = 30: y^=โˆ’0.5(30)+100=โˆ’15+100=85\hat{y} = -0.5(30) + 100 = -15 + 100 = 85

Step 2: Calculate the residual: Residual=Actualโˆ’Predicted=88โˆ’85=3\text{Residual} = \text{Actual} - \text{Predicted} = 88 - 85 = 3

Answer: The residual is +3+3.

Interpretation: The actual value (88) is 3 units ABOVE the predicted value (85), so this point lies above the line of best fit.

4Problem 4hard

โ“ Question:

A researcher collects data and finds a correlation coefficient of r=0.85r = 0.85 between ice cream sales and drowning incidents. Can the researcher conclude that eating ice cream causes drowning?

๐Ÿ’ก Show Solution

Answer: No!

Explanation: A strong correlation (r=0.85r = 0.85) shows that ice cream sales and drowning incidents are associated โ€” they tend to increase together. However, correlation does NOT prove causation.

What's really happening: There is a confounding variable โ€” hot weather. When it's hot:

  • More people buy ice cream
  • More people go swimming โ†’ more drownings

The heat is the common cause. Ice cream doesn't cause drowning.

SAT Rule: Only a randomized controlled experiment can establish causation. Observational studies can only show association.

5Problem 5expert

โ“ Question:

A line of best fit is y=1.8x+22y = 1.8x + 22 for data where xx ranges from 55 to 5050. Which of the following predictions is most reliable?

A) Predicting yy when x=30x = 30 B) Predicting yy when x=80x = 80 C) Predicting yy when x=100x = 100 D) Predicting yy when x=โˆ’5x = -5

๐Ÿ’ก Show Solution

Key concept: Interpolation vs. Extrapolation

The data ranges from x=5x = 5 to x=50x = 50.

A) x=30x = 30: This is WITHIN the data range โ†’ interpolation โ†’ most reliable โœ“ B) x=80x = 80: Beyond the range โ†’ extrapolation โ†’ less reliable โœ— C) x=100x = 100: Far beyond the range โ†’ extrapolation โ†’ unreliable โœ— D) x=โˆ’5x = -5: Below the range โ†’ extrapolation โ†’ unreliable โœ—

Answer: A

SAT Tip: Predictions within the data range (interpolation) are more trustworthy than predictions outside it (extrapolation). The farther outside the range, the less reliable the prediction.