Scatterplots and Line of Best Fit

Interpret scatterplots, correlation, and trend lines

Scatterplots and Line of Best Fit (SAT Math)

What is a Scatterplot?

Graph showing relationship between two variables

Each point represents:

  • x-coordinate: one variable
  • y-coordinate: another variable

Example: Height vs. Weight

  • Each point = one person
  • x = height
  • y = weight

Types of Correlation

Positive Correlation

As x increases, y increases

Pattern: Points slope upward (↗)

Examples:

  • Study time vs. test scores
  • Temperature vs. ice cream sales
  • Height vs. shoe size

Negative Correlation

As x increases, y decreases

Pattern: Points slope downward (↘)

Examples:

  • Speed vs. travel time
  • Price vs. quantity demanded
  • Outdoor temperature vs. heating costs

No Correlation

No clear pattern

Pattern: Points scattered randomly

Examples:

  • Shoe size vs. test scores
  • Height vs. favorite color

Strength of Correlation

Strong Correlation

Points cluster tightly around a line

  • Clear pattern
  • Easy to predict

Weak Correlation

Points loosely follow pattern

  • General trend but lots of variation
  • Harder to predict

Perfect Correlation

All points exactly on a line

  • Rare in real data
  • r = 1 (positive) or r = -1 (negative)

Line of Best Fit (Trend Line)

What Is It?

Line that best represents the data trend

Also called:

  • Regression line
  • Trend line
  • Best-fit line

Equation Form

Usually written as: y=mx+by = mx + b

Where:

  • mm = slope (rate of change)
  • bb = y-intercept (starting value)

Or: y^=ax+b\hat{y} = ax + b (predicted value)

Interpreting Slope

Slope (mm) Meaning

Positive slope (m>0m > 0):

  • Positive correlation
  • For every 1 unit increase in x, y increases by mm

Example: y=2x+10y = 2x + 10

  • For every 1 hour of study, score increases by 2 points

Negative slope (m<0m < 0):

  • Negative correlation
  • For every 1 unit increase in x, y decreases by m|m|

Example: y=3x+100y = -3x + 100

  • For every 1 mph faster, travel time decreases by 3 minutes

Interpreting Y-Intercept

Y-Intercept (bb) Meaning

Value of y when x = 0

Example: y=5x+20y = 5x + 20

  • When study time = 0, predicted score = 20

Watch out: Sometimes x = 0 doesn't make sense!

  • If x = year (like 2020), y-intercept is for year 0 (not useful!)

Making Predictions

Interpolation

Predicting within the data range

Generally reliable

Example: Data from x = 10 to x = 50

  • Predicting at x = 30 → interpolation ✓

Extrapolation

Predicting outside the data range

Less reliable - pattern may not continue!

Example: Data from x = 10 to x = 50

  • Predicting at x = 100 → extrapolation ⚠️

Outliers

What is an Outlier?

Point far from the general pattern

Effects:

  • Can significantly affect line of best fit
  • May indicate error or special case

On SAT:

  • Questions may ask about outliers
  • "Which point doesn't fit the pattern?"

Correlation vs. Causation

CRITICAL DISTINCTION!

Correlation: Variables are related Causation: One variable CAUSES change in other

Correlation ≠ Causation!

Example:

  • Ice cream sales and drowning deaths are correlated
  • But ice cream doesn't CAUSE drowning!
  • Both are caused by third factor (hot weather!)

SAT Trap: Don't assume correlation means causation!

Correlation Coefficient (rr)

What is rr?

Number measuring strength and direction of correlation

Range: 1r1-1 \leq r \leq 1

r=1r = 1: Perfect positive correlation r=0.8r = 0.8: Strong positive correlation r=0.5r = 0.5: Moderate positive correlation r=0r = 0: No correlation r=0.5r = -0.5: Moderate negative correlation r=0.8r = -0.8: Strong negative correlation r=1r = -1: Perfect negative correlation

Interpreting rr

Sign (+ or -): Direction

  • Positive = positive correlation
  • Negative = negative correlation

Magnitude (how close to 1): Strength

  • Close to 1 or -1 = strong
  • Close to 0 = weak

Residuals

What is a Residual?

Difference between actual value and predicted value

Formula: Residual = Actual - Predicted

Positive residual: Point above line (actual > predicted) Negative residual: Point below line (actual < predicted) Zero residual: Point exactly on line

Residual Plots

Graph of residuals

Random pattern: Good fit Clear pattern: Poor fit (need different model)

SAT Question Types

Type 1: Interpret Slope

"What does the slope represent?"

Answer: Rate of change, change in y per unit change in x

Type 2: Use Equation to Predict

"According to the line, what is y when x = 10?"

Plug in: y=m(10)+by = m(10) + b

Type 3: Identify Correlation

"Which best describes the relationship?"

Look at: Direction and strength of pattern

Type 4: Find Outlier

"Which point is farthest from the trend?"

Look for: Point that doesn't fit pattern

Type 5: Correlation vs. Causation

"Does x cause y?"

Remember: Correlation doesn't prove causation!

SAT Strategies

Read the Axes!

Always check what variables are being plotted

Look at the Pattern

Upward slope = positive, downward = negative

Use the Equation

Plug in values - don't try to eyeball!

Check Units

Slope units = (y units) per (x unit)

Remember Real-World Context

Does the answer make sense?

Common SAT Patterns

Temperature and Sales

Often positive correlation

  • Hot temperature → more cold drinks sold

Time and Distance

Positive correlation for travel

  • More time → more distance covered

Price and Demand

Negative correlation

  • Higher price → lower demand

Practice and Performance

Positive correlation

  • More practice → better performance

SAT Tips

  • Positive correlation: Both increase together (upward slope ↗)
  • Negative correlation: One increases, other decreases (downward slope ↘)
  • No correlation: Random scatter, no pattern
  • Strong correlation: Points cluster tightly around line
  • Weak correlation: Points loosely follow pattern
  • Slope (mm): Rate of change (rise/run)
  • Y-intercept (bb): Value when x = 0
  • Outlier: Point far from pattern
  • Interpolation: Predicting within data range (reliable)
  • Extrapolation: Predicting outside data range (less reliable)
  • Correlation ≠ Causation: Related doesn't mean one causes other!
  • Use the equation: Plug in values to predict
  • Read axes carefully: Know what x and y represent
  • Context matters: Does answer make real-world sense?
  • rr close to 1 or -1: Strong correlation
  • rr close to 0: Weak or no correlation

📚 Practice Problems

1Problem 1easy

Question:

A scatterplot shows the relationship between hours studied (x-axis) and test scores (y-axis). The points show an upward trend from left to right. This indicates:

A) Negative correlation B) Positive correlation C) No correlation D) Causation

💡 Show Solution

Solution:

Pattern: Upward trend (↗)

Meaning: As x increases, y increases

This is positive correlation!

Check choices:

  • A) Negative → downward slope ✗
  • B) Positive → upward slope ✓
  • C) No correlation → random scatter ✗
  • D) Causation → correlation doesn't prove causation ✗

Answer: B

Why not D? Scatterplot shows correlation, but doesn't prove studying CAUSES higher scores (though it likely does - the graph alone doesn't prove it!)

SAT Tip: Upward slope = positive correlation; Downward slope = negative correlation!

2Problem 2medium

Question:

A line of best fit has equation y=3x+15y = 3x + 15, where xx represents hours worked and yy represents earnings in dollars. What does the slope represent?

A) Total earnings B) Earnings when hours = 0 C) Dollars earned per hour D) Total hours worked

💡 Show Solution

Solution:

Equation: y=3x+15y = 3x + 15

Slope = 3

Slope meaning: Change in y per unit change in x

In context:

  • x = hours worked
  • y = earnings (dollars)
  • Slope = change in dollars per hour

Slope = 3 means earning $3 per hour

Check choices:

  • A) Total earnings → that's yy, not slope ✗
  • B) Earnings when hours = 0 → that's y-intercept (15) ✗
  • C) Dollars per hour → YES! ✓
  • D) Total hours → that's xx

Answer: C

Note: Y-intercept of 15 might represent a base payment or starting amount.

SAT Tip: Slope = rate of change = (y units) per (x unit)!

3Problem 3hard

Question:

A scatterplot shows the relationship between age of a car (years) and its value (thousands of dollars). The line of best fit is y=2x+30y = -2x + 30. According to the model, what is the predicted value of a 12-year-old car?

A) $6,000 B) $8,000 C) $54,000 D) $66,000

💡 Show Solution

Solution:

Given equation: y=2x+30y = -2x + 30

Variables:

  • x = age (years)
  • y = value (thousands of dollars)

Find: Value when x = 12

Plug in x = 12: y=2(12)+30y = -2(12) + 30 y=24+30y = -24 + 30 y=6y = 6

But y is in THOUSANDS of dollars!

y=6y = 6 thousand = $6,000

Answer: A) $6,000

Check reasonableness:

  • Negative slope (-2) makes sense: car loses value as it ages ✓
  • Starting value (y-intercept) = 30 thousand = $30,000 (new car) ✓
  • Loses $2,000 per year ✓
  • After 12 years: 30 - 24 = 6 thousand ✓

SAT Tip: Watch the UNITS! "Thousands of dollars" means multiply by 1,000!