Scatter Plots and Correlations

Analyzing scatter plots and identifying correlations

Scatter Plots and Correlations

What is a Scatter Plot?

A scatter plot is a graph showing the relationship between two variables.

Each point represents one data pair (x, y).

Purpose: Identify patterns and correlations between variables

Example data: Study hours vs. test scores for 5 students:

  • (1, 65): 1 hour study, score 65
  • (2, 70): 2 hours study, score 70
  • (3, 80): 3 hours study, score 80
  • (4, 85): 4 hours study, score 85
  • (5, 95): 5 hours study, score 95

Plot these points to see the relationship!

Creating a Scatter Plot

Steps:

  1. Draw axes (x-axis = independent variable, y-axis = dependent variable)
  2. Label axes with variable names and units
  3. Choose appropriate scale
  4. Plot each ordered pair as a point
  5. Title the graph

Example: Height vs. Shoe Size

Data:

  • Height 60 inches, Size 7
  • Height 64 inches, Size 8
  • Height 68 inches, Size 9
  • Height 72 inches, Size 10

Plot each point: (60, 7), (64, 8), (68, 9), (72, 10)

Types of Correlation

Positive Correlation:

  • As x increases, y increases
  • Points trend upward from left to right
  • Slope is positive

Example: Study time vs. test score More studying → higher scores

Negative Correlation:

  • As x increases, y decreases
  • Points trend downward from left to right
  • Slope is negative

Example: Speed vs. travel time Higher speed → less time

No Correlation:

  • No clear pattern
  • Points scattered randomly
  • No relationship between variables

Example: Shoe size vs. math score No connection!

Strength of Correlation

Strong Correlation:

  • Points close to a line
  • Clear, tight pattern
  • Strong relationship

Moderate Correlation:

  • Points somewhat close to a line
  • Pattern visible but not tight
  • Moderate relationship

Weak Correlation:

  • Points loosely scattered
  • Vague pattern
  • Weak relationship

Measuring strength: How close points are to forming a line

Correlation vs. Causation

Correlation: Two variables related (change together)

Causation: One variable CAUSES the other to change

KEY POINT: Correlation does NOT prove causation!

Example 1: Ice cream sales and drowning deaths

Correlation: Both increase in summer Causation? NO! Heat causes both, not each other

Example 2: Study time and test scores

Correlation: Positive Causation? Likely YES! Studying helps scores

Example 3: Shoe size and reading ability in children

Correlation: Positive (both increase with age) Causation? NO! Age causes both

Critical thinking: Always ask "Could there be another factor?"

Line of Best Fit (Trend Line)

A line that best represents the data pattern.

Purpose: Make predictions from data

Characteristics:

  • Passes through or near most points
  • Equal points above and below line (balanced)
  • Minimizes distance to all points

Drawing by hand:

  1. Identify the pattern (positive/negative)
  2. Draw a line through the "middle" of points
  3. Balance points above and below

Example: Study hours vs. scores

Points: (1, 65), (2, 70), (3, 80), (4, 85), (5, 95)

Line approximately: y = 8x + 57

Making Predictions

Use the line of best fit to predict values.

Interpolation: Predict within data range

Example: If line is y = 8x + 57 Predict score for 2.5 hours study: y = 8(2.5) + 57 = 20 + 57 = 77

Expected score: 77

Extrapolation: Predict outside data range

Example: Predict score for 10 hours study: y = 8(10) + 57 = 137

Warning: Extrapolation less reliable! (Can't score 137!)

Outliers in Scatter Plots

Outlier: Point far from the general pattern

Example: Study hours vs. scores

Most points follow pattern, but one student: (4, 50) - studied 4 hours but scored only 50

This is an outlier!

Reasons for outliers:

  • Measurement error
  • Unusual circumstance (student was sick)
  • Special case
  • Data entry mistake

Effect on correlation:

  • Outliers can weaken correlation
  • May affect line of best fit
  • Consider removing if justified

Linear vs. Nonlinear Patterns

Linear pattern: Points form roughly a straight line

Nonlinear patterns:

Quadratic: Points form a curve (parabola) Example: Height of thrown ball vs. time

Exponential: Points show rapid increase/decrease Example: Bacterial growth vs. time

No pattern: Random scatter Example: Phone number vs. height

For Algebra 1: Focus on linear patterns!

Correlation Coefficient (r)

Measures strength and direction of linear correlation

Range: -1 to +1

r = +1: Perfect positive correlation (all points on upward line) r = 0: No correlation r = -1: Perfect negative correlation (all points on downward line)

Interpreting r:

  • 0.8 to 1.0 or -0.8 to -1.0: Strong
  • 0.5 to 0.8 or -0.5 to -0.8: Moderate
  • 0 to 0.5 or 0 to -0.5: Weak

Example: r = 0.92 → Strong positive correlation r = -0.73 → Moderate negative correlation r = 0.15 → Weak positive correlation

Real-World Examples

Example 1: Temperature vs. Ice Cream Sales

Data shows positive correlation: Higher temperature → more sales

Makes sense! (causation likely)

Example 2: Car Age vs. Value

Negative correlation: Older car → lower value

Clear causation!

Example 3: Height vs. Arm Span

Strong positive correlation: Taller people have longer arms

Biological relationship!

Example 4: TV Hours vs. GPA

Negative correlation: More TV → lower GPA

Correlation? Yes. Causation? Maybe! (could be other factors)

Analyzing Scatter Plots

Questions to ask:

  1. Is there a correlation? (positive/negative/none)
  2. How strong is the correlation? (strong/moderate/weak)
  3. Are there outliers?
  4. Is the pattern linear or nonlinear?
  5. Could there be causation?
  6. Are there lurking variables?

Lurking variable: Hidden factor affecting both variables

Example: Study time and scores both affected by student motivation

Creating Scatter Plots from Data Tables

Example: Hours of sleep vs. energy level (1-10 scale)

Data: Student A: 5 hours, energy 4 Student B: 7 hours, energy 7 Student C: 6 hours, energy 5 Student D: 8 hours, energy 8 Student E: 4 hours, energy 3

Points: (5, 4), (7, 7), (6, 5), (8, 8), (4, 3)

Pattern: Positive correlation More sleep → more energy

Using Technology

Graphing calculators:

  • Enter data in lists
  • Create scatter plot
  • Calculate correlation coefficient
  • Find line of best fit equation

Example: TI-84

  1. STAT → Edit → Enter data in L1 and L2
  2. 2nd → STAT PLOT → Turn on Plot1
  3. ZOOM → ZoomStat
  4. STAT → CALC → LinReg (calculate r and equation)

Spreadsheet software:

  • Excel, Google Sheets
  • Create chart → Scatter plot
  • Add trendline
  • Display equation and R² value

Common Mistakes to Avoid

  1. Confusing correlation with causation Just because two things correlate doesn't mean one causes the other!

  2. Ignoring outliers Outliers can significantly affect the line of best fit

  3. Extrapolating too far Predictions far outside data range are unreliable

  4. Wrong variable on wrong axis Independent variable (x) should be on horizontal axis

  5. Poor scale choices Scale should show pattern clearly without distortion

  6. Not labeling axes Always label with variable names and units!

Interpreting Slope in Context

The slope of the line of best fit has real meaning!

Example: Study hours vs. test scores Line: y = 8x + 57

Slope = 8 means: Each additional hour of study increases score by about 8 points

Example: Temperature vs. ice cream sales Line: y = 12x + 50

Slope = 12 means: Each degree increase in temperature adds about 12 sales

Y-Intercept in Context

Example: y = 8x + 57 (study hours vs. score)

Y-intercept = 57: Predicted score with 0 hours study

Does this make sense? Maybe! (represents prior knowledge)

Example: y = 12x + 50 (temperature vs. sales)

Y-intercept = 50: Predicted sales at 0°F

May not make sense! (shop might be closed)

Lesson: Interpret intercept carefully in context!

Quick Reference

Positive correlation: Both variables increase together

Negative correlation: One increases, other decreases

No correlation: No relationship

Strong: Points close to line

Weak: Points scattered loosely

Outlier: Point far from pattern

Line of best fit: Line representing overall trend

Interpolation: Predict within data range

Extrapolation: Predict outside data range

Correlation ≠ Causation!

Practice Tips

  • Always label axes with variable names and units
  • Start with independent variable on x-axis
  • Look for overall pattern before drawing line
  • Balance points above and below the line
  • Identify and note outliers
  • Consider whether correlation makes sense
  • Don't assume causation without evidence
  • Practice reading scatter plots from different contexts
  • Use technology to check your work
  • Think critically about lurking variables
  • Apply to real-world situations
  • Understand the difference between interpolation and extrapolation
  • Remember: Scatter plots are about relationships!

Scatter plots are powerful tools for visualizing relationships between variables. Master this skill and you'll be able to analyze data in science, social studies, business, and everyday life!

📚 Practice Problems

1Problem 1easy

Question:

Describe the correlation shown: As study time increases, test scores increase.

💡 Show Solution

Step 1: Identify the relationship between variables: Variable 1 (x): Study time Variable 2 (y): Test scores Relationship: As x increases, y increases

Step 2: Recall correlation types:

  • Positive correlation: Both variables increase together
  • Negative correlation: As one increases, the other decreases
  • No correlation: No clear relationship

Step 3: Classify this relationship: Since both study time AND test scores increase together, this is a positive correlation.

Answer: Positive correlation

2Problem 2easy

Question:

A scatter plot shows hours studied vs. test scores. As hours increase, scores increase. What type of correlation is this?

💡 Show Solution

When one variable increases and the other also increases, we have a positive correlation.

The points would slope upward from left to right.

Answer: Positive correlation

3Problem 3easy

Question:

A scatter plot shows points that form a downward pattern from left to right. What type of correlation is this?

💡 Show Solution

Step 1: Visualize the pattern: Points going downward from left to right means:

  • As x increases (moving right), y decreases (moving down)

Step 2: Recall correlation types:

  • Positive: upward pattern (both increase)
  • Negative: downward pattern (one increases, other decreases)
  • No correlation: random scatter

Step 3: Classify: A downward pattern indicates a negative correlation.

Example: As temperature decreases, heating costs increase.

Answer: Negative correlation

4Problem 4medium

Question:

A trend line for temperature (°F) vs. ice cream sales has the equation y=50x1000y = 50x - 1000. Predict sales when temperature is 80°F.

💡 Show Solution

Substitute x=80x = 80 into the equation:

y=50(80)1000y = 50(80) - 1000 y=40001000y = 4000 - 1000 y=3000y = 3000

Answer: Predicted sales: $3,000

5Problem 5medium

Question:

Points on a scatter plot are tightly clustered around a line. Is this a strong or weak correlation?

💡 Show Solution

Step 1: Understand correlation strength:

  • Strong correlation: Points are close to forming a line (tight cluster)
  • Weak correlation: Points are scattered, loosely following a pattern
  • No correlation: Points show no pattern at all

Step 2: Analyze the given information: "Tightly clustered around a line" means the points are very close together, following the linear pattern closely.

Step 3: Determine strength: When points are tightly clustered, the correlation is strong.

Note: A correlation can be:

  • Strong positive (tight cluster, upward)
  • Strong negative (tight cluster, downward)
  • Weak positive (loose scatter, upward trend)
  • Weak negative (loose scatter, downward trend)

Answer: Strong correlation

6Problem 6medium

Question:

A study finds that ice cream sales and drowning incidents both increase in summer. Does this mean ice cream causes drowning?

💡 Show Solution

Step 1: Identify what we observe:

  • Ice cream sales increase in summer
  • Drowning incidents increase in summer
  • Both variables are correlated (both increase together)

Step 2: Apply the principle: Correlation ≠ Causation Just because two variables are correlated does NOT mean one causes the other.

Step 3: Find the lurking variable: The real cause is a third variable: warm weather/summer

  • Warm weather → more people buy ice cream
  • Warm weather → more people swim → more drowning incidents

Step 4: Conclusion: Ice cream sales and drowning are correlated, but ice cream does NOT cause drowning. Both are caused by a third factor (summer/warm weather).

This is a classic example of correlation without causation.

Answer: No, correlation does not imply causation. Both are caused by warm weather.

7Problem 7medium

Question:

Describe the correlation: As outdoor temperature increases, heating bills decrease.

💡 Show Solution

One variable (temperature) increases while the other (heating bills) decreases.

This is a negative correlation.

The relationship makes sense: warmer weather means less heating needed!

Answer: Negative correlation

8Problem 8hard

Question:

A line of best fit for a scatter plot has equation y = 2.5x + 10, where x is hours studied and y is test score. Predict the test score for someone who studies 6 hours.

💡 Show Solution

Step 1: Understand what we're doing: We're using the line of best fit equation to make a prediction (interpolation, since 6 hours is likely within our data range).

Step 2: Identify the given information: Equation: y = 2.5x + 10 x (hours studied) = 6 y (test score) = ?

Step 3: Substitute x = 6 into the equation: y = 2.5(6) + 10

Step 4: Calculate: y = 15 + 10 y = 25

Step 5: Interpret the result: According to the line of best fit, a student who studies 6 hours is predicted to score 25 points on the test.

Note: This is a prediction based on the trend; actual scores may vary.

Answer: Predicted test score is 25 points