Scatter Plots and Correlations
Analyzing scatter plots and identifying correlations
Scatter Plots and Correlations
What is a Scatter Plot?
A scatter plot is a graph showing the relationship between two variables.
Each point represents one data pair (x, y).
Purpose: Identify patterns and correlations between variables
Example data: Study hours vs. test scores for 5 students:
- (1, 65): 1 hour study, score 65
- (2, 70): 2 hours study, score 70
- (3, 80): 3 hours study, score 80
- (4, 85): 4 hours study, score 85
- (5, 95): 5 hours study, score 95
Plot these points to see the relationship!
Creating a Scatter Plot
Steps:
- Draw axes (x-axis = independent variable, y-axis = dependent variable)
- Label axes with variable names and units
- Choose appropriate scale
- Plot each ordered pair as a point
- Title the graph
Example: Height vs. Shoe Size
Data:
- Height 60 inches, Size 7
- Height 64 inches, Size 8
- Height 68 inches, Size 9
- Height 72 inches, Size 10
Plot each point: (60, 7), (64, 8), (68, 9), (72, 10)
Types of Correlation
Positive Correlation:
- As x increases, y increases
- Points trend upward from left to right
- Slope is positive
Example: Study time vs. test score More studying → higher scores
Negative Correlation:
- As x increases, y decreases
- Points trend downward from left to right
- Slope is negative
Example: Speed vs. travel time Higher speed → less time
No Correlation:
- No clear pattern
- Points scattered randomly
- No relationship between variables
Example: Shoe size vs. math score No connection!
Strength of Correlation
Strong Correlation:
- Points close to a line
- Clear, tight pattern
- Strong relationship
Moderate Correlation:
- Points somewhat close to a line
- Pattern visible but not tight
- Moderate relationship
Weak Correlation:
- Points loosely scattered
- Vague pattern
- Weak relationship
Measuring strength: How close points are to forming a line
Correlation vs. Causation
Correlation: Two variables related (change together)
Causation: One variable CAUSES the other to change
KEY POINT: Correlation does NOT prove causation!
Example 1: Ice cream sales and drowning deaths
Correlation: Both increase in summer Causation? NO! Heat causes both, not each other
Example 2: Study time and test scores
Correlation: Positive Causation? Likely YES! Studying helps scores
Example 3: Shoe size and reading ability in children
Correlation: Positive (both increase with age) Causation? NO! Age causes both
Critical thinking: Always ask "Could there be another factor?"
Line of Best Fit (Trend Line)
A line that best represents the data pattern.
Purpose: Make predictions from data
Characteristics:
- Passes through or near most points
- Equal points above and below line (balanced)
- Minimizes distance to all points
Drawing by hand:
- Identify the pattern (positive/negative)
- Draw a line through the "middle" of points
- Balance points above and below
Example: Study hours vs. scores
Points: (1, 65), (2, 70), (3, 80), (4, 85), (5, 95)
Line approximately: y = 8x + 57
Making Predictions
Use the line of best fit to predict values.
Interpolation: Predict within data range
Example: If line is y = 8x + 57 Predict score for 2.5 hours study: y = 8(2.5) + 57 = 20 + 57 = 77
Expected score: 77
Extrapolation: Predict outside data range
Example: Predict score for 10 hours study: y = 8(10) + 57 = 137
Warning: Extrapolation less reliable! (Can't score 137!)
Outliers in Scatter Plots
Outlier: Point far from the general pattern
Example: Study hours vs. scores
Most points follow pattern, but one student: (4, 50) - studied 4 hours but scored only 50
This is an outlier!
Reasons for outliers:
- Measurement error
- Unusual circumstance (student was sick)
- Special case
- Data entry mistake
Effect on correlation:
- Outliers can weaken correlation
- May affect line of best fit
- Consider removing if justified
Linear vs. Nonlinear Patterns
Linear pattern: Points form roughly a straight line
Nonlinear patterns:
Quadratic: Points form a curve (parabola) Example: Height of thrown ball vs. time
Exponential: Points show rapid increase/decrease Example: Bacterial growth vs. time
No pattern: Random scatter Example: Phone number vs. height
For Algebra 1: Focus on linear patterns!
Correlation Coefficient (r)
Measures strength and direction of linear correlation
Range: -1 to +1
r = +1: Perfect positive correlation (all points on upward line) r = 0: No correlation r = -1: Perfect negative correlation (all points on downward line)
Interpreting r:
- 0.8 to 1.0 or -0.8 to -1.0: Strong
- 0.5 to 0.8 or -0.5 to -0.8: Moderate
- 0 to 0.5 or 0 to -0.5: Weak
Example: r = 0.92 → Strong positive correlation r = -0.73 → Moderate negative correlation r = 0.15 → Weak positive correlation
Real-World Examples
Example 1: Temperature vs. Ice Cream Sales
Data shows positive correlation: Higher temperature → more sales
Makes sense! (causation likely)
Example 2: Car Age vs. Value
Negative correlation: Older car → lower value
Clear causation!
Example 3: Height vs. Arm Span
Strong positive correlation: Taller people have longer arms
Biological relationship!
Example 4: TV Hours vs. GPA
Negative correlation: More TV → lower GPA
Correlation? Yes. Causation? Maybe! (could be other factors)
Analyzing Scatter Plots
Questions to ask:
- Is there a correlation? (positive/negative/none)
- How strong is the correlation? (strong/moderate/weak)
- Are there outliers?
- Is the pattern linear or nonlinear?
- Could there be causation?
- Are there lurking variables?
Lurking variable: Hidden factor affecting both variables
Example: Study time and scores both affected by student motivation
Creating Scatter Plots from Data Tables
Example: Hours of sleep vs. energy level (1-10 scale)
Data: Student A: 5 hours, energy 4 Student B: 7 hours, energy 7 Student C: 6 hours, energy 5 Student D: 8 hours, energy 8 Student E: 4 hours, energy 3
Points: (5, 4), (7, 7), (6, 5), (8, 8), (4, 3)
Pattern: Positive correlation More sleep → more energy
Using Technology
Graphing calculators:
- Enter data in lists
- Create scatter plot
- Calculate correlation coefficient
- Find line of best fit equation
Example: TI-84
- STAT → Edit → Enter data in L1 and L2
- 2nd → STAT PLOT → Turn on Plot1
- ZOOM → ZoomStat
- STAT → CALC → LinReg (calculate r and equation)
Spreadsheet software:
- Excel, Google Sheets
- Create chart → Scatter plot
- Add trendline
- Display equation and R² value
Common Mistakes to Avoid
-
Confusing correlation with causation Just because two things correlate doesn't mean one causes the other!
-
Ignoring outliers Outliers can significantly affect the line of best fit
-
Extrapolating too far Predictions far outside data range are unreliable
-
Wrong variable on wrong axis Independent variable (x) should be on horizontal axis
-
Poor scale choices Scale should show pattern clearly without distortion
-
Not labeling axes Always label with variable names and units!
Interpreting Slope in Context
The slope of the line of best fit has real meaning!
Example: Study hours vs. test scores Line: y = 8x + 57
Slope = 8 means: Each additional hour of study increases score by about 8 points
Example: Temperature vs. ice cream sales Line: y = 12x + 50
Slope = 12 means: Each degree increase in temperature adds about 12 sales
Y-Intercept in Context
Example: y = 8x + 57 (study hours vs. score)
Y-intercept = 57: Predicted score with 0 hours study
Does this make sense? Maybe! (represents prior knowledge)
Example: y = 12x + 50 (temperature vs. sales)
Y-intercept = 50: Predicted sales at 0°F
May not make sense! (shop might be closed)
Lesson: Interpret intercept carefully in context!
Quick Reference
Positive correlation: Both variables increase together
Negative correlation: One increases, other decreases
No correlation: No relationship
Strong: Points close to line
Weak: Points scattered loosely
Outlier: Point far from pattern
Line of best fit: Line representing overall trend
Interpolation: Predict within data range
Extrapolation: Predict outside data range
Correlation ≠ Causation!
Practice Tips
- Always label axes with variable names and units
- Start with independent variable on x-axis
- Look for overall pattern before drawing line
- Balance points above and below the line
- Identify and note outliers
- Consider whether correlation makes sense
- Don't assume causation without evidence
- Practice reading scatter plots from different contexts
- Use technology to check your work
- Think critically about lurking variables
- Apply to real-world situations
- Understand the difference between interpolation and extrapolation
- Remember: Scatter plots are about relationships!
Scatter plots are powerful tools for visualizing relationships between variables. Master this skill and you'll be able to analyze data in science, social studies, business, and everyday life!
📚 Practice Problems
1Problem 1easy
❓ Question:
Describe the correlation shown: As study time increases, test scores increase.
💡 Show Solution
Step 1: Identify the relationship between variables: Variable 1 (x): Study time Variable 2 (y): Test scores Relationship: As x increases, y increases
Step 2: Recall correlation types:
- Positive correlation: Both variables increase together
- Negative correlation: As one increases, the other decreases
- No correlation: No clear relationship
Step 3: Classify this relationship: Since both study time AND test scores increase together, this is a positive correlation.
Answer: Positive correlation
2Problem 2easy
❓ Question:
A scatter plot shows hours studied vs. test scores. As hours increase, scores increase. What type of correlation is this?
💡 Show Solution
When one variable increases and the other also increases, we have a positive correlation.
The points would slope upward from left to right.
Answer: Positive correlation
3Problem 3easy
❓ Question:
A scatter plot shows points that form a downward pattern from left to right. What type of correlation is this?
💡 Show Solution
Step 1: Visualize the pattern: Points going downward from left to right means:
- As x increases (moving right), y decreases (moving down)
Step 2: Recall correlation types:
- Positive: upward pattern (both increase)
- Negative: downward pattern (one increases, other decreases)
- No correlation: random scatter
Step 3: Classify: A downward pattern indicates a negative correlation.
Example: As temperature decreases, heating costs increase.
Answer: Negative correlation
4Problem 4medium
❓ Question:
A trend line for temperature (°F) vs. ice cream sales has the equation . Predict sales when temperature is 80°F.
💡 Show Solution
Substitute into the equation:
Answer: Predicted sales: $3,000
5Problem 5medium
❓ Question:
Points on a scatter plot are tightly clustered around a line. Is this a strong or weak correlation?
💡 Show Solution
Step 1: Understand correlation strength:
- Strong correlation: Points are close to forming a line (tight cluster)
- Weak correlation: Points are scattered, loosely following a pattern
- No correlation: Points show no pattern at all
Step 2: Analyze the given information: "Tightly clustered around a line" means the points are very close together, following the linear pattern closely.
Step 3: Determine strength: When points are tightly clustered, the correlation is strong.
Note: A correlation can be:
- Strong positive (tight cluster, upward)
- Strong negative (tight cluster, downward)
- Weak positive (loose scatter, upward trend)
- Weak negative (loose scatter, downward trend)
Answer: Strong correlation
6Problem 6medium
❓ Question:
A study finds that ice cream sales and drowning incidents both increase in summer. Does this mean ice cream causes drowning?
💡 Show Solution
Step 1: Identify what we observe:
- Ice cream sales increase in summer
- Drowning incidents increase in summer
- Both variables are correlated (both increase together)
Step 2: Apply the principle: Correlation ≠ Causation Just because two variables are correlated does NOT mean one causes the other.
Step 3: Find the lurking variable: The real cause is a third variable: warm weather/summer
- Warm weather → more people buy ice cream
- Warm weather → more people swim → more drowning incidents
Step 4: Conclusion: Ice cream sales and drowning are correlated, but ice cream does NOT cause drowning. Both are caused by a third factor (summer/warm weather).
This is a classic example of correlation without causation.
Answer: No, correlation does not imply causation. Both are caused by warm weather.
7Problem 7medium
❓ Question:
Describe the correlation: As outdoor temperature increases, heating bills decrease.
💡 Show Solution
One variable (temperature) increases while the other (heating bills) decreases.
This is a negative correlation.
The relationship makes sense: warmer weather means less heating needed!
Answer: Negative correlation
8Problem 8hard
❓ Question:
A line of best fit for a scatter plot has equation y = 2.5x + 10, where x is hours studied and y is test score. Predict the test score for someone who studies 6 hours.
💡 Show Solution
Step 1: Understand what we're doing: We're using the line of best fit equation to make a prediction (interpolation, since 6 hours is likely within our data range).
Step 2: Identify the given information: Equation: y = 2.5x + 10 x (hours studied) = 6 y (test score) = ?
Step 3: Substitute x = 6 into the equation: y = 2.5(6) + 10
Step 4: Calculate: y = 15 + 10 y = 25
Step 5: Interpret the result: According to the line of best fit, a student who studies 6 hours is predicted to score 25 points on the test.
Note: This is a prediction based on the trend; actual scores may vary.
Answer: Predicted test score is 25 points
Practice with Flashcards
Review key concepts with our flashcard system
Browse All Topics
Explore other calculus topics