Scatter Plots and Correlation

Create scatterplots and calculate the correlation coefficient r to describe linear relationships.

🎯⭐ INTERACTIVE LESSON

Try the Interactive Version!

Learn step-by-step with practice exercises built right in.

Start Interactive Lesson →

Scatterplots and Correlation

Scatterplots

A scatterplot displays the relationship between two quantitative variables. Each point represents one individual.

  • Explanatory variable (xx): plotted on the horizontal axis
  • Response variable (yy): plotted on the vertical axis

Describing Scatterplots (DOFS)

  1. Direction: Positive, negative, or no association
  2. Outliers: Unusual points
  3. Form: Linear, curved, clusters
  4. Strength: Weak, moderate, strong

Correlation Coefficient (rr)

The correlation rr measures the strength and direction of a linear relationship.

r=1n1(xixˉsx)(yiyˉsy)r = \frac{1}{n-1} \sum \left(\frac{x_i - \bar{x}}{s_x}\right)\left(\frac{y_i - \bar{y}}{s_y}\right)

Properties of rr

  1. 1r1-1 \leq r \leq 1
  2. r>0r > 0: positive association
  3. r<0r < 0: negative association
  4. r|r| close to 1: strong linear relationship
  5. r|r| close to 0: weak or no linear relationship
  6. rr has no units (dimensionless)
  7. rr is not affected by changes in units (adding, multiplying)
  8. rr is the same regardless of which variable is xx or yy

Interpreting rr

| r|r| | Strength | |-------|----------| | 0.8 – 1.0 | Strong | | 0.5 – 0.8 | Moderate | | 0.0 – 0.5 | Weak |

Cautions About Correlation

  1. Correlation ≠ Causation: Association does not imply cause-and-effect
  2. rr only measures linear relationships (a curved pattern may have r0r \approx 0)
  3. rr is sensitive to outliers
  4. rr should only be used for quantitative variables
  5. Always look at the scatterplot — don't rely on rr alone

Influential Points

An influential point substantially changes the regression line or correlation when removed.

  • Points with extreme xx-values are often influential
  • Outliers may or may not be influential

AP Tip: Always plot the data before calculating rr. The correlation coefficient can be misleading without seeing the actual pattern (recall Anscombe's quartet).

📚 Practice Problems

No example problems available yet.