Statistics and Data Interpretation

Analyze data using mean, median, mode, standard deviation, and data displays.

🎯⭐ INTERACTIVE LESSON

Try the Interactive Version!

Learn step-by-step with practice exercises built right in.

Start Interactive Lesson →

Statistics and Data Interpretation on the SAT

Central Tendency: Mean, Median, Mode

Mean (Average)

Mean=Sum of all valuesNumber of values\text{Mean} = \frac{\text{Sum of all values}}{\text{Number of values}}

Key property: The mean is sensitive to outliers.

Median

The middle value when data is arranged in order.

  • Odd number of values: the middle one
  • Even number of values: average of the two middle ones

Key property: The median is resistant to outliers.

Mode

The most frequently occurring value. A data set can have no mode, one mode, or multiple modes.


When to Use Each Measure

| Situation | Best Measure | Why | |---|---|---| | Symmetric data, no outliers | Mean | Representative of all values | | Skewed data or outliers | Median | Not pulled by extremes | | Categorical data | Mode | Most common category |

SAT Tip: The SAT often asks "which measure best represents the data" — choose median when there are outliers.


Spread: Range, Standard Deviation, IQR

Range

Range=MaximumMinimum\text{Range} = \text{Maximum} - \text{Minimum}

Interquartile Range (IQR)

IQR=Q3Q1\text{IQR} = Q_3 - Q_1 Measures the spread of the middle 50% of data.

Standard Deviation

Measures how spread out data is from the mean.

  • Small SD: Data is clustered near the mean
  • Large SD: Data is spread out from the mean

You do NOT need to calculate standard deviation on the SAT — just understand what it means!


Box Plots (Box-and-Whisker)

A box plot displays the 5-number summary:

  1. Minimum
  2. Q1Q_1 (25th percentile)
  3. Median (50th percentile)
  4. Q3Q_3 (75th percentile)
  5. Maximum

The box spans from Q1Q_1 to Q3Q_3; the line inside is the median.


Reading Tables and Graphs

Histograms

  • xx-axis: intervals (bins)
  • yy-axis: frequency
  • To find total: add all bar heights
  • To find median: count from left until you reach the middle value

Scatterplots

  • Look for positive, negative, or no association
  • A line of best fit (regression line) approximates the trend
  • Correlation coefficient rr: close to +1 (strong positive), close to -1 (strong negative), close to 0 (no correlation)

Dot Plots

  • Each dot = one data point
  • Easy to find mode, range, and count

Margin of Error and Confidence Intervals

Confidence Interval=Estimate±Margin of Error\text{Confidence Interval} = \text{Estimate} \pm \text{Margin of Error}

Example: A survey estimates 45% support with a margin of error of 3%. This means the true value is likely between 42% and 48%.

To decrease margin of error: Increase sample size.


SAT Question Types

Type 1: Calculate Mean/Median

"Find the mean of 12, 15, 18, 18, 22" → 855=17\frac{85}{5} = 17

Type 2: Effect of Adding/Removing Values

"If value 100 is added to the set above, how does it affect the mean vs. median?" Mean changes significantly (sensitive to outliers), median barely changes.

Type 3: Interpret Graphs

Read values from bar charts, histograms, line graphs. Pay attention to axis labels and scales!

Type 4: Compare Distributions

"Set A has mean 50 and SD 5. Set B has mean 50 and SD 10. Which is more spread out?" Set B (larger SD).


Common SAT Mistakes

  1. Confusing mean and median — the SAT specifically tests whether you know which is affected by outliers
  2. Misreading graph scales — check if axes start at 0
  3. Confusing correlation with causation — correlation does NOT prove one variable causes another
  4. Forgetting to order data before finding the median
  5. Misinterpreting standard deviation — it's about spread, not about the mean itself

📚 Practice Problems

1Problem 1easy

Question:

The ages of 5 students are: 14, 16, 15, 14, 17. What is the median age?

💡 Show Solution

Step 1: Arrange in order: 14, 14, 15, 16, 17

Step 2: Find the middle value. With 5 values, the median is the 3rd value.

Answer: Median = 15

Note: Don't forget to sort first! The original order doesn't matter for median.

2Problem 2easy

Question:

The ages of 5 students are: 14, 16, 15, 14, 17. What is the median age?

💡 Show Solution

Step 1: Arrange in order: 14, 14, 15, 16, 17

Step 2: Find the middle value. With 5 values, the median is the 3rd value.

Answer: Median = 15

Note: Don't forget to sort first! The original order doesn't matter for median.

3Problem 3medium

Question:

A data set has a mean of 72 and contains 8 values. If a 9th value of 90 is added, what is the new mean?

💡 Show Solution

Step 1: Find the current sum. Mean=Sumn    Sum=Mean×n=72×8=576\text{Mean} = \frac{\text{Sum}}{n} \implies \text{Sum} = \text{Mean} \times n = 72 \times 8 = 576

Step 2: Add the new value. New Sum=576+90=666\text{New Sum} = 576 + 90 = 666

Step 3: Calculate the new mean. New Mean=6669=74\text{New Mean} = \frac{666}{9} = 74

Answer: New mean = 74

SAT Tip: To find the sum from a mean, multiply: Sum = Mean × Count.

4Problem 4medium

Question:

A data set has a mean of 72 and contains 8 values. If a 9th value of 90 is added, what is the new mean?

💡 Show Solution

Step 1: Find the current sum. Mean=Sumn    Sum=Mean×n=72×8=576\text{Mean} = \frac{\text{Sum}}{n} \implies \text{Sum} = \text{Mean} \times n = 72 \times 8 = 576

Step 2: Add the new value. New Sum=576+90=666\text{New Sum} = 576 + 90 = 666

Step 3: Calculate the new mean. New Mean=6669=74\text{New Mean} = \frac{666}{9} = 74

Answer: New mean = 74

SAT Tip: To find the sum from a mean, multiply: Sum = Mean × Count.

5Problem 5medium

Question:

A class of 20 students has a mean test score of 78. A class of 30 students has a mean test score of 84. What is the combined mean for all 50 students?

💡 Show Solution

Step 1: Find each class's total points. Class 1: 78×20=1,56078 \times 20 = 1{,}560 Class 2: 84×30=2,52084 \times 30 = 2{,}520

Step 2: Find the combined mean. Combined Mean=1,560+2,52020+30=4,08050=81.6\text{Combined Mean} = \frac{1{,}560 + 2{,}520}{20 + 30} = \frac{4{,}080}{50} = 81.6

Answer: 81.6

Key: You cannot just average the two means (that would give 81). The combined mean is a weighted average because the groups have different sizes.

6Problem 6medium

Question:

A class of 20 students has a mean test score of 78. A class of 30 students has a mean test score of 84. What is the combined mean for all 50 students?

💡 Show Solution

Step 1: Find each class's total points. Class 1: 78×20=1,56078 \times 20 = 1{,}560 Class 2: 84×30=2,52084 \times 30 = 2{,}520

Step 2: Find the combined mean. Combined Mean=1,560+2,52020+30=4,08050=81.6\text{Combined Mean} = \frac{1{,}560 + 2{,}520}{20 + 30} = \frac{4{,}080}{50} = 81.6

Answer: 81.6

Key: You cannot just average the two means (that would give 81). The combined mean is a weighted average because the groups have different sizes.

7Problem 7hard

Question:

A survey of 500 residents found that 62% support a new park, with a margin of error of 4%. Which of the following is a valid conclusion?

A) Exactly 62% of all residents support the park. B) Between 58% and 66% of all residents likely support the park. C) At least 58% of all residents definitely support the park. D) The survey is unreliable because of the margin of error.

💡 Show Solution

Analysis of each option:

A) "Exactly 62%" — No. The 62% is an estimate, not an exact figure. ✗

B) "Between 58% and 66% likely support" — Yes! The confidence interval is 62%±4%=[58%,66%]62\% \pm 4\% = [58\%, 66\%]. "Likely" is the right word because it's a probability statement. ✓

C) "At least 58% definitely" — No. "Definitely" is too strong. There's a small chance the true value is outside the interval. ✗

D) "Survey is unreliable" — No. All surveys have margins of error; this doesn't make them unreliable. ✗

Answer: B

SAT Tip: Confidence intervals give a RANGE of plausible values, not a guarantee. Watch for words like "definitely" or "exactly" — they're usually wrong.

8Problem 8hard

Question:

A survey of 500 residents found that 62% support a new park, with a margin of error of 4%. Which of the following is a valid conclusion?

A) Exactly 62% of all residents support the park. B) Between 58% and 66% of all residents likely support the park. C) At least 58% of all residents definitely support the park. D) The survey is unreliable because of the margin of error.

💡 Show Solution

Analysis of each option:

A) "Exactly 62%" — No. The 62% is an estimate, not an exact figure. ✗

B) "Between 58% and 66% likely support" — Yes! The confidence interval is 62%±4%=[58%,66%]62\% \pm 4\% = [58\%, 66\%]. "Likely" is the right word because it's a probability statement. ✓

C) "At least 58% definitely" — No. "Definitely" is too strong. There's a small chance the true value is outside the interval. ✗

D) "Survey is unreliable" — No. All surveys have margins of error; this doesn't make them unreliable. ✗

Answer: B

SAT Tip: Confidence intervals give a RANGE of plausible values, not a guarantee. Watch for words like "definitely" or "exactly" — they're usually wrong.

9Problem 9expert

Question:

Two data sets each have 10 values. Set A: {2, 3, 4, 5, 5, 5, 6, 7, 8, 15}. Set B: {4, 4, 5, 5, 5, 5, 6, 6, 7, 8}. Which set has the greater standard deviation, and which measure of center (mean or median) would differ more between the sets?

💡 Show Solution

Step 1: Compare the spreads.

Set A has values ranging from 2 to 15, with the outlier 15 pulling the data wide. Set B has values from 4 to 8, tightly clustered.

Set A has the greater standard deviation because it is more spread out, especially due to the outlier 15.

Step 2: Compare the means. Set A mean: 2+3+4+5+5+5+6+7+8+1510=6010=6.0\frac{2+3+4+5+5+5+6+7+8+15}{10} = \frac{60}{10} = 6.0 Set B mean: 4+4+5+5+5+5+6+6+7+810=5510=5.5\frac{4+4+5+5+5+5+6+6+7+8}{10} = \frac{55}{10} = 5.5

Step 3: Compare the medians. Set A median: average of 5th and 6th values = 5+52=5\frac{5+5}{2} = 5 Set B median: average of 5th and 6th values = 5+52=5\frac{5+5}{2} = 5

Step 4: The means differ by 0.5, but the medians are identical. So the mean differs more between the sets.

Answer: Set A has the greater standard deviation. The mean differs more between the sets because it is affected by the outlier (15) in Set A, while the median is resistant to outliers.

10Problem 10expert

Question:

Two data sets each have 10 values. Set A: {2, 3, 4, 5, 5, 5, 6, 7, 8, 15}. Set B: {4, 4, 5, 5, 5, 5, 6, 6, 7, 8}. Which set has the greater standard deviation, and which measure of center (mean or median) would differ more between the sets?

💡 Show Solution

Step 1: Compare the spreads.

Set A has values ranging from 2 to 15, with the outlier 15 pulling the data wide. Set B has values from 4 to 8, tightly clustered.

Set A has the greater standard deviation because it is more spread out, especially due to the outlier 15.

Step 2: Compare the means. Set A mean: 2+3+4+5+5+5+6+7+8+1510=6010=6.0\frac{2+3+4+5+5+5+6+7+8+15}{10} = \frac{60}{10} = 6.0 Set B mean: 4+4+5+5+5+5+6+6+7+810=5510=5.5\frac{4+4+5+5+5+5+6+6+7+8}{10} = \frac{55}{10} = 5.5

Step 3: Compare the medians. Set A median: average of 5th and 6th values = 5+52=5\frac{5+5}{2} = 5 Set B median: average of 5th and 6th values = 5+52=5\frac{5+5}{2} = 5

Step 4: The means differ by 0.5, but the medians are identical. So the mean differs more between the sets.

Answer: Set A has the greater standard deviation. The mean differs more between the sets because it is affected by the outlier (15) in Set A, while the median is resistant to outliers.