Measures of Spread

Range, IQR, standard deviation, and variance

Measures of Spread

Introduction

While measures of center tell us the "typical" value, measures of spread (also called measures of variability or dispersion) tell us how spread out or variable the data is. Two datasets can have the same mean but very different spreads!

Example:

  • Class A scores: 70, 72, 73, 74, 75 (Mean = 72.8, very consistent)
  • Class B scores: 50, 60, 73, 80, 100 (Mean = 72.6, highly variable)

Both classes have similar means, but Class B has much more spread!

Range

Definition

Range: The difference between the maximum and minimum values

Formula: Range=MaxMinRange = Max - Min

Calculating Range

Example 1: Test scores: 68, 75, 82, 91, 88

  • Max = 91
  • Min = 68
  • Range = 91 - 68 = 23 points

Example 2: Temperatures (°F): 45, 52, 58, 51, 62, 48

  • Max = 62
  • Min = 45
  • Range = 62 - 45 = 17°F

Properties of Range

Advantages: ✓ Very easy to calculate and understand
✓ Gives sense of total spread
✓ Useful for quick assessment

Disadvantages: ❌ Only uses two values (ignores all others)
❌ Extremely sensitive to outliers
❌ Doesn't tell us about distribution between min and max
❌ Increases with sample size (larger samples tend to have more extreme values)

Example of outlier sensitivity:

Without outlier: 10, 12, 13, 14, 15
Range = 15 - 10 = 5

With outlier: 10, 12, 13, 14, 15, 50
Range = 50 - 10 = 40

One outlier dramatically changed the range!

When to Use Range

Appropriate for:

  • Quick, rough sense of spread
  • Knowing the extreme values matters
  • Quality control (acceptable range of values)

Not appropriate when:

  • Outliers present
  • Need precise measure of variability
  • Comparing datasets of different sizes

Interquartile Range (IQR)

Definition

IQR: The range of the middle 50% of data

Formula: IQR=Q3Q1IQR = Q3 - Q1

Where:

  • Q1 = First quartile (25th percentile)
  • Q3 = Third quartile (75th percentile)

Finding Quartiles and IQR

Step 1: Order data from smallest to largest

Step 2: Find median (Q2)

Step 3: Find median of lower half = Q1

Step 4: Find median of upper half = Q3

Step 5: Calculate IQR = Q3 - Q1

Example

Data: 12, 15, 17, 19, 20, 22, 25, 28, 30, 35, 40

Step 1: Already ordered

Step 2: Median (Q2) = 22 (middle value, n=11)

Step 3: Lower half: 12, 15, 17, 19, 20
Q1 = 17 (median of lower half)

Step 4: Upper half: 25, 28, 30, 35, 40
Q3 = 30 (median of upper half)

Step 5: IQR = 30 - 17 = 13

Interpretation: The middle 50% of data spans 13 units

Properties of IQR

Advantages: ✓ Resistant to outliers (uses middle 50% only)
✓ More stable than range
✓ Useful with skewed data
✓ Basis for outlier detection (1.5 × IQR rule)

Disadvantages: ❌ Ignores 50% of data (lowest 25%, highest 25%)
❌ Less mathematically useful than standard deviation
❌ Harder to calculate than range

Using IQR to Identify Outliers

1.5 × IQR Rule:

Lower fence: Q11.5×IQRQ1 - 1.5 \times IQR
Upper fence: Q3+1.5×IQRQ3 + 1.5 \times IQR

Outliers: Values below lower fence or above upper fence

Example (from previous):

  • Q1 = 17, Q3 = 30, IQR = 13
  • Lower fence = 17 - 1.5(13) = 17 - 19.5 = -2.5
  • Upper fence = 30 + 1.5(13) = 30 + 19.5 = 49.5
  • Any values < -2.5 or > 49.5 are outliers

When to Use IQR

Appropriate when: ✓ Distribution is skewed
✓ Outliers are present
✓ Want resistant measure
✓ Describing boxplots

Paired with: Median (both resistant measures)

Variance and Standard Deviation

Why We Need Them

Range and IQR don't use all data values. Variance and standard deviation measure average distance from the mean using ALL data points.

Variance (s2s^2)

Definition: Average squared deviation from the mean

Formula (sample variance): s2=(xixˉ)2n1s^2 = \frac{\sum(x_i - \bar{x})^2}{n-1}

Steps to calculate:

  1. Find mean (xˉ\bar{x})
  2. Find each deviation: (xixˉ)(x_i - \bar{x})
  3. Square each deviation: (xixˉ)2(x_i - \bar{x})^2
  4. Sum squared deviations: (xixˉ)2\sum(x_i - \bar{x})^2
  5. Divide by n1n-1

Note: We divide by n1n-1 (not nn) for sample variance. This is called Bessel's correction and gives a better estimate of population variance.

Standard Deviation (ss)

Definition: Square root of variance

Formula (sample standard deviation): s=(xixˉ)2n1s = \sqrt{\frac{\sum(x_i - \bar{x})^2}{n-1}}

Why take square root?

  • Variance is in squared units (points², dollars²)
  • Standard deviation returns to original units (points, dollars)
  • More interpretable!

Example Calculation

Data: 10, 12, 14, 16, 18

Step 1: Find mean xˉ=10+12+14+16+185=705=14\bar{x} = \frac{10+12+14+16+18}{5} = \frac{70}{5} = 14

Step 2: Find deviations and square them

| xix_i | xixˉx_i - \bar{x} | (xixˉ)2(x_i - \bar{x})^2 | |---------|---------------------|------------------------| | 10 | -4 | 16 | | 12 | -2 | 4 | | 14 | 0 | 0 | | 16 | 2 | 4 | | 18 | 4 | 16 |

Step 3: Sum squared deviations (xixˉ)2=16+4+0+4+16=40\sum(x_i - \bar{x})^2 = 16 + 4 + 0 + 4 + 16 = 40

Step 4: Calculate variance s2=4051=404=10s^2 = \frac{40}{5-1} = \frac{40}{4} = 10

Step 5: Calculate standard deviation s=103.16s = \sqrt{10} \approx 3.16

Interpretation: On average, values deviate from the mean by about 3.16 units.

Properties of Standard Deviation

Interpretation:

  • Typical distance from mean
  • Larger SD = more spread out
  • Smaller SD = more clustered around mean
  • SD = 0 only when all values are identical

Properties:

  • Always ≥ 0
  • Same units as original data
  • Sensitive to outliers (because we square deviations)
  • Used in many statistical procedures

Empirical Rule (for roughly normal distributions):

  • About 68% of data within 1 SD of mean
  • About 95% of data within 2 SD of mean
  • About 99.7% of data within 3 SD of mean

When to Use Standard Deviation

Appropriate when: ✓ Distribution is roughly symmetric
✓ No extreme outliers
✓ Want to use all data
✓ Need for statistical inference
✓ Describing normal distributions

Paired with: Mean (both use all data, both sensitive to outliers)

Not appropriate when: ❌ Distribution is heavily skewed
❌ Outliers present
❌ Want resistant measure

Choosing the Right Measure

Decision Framework

Distribution Shape:

Symmetric, no outliers:

  • Center: Mean
  • Spread: Standard deviation
  • "The mean is [value] with a standard deviation of [value]"

Skewed or outliers present:

  • Center: Median
  • Spread: IQR
  • "The median is [value] with an IQR of [value]"

Comparison Table

| Measure | Resistant? | Uses All Data? | Units | |----------------------|------------|----------------|-----------------| | Range | No | No (only 2) | Original | | IQR | Yes | No (middle 50%)| Original | | Variance | No | Yes | Squared | | Standard Deviation | No | Yes | Original |

Effect of Transformations

Adding/Subtracting a Constant

Adding cc to every value:

  • Range: Unchanged
  • IQR: Unchanged
  • SD: Unchanged

Example: Convert test scores from points to percent by adding 50

  • Original SD = 5 points
  • New SD = 5 percent
  • Spread didn't change, just units!

Multiplying/Dividing by a Constant

Multiplying every value by cc:

  • Range: Multiplied by c|c|
  • IQR: Multiplied by c|c|
  • SD: Multiplied by c|c|
  • Variance: Multiplied by c2c^2

Example: Convert heights from inches to centimeters (multiply by 2.54)

  • Original SD = 3 inches
  • New SD = 3 × 2.54 = 7.62 cm

Coefficient of Variation

Definition

Coefficient of Variation (CV): Ratio of standard deviation to mean

Formula: CV=sxˉ×100%CV = \frac{s}{\bar{x}} \times 100\%

Purpose

Compare variability across different units or scales

Example:

  • Heights: Mean = 66 inches, SD = 3 inches
    CV = (3/66) × 100% = 4.5%

  • Weights: Mean = 150 lbs, SD = 20 lbs
    CV = (20/150) × 100% = 13.3%

Weights are more variable relative to their mean than heights!

When to Use CV

✓ Comparing datasets with different units
✓ Comparing datasets with very different means
✓ Wanting relative (not absolute) measure of spread

Common Mistakes

Using SD with skewed data
Use IQR instead!

Forgetting units
Range, IQR, SD all have units!

Confusing variance and SD
Variance is squared units, SD is original units

Dividing by nn instead of n1n-1
Sample SD uses n1n-1 (degrees of freedom)

Reporting spread without center
Always report both!

Comparing SDs of very different datasets
Consider CV for fair comparison

Quick Reference

Range:

  • Formula: MaxMinMax - Min
  • When: Quick assessment
  • Property: Sensitive to outliers

IQR:

  • Formula: Q3Q1Q3 - Q1
  • When: Skewed data, outliers
  • Property: Resistant

Standard Deviation:

  • Formula: s=(xixˉ)2n1s = \sqrt{\frac{\sum(x_i - \bar{x})^2}{n-1}}
  • When: Symmetric, no outliers
  • Property: Uses all data

Choosing:

  • Symmetric → Mean & SD
  • Skewed → Median & IQR

Outlier Rule:

  • Outliers beyond Q11.5×IQRQ1 - 1.5 \times IQR or Q3+1.5×IQRQ3 + 1.5 \times IQR

Remember: Spread is just as important as center! Two datasets can have the same mean but completely different spreads. Always report both center AND spread when describing data!

📚 Practice Problems

No example problems available yet.