Measures of Spread

Introduction

While measures of center tell us the "typical" value, measures of spread (also called measures of variability or dispersion) tell us how spread out or variable the data is. Two datasets can have the same mean but very different spreads!

Example:

Class A scores: 70, 72, 73, 74, 75 (Mean = 72.8, very consistent)
Class B scores: 50, 60, 73, 80, 100 (Mean = 72.6, highly variable)

Both classes have similar means, but Class B has much more spread!

Range

Definition

Range: The difference between the maximum and minimum values

Formula: $Range = Max - Min$

Calculating Range

Example 1: Test scores: 68, 75, 82, 91, 88

Max = 91
Min = 68
Range = 91 - 68 = 23 points

Example 2: Temperatures (°F): 45, 52, 58, 51, 62, 48

Max = 62
Min = 45
Range = 62 - 45 = 17°F

Properties of Range

Advantages: ✓ Very easy to calculate and understand
✓ Gives sense of total spread
✓ Useful for quick assessment

Disadvantages: ❌ Only uses two values (ignores all others)
❌ Extremely sensitive to outliers
❌ Doesn't tell us about distribution between min and max
❌ Increases with sample size (larger samples tend to have more extreme values)

Example of outlier sensitivity:

Without outlier: 10, 12, 13, 14, 15
Range = 15 - 10 = 5

With outlier: 10, 12, 13, 14, 15, 50
Range = 50 - 10 = 40

One outlier dramatically changed the range!

When to Use Range

Appropriate for:

Quick, rough sense of spread
Knowing the extreme values matters
Quality control (acceptable range of values)

Not appropriate when:

Outliers present
Need precise measure of variability
Comparing datasets of different sizes

Interquartile Range (IQR)

Definition

IQR: The range of the middle 50% of data

Formula: $IQR = Q3 - Q1$

Where:

Q1 = First quartile (25th percentile)
Q3 = Third quartile (75th percentile)

Finding Quartiles and IQR

Step 1: Order data from smallest to largest

Step 2: Find median (Q2)

Step 3: Find median of lower half = Q1

Step 4: Find median of upper half = Q3

Step 5: Calculate IQR = Q3 - Q1

Example

Data: 12, 15, 17, 19, 20, 22, 25, 28, 30, 35, 40

Step 1: Already ordered

Step 2: Median (Q2) = 22 (middle value, n=11)

Step 3: Lower half: 12, 15, 17, 19, 20
Q1 = 17 (median of lower half)

Step 4: Upper half: 25, 28, 30, 35, 40
Q3 = 30 (median of upper half)

Step 5: IQR = 30 - 17 = 13

Interpretation: The middle 50% of data spans 13 units

Properties of IQR

Advantages: ✓ Resistant to outliers (uses middle 50% only)
✓ More stable than range
✓ Useful with skewed data
✓ Basis for outlier detection (1.5 × IQR rule)

Disadvantages: ❌ Ignores 50% of data (lowest 25%, highest 25%)
❌ Less mathematically useful than standard deviation
❌ Harder to calculate than range

Using IQR to Identify Outliers

1.5 × IQR Rule:

Lower fence: $Q1 - 1.5 \times IQR$
Upper fence: $Q3 + 1.5 \times IQR$

Outliers: Values below lower fence or above upper fence

Example (from previous):

Q1 = 17, Q3 = 30, IQR = 13
Lower fence = 17 - 1.5(13) = 17 - 19.5 = -2.5
Upper fence = 30 + 1.5(13) = 30 + 19.5 = 49.5
Any values < -2.5 or > 49.5 are outliers

When to Use IQR

Appropriate when: ✓ Distribution is skewed
✓ Outliers are present
✓ Want resistant measure
✓ Describing boxplots

Paired with: Median (both resistant measures)

Variance and Standard Deviation

Why We Need Them

Range and IQR don't use all data values. Variance and standard deviation measure average distance from the mean using ALL data points.

Variance ( $s^2$ )

Definition: Average squared deviation from the mean

Formula (sample variance): $s^2 = \frac{\sum(x_i - \bar{x})^2}{n-1}$

Steps to calculate:

Find mean ( $\bar{x}$ )
Find each deviation: $(x_i - \bar{x})$
Square each deviation: $(x_i - \bar{x})^2$
Sum squared deviations: $\sum(x_i - \bar{x})^2$
Divide by $n-1$

Note: We divide by $n-1$ (not $n$ ) for sample variance. This is called Bessel's correction and gives a better estimate of population variance.

Standard Deviation ( $s$ )

Definition: Square root of variance

Formula (sample standard deviation): $s = \sqrt{\frac{\sum(x_i - \bar{x})^2}{n-1}}$

Why take square root?

Variance is in squared units (points², dollars²)
Standard deviation returns to original units (points, dollars)
More interpretable!

Example Calculation

Data: 10, 12, 14, 16, 18

Step 1: Find mean $\bar{x} = \frac{10+12+14+16+18}{5} = \frac{70}{5} = 14$

Step 2: Find deviations and square them

| $x_i$ | $x_i - \bar{x}$ | $(x_i - \bar{x})^2$ | |---------|---------------------|------------------------| | 10 | -4 | 16 | | 12 | -2 | 4 | | 14 | 0 | 0 | | 16 | 2 | 4 | | 18 | 4 | 16 |

Step 3: Sum squared deviations $\sum(x_i - \bar{x})^2 = 16 + 4 + 0 + 4 + 16 = 40$

Step 4: Calculate variance $s^2 = \frac{40}{5-1} = \frac{40}{4} = 10$

Step 5: Calculate standard deviation $s = \sqrt{10} \approx 3.16$

Interpretation: On average, values deviate from the mean by about 3.16 units.

Properties of Standard Deviation

Interpretation:

Typical distance from mean
Larger SD = more spread out
Smaller SD = more clustered around mean
SD = 0 only when all values are identical

Properties:

Always ≥ 0
Same units as original data
Sensitive to outliers (because we square deviations)
Used in many statistical procedures

Empirical Rule (for roughly normal distributions):

About 68% of data within 1 SD of mean
About 95% of data within 2 SD of mean
About 99.7% of data within 3 SD of mean

When to Use Standard Deviation

Appropriate when: ✓ Distribution is roughly symmetric
✓ No extreme outliers
✓ Want to use all data
✓ Need for statistical inference
✓ Describing normal distributions

Paired with: Mean (both use all data, both sensitive to outliers)

Not appropriate when: ❌ Distribution is heavily skewed
❌ Outliers present
❌ Want resistant measure

Choosing the Right Measure

Decision Framework

Distribution Shape:

Symmetric, no outliers:

Center: Mean
Spread: Standard deviation
"The mean is [value] with a standard deviation of [value]"

Skewed or outliers present:

Center: Median
Spread: IQR
"The median is [value] with an IQR of [value]"

Comparison Table

| Measure | Resistant? | Uses All Data? | Units | |----------------------|------------|----------------|-----------------| | Range | No | No (only 2) | Original | | IQR | Yes | No (middle 50%)| Original | | Variance | No | Yes | Squared | | Standard Deviation | No | Yes | Original |

Effect of Transformations

Adding/Subtracting a Constant

Adding $c$ to every value:

Range: Unchanged
IQR: Unchanged
SD: Unchanged

Example: Convert test scores from points to percent by adding 50

Original SD = 5 points
New SD = 5 percent
Spread didn't change, just units!

Multiplying/Dividing by a Constant

Multiplying every value by $c$ :

Range: Multiplied by $|c|$
IQR: Multiplied by $|c|$
SD: Multiplied by $|c|$
Variance: Multiplied by $c^2$

Example: Convert heights from inches to centimeters (multiply by 2.54)

Original SD = 3 inches
New SD = 3 × 2.54 = 7.62 cm

Coefficient of Variation

Definition

Coefficient of Variation (CV): Ratio of standard deviation to mean

Formula: $CV = \frac{s}{\bar{x}} \times 100\%$

Purpose

Compare variability across different units or scales

Example:

Heights: Mean = 66 inches, SD = 3 inches
CV = (3/66) × 100% = 4.5%
Weights: Mean = 150 lbs, SD = 20 lbs
CV = (20/150) × 100% = 13.3%

Weights are more variable relative to their mean than heights!

When to Use CV

✓ Comparing datasets with different units
✓ Comparing datasets with very different means
✓ Wanting relative (not absolute) measure of spread

Common Mistakes

❌ Using SD with skewed data
Use IQR instead!

❌ Forgetting units
Range, IQR, SD all have units!

❌ Confusing variance and SD
Variance is squared units, SD is original units

❌ Dividing by $n$ instead of $n-1$
Sample SD uses $n-1$ (degrees of freedom)

❌ Reporting spread without center
Always report both!

❌ Comparing SDs of very different datasets
Consider CV for fair comparison

Quick Reference

Range:

Formula: $Max - Min$
When: Quick assessment
Property: Sensitive to outliers

IQR:

Formula: $Q3 - Q1$
When: Skewed data, outliers
Property: Resistant

Standard Deviation:

Formula: $s = \sqrt{\frac{\sum(x_i - \bar{x})^2}{n-1}}$
When: Symmetric, no outliers
Property: Uses all data

Choosing:

Symmetric → Mean & SD
Skewed → Median & IQR

Outlier Rule:

Outliers beyond $Q1 - 1.5 \times IQR$ or $Q3 + 1.5 \times IQR$

Remember: Spread is just as important as center! Two datasets can have the same mean but completely different spreads. Always report both center AND spread when describing data!

Measures of Spread

Measures of Spread

Introduction

Range

Definition

Calculating Range

Properties of Range

When to Use Range

Interquartile Range (IQR)

Definition

Finding Quartiles and IQR

Example

Properties of IQR

Using IQR to Identify Outliers

When to Use IQR

Variance and Standard Deviation

Why We Need Them

Variance ( $s^2$ )

Standard Deviation ( $s$ )

Example Calculation

Properties of Standard Deviation

When to Use Standard Deviation

Choosing the Right Measure

Decision Framework

Comparison Table

Effect of Transformations

Adding/Subtracting a Constant

Multiplying/Dividing by a Constant

Coefficient of Variation

Definition

Purpose

When to Use CV

Common Mistakes

Quick Reference

📚 Practice Problems

Practice with Flashcards

Browse All Topics

Measures of Spread

Measures of Spread

Introduction

Range

Definition

Calculating Range

Properties of Range

When to Use Range

Interquartile Range (IQR)

Definition

Finding Quartiles and IQR

Example

Properties of IQR

Using IQR to Identify Outliers

When to Use IQR

Variance and Standard Deviation

Why We Need Them

Variance (s2s^2s2)

Standard Deviation (sss)

Example Calculation

Properties of Standard Deviation

When to Use Standard Deviation

Choosing the Right Measure

Decision Framework

Comparison Table

Effect of Transformations

Adding/Subtracting a Constant

Multiplying/Dividing by a Constant

Coefficient of Variation

Definition

Purpose

When to Use CV

Common Mistakes

Quick Reference

📚 Practice Problems

Practice with Flashcards

Browse All Topics

Variance ( $s^2$ )

Standard Deviation ( $s$ )