Measures of Spread
Range, IQR, standard deviation, and variance
Measures of Spread
Introduction
While measures of center tell us the "typical" value, measures of spread (also called measures of variability or dispersion) tell us how spread out or variable the data is. Two datasets can have the same mean but very different spreads!
Example:
- Class A scores: 70, 72, 73, 74, 75 (Mean = 72.8, very consistent)
- Class B scores: 50, 60, 73, 80, 100 (Mean = 72.6, highly variable)
Both classes have similar means, but Class B has much more spread!
Range
Definition
Range: The difference between the maximum and minimum values
Formula:
Calculating Range
Example 1: Test scores: 68, 75, 82, 91, 88
- Max = 91
- Min = 68
- Range = 91 - 68 = 23 points
Example 2: Temperatures (ยฐF): 45, 52, 58, 51, 62, 48
- Max = 62
- Min = 45
- Range = 62 - 45 = 17ยฐF
Properties of Range
Advantages:
โ Very easy to calculate and understand
โ Gives sense of total spread
โ Useful for quick assessment
Disadvantages:
โ Only uses two values (ignores all others)
โ Extremely sensitive to outliers
โ Doesn't tell us about distribution between min and max
โ Increases with sample size (larger samples tend to have more extreme values)
Example of outlier sensitivity:
Without outlier: 10, 12, 13, 14, 15
Range = 15 - 10 = 5
With outlier: 10, 12, 13, 14, 15, 50
Range = 50 - 10 = 40
One outlier dramatically changed the range!
When to Use Range
Appropriate for:
- Quick, rough sense of spread
- Knowing the extreme values matters
- Quality control (acceptable range of values)
Not appropriate when:
- Outliers present
- Need precise measure of variability
- Comparing datasets of different sizes
Interquartile Range (IQR)
Definition
IQR: The range of the middle 50% of data
Formula:
Where:
- Q1 = First quartile (25th percentile)
- Q3 = Third quartile (75th percentile)
Finding Quartiles and IQR
Step 1: Order data from smallest to largest
Step 2: Find median (Q2)
Step 3: Find median of lower half = Q1
Step 4: Find median of upper half = Q3
Step 5: Calculate IQR = Q3 - Q1
Example
Data: 12, 15, 17, 19, 20, 22, 25, 28, 30, 35, 40
Step 1: Already ordered
Step 2: Median (Q2) = 22 (middle value, n=11)
Step 3: Lower half: 12, 15, 17, 19, 20
Q1 = 17 (median of lower half)
Step 4: Upper half: 25, 28, 30, 35, 40
Q3 = 30 (median of upper half)
Step 5: IQR = 30 - 17 = 13
Interpretation: The middle 50% of data spans 13 units
Properties of IQR
Advantages:
โ Resistant to outliers (uses middle 50% only)
โ More stable than range
โ Useful with skewed data
โ Basis for outlier detection (1.5 ร IQR rule)
Disadvantages:
โ Ignores 50% of data (lowest 25%, highest 25%)
โ Less mathematically useful than standard deviation
โ Harder to calculate than range
Using IQR to Identify Outliers
1.5 ร IQR Rule:
Lower fence:
Upper fence:
Outliers: Values below lower fence or above upper fence
Example (from previous):
- Q1 = 17, Q3 = 30, IQR = 13
- Lower fence = 17 - 1.5(13) = 17 - 19.5 = -2.5
- Upper fence = 30 + 1.5(13) = 30 + 19.5 = 49.5
- Any values < -2.5 or > 49.5 are outliers
When to Use IQR
Appropriate when:
โ Distribution is skewed
โ Outliers are present
โ Want resistant measure
โ Describing boxplots
Paired with: Median (both resistant measures)
Variance and Standard Deviation
Why We Need Them
Range and IQR don't use all data values. Variance and standard deviation measure average distance from the mean using ALL data points.
Variance ()
Definition: Average squared deviation from the mean
Formula (sample variance):
Steps to calculate:
- Find mean ()
- Find each deviation:
- Square each deviation:
- Sum squared deviations:
- Divide by
Note: We divide by (not ) for sample variance. This is called Bessel's correction and gives a better estimate of population variance.
Standard Deviation ()
Definition: Square root of variance
Formula (sample standard deviation):
Why take square root?
- Variance is in squared units (pointsยฒ, dollarsยฒ)
- Standard deviation returns to original units (points, dollars)
- More interpretable!
Example Calculation
Data: 10, 12, 14, 16, 18
Step 1: Find mean
Step 2: Find deviations and square them
| | | | |---------|---------------------|------------------------| | 10 | -4 | 16 | | 12 | -2 | 4 | | 14 | 0 | 0 | | 16 | 2 | 4 | | 18 | 4 | 16 |
Step 3: Sum squared deviations
Step 4: Calculate variance
Step 5: Calculate standard deviation
Interpretation: On average, values deviate from the mean by about 3.16 units.
Properties of Standard Deviation
Interpretation:
- Typical distance from mean
- Larger SD = more spread out
- Smaller SD = more clustered around mean
- SD = 0 only when all values are identical
Properties:
- Always โฅ 0
- Same units as original data
- Sensitive to outliers (because we square deviations)
- Used in many statistical procedures
Empirical Rule (for roughly normal distributions):
- About 68% of data within 1 SD of mean
- About 95% of data within 2 SD of mean
- About 99.7% of data within 3 SD of mean
When to Use Standard Deviation
Appropriate when:
โ Distribution is roughly symmetric
โ No extreme outliers
โ Want to use all data
โ Need for statistical inference
โ Describing normal distributions
Paired with: Mean (both use all data, both sensitive to outliers)
Not appropriate when:
โ Distribution is heavily skewed
โ Outliers present
โ Want resistant measure
Choosing the Right Measure
Decision Framework
Distribution Shape:
Symmetric, no outliers:
- Center: Mean
- Spread: Standard deviation
- "The mean is [value] with a standard deviation of [value]"
Skewed or outliers present:
- Center: Median
- Spread: IQR
- "The median is [value] with an IQR of [value]"
Comparison Table
| Measure | Resistant? | Uses All Data? | Units | |----------------------|------------|----------------|-----------------| | Range | No | No (only 2) | Original | | IQR | Yes | No (middle 50%)| Original | | Variance | No | Yes | Squared | | Standard Deviation | No | Yes | Original |
Effect of Transformations
Adding/Subtracting a Constant
Adding to every value:
- Range: Unchanged
- IQR: Unchanged
- SD: Unchanged
Example: Convert test scores from points to percent by adding 50
- Original SD = 5 points
- New SD = 5 percent
- Spread didn't change, just units!
Multiplying/Dividing by a Constant
Multiplying every value by :
- Range: Multiplied by
- IQR: Multiplied by
- SD: Multiplied by
- Variance: Multiplied by
Example: Convert heights from inches to centimeters (multiply by 2.54)
- Original SD = 3 inches
- New SD = 3 ร 2.54 = 7.62 cm
Coefficient of Variation
Definition
Coefficient of Variation (CV): Ratio of standard deviation to mean
Formula:
Purpose
Compare variability across different units or scales
Example:
-
Heights: Mean = 66 inches, SD = 3 inches
CV = (3/66) ร 100% = 4.5% -
Weights: Mean = 150 lbs, SD = 20 lbs
CV = (20/150) ร 100% = 13.3%
Weights are more variable relative to their mean than heights!
When to Use CV
โ Comparing datasets with different units
โ Comparing datasets with very different means
โ Wanting relative (not absolute) measure of spread
Common Mistakes
โ Using SD with skewed data
Use IQR instead!
โ Forgetting units
Range, IQR, SD all have units!
โ Confusing variance and SD
Variance is squared units, SD is original units
โ Dividing by instead of
Sample SD uses (degrees of freedom)
โ Reporting spread without center
Always report both!
โ Comparing SDs of very different datasets
Consider CV for fair comparison
Quick Reference
Range:
- Formula:
- When: Quick assessment
- Property: Sensitive to outliers
IQR:
- Formula:
- When: Skewed data, outliers
- Property: Resistant
Standard Deviation:
- Formula:
- When: Symmetric, no outliers
- Property: Uses all data
Choosing:
- Symmetric โ Mean & SD
- Skewed โ Median & IQR
Outlier Rule:
- Outliers beyond or
Remember: Spread is just as important as center! Two datasets can have the same mean but completely different spreads. Always report both center AND spread when describing data!
๐ Practice Problems
1Problem 1easy
โ Question:
Calculate the range for this dataset: 45, 52, 48, 61, 55, 49, 58
๐ก Show Solution
Step 1: Identify minimum and maximum Data: 45, 52, 48, 61, 55, 49, 58
Minimum value = 45 Maximum value = 61
Step 2: Calculate range Range = Maximum - Minimum Range = 61 - 45 Range = 16
Step 3: Interpret The data spans 16 units Difference between highest and lowest values Simple measure of spread, but affected by outliers
Answer: Range = 16
2Problem 2easy
โ Question:
Given this five-number summary: Min=20, Q1=35, Median=50, Q3=65, Max=90. Calculate the IQR and range.
๐ก Show Solution
Step 1: Calculate IQR IQR = Q3 - Q1 IQR = 65 - 35 IQR = 30
Step 2: Calculate Range Range = Max - Min Range = 90 - 20 Range = 70
Step 3: Compare the two measures IQR = 30 (middle 50% of data spans 30 units) Range = 70 (all data spans 70 units)
Step 4: Interpret IQR is resistant to outliers (only uses middle 50%) Range is sensitive to outliers (uses extremes) IQR is better for skewed data
Answer: IQR = 30, Range = 70
3Problem 3medium
โ Question:
Two classes took the same test. Both have a mean of 75. Class A has a standard deviation of 5, and Class B has a standard deviation of 15. What does this tell you about the two classes?
๐ก Show Solution
Step 1: Understand standard deviation SD measures average distance from the mean Higher SD = more spread out Lower SD = more clustered around mean
Step 2: Analyze Class A (SD = 5) Scores tightly clustered around mean of 75 Most students scored close to 75 Typical deviation from mean: about 5 points Likely range: roughly 65-85 (most within 2 SD) Very consistent performance
Step 3: Analyze Class B (SD = 15) Scores widely spread around mean of 75 More variability in performance Typical deviation from mean: about 15 points Likely range: roughly 45-105 (most within 2 SD) Very inconsistent performance
Step 4: Compare the classes Class A: Homogeneous, similar ability levels, consistent Class B: Heterogeneous, mixed ability levels, varied
Possible explanations for Class B:
- Some students very prepared, others not
- Wider range of abilities
- Some students may have guessed more
- More diverse backgrounds/preparation
Step 5: Teaching implications Class A: Whole-class instruction may work well Class B: May need differentiated instruction
Answer: Class A (SD=5) has students performing very similarly, all close to 75. Class B (SD=15) has much more variability - some students did very well, others poorly. Both classes average the same, but Class B is much more spread out.
4Problem 4medium
โ Question:
Calculate the standard deviation for this small dataset: 2, 4, 6, 8, 10
๐ก Show Solution
Step 1: Calculate the mean Mean = (2 + 4 + 6 + 8 + 10) / 5 = 30 / 5 = 6
Step 2: Calculate deviations from mean Value | Deviation from mean 2 | 2 - 6 = -4 4 | 4 - 6 = -2 6 | 6 - 6 = 0 8 | 8 - 6 = 2 10 | 10 - 6 = 4
Step 3: Square the deviations (-4)ยฒ = 16 (-2)ยฒ = 4 (0)ยฒ = 0 (2)ยฒ = 4 (4)ยฒ = 16
Step 4: Find average of squared deviations (variance) For sample: divide by (n - 1) = 4 Variance = (16 + 4 + 0 + 4 + 16) / 4 = 40 / 4 = 10
Step 5: Take square root (standard deviation) SD = โ10 โ 3.16
Step 6: Interpret On average, values deviate about 3.16 units from the mean of 6 Makes sense: values are 2, 4, 6, 8, 10 (spread from -4 to +4)
Note: We used (n-1) because this is sample data For population, we'd use n
Answer: s โ 3.16
5Problem 5hard
โ Question:
Compare and contrast range, IQR, and standard deviation as measures of spread. When should you use each?
๐ก Show Solution
RANGE:
Definition: Maximum - Minimum
Advantages:
- Very easy to calculate
- Easy to understand
- Shows total spread
Disadvantages:
- Uses only 2 values (ignores all others)
- Extremely sensitive to outliers
- Doesn't show where data is concentrated
When to use:
- Quick rough measure
- When outliers aren't a concern
- Small datasets
INTERQUARTILE RANGE (IQR):
Definition: Q3 - Q1 (middle 50% spread)
Advantages:
- Resistant to outliers
- Shows spread of middle 50%
- Good with skewed data
- Used to identify outliers
Disadvantages:
- Ignores outer 50% of data
- Doesn't use all information
- Less precise than SD
When to use:
- Skewed distributions
- Data with outliers
- Paired with median
- Five-number summary
STANDARD DEVIATION (SD):
Definition: โ[ฮฃ(x - xฬ)ยฒ / (n-1)] Average distance from mean
Advantages:
- Uses ALL data values
- Mathematically precise
- Best for normal distributions
- Standard in statistics
- Used in inference
Disadvantages:
- Not resistant to outliers
- Hard to calculate by hand
- Less intuitive
- Assumes interval data
When to use:
- Symmetric distributions
- Normal distributions
- No major outliers
- Paired with mean
- Statistical inference
SUMMARY TABLE: Resistant? Range: NO, IQR: YES, SD: NO Uses all data? Range: NO, IQR: NO, SD: YES Easy to calculate? Range: YES, IQR: MEDIUM, SD: NO Best for skewed data? Range: NO, IQR: YES, SD: NO Best for normal data? Range: NO, IQR: NO, SD: YES
PAIRING: Mean + SD (symmetric data, no outliers) Median + IQR (skewed data, outliers present)
Answer: Use range for quick estimates. Use IQR for skewed data or outliers (resistant). Use SD for normal distributions (uses all data, best statistical properties). Match with mean (SD) or median (IQR).
Practice with Flashcards
Review key concepts with our flashcard system
Browse All Topics
Explore other calculus topics