Measures of Center
Mean, median, and mode
Measures of Center
Introduction
Measures of center describe the "typical" or "middle" value in a dataset. They help us answer: "What is a representative value?" The three main measures — mean, median, and mode — each have different properties and appropriate uses.
The Mean
Definition
Mean (): The arithmetic average
Formula:
Where:
- = sum of all values
- = number of observations
Calculating the Mean
Example 1: Test scores: 85, 90, 78, 92, 88
Mean test score = 86.6 points
Example 2: Heights (in inches): 64, 67, 65, 70, 64
Mean height = 66 inches
Properties of the Mean
Uses all data:
- Every value contributes
- Change any value, mean changes
- Adding up all deviations from mean = 0
Balance point:
- If data were on a number line with equal weights, mean is where it would balance
- Sum of distances below mean = sum of distances above mean
Sensitive to outliers:
- Extreme values pull mean toward them
- One very high/low value can change mean substantially
Example showing outlier effect:
Without outlier: 10, 12, 11, 13, 12
With outlier: 10, 12, 11, 13, 12, 100
The outlier (100) dramatically increased the mean from 11.6 to 26.3!
When to Use the Mean
Appropriate when:
✓ Distribution is roughly symmetric
✓ No extreme outliers
✓ Need to use all data values
✓ Want mathematical properties (use in further calculations)
Not appropriate when:
❌ Distribution is heavily skewed
❌ Outliers present
❌ Want resistant measure
❌ Data is ordinal (ranked) only
The Median
Definition
Median: The middle value when data is ordered
- 50th percentile
- Splits data in half
- Half values below, half above
Finding the Median
Step 1: Order data from smallest to largest
Step 2: Find middle position
If is odd: Median = middle value
Position =
If is even: Median = average of two middle values
Positions = and
Examples
Example 1 (odd n): Scores: 78, 85, 90, 82, 88
Step 1: Order: 78, 82, 85, 88, 90
Step 2: (odd), position =
Median = 85 (the 3rd value)
Example 2 (even n): Scores: 78, 85, 90, 82, 88, 92
Step 1: Order: 78, 82, 85, 88, 90, 92
Step 2: (even), positions = 3 and 4
Step 3: Values are 85 and 88
Median =
Properties of the Median
Resistant to outliers:
- Position-based, not value-based
- Extreme values don't affect it much
- More stable measure for skewed data
Example:
Data: 10, 12, 11, 13, 12 → Median = 12
With outlier: 10, 12, 11, 13, 12, 100 → Median = 12
The outlier didn't change the median!
50-50 split:
- Half the data ≤ median
- Half the data ≥ median
- Useful for understanding data distribution
Not affected by exact values:
- Only needs order and middle position
- Works well for ordinal data (rankings)
When to Use the Median
Appropriate when:
✓ Distribution is skewed
✓ Outliers are present
✓ Want resistant measure
✓ Data is ordinal (ordered categories)
✓ Interested in "typical" individual
Examples where median is better:
- Income (right-skewed, few very high earners)
- Home prices (right-skewed, few very expensive homes)
- Reaction times (right-skewed, occasional very slow responses)
The Mode
Definition
Mode: The most frequently occurring value
- Can have one mode (unimodal)
- Can have multiple modes (bimodal, multimodal)
- Can have no mode (all values occur once)
Finding the Mode
Count frequency of each value, identify most common
Example 1: Scores: 85, 90, 85, 92, 88, 85
- 85 appears 3 times
- 90, 92, 88 each appear once
- Mode = 85
Example 2: Scores: 85, 90, 85, 92, 90, 88
- 85 appears twice
- 90 appears twice
- Modes = 85 and 90 (bimodal)
Example 3: Scores: 85, 90, 92, 88, 82
- All values appear once
- No mode
When to Use the Mode
Appropriate when:
✓ Categorical data
✓ Want most common value
✓ Describing bimodal distributions
Examples:
- "The most common car color is white" (mode of categorical data)
- "The distribution is bimodal with peaks at 65 and 72" (describing shape)
Not very useful for:
❌ Continuous numerical data (values rarely repeat)
❌ Summarizing center of distribution
Comparing Mean and Median
Relationship to Distribution Shape
Symmetric distribution:
Both measures give similar values, either can be used
Right-skewed distribution:
Mean pulled right by high values in tail
Median more representative of "typical" value
Left-skewed distribution:
Mean pulled left by low values in tail
Median more representative of "typical" value
Visual Representation
Symmetric: Mean and median at same location (center of distribution)
Right-skewed: Mean to the right of median (toward tail)
Left-skewed: Mean to the left of median (toward tail)
Choosing Between Mean and Median
Use Mean when:
- Distribution is symmetric
- No outliers or extreme skewness
- Want to use all data
- Need for further calculations (variance, hypothesis tests)
Use Median when:
- Distribution is skewed
- Outliers are present
- Want resistant measure
- Ordinal data
- Interested in "typical" individual rather than arithmetic average
Real-world example: Income
Town income data:
- Median income: 45,000 dollars
- Mean income: 75,000 dollars
Mean is much higher because a few very wealthy residents pull it up. The median of 45,000 dollars better represents the "typical" resident's income.
Weighted Mean
Definition
Weighted Mean: When values have different importance or frequency
Formula:
Where:
- = weight for each value
- = data value
Example: Course Grade
Your course grade is calculated as:
- Tests: 60% of grade (weight = 0.60)
- Homework: 25% of grade (weight = 0.25)
- Final: 15% of grade (weight = 0.15)
Scores:
- Test average: 85
- Homework average: 92
- Final exam: 78
Weighted mean:
Course grade = 85.7%
Note: Cannot just average 85, 92, and 78 because they have different weights!
Trimmed Mean
Definition
Trimmed Mean: Mean calculated after removing extreme values
Common: 5% trimmed mean (remove lowest 5% and highest 5%)
Purpose
- More resistant than regular mean
- Still uses most of data
- Compromise between mean and median
Example
Data (ordered): 10, 12, 13, 14, 15, 16, 17, 18, 19, 100
Regular mean: (affected by outlier 100)
10% trimmed mean: Remove lowest 10% (10) and highest 10% (100)
Trimmed mean (15.5) more representative than regular mean (23.4)
Common Mistakes
❌ Using mean with skewed data
Use median instead!
❌ Forgetting to order data for median
Always sort first!
❌ Reporting mode for continuous data
Usually not meaningful when values don't repeat
❌ Not specifying units
Always include units (inches, dollars, points, etc.)
❌ Confusing which measure to use
Consider shape and outliers
❌ Calculating mean of percentages
May need weighted mean if groups are different sizes
Quick Reference
Mean:
- Formula:
- When: Symmetric, no outliers
- Property: Uses all data, sensitive to extremes
- Symbol: (sample), (population)
Median:
- Method: Middle value when ordered
- When: Skewed, outliers present
- Property: Resistant, 50-50 split
- Symbol: M or
Mode:
- Method: Most frequent value
- When: Categorical data, describe shape
- Property: Can have multiple or none
Relationship to shape:
- Symmetric: Mean ≈ Median
- Right-skewed: Mean > Median
- Left-skewed: Mean < Median
Remember: The best measure of center depends on the distribution's shape and the presence of outliers. When in doubt, report both mean and median!
📚 Practice Problems
1Problem 1easy
❓ Question:
Calculate the mean and median for this dataset: 8, 12, 15, 15, 18, 20, 22
💡 Show Solution
Step 1: Calculate the mean Mean = sum of all values / number of values Sum = 8 + 12 + 15 + 15 + 18 + 20 + 22 = 110 Number of values (n) = 7 Mean = 110 / 7 ≈ 15.71
Step 2: Calculate the median Data is already in order: 8, 12, 15, 15, 18, 20, 22 n = 7 (odd number) Median position = (n + 1) / 2 = (7 + 1) / 2 = 4th value Median = 15
Step 3: Verify Count: 1st, 2nd, 3rd, 4th, 5th, 6th, 7th Values: 8, 12, 15, [15], 18, 20, 22 ↑ median (4th value)
Answer: Mean ≈ 15.71, Median = 15
2Problem 2hard
❓ Question:
A dataset has a mean of 50 and a median of 50. If you add a new value of 100 to the dataset, will the mean or median change more? Explain your reasoning.
💡 Show Solution
Step 1: Understand the initial condition Mean = 50, Median = 50 This suggests symmetric distribution Data is balanced around 50
Step 2: Analyze effect on MEAN The mean uses ALL values in its calculation New mean = (sum of old values + 100) / (n + 1)
Adding 100 (which is 50 above the current mean):
- Pulls the mean UP
- Amount depends on sample size
- But definitely increases
If n = 9 (10 values total after adding 100):
- Old sum ≈ 9 × 50 = 450
- New sum = 450 + 100 = 550
- New mean = 550 / 10 = 55
- Change: +5 points
Step 3: Analyze effect on MEDIAN The median only depends on MIDDLE position(s) Adding one value:
- Changes sample size from n to n+1
- May shift which value(s) are in middle
- But only by one position
If n was odd (say 9): old median was 5th value If n is now even (10): new median is average of 5th and 6th values The value 100 goes to the end, doesn't become a middle value Median shifts only slightly (maybe to 50.5 or 51 depending on data)
Step 4: Compare magnitude of changes Mean: Increased significantly (we calculated +5 for n=9) Median: Increased minimally (maybe +0 to +2 at most)
The mean is SENSITIVE to extreme values The median is RESISTANT to extreme values
Answer: The MEAN will change more. It's sensitive to all values, especially outliers. Adding 100 (far above 50) pulls the mean up substantially. The median is resistant - it only depends on middle positions, so adding one extreme value has minimal effect.
3Problem 3medium
❓ Question:
Five students scored: 85, 90, 88, 92, and 95 on a test. A sixth student who was absent takes the test and scores 40. How does this affect the mean and median?
💡 Show Solution
Step 1: Calculate original statistics Original data: 85, 88, 90, 92, 95 (already ordered) n = 5
Original mean = (85 + 88 + 90 + 92 + 95) / 5 = 450 / 5 = 90 Original median = 3rd value = 90
Step 2: Add the new score New data: 40, 85, 88, 90, 92, 95 (ordered) n = 6
New mean = (40 + 85 + 88 + 90 + 92 + 95) / 6 = 490 / 6 ≈ 81.67 New median = average of 3rd and 4th values = (88 + 90) / 2 = 89
Step 3: Calculate changes Mean: 90 → 81.67 Change = -8.33 points (decreased by 9.3%)
Median: 90 → 89 Change = -1 point (decreased by 1.1%)
Step 4: Explain the difference Mean is NOT RESISTANT: Affected greatly by outliers The score of 40 is much lower than others, pulling mean down significantly
Median is RESISTANT: Only depends on middle values Adding one value only shifts the middle position slightly
Answer: Mean dropped from 90 to 81.67 (decrease of 8.33) Median dropped from 90 to 89 (decrease of 1) The mean was much more affected by the outlier than the median.
4Problem 4medium
❓ Question:
A company has 10 employees with salaries: 32k, 35k, 40k, 45k, 300k. Which measure of center (mean or median) better represents the "typical" employee salary? Explain.
💡 Show Solution
Step 1: Calculate both measures Data: 30, 32, 35, 35, 38, 40, 42, 45, 48, 300 (in thousands) n = 10
Mean = (30 + 32 + 35 + 35 + 38 + 40 + 42 + 45 + 48 + 300) / 10 = 645 / 10 = $64.5k
Median = average of 5th and 6th values = (38 + 40) / 2 = $39k
Step 2: Compare to actual data 9 employees make: 48k (most around 45k) 1 employee (CEO) makes: $300k
Mean (39k): Right in the middle of what most employees make
Step 3: Determine which is better The mean is heavily influenced by the CEO's salary 64.5k
The median is resistant to the outlier $39k represents the middle of employee salaries Half make more, half make less
Step 4: Make recommendation Median is better here because:
- Data is strongly skewed right (one extreme value)
- Mean is misleading (inflated by CEO)
- Median represents actual middle of employee salaries
- If asked "what's a typical salary?" - $39k is more accurate
Answer: MEDIAN (64.5k) is inflated by the CEO's $300k salary. With skewed data and outliers, median is the better measure of center.
5Problem 5hard
❓ Question:
For what type of distributions should you use the mean vs. median as the measure of center? Provide examples.
💡 Show Solution
USE THE MEAN when:
-
Distribution is symmetric
- Mean and median will be approximately equal
- Mean uses all data points (more information)
- Example: Heights, test scores (when roughly normal)
-
No outliers or extreme values
- Mean won't be distorted
- All values contribute equally
- Example: Temperatures in summer months
-
You want a measure that uses all data
- Mean incorporates every value
- More sensitive to changes
- Example: Quality control where all measurements matter
-
Normal distribution
- Mean is the best measure
- Optimal statistical properties
- Example: IQ scores, measurement errors
USE THE MEDIAN when:
-
Distribution is skewed
- Median not affected by skew
- Better represents "typical" value
- Example: Income (right-skewed), home prices
-
Outliers are present
- Median is resistant/robust
- Not influenced by extreme values
- Example: Salaries with CEO, test scores with one failure
-
Ordinal data
- When data is ranked/ordered but differences aren't equal
- Can find middle rank
- Example: Satisfaction ratings (1-5 scale)
-
Open-ended distributions
- When highest/lowest values are unknown
- Example: Income ">$200k", age "65+"
SUMMARY TABLE: Symmetric, no outliers → Use MEAN Skewed or outliers → Use MEDIAN Want all data used → Use MEAN Want resistant measure → Use MEDIAN
Answer: Use mean for symmetric distributions without outliers (normal data). Use median for skewed distributions or data with outliers (income, housing prices). Median is resistant; mean uses all data.
Practice with Flashcards
Review key concepts with our flashcard system
Browse All Topics
Explore other calculus topics