Box Plots

Create and interpret box-and-whisker plots

Box Plots

What is a Box Plot?

A box plot (also called box-and-whisker plot) is a visual way to display the distribution of data using five key numbers.

Purpose:

  • Show spread of data
  • Identify center of data
  • Spot outliers
  • Compare multiple data sets

Visual: A box with lines (whiskers) extending from each side

The Five-Number Summary

Box plots are based on five key values:

1. Minimum: Smallest value 2. Q1 (First Quartile): 25th percentile 3. Median (Q2): 50th percentile (middle value) 4. Q3 (Third Quartile): 75th percentile 5. Maximum: Largest value

Example data: 2, 4, 6, 8, 10, 12, 14, 16, 18

Minimum: 2 Q1: 6 (25% of data below this) Median: 10 (middle value) Q3: 14 (75% of data below this) Maximum: 18

Finding the Five-Number Summary

Step 1: Order the data (smallest to largest)

Step 2: Find the median (Q2)

  • If odd number of values: middle value
  • If even number of values: average of two middle values

Step 3: Find Q1

  • Median of lower half (below Q2)

Step 4: Find Q3

  • Median of upper half (above Q2)

Step 5: Find minimum and maximum

  • Smallest and largest values

Example 1: 3, 7, 8, 10, 12, 15, 18, 20, 21

Already ordered n = 9 (odd)

Median (Q2): 5th value = 12

Lower half: 3, 7, 8, 10 Q1: Average of 7 and 8 = 7.5

Upper half: 15, 18, 20, 21 Q3: Average of 18 and 20 = 19

Five-number summary: Min: 3, Q1: 7.5, Median: 12, Q3: 19, Max: 21

Example 2: 5, 8, 10, 12, 15, 18

n = 6 (even)

Median: Average of 10 and 12 = 11

Lower half: 5, 8, 10 Q1: 8

Upper half: 12, 15, 18 Q3: 15

Five-number summary: Min: 5, Q1: 8, Median: 11, Q3: 15, Max: 18

Drawing a Box Plot

Step 1: Draw a number line with appropriate scale

Step 2: Mark the five-number summary above the line

Step 3: Draw a box from Q1 to Q3

Step 4: Draw a vertical line at the median inside the box

Step 5: Draw whiskers from box to min and max

Example: Five-number summary: 2, 5, 8, 12, 16

Number line from 0 to 20 Box from 5 to 12 Line at 8 inside box Left whisker from 5 to 2 Right whisker from 12 to 16

Parts of a Box Plot

The Box:

  • Left edge: Q1
  • Right edge: Q3
  • Line inside: Median
  • Width of box: Interquartile Range (IQR)

The Whiskers:

  • Left whisker: From Q1 to minimum
  • Right whisker: From Q3 to maximum
  • Show range of lower and upper 25% of data

Important: 50% of data is inside the box!

Interquartile Range (IQR)

IQR = Q3 - Q1

Meaning: Middle 50% of data spread

Example: Q1 = 6, Q3 = 14

IQR = 14 - 6 = 8

Use: Measure of spread (variation)

Larger IQR = more spread out data Smaller IQR = more concentrated data

Reading Information from Box Plots

1. Center (Median): Where is the line inside the box?

2. Spread (Range and IQR): How far do whiskers extend? How wide is the box?

3. Symmetry: Is median in center of box? Are whiskers equal length?

4. Skewness: If right whisker longer → right-skewed (positive skew) If left whisker longer → left-skewed (negative skew)

Example: Box plot with:

  • Longer right whisker
  • Median closer to Q1

This is right-skewed (tail to the right) Most data on lower end

Outliers in Box Plots

Outlier: Value unusually far from the rest

Rule: A value is an outlier if:

  • Less than Q1 - 1.5(IQR), OR
  • Greater than Q3 + 1.5(IQR)

Example: Q1 = 8, Q3 = 16, IQR = 8

Lower boundary: 8 - 1.5(8) = 8 - 12 = -4 Upper boundary: 16 + 1.5(8) = 16 + 12 = 28

Any value below -4 or above 28 is an outlier

Displaying outliers:

  • Mark with individual points (dots or asterisks)
  • Draw whiskers to last non-outlier value

Example data: 5, 7, 9, 11, 13, 15, 40

40 is an outlier (way above the rest)

  • Draw whisker to 15 (last non-outlier)
  • Mark 40 as separate point

Modified Box Plot

Standard box plot: Whiskers extend to min and max

Modified box plot: Whiskers extend to last non-outlier

  • Outliers shown as individual points
  • More accurate representation when outliers present

Use modified when: Data contains outliers

Comparing Box Plots

Multiple box plots on same scale

Can compare:

1. Centers: Which median is higher?

2. Spreads: Which IQR is larger? Which range is larger?

3. Symmetry: Which is more symmetric?

4. Outliers: Which has outliers?

Example: Compare test scores for two classes

Class A: Median = 75, IQR = 10 Class B: Median = 80, IQR = 20

Analysis:

  • Class B has higher median (better average)
  • Class A has smaller IQR (more consistent)
  • Class B more variable (some very high, some very low)

Advantages of Box Plots

1. Show five-number summary visually

2. Easy to compare multiple groups

3. Clearly identify outliers

4. Show skewness

5. Good for large data sets

6. Compact display

Disadvantages of Box Plots

1. Don't show individual values (except outliers)

2. Don't show frequency (how many at each value)

3. Don't show gaps in data

4. Can hide multiple modes (bimodal data)

5. Arbitrary outlier rule (1.5 IQR is convention)

Better for: Overall distribution and comparison Not as good for: Detailed frequency information

Creating Box Plot from Frequency Table

Example:

Value | Frequency ------|---------- 10 | 2 15 | 3 20 | 4 25 | 2 30 | 1

Step 1: List all values in order 10, 10, 15, 15, 15, 20, 20, 20, 20, 25, 25, 30

Step 2: Find five-number summary n = 12 Min: 10 Q1: 15 (median of first 6) Median: Average of 6th and 7th = (15+20)/2 = 17.5 Q3: 25 (median of last 6) Max: 30

Step 3: Draw box plot using these values

Percentiles and Box Plots

Box plot divides data into four parts (quartiles):

0% to 25%: Below Q1 (left whisker) 25% to 50%: Q1 to Median (left half of box) 50% to 75%: Median to Q3 (right half of box) 75% to 100%: Above Q3 (right whisker)

Each section contains 25% of the data!

Example: If there are 20 data points:

  • 5 values below Q1
  • 5 values from Q1 to median
  • 5 values from median to Q3
  • 5 values above Q3

Skewness from Box Plots

Symmetric:

  • Median in center of box
  • Equal whisker lengths
  • Data evenly distributed

Right-skewed (positively skewed):

  • Right whisker longer than left
  • Median closer to Q1
  • Tail extends to the right
  • Example: Income data (few very high earners)

Left-skewed (negatively skewed):

  • Left whisker longer than right
  • Median closer to Q3
  • Tail extends to the left
  • Example: Test scores (few very low scores)

Real-World Applications

1. Comparing groups: Test scores across different classes Salaries across different companies Heights across different age groups

2. Quality control: Identify defective products (outliers) Monitor consistency (IQR)

3. Scientific data: Compare experimental results Analyze measurement variation

4. Sports statistics: Compare player performance Analyze team statistics

5. Business: Sales data across regions Customer satisfaction scores

Example Problem: Complete Analysis

Data: Daily temperatures (°F) for two weeks 68, 70, 72, 74, 75, 76, 78, 80, 81, 82, 83, 85, 88, 90

Find five-number summary:

Min: 68 Q1: 73 (average of 72 and 74) Median: 78.5 (average of 78 and 80) Q3: 83.5 (average of 83 and 85) Max: 90

Find IQR: IQR = 83.5 - 73 = 10.5

Check for outliers: Lower boundary: 73 - 1.5(10.5) = 73 - 15.75 = 57.25 Upper boundary: 83.5 + 1.5(10.5) = 83.5 + 15.75 = 99.25

No outliers (all data between 57.25 and 99.25)

Describe distribution:

  • Right-skewed (right whisker slightly longer)
  • No outliers
  • IQR of 10.5 shows moderate variation
  • Median of 78.5 is typical temperature

Double Box Plots

Two box plots on same scale for comparison

Example: Boys vs. Girls test scores

Boys: Min 60, Q1 70, Med 78, Q3 85, Max 92 Girls: Min 65, Q1 75, Med 82, Q3 88, Max 95

Draw both on same number line (vertically stacked)

Compare:

  • Girls have higher median (82 vs 78)
  • Girls have slightly larger IQR (13 vs 15)
  • Girls have higher minimum and maximum
  • Overall, girls performed better

Common Mistakes to Avoid

  1. Not ordering data first Must arrange in order before finding quartiles!

  2. Confusing median and mean Box plot uses median, not mean

  3. Wrong quartile calculation Different methods exist, be consistent

  4. Misidentifying outliers Use 1.5 IQR rule correctly

  5. Drawing to scale incorrectly Number line must be evenly spaced

  6. Forgetting to label Always label number line and title graph

  7. Misreading whiskers Whiskers go to actual min/max (or last non-outlier)

Box Plot vs Other Displays

Box Plot vs Histogram:

  • Box plot: Shows five-number summary, quartiles
  • Histogram: Shows frequency, shape of distribution

Box Plot vs Dot Plot:

  • Box plot: Summary, good for large data
  • Dot plot: Individual values, good for small data

Box Plot vs Stem-and-Leaf:

  • Box plot: Visual summary
  • Stem-and-leaf: Preserves actual values

Use box plot when: Comparing groups, showing quartiles, large data sets

Technology for Box Plots

Graphing calculators:

  • TI-84: STAT → PLOT → Modified Box Plot
  • Enter data in lists
  • Adjust window
  • TRACE to see five-number summary

Software:

  • Excel: Insert → Chart → Box and Whisker
  • Google Sheets: Similar feature
  • Online tools: Many free box plot generators

Advantages: Quick, accurate, can handle large data sets

Quick Reference

Five-Number Summary: Min, Q1, Median, Q3, Max

IQR: Q3 - Q1 (middle 50% spread)

Outlier Rule: Below Q1 - 1.5(IQR) or above Q3 + 1.5(IQR)

Box: From Q1 to Q3 (contains middle 50%)

Whiskers: From box to min and max (or last non-outlier)

Median line: Inside box

Skewness:

  • Right-skewed: Right whisker longer
  • Left-skewed: Left whisker longer
  • Symmetric: Whiskers roughly equal

Practice Tips

  • Always order data first
  • Practice finding quartiles with odd and even data sets
  • Draw to scale carefully
  • Label all parts clearly
  • Check for outliers using 1.5 IQR rule
  • Compare multiple box plots for practice
  • Understand what each part represents
  • Relate to percentiles (25%, 50%, 75%)
  • Practice reading and creating box plots
  • Connect to real-world contexts
  • Use technology to verify hand calculations
  • Remember: 50% of data is in the box!
  • Practice identifying skewness
  • Work with both standard and modified box plots

Box plots are powerful tools for understanding data distribution and making comparisons. Master this skill and you'll have a valuable technique for analyzing data in statistics, science, and many other fields!

📚 Practice Problems

1Problem 1easy

Question:

Find the five-number summary for: 3, 7, 8, 12, 13, 15, 18, 21, 23

💡 Show Solution

Step 1: Arrange data in order (already done): 3, 7, 8, 12, 13, 15, 18, 21, 23

Step 2: Find the minimum and maximum: Minimum = 3 Maximum = 23

Step 3: Find the median (Q2): There are 9 values, so the median is the 5th value. Median (Q2) = 13

Step 4: Find Q1 (median of lower half): Lower half: 3, 7, 8, 12 Q1 = (7 + 8)/2 = 7.5

Step 5: Find Q3 (median of upper half): Upper half: 15, 18, 21, 23 Q3 = (18 + 21)/2 = 19.5

Five-number summary: Min = 3, Q1 = 7.5, Q2 = 13, Q3 = 19.5, Max = 23

2Problem 2easy

Question:

Calculate the interquartile range (IQR) for a data set with Q1 = 12 and Q3 = 28.

💡 Show Solution

Step 1: Recall the IQR formula: IQR = Q3 - Q1

Step 2: Substitute the values: IQR = 28 - 12

Step 3: Calculate: IQR = 16

Step 4: Interpret: The IQR is 16, which means the middle 50% of the data spans 16 units. This measures the spread of the middle half of the data.

Answer: IQR = 16

3Problem 3medium

Question:

For a data set with Q1 = 20, Q3 = 35, determine if a value of 60 is an outlier.

💡 Show Solution

Step 1: Calculate the IQR: IQR = Q3 - Q1 = 35 - 20 = 15

Step 2: Calculate the outlier boundaries using the 1.5 × IQR rule: Lower boundary = Q1 - 1.5(IQR) = 20 - 1.5(15) = 20 - 22.5 = -2.5 Upper boundary = Q3 + 1.5(IQR) = 35 + 1.5(15) = 35 + 22.5 = 57.5

Step 3: Check if 60 is outside these boundaries: 60 > 57.5, so 60 is above the upper boundary.

Step 4: Conclusion: Yes, 60 is an outlier because it exceeds the upper boundary.

Any value below -2.5 or above 57.5 would be considered an outlier.

Answer: Yes, 60 is an outlier

4Problem 4medium

Question:

A box plot shows Min = 5, Q1 = 12, Q2 = 18, Q3 = 25, Max = 40. Describe the distribution.

💡 Show Solution

Step 1: Calculate the IQR: IQR = Q3 - Q1 = 25 - 12 = 13

Step 2: Compare distances from median to quartiles: Distance from Q2 to Q1: 18 - 12 = 6 Distance from Q2 to Q3: 25 - 18 = 7 These are roughly equal (6 ≈ 7)

Step 3: Compare whisker lengths: Lower whisker (Q1 to Min): 12 - 5 = 7 Upper whisker (Max to Q3): 40 - 25 = 15 The upper whisker is longer.

Step 4: Determine skewness: Since the upper whisker is longer than the lower whisker, and the distances are fairly symmetric around the median, the distribution is slightly right-skewed (positively skewed).

Step 5: Additional observations:

  • The box (IQR = 13) shows where the middle 50% of data lies
  • Range = 40 - 5 = 35
  • No obvious outliers mentioned

Answer: The distribution is approximately symmetric with a slight right skew. The middle 50% of data spans from 12 to 25.

5Problem 5hard

Question:

Create a box plot for: 2, 4, 6, 7, 9, 10, 12, 15, 18, 20, 24. Identify any outliers.

💡 Show Solution

Step 1: Data is already in order. Find five-number summary: Min = 2 Q1 = 6 (median of lower half: 2, 4, 6, 7, 9) Q2 = 10 (median of all: 6th value) Q3 = 18 (median of upper half: 12, 15, 18, 20, 24) Max = 24

Step 2: Calculate IQR: IQR = Q3 - Q1 = 18 - 6 = 12

Step 3: Calculate outlier boundaries: Lower: Q1 - 1.5(IQR) = 6 - 1.5(12) = 6 - 18 = -12 Upper: Q3 + 1.5(IQR) = 18 + 1.5(12) = 18 + 18 = 36

Step 4: Check for outliers: All values (2, 4, 6, 7, 9, 10, 12, 15, 18, 20, 24) are between -12 and 36. No outliers exist.

Step 5: Draw the box plot:

  • Draw a number line from 0 to 25
  • Draw a box from Q1 (6) to Q3 (18)
  • Draw a vertical line at the median Q2 (10) inside the box
  • Draw a whisker from the box to Min (2)
  • Draw a whisker from the box to Max (24)

Answer: Five-number summary: 2, 6, 10, 18, 24. No outliers.