Displaying Distributions with Graphs
Histograms, dot plots, stem plots, and boxplots
Displaying Distributions with Graphs
Introduction
"A picture is worth a thousand words" — especially in statistics! Graphs help us visualize data distributions, identify patterns, spot outliers, and communicate findings effectively. Choosing the right graph type depends on your data type and what you want to show.
Graphs for Categorical Data
Bar Graph (Bar Chart)
Purpose: Compare frequencies or percentages across categories
Structure:
- Categorical variable on x-axis
- Frequency or percentage on y-axis
- Bars have gaps between them (not touching)
- Heights represent frequencies
When to use:
- Categorical data
- Comparing categories
- Showing frequencies or percentages
Example: Favorite ice cream flavors among students
- Chocolate: 45 students
- Vanilla: 32 students
- Strawberry: 18 students
- Other: 15 students
Key features:
- Bars can be ordered (by frequency) or kept in natural order
- Easy to compare categories visually
- Clear and simple
Pie Chart
Purpose: Show parts of a whole
Structure:
- Circle divided into slices
- Each slice represents a category
- Slice size proportional to percentage
When to use:
- Want to show proportions
- Have relatively few categories (3-6 ideal)
- Emphasizing "part of whole" relationship
Example: Student transportation methods
- Bus: 40%
- Car: 30%
- Walk: 20%
- Bike: 10%
Advantages:
- Shows proportions clearly
- Visually appealing
- Good for presentations
Disadvantages:
- Hard to compare similar-sized slices
- Difficult with many categories
- Can be misleading with 3D effects
Segmented Bar Chart
Purpose: Compare distributions across multiple groups
Structure:
- Bars divided into segments
- Each segment represents a category
- Can show counts or percentages
When to use:
- Comparing categorical distributions across groups
- Two categorical variables
- Want to see both totals and breakdowns
Example: Transportation method by grade level
- Each grade has a bar
- Bars divided by transportation type
- Can compare both across and within grades
Graphs for Quantitative Data
Dotplot
Purpose: Display individual values for small to moderate datasets
Structure:
- Number line showing possible values
- Dot for each observation
- Dots stack when values repeat
When to use:
- Small datasets (n < 50)
- Want to see individual values
- Looking for clusters, gaps, outliers
Example: Test scores: 75, 80, 80, 82, 85, 85, 85, 90, 95
- Stack three dots above 85
- Stack two dots above 80
- Single dots for 75, 82, 90, 95
Advantages:
- Shows every data point
- Easy to create
- Good for small datasets
Disadvantages:
- Impractical for large datasets
- Can become cluttered
Stemplot (Stem-and-Leaf Plot)
Purpose: Display data while retaining actual values
Structure:
- Split each value into "stem" (leading digit(s)) and "leaf" (trailing digit)
- Stems listed vertically
- Leaves listed horizontally
When to use:
- Small to moderate datasets
- Want to preserve actual data values
- Quick hand-drawn analysis
Example: Test scores: 67, 72, 75, 78, 81, 83, 85, 85, 92
Stem Leaf
6 7
7 2 5 8
8 1 3 5 5
9 2
Key: 7 | 2 represents 72
Back-to-back stemplot: Compare two distributions
- Shared stems in middle
- One dataset's leaves on left
- Other dataset's leaves on right
Advantages:
- Retains actual values
- Shows distribution shape
- Can reconstruct original data
Disadvantages:
- Tedious for large datasets
- Choice of stems affects appearance
Histogram
Purpose: Display distribution of continuous data
Structure:
- Quantitative variable on x-axis (divided into bins)
- Frequency or relative frequency on y-axis
- Bars touching (continuous data)
- Bar height = frequency in that interval
When to use:
- Large datasets
- Continuous or discrete quantitative data
- Want to see distribution shape
Example: Heights of students (in inches)
- 60-62: 5 students
- 62-64: 12 students
- 64-66: 23 students
- 66-68: 18 students
- 68-70: 8 students
Important considerations:
Bin width:
- Too narrow → choppy, hard to see pattern
- Too wide → lose detail, miss features
- Experiment to find appropriate width
Number of bins:
- General rule: or
- 5-20 bins usually works well
- More data → can use more bins
Advantages:
- Shows distribution shape clearly
- Handles large datasets
- Identifies outliers, gaps, clusters
Disadvantages:
- Appearance depends on bin choices
- Loses individual data values
- Can mislead if bins chosen poorly
Boxplot (Box-and-Whisker Plot)
Purpose: Display five-number summary and identify outliers
Structure:
- Box from Q1 to Q3 (contains middle 50%)
- Line at median inside box
- Whiskers extend to min and max (excluding outliers)
- Outliers plotted individually
Five-number summary:
- Minimum (excluding outliers)
- Q1 (first quartile, 25th percentile)
- Median (50th percentile)
- Q3 (third quartile, 75th percentile)
- Maximum (excluding outliers)
Outlier definition:
- Below:
- Above:
- Where
When to use:
- Comparing multiple distributions
- Identifying outliers
- Showing spread and center
- Large datasets
Modified boxplot:
- Whiskers go to last value within 1.5 × IQR
- Outliers plotted as individual points
- More informative than regular boxplot
Side-by-side boxplots:
- Compare distributions across groups
- Same scale for all boxes
- Easy to see differences in center, spread, shape
Advantages:
- Compact display
- Shows spread clearly
- Easy to compare groups
- Identifies outliers automatically
Disadvantages:
- Doesn't show distribution shape well
- Can hide bimodality or other features
- Less detail than histogram
Cumulative Frequency Plot (Ogive)
Purpose: Show cumulative frequencies or percentages
Structure:
- Data values on x-axis
- Cumulative frequency/percentage on y-axis
- Line connects points
- Always increasing (or flat)
When to use:
- Want to find percentiles
- Show how data accumulates
- Identify median and quartiles
Uses:
- Read off percentiles directly
- See what percentage falls below a value
- Identify quartile locations
Describing Distributions (SOCS)
When analyzing any graph, describe using SOCS:
S - Shape
Symmetric: Balanced around center (mirror image)
- Normal (bell-shaped)
- Uniform (flat, rectangular)
Skewed:
- Right-skewed (positive): Tail extends to right, mean > median
- Left-skewed (negative): Tail extends to left, mean < median
Modality:
- Unimodal: One peak
- Bimodal: Two peaks
- Multimodal: Multiple peaks
- Uniform: No peaks
O - Outliers
Outliers: Observations unusually far from bulk of data
Identify:
- Visual inspection (far from others)
- 1.5 × IQR rule (for boxplots)
- More than 2-3 standard deviations from mean
Report:
- Note presence
- Give values if possible
- Consider causes (error? legitimate?)
C - Center
Typical value: Where data tends to cluster
Measures:
- Median (middle value)
- Mean (average)
- Mode (most common)
In description: "The center is around [value]" or "The median is [value]"
S - Spread
Variability: How spread out data is
Measures:
- Range (max - min)
- IQR (Q3 - Q1)
- Standard deviation
In description: "Values range from [min] to [max]" or "Most values fall between [Q1] and [Q3]"
Choosing the Right Graph
Decision Guide
Categorical data:
- Few categories, show proportions → Pie chart
- Compare categories → Bar graph
- Compare across groups → Segmented bar chart
Quantitative data:
- Small dataset (n < 30) → Dotplot or stemplot
- Show distribution shape → Histogram
- Compare groups → Side-by-side boxplots
- Identify outliers → Boxplot
- Find percentiles → Cumulative frequency plot
Common Mistakes to Avoid
❌ Pie charts for quantitative data
❌ 3D or decorative effects (distort perception)
❌ Inconsistent scales when comparing
❌ Too many/too few bins in histograms
❌ Bar graph with touching bars (that's a histogram!)
❌ Missing labels on axes
❌ No scale on axes
Best Practices
✓ Label axes clearly with variable names and units
✓ Include title describing what graph shows
✓ Use consistent scales when comparing
✓ Choose appropriate graph type for data
✓ Make it readable (not too small, cluttered)
✓ Describe using SOCS in analysis
✓ Note any outliers or unusual features
Quick Reference
Graph Selection:
- Categorical: Bar graph or pie chart
- Small quantitative: Dotplot or stemplot
- Large quantitative: Histogram or boxplot
- Comparisons: Side-by-side boxplots or segmented bar charts
- Percentiles: Cumulative frequency plot
SOCS Description:
- Shape: symmetric, skewed (left/right), unimodal/bimodal
- Outliers: identify and report
- Center: median, mean
- Spread: range, IQR, standard deviation
Remember: The best graph clearly communicates the story in your data. When in doubt, try multiple types and choose the one that reveals the most!
📚 Practice Problems
1Problem 1easy
❓ Question:
What type of graph would be most appropriate for displaying: a) The distribution of test scores (0-100) for a class b) The number of students in each major at a university c) The relationship between study hours and exam scores
💡 Show Solution
Step 1: Match data type to graph type
a) Test scores (0-100) - Quantitative, continuous Best choice: HISTOGRAM
- Shows distribution shape
- Can see center, spread, outliers Alternative: Boxplot, Dotplot (for small datasets)
b) Number of students in each major - Categorical Best choice: BAR GRAPH
- Each major is a category
- Height shows frequency/count
- Bars should NOT touch (categorical)
c) Study hours vs exam scores - Two quantitative variables Best choice: SCATTERPLOT
- Shows relationship between two quantitative variables
- Each point represents one student
- Can assess correlation
Answer: a) Histogram b) Bar graph c) Scatterplot
2Problem 2easy
❓ Question:
Given this data on ages of 20 people: 18, 19, 19, 20, 20, 20, 21, 21, 22, 22, 23, 23, 24, 25, 26, 27, 30, 35, 40, 55. Create a stemplot (stem-and-leaf plot) for this data.
💡 Show Solution
Step 1: Organize by stems (tens place) Stem = tens digit Leaf = ones digit
Step 2: List all data points by stem 1|8, 9, 9 2|0, 0, 0, 1, 1, 2, 2, 3, 3, 4, 5, 6, 7 3|0, 5 4|0 5|5
Step 3: Create the stemplot with key
Stem-and-Leaf Plot: 1 | 8 9 9 2 | 0 0 0 1 1 2 2 3 3 4 5 6 7 3 | 0 5 4 | 0 5 | 5
Key: 1|8 = 18 years old
Step 4: Observations
- Most people in their 20s (heavily concentrated)
- Few outliers in 40s and 50s
- Roughly symmetric in the 18-27 range
- Gap between 30 and 35, and after 40
Answer: See stemplot above
3Problem 3medium
❓ Question:
The following histogram shows test scores. Describe the shape, center, and spread of the distribution. [Histogram with bins: 50-59(2), 60-69(5), 70-79(12), 80-89(8), 90-99(3)]
💡 Show Solution
Step 1: Determine the shape Look at the overall pattern:
- Peak at 70-79 (most frequent)
- Decreases on both sides of peak
- Roughly symmetric, slight left skew
- One mode (unimodal)
Shape: Roughly symmetric, unimodal, slightly skewed left
Step 2: Estimate the center Peak bin: 70-79 Most data in 70-89 range Approximate mean/median: around 75-77
Step 3: Describe the spread Range: 50 to 99 (approximately 50 points) Most data spans about 30-40 points (60-90) Variability: Moderate spread
Step 4: Look for unusual features
- Small tail on left (50s and 60s)
- Very few extreme scores
- No major outliers
- Gap in very low scores (no scores below 50)
Answer: Shape: Unimodal, roughly symmetric with slight left skew Center: Around 75-77 Spread: Scores range from 50s to 90s, with most between 60-90 Unusual: Small left tail, no scores below 50
4Problem 4medium
❓ Question:
Compare using histograms vs. boxplots. What are the advantages and disadvantages of each for displaying distributions?
💡 Show Solution
HISTOGRAMS:
Advantages:
- Show the shape of distribution clearly
- Can see multiple modes (bimodal, multimodal)
- Display frequency/count information
- Show gaps in data
- Can see actual data density
Disadvantages:
- Appearance depends on bin width choice
- Harder to compare multiple distributions
- Don't show specific summary statistics
- Take more space for multiple groups
BOXPLOTS:
Advantages:
- Show 5-number summary clearly (min, Q1, median, Q3, max)
- Excellent for comparing multiple distributions side-by-side
- Clearly identify outliers
- Compact representation
- Good for large datasets
Disadvantages:
- Don't show the shape as clearly
- Can't see multiple modes
- Hide detailed distribution features
- Don't show sample size
- Can't see gaps in data
WHEN TO USE EACH:
Use Histogram when:
- Need to see detailed shape
- Checking for normality
- Looking for multiple modes
- Single distribution to display
Use Boxplot when:
- Comparing multiple groups
- Quick summary needed
- Identifying outliers is priority
- Limited space available
Answer: Histograms show shape better; boxplots better for comparisons and outlier detection. Choice depends on analysis goals.
5Problem 5hard
❓ Question:
Create a boxplot for this five-number summary: Min=12, Q1=18, Median=23, Q3=29, Max=45. Then identify if there are any outliers using the 1.5×IQR rule.
💡 Show Solution
Step 1: Calculate IQR IQR = Q3 - Q1 = 29 - 18 = 11
Step 2: Calculate outlier boundaries Lower fence = Q1 - 1.5×IQR = 18 - 1.5(11) = 18 - 16.5 = 1.5
Upper fence = Q3 + 1.5×IQR = 29 + 1.5(11) = 29 + 16.5 = 45.5
Step 3: Identify outliers Any value < 1.5 or > 45.5 is an outlier
Check our values: Min = 12: Is 12 < 1.5? No → Not an outlier Max = 45: Is 45 > 45.5? No → Not an outlier
Step 4: Draw the boxplot No outliers, so whiskers extend to actual min and max
Boxplot: |------[====|====]------| 12 18 23 29 45
Box: From Q1(18) to Q3(29) Line in box: Median(23) Left whisker: To Min(12) Right whisker: To Max(45)
Step 5: Observations
- Median closer to Q1 than Q3 (slight right skew)
- Right whisker longer than left (confirms right skew)
- No outliers
Answer: No outliers. Boxplot shows slight right skew with all data within fences.
Practice with Flashcards
Review key concepts with our flashcard system
Browse All Topics
Explore other calculus topics