Displaying Distributions with Graphs
Histograms, dot plots, stem plots, and boxplots
Displaying Distributions with Graphs
Introduction
"A picture is worth a thousand words" — especially in statistics! Graphs help us visualize data distributions, identify patterns, spot outliers, and communicate findings effectively. Choosing the right graph type depends on your data type and what you want to show.
Graphs for Categorical Data
Bar Graph (Bar Chart)
Purpose: Compare frequencies or percentages across categories
Structure:
- Categorical variable on x-axis
- Frequency or percentage on y-axis
- Bars have gaps between them (not touching)
- Heights represent frequencies
When to use:
- Categorical data
- Comparing categories
- Showing frequencies or percentages
Example: Favorite ice cream flavors among students
- Chocolate: 45 students
- Vanilla: 32 students
- Strawberry: 18 students
- Other: 15 students
Key features:
- Bars can be ordered (by frequency) or kept in natural order
- Easy to compare categories visually
- Clear and simple
Pie Chart
Purpose: Show parts of a whole
Structure:
- Circle divided into slices
- Each slice represents a category
- Slice size proportional to percentage
When to use:
- Want to show proportions
- Have relatively few categories (3-6 ideal)
- Emphasizing "part of whole" relationship
Example: Student transportation methods
- Bus: 40%
- Car: 30%
- Walk: 20%
- Bike: 10%
Advantages:
- Shows proportions clearly
- Visually appealing
- Good for presentations
Disadvantages:
- Hard to compare similar-sized slices
- Difficult with many categories
- Can be misleading with 3D effects
Segmented Bar Chart
Purpose: Compare distributions across multiple groups
Structure:
- Bars divided into segments
- Each segment represents a category
- Can show counts or percentages
When to use:
- Comparing categorical distributions across groups
- Two categorical variables
- Want to see both totals and breakdowns
Example: Transportation method by grade level
- Each grade has a bar
- Bars divided by transportation type
- Can compare both across and within grades
Graphs for Quantitative Data
Dotplot
Purpose: Display individual values for small to moderate datasets
Structure:
- Number line showing possible values
- Dot for each observation
- Dots stack when values repeat
When to use:
- Small datasets (n < 50)
- Want to see individual values
- Looking for clusters, gaps, outliers
Example: Test scores: 75, 80, 80, 82, 85, 85, 85, 90, 95
- Stack three dots above 85
- Stack two dots above 80
- Single dots for 75, 82, 90, 95
Advantages:
- Shows every data point
- Easy to create
- Good for small datasets
Disadvantages:
- Impractical for large datasets
- Can become cluttered
Stemplot (Stem-and-Leaf Plot)
Purpose: Display data while retaining actual values
Structure:
- Split each value into "stem" (leading digit(s)) and "leaf" (trailing digit)
- Stems listed vertically
- Leaves listed horizontally
When to use:
- Small to moderate datasets
- Want to preserve actual data values
- Quick hand-drawn analysis
Example: Test scores: 67, 72, 75, 78, 81, 83, 85, 85, 92
Stem Leaf
6 7
7 2 5 8
8 1 3 5 5
9 2
Key: 7 | 2 represents 72
Back-to-back stemplot: Compare two distributions
- Shared stems in middle
- One dataset's leaves on left
- Other dataset's leaves on right
Advantages:
- Retains actual values
- Shows distribution shape
- Can reconstruct original data
Disadvantages:
- Tedious for large datasets
- Choice of stems affects appearance
Histogram
Purpose: Display distribution of continuous data
Structure:
- Quantitative variable on x-axis (divided into bins)
- Frequency or relative frequency on y-axis
- Bars touching (continuous data)
- Bar height = frequency in that interval
When to use:
- Large datasets
- Continuous or discrete quantitative data
- Want to see distribution shape
Example: Heights of students (in inches)
- 60-62: 5 students
- 62-64: 12 students
- 64-66: 23 students
- 66-68: 18 students
- 68-70: 8 students
Important considerations:
Bin width:
- Too narrow → choppy, hard to see pattern
- Too wide → lose detail, miss features
- Experiment to find appropriate width
Number of bins:
- General rule: or
- 5-20 bins usually works well
- More data → can use more bins
Advantages:
- Shows distribution shape clearly
- Handles large datasets
- Identifies outliers, gaps, clusters
Disadvantages:
- Appearance depends on bin choices
- Loses individual data values
- Can mislead if bins chosen poorly
Boxplot (Box-and-Whisker Plot)
Purpose: Display five-number summary and identify outliers
Structure:
- Box from Q1 to Q3 (contains middle 50%)
- Line at median inside box
- Whiskers extend to min and max (excluding outliers)
- Outliers plotted individually
Five-number summary:
- Minimum (excluding outliers)
- Q1 (first quartile, 25th percentile)
- Median (50th percentile)
- Q3 (third quartile, 75th percentile)
- Maximum (excluding outliers)
Outlier definition:
- Below:
- Above:
- Where
When to use:
- Comparing multiple distributions
- Identifying outliers
- Showing spread and center
- Large datasets
Modified boxplot:
- Whiskers go to last value within 1.5 × IQR
- Outliers plotted as individual points
- More informative than regular boxplot
Side-by-side boxplots:
- Compare distributions across groups
- Same scale for all boxes
- Easy to see differences in center, spread, shape
Advantages:
- Compact display
- Shows spread clearly
- Easy to compare groups
- Identifies outliers automatically
Disadvantages:
- Doesn't show distribution shape well
- Can hide bimodality or other features
- Less detail than histogram
Cumulative Frequency Plot (Ogive)
Purpose: Show cumulative frequencies or percentages
Structure:
- Data values on x-axis
- Cumulative frequency/percentage on y-axis
- Line connects points
- Always increasing (or flat)
When to use:
- Want to find percentiles
- Show how data accumulates
- Identify median and quartiles
Uses:
- Read off percentiles directly
- See what percentage falls below a value
- Identify quartile locations
Describing Distributions (SOCS)
When analyzing any graph, describe using SOCS:
S - Shape
Symmetric: Balanced around center (mirror image)
- Normal (bell-shaped)
- Uniform (flat, rectangular)
Skewed:
- Right-skewed (positive): Tail extends to right, mean > median
- Left-skewed (negative): Tail extends to left, mean < median
Modality:
- Unimodal: One peak
- Bimodal: Two peaks
- Multimodal: Multiple peaks
- Uniform: No peaks
O - Outliers
Outliers: Observations unusually far from bulk of data
Identify:
- Visual inspection (far from others)
- 1.5 × IQR rule (for boxplots)
- More than 2-3 standard deviations from mean
Report:
- Note presence
- Give values if possible
- Consider causes (error? legitimate?)
C - Center
Typical value: Where data tends to cluster
Measures:
- Median (middle value)
- Mean (average)
- Mode (most common)
In description: "The center is around [value]" or "The median is [value]"
S - Spread
Variability: How spread out data is
Measures:
- Range (max - min)
- IQR (Q3 - Q1)
- Standard deviation
In description: "Values range from [min] to [max]" or "Most values fall between [Q1] and [Q3]"
Choosing the Right Graph
Decision Guide
Categorical data:
- Few categories, show proportions → Pie chart
- Compare categories → Bar graph
- Compare across groups → Segmented bar chart
Quantitative data:
- Small dataset (n < 30) → Dotplot or stemplot
- Show distribution shape → Histogram
- Compare groups → Side-by-side boxplots
- Identify outliers → Boxplot
- Find percentiles → Cumulative frequency plot
Common Mistakes to Avoid
❌ Pie charts for quantitative data
❌ 3D or decorative effects (distort perception)
❌ Inconsistent scales when comparing
❌ Too many/too few bins in histograms
❌ Bar graph with touching bars (that's a histogram!)
❌ Missing labels on axes
❌ No scale on axes
Best Practices
✓ Label axes clearly with variable names and units
✓ Include title describing what graph shows
✓ Use consistent scales when comparing
✓ Choose appropriate graph type for data
✓ Make it readable (not too small, cluttered)
✓ Describe using SOCS in analysis
✓ Note any outliers or unusual features
Quick Reference
Graph Selection:
- Categorical: Bar graph or pie chart
- Small quantitative: Dotplot or stemplot
- Large quantitative: Histogram or boxplot
- Comparisons: Side-by-side boxplots or segmented bar charts
- Percentiles: Cumulative frequency plot
SOCS Description:
- Shape: symmetric, skewed (left/right), unimodal/bimodal
- Outliers: identify and report
- Center: median, mean
- Spread: range, IQR, standard deviation
Remember: The best graph clearly communicates the story in your data. When in doubt, try multiple types and choose the one that reveals the most!
📚 Practice Problems
No example problems available yet.
Practice with Flashcards
Review key concepts with our flashcard system
Browse All Topics
Explore other calculus topics