Displaying Distributions with Graphs

Introduction

"A picture is worth a thousand words" — especially in statistics! Graphs help us visualize data distributions, identify patterns, spot outliers, and communicate findings effectively. Choosing the right graph type depends on your data type and what you want to show.

Graphs for Categorical Data

Bar Graph (Bar Chart)

Purpose: Compare frequencies or percentages across categories

Structure:

Categorical variable on x-axis
Frequency or percentage on y-axis
Bars have gaps between them (not touching)
Heights represent frequencies

When to use:

Categorical data
Comparing categories
Showing frequencies or percentages

Example: Favorite ice cream flavors among students

Chocolate: 45 students
Vanilla: 32 students
Strawberry: 18 students
Other: 15 students

Key features:

Bars can be ordered (by frequency) or kept in natural order
Easy to compare categories visually
Clear and simple

Pie Chart

Purpose: Show parts of a whole

Structure:

Circle divided into slices
Each slice represents a category
Slice size proportional to percentage

When to use:

Want to show proportions
Have relatively few categories (3-6 ideal)
Emphasizing "part of whole" relationship

Example: Student transportation methods

Bus: 40%
Car: 30%
Walk: 20%
Bike: 10%

Advantages:

Shows proportions clearly
Visually appealing
Good for presentations

Disadvantages:

Hard to compare similar-sized slices
Difficult with many categories
Can be misleading with 3D effects

Segmented Bar Chart

Purpose: Compare distributions across multiple groups

Structure:

Bars divided into segments
Each segment represents a category
Can show counts or percentages

When to use:

Comparing categorical distributions across groups
Two categorical variables
Want to see both totals and breakdowns

Example: Transportation method by grade level

Each grade has a bar
Bars divided by transportation type
Can compare both across and within grades

Graphs for Quantitative Data

Dotplot

Purpose: Display individual values for small to moderate datasets

Structure:

Number line showing possible values
Dot for each observation
Dots stack when values repeat

When to use:

Small datasets (n < 50)
Want to see individual values
Looking for clusters, gaps, outliers

Example: Test scores: 75, 80, 80, 82, 85, 85, 85, 90, 95

Stack three dots above 85
Stack two dots above 80
Single dots for 75, 82, 90, 95

Advantages:

Shows every data point
Easy to create
Good for small datasets

Disadvantages:

Impractical for large datasets
Can become cluttered

Stemplot (Stem-and-Leaf Plot)

Purpose: Display data while retaining actual values

Structure:

Split each value into "stem" (leading digit(s)) and "leaf" (trailing digit)
Stems listed vertically
Leaves listed horizontally

When to use:

Small to moderate datasets
Want to preserve actual data values
Quick hand-drawn analysis

Example: Test scores: 67, 72, 75, 78, 81, 83, 85, 85, 92

Stem  Leaf
6     7
7     2 5 8
8     1 3 5 5
9     2

Key: 7 | 2 represents 72

Back-to-back stemplot: Compare two distributions

Shared stems in middle
One dataset's leaves on left
Other dataset's leaves on right

Advantages:

Retains actual values
Shows distribution shape
Can reconstruct original data

Disadvantages:

Tedious for large datasets
Choice of stems affects appearance

Histogram

Purpose: Display distribution of continuous data

Structure:

Quantitative variable on x-axis (divided into bins)
Frequency or relative frequency on y-axis
Bars touching (continuous data)
Bar height = frequency in that interval

When to use:

Large datasets
Continuous or discrete quantitative data
Want to see distribution shape

Example: Heights of students (in inches)

60-62: 5 students
62-64: 12 students
64-66: 23 students
66-68: 18 students
68-70: 8 students

Important considerations:

Bin width:

Too narrow → choppy, hard to see pattern
Too wide → lose detail, miss features
Experiment to find appropriate width

Number of bins:

General rule: $\sqrt{n}$ or $\log_2(n) + 1$
5-20 bins usually works well
More data → can use more bins

Advantages:

Shows distribution shape clearly
Handles large datasets
Identifies outliers, gaps, clusters

Disadvantages:

Appearance depends on bin choices
Loses individual data values
Can mislead if bins chosen poorly

Boxplot (Box-and-Whisker Plot)

Purpose: Display five-number summary and identify outliers

Structure:

Box from Q1 to Q3 (contains middle 50%)
Line at median inside box
Whiskers extend to min and max (excluding outliers)
Outliers plotted individually

Five-number summary:

Minimum (excluding outliers)
Q1 (first quartile, 25th percentile)
Median (50th percentile)
Q3 (third quartile, 75th percentile)
Maximum (excluding outliers)

Outlier definition:

Below: $Q1 - 1.5 \times IQR$
Above: $Q3 + 1.5 \times IQR$
Where $IQR = Q3 - Q1$

When to use:

Comparing multiple distributions
Identifying outliers
Showing spread and center
Large datasets

Modified boxplot:

Whiskers go to last value within 1.5 × IQR
Outliers plotted as individual points
More informative than regular boxplot

Side-by-side boxplots:

Compare distributions across groups
Same scale for all boxes
Easy to see differences in center, spread, shape

Advantages:

Compact display
Shows spread clearly
Easy to compare groups
Identifies outliers automatically

Disadvantages:

Doesn't show distribution shape well
Can hide bimodality or other features
Less detail than histogram

Cumulative Frequency Plot (Ogive)

Purpose: Show cumulative frequencies or percentages

Structure:

Data values on x-axis
Cumulative frequency/percentage on y-axis
Line connects points
Always increasing (or flat)

When to use:

Want to find percentiles
Show how data accumulates
Identify median and quartiles

Uses:

Read off percentiles directly
See what percentage falls below a value
Identify quartile locations

Describing Distributions (SOCS)

When analyzing any graph, describe using SOCS:

S - Shape

Symmetric: Balanced around center (mirror image)

Normal (bell-shaped)
Uniform (flat, rectangular)

Skewed:

Right-skewed (positive): Tail extends to right, mean > median
Left-skewed (negative): Tail extends to left, mean < median

Modality:

Unimodal: One peak
Bimodal: Two peaks
Multimodal: Multiple peaks
Uniform: No peaks

O - Outliers

Outliers: Observations unusually far from bulk of data

Identify:

Visual inspection (far from others)
1.5 × IQR rule (for boxplots)
More than 2-3 standard deviations from mean

Report:

Note presence
Give values if possible
Consider causes (error? legitimate?)

C - Center

Typical value: Where data tends to cluster

Measures:

Median (middle value)
Mean (average)
Mode (most common)

In description: "The center is around [value]" or "The median is [value]"

S - Spread

Variability: How spread out data is

Measures:

Range (max - min)
IQR (Q3 - Q1)
Standard deviation

In description: "Values range from [min] to [max]" or "Most values fall between [Q1] and [Q3]"

Choosing the Right Graph

Decision Guide

Categorical data:

Few categories, show proportions → Pie chart
Compare categories → Bar graph
Compare across groups → Segmented bar chart

Quantitative data:

Small dataset (n < 30) → Dotplot or stemplot
Show distribution shape → Histogram
Compare groups → Side-by-side boxplots
Identify outliers → Boxplot
Find percentiles → Cumulative frequency plot

Common Mistakes to Avoid

❌ Pie charts for quantitative data
❌ 3D or decorative effects (distort perception)
❌ Inconsistent scales when comparing
❌ Too many/too few bins in histograms
❌ Bar graph with touching bars (that's a histogram!)
❌ Missing labels on axes
❌ No scale on axes

Best Practices

✓ Label axes clearly with variable names and units
✓ Include title describing what graph shows
✓ Use consistent scales when comparing
✓ Choose appropriate graph type for data
✓ Make it readable (not too small, cluttered)
✓ Describe using SOCS in analysis
✓ Note any outliers or unusual features

Quick Reference

Graph Selection:

Categorical: Bar graph or pie chart
Small quantitative: Dotplot or stemplot
Large quantitative: Histogram or boxplot
Comparisons: Side-by-side boxplots or segmented bar charts
Percentiles: Cumulative frequency plot

SOCS Description:

Shape: symmetric, skewed (left/right), unimodal/bimodal
Outliers: identify and report
Center: median, mean
Spread: range, IQR, standard deviation

Remember: The best graph clearly communicates the story in your data. When in doubt, try multiple types and choose the one that reveals the most!

Displaying Distributions with Graphs

Displaying Distributions with Graphs

Introduction

Graphs for Categorical Data

Bar Graph (Bar Chart)

Pie Chart

Segmented Bar Chart

Graphs for Quantitative Data

Dotplot

Stemplot (Stem-and-Leaf Plot)

Histogram

Boxplot (Box-and-Whisker Plot)

Cumulative Frequency Plot (Ogive)

Describing Distributions (SOCS)

S - Shape

O - Outliers

C - Center

S - Spread

Choosing the Right Graph

Decision Guide

Common Mistakes to Avoid

Best Practices

Quick Reference

📚 Practice Problems

Practice with Flashcards

Browse All Topics