Displaying Distributions with Graphs

Histograms, dot plots, stem plots, and boxplots

Displaying Distributions with Graphs

Introduction

"A picture is worth a thousand words" — especially in statistics! Graphs help us visualize data distributions, identify patterns, spot outliers, and communicate findings effectively. Choosing the right graph type depends on your data type and what you want to show.

Graphs for Categorical Data

Bar Graph (Bar Chart)

Purpose: Compare frequencies or percentages across categories

Structure:

  • Categorical variable on x-axis
  • Frequency or percentage on y-axis
  • Bars have gaps between them (not touching)
  • Heights represent frequencies

When to use:

  • Categorical data
  • Comparing categories
  • Showing frequencies or percentages

Example: Favorite ice cream flavors among students

  • Chocolate: 45 students
  • Vanilla: 32 students
  • Strawberry: 18 students
  • Other: 15 students

Key features:

  • Bars can be ordered (by frequency) or kept in natural order
  • Easy to compare categories visually
  • Clear and simple

Pie Chart

Purpose: Show parts of a whole

Structure:

  • Circle divided into slices
  • Each slice represents a category
  • Slice size proportional to percentage

When to use:

  • Want to show proportions
  • Have relatively few categories (3-6 ideal)
  • Emphasizing "part of whole" relationship

Example: Student transportation methods

  • Bus: 40%
  • Car: 30%
  • Walk: 20%
  • Bike: 10%

Advantages:

  • Shows proportions clearly
  • Visually appealing
  • Good for presentations

Disadvantages:

  • Hard to compare similar-sized slices
  • Difficult with many categories
  • Can be misleading with 3D effects

Segmented Bar Chart

Purpose: Compare distributions across multiple groups

Structure:

  • Bars divided into segments
  • Each segment represents a category
  • Can show counts or percentages

When to use:

  • Comparing categorical distributions across groups
  • Two categorical variables
  • Want to see both totals and breakdowns

Example: Transportation method by grade level

  • Each grade has a bar
  • Bars divided by transportation type
  • Can compare both across and within grades

Graphs for Quantitative Data

Dotplot

Purpose: Display individual values for small to moderate datasets

Structure:

  • Number line showing possible values
  • Dot for each observation
  • Dots stack when values repeat

When to use:

  • Small datasets (n < 50)
  • Want to see individual values
  • Looking for clusters, gaps, outliers

Example: Test scores: 75, 80, 80, 82, 85, 85, 85, 90, 95

  • Stack three dots above 85
  • Stack two dots above 80
  • Single dots for 75, 82, 90, 95

Advantages:

  • Shows every data point
  • Easy to create
  • Good for small datasets

Disadvantages:

  • Impractical for large datasets
  • Can become cluttered

Stemplot (Stem-and-Leaf Plot)

Purpose: Display data while retaining actual values

Structure:

  • Split each value into "stem" (leading digit(s)) and "leaf" (trailing digit)
  • Stems listed vertically
  • Leaves listed horizontally

When to use:

  • Small to moderate datasets
  • Want to preserve actual data values
  • Quick hand-drawn analysis

Example: Test scores: 67, 72, 75, 78, 81, 83, 85, 85, 92

Stem  Leaf
6     7
7     2 5 8
8     1 3 5 5
9     2

Key: 7 | 2 represents 72

Back-to-back stemplot: Compare two distributions

  • Shared stems in middle
  • One dataset's leaves on left
  • Other dataset's leaves on right

Advantages:

  • Retains actual values
  • Shows distribution shape
  • Can reconstruct original data

Disadvantages:

  • Tedious for large datasets
  • Choice of stems affects appearance

Histogram

Purpose: Display distribution of continuous data

Structure:

  • Quantitative variable on x-axis (divided into bins)
  • Frequency or relative frequency on y-axis
  • Bars touching (continuous data)
  • Bar height = frequency in that interval

When to use:

  • Large datasets
  • Continuous or discrete quantitative data
  • Want to see distribution shape

Example: Heights of students (in inches)

  • 60-62: 5 students
  • 62-64: 12 students
  • 64-66: 23 students
  • 66-68: 18 students
  • 68-70: 8 students

Important considerations:

Bin width:

  • Too narrow → choppy, hard to see pattern
  • Too wide → lose detail, miss features
  • Experiment to find appropriate width

Number of bins:

  • General rule: n\sqrt{n} or log2(n)+1\log_2(n) + 1
  • 5-20 bins usually works well
  • More data → can use more bins

Advantages:

  • Shows distribution shape clearly
  • Handles large datasets
  • Identifies outliers, gaps, clusters

Disadvantages:

  • Appearance depends on bin choices
  • Loses individual data values
  • Can mislead if bins chosen poorly

Boxplot (Box-and-Whisker Plot)

Purpose: Display five-number summary and identify outliers

Structure:

  • Box from Q1 to Q3 (contains middle 50%)
  • Line at median inside box
  • Whiskers extend to min and max (excluding outliers)
  • Outliers plotted individually

Five-number summary:

  1. Minimum (excluding outliers)
  2. Q1 (first quartile, 25th percentile)
  3. Median (50th percentile)
  4. Q3 (third quartile, 75th percentile)
  5. Maximum (excluding outliers)

Outlier definition:

  • Below: Q11.5×IQRQ1 - 1.5 \times IQR
  • Above: Q3+1.5×IQRQ3 + 1.5 \times IQR
  • Where IQR=Q3Q1IQR = Q3 - Q1

When to use:

  • Comparing multiple distributions
  • Identifying outliers
  • Showing spread and center
  • Large datasets

Modified boxplot:

  • Whiskers go to last value within 1.5 × IQR
  • Outliers plotted as individual points
  • More informative than regular boxplot

Side-by-side boxplots:

  • Compare distributions across groups
  • Same scale for all boxes
  • Easy to see differences in center, spread, shape

Advantages:

  • Compact display
  • Shows spread clearly
  • Easy to compare groups
  • Identifies outliers automatically

Disadvantages:

  • Doesn't show distribution shape well
  • Can hide bimodality or other features
  • Less detail than histogram

Cumulative Frequency Plot (Ogive)

Purpose: Show cumulative frequencies or percentages

Structure:

  • Data values on x-axis
  • Cumulative frequency/percentage on y-axis
  • Line connects points
  • Always increasing (or flat)

When to use:

  • Want to find percentiles
  • Show how data accumulates
  • Identify median and quartiles

Uses:

  • Read off percentiles directly
  • See what percentage falls below a value
  • Identify quartile locations

Describing Distributions (SOCS)

When analyzing any graph, describe using SOCS:

S - Shape

Symmetric: Balanced around center (mirror image)

  • Normal (bell-shaped)
  • Uniform (flat, rectangular)

Skewed:

  • Right-skewed (positive): Tail extends to right, mean > median
  • Left-skewed (negative): Tail extends to left, mean < median

Modality:

  • Unimodal: One peak
  • Bimodal: Two peaks
  • Multimodal: Multiple peaks
  • Uniform: No peaks

O - Outliers

Outliers: Observations unusually far from bulk of data

Identify:

  • Visual inspection (far from others)
  • 1.5 × IQR rule (for boxplots)
  • More than 2-3 standard deviations from mean

Report:

  • Note presence
  • Give values if possible
  • Consider causes (error? legitimate?)

C - Center

Typical value: Where data tends to cluster

Measures:

  • Median (middle value)
  • Mean (average)
  • Mode (most common)

In description: "The center is around [value]" or "The median is [value]"

S - Spread

Variability: How spread out data is

Measures:

  • Range (max - min)
  • IQR (Q3 - Q1)
  • Standard deviation

In description: "Values range from [min] to [max]" or "Most values fall between [Q1] and [Q3]"

Choosing the Right Graph

Decision Guide

Categorical data:

  • Few categories, show proportions → Pie chart
  • Compare categories → Bar graph
  • Compare across groups → Segmented bar chart

Quantitative data:

  • Small dataset (n < 30) → Dotplot or stemplot
  • Show distribution shape → Histogram
  • Compare groups → Side-by-side boxplots
  • Identify outliers → Boxplot
  • Find percentiles → Cumulative frequency plot

Common Mistakes to Avoid

Pie charts for quantitative data
3D or decorative effects (distort perception)
Inconsistent scales when comparing
Too many/too few bins in histograms
Bar graph with touching bars (that's a histogram!)
Missing labels on axes
No scale on axes

Best Practices

Label axes clearly with variable names and units
Include title describing what graph shows
Use consistent scales when comparing
Choose appropriate graph type for data
Make it readable (not too small, cluttered)
Describe using SOCS in analysis
Note any outliers or unusual features

Quick Reference

Graph Selection:

  • Categorical: Bar graph or pie chart
  • Small quantitative: Dotplot or stemplot
  • Large quantitative: Histogram or boxplot
  • Comparisons: Side-by-side boxplots or segmented bar charts
  • Percentiles: Cumulative frequency plot

SOCS Description:

  • Shape: symmetric, skewed (left/right), unimodal/bimodal
  • Outliers: identify and report
  • Center: median, mean
  • Spread: range, IQR, standard deviation

Remember: The best graph clearly communicates the story in your data. When in doubt, try multiple types and choose the one that reveals the most!

📚 Practice Problems

No example problems available yet.