Measures of Spread - Complete Interactive Lesson
Part 1: Exploratory Data Analysis Overview
๐ Exploratory Data Analysis
Part 1 of 7 โ EDA Overview
What Is EDA?
Exploratory Data Analysis (EDA) is the process of using graphs and summary statistics to understand the key features of a dataset.
The Four Features (SOCS)
When describing a distribution, always mention:
| Feature | What to Look For |
|---|---|
| Shape | Symmetric, skewed left, skewed right, bimodal, uniform |
| Outliers | Unusual values far from the pattern |
| Center | Mean, median |
| Spread | Range, IQR, standard deviation |
Types of Data
| Type | Examples |
|---|---|
| Categorical | Gender, color, yes/no |
| Quantitative | Height, test scores, income |
Graphs for Categorical vs. Quantitative
- Categorical: bar chart, pie chart
- Quantitative: histogram, stemplot, boxplot, dotplot
Concept Check U0001f3af
Data Classification ๐งฎ
Classify each as categorical (C) or quantitative (Q):
1) Zip code
2) Temperature in degrees Fahrenheit
3) Number of siblings
Part 2: Graphical Displays
๐ Graphical Displays
Part 2 of 7 โ Graphs for Quantitative Data
Histograms
- Bars represent frequency (or relative frequency) for intervals
- No gaps between bars (unlike bar charts)
- Show shape, center, spread, and outliers
Stemplots (Stem-and-Leaf Plots)
- Each value is split into a โstemโ and โleafโ
- Good for small datasets (preserves individual values)
- Back-to-back stemplots compare two groups
Dotplots
- Each value represented by a dot above a number line
- Best for small datasets
- Easy to see clusters, gaps, and outliers
Comparative Displays
To compare distributions, use:
- Side-by-side boxplots
- Back-to-back stemplots
- Overlapping or stacked histograms
๐ Always compare shape, outliers, center, AND spread when comparing distributions.
Concept Check U0001f3af
Graph Selection ๐งฎ
Choose the best graph type for each:
1) Comparing test score distributions of two classes (histogram/boxplot/stemplot)
2) Showing individual values of 20 measurements (histogram/dotplot/boxplot)
3) Displaying the distribution of 500 exam scores (histogram/dotplot/stemplot)
Part 3: Measures of Center
๐ Measures of Center
Part 3 of 7 โ Mean, Median, Mode
Mean ()
Part 4: Measures of Spread
๐ Measures of Spread
Part 4 of 7 โ Range, IQR, Standard Deviation
Range
Simple but not resistant to outliers.
Interquartile Range (IQR)
Part 5: Outliers and Shape
๐ Shape and Outliers
Part 5 of 7 โ Describing Distributions Completely
Shapes of Distributions
| Shape | Description | Example |
|---|---|---|
| Symmetric | Mirror image, roughly equal tails | Test scores |
| Skewed right | Long right tail | Income, home prices |
| Skewed left | Long left tail | Age at retirement |
| Bimodal | Two peaks | Heights of men & women combined |
| Uniform | All values equally likely | Rolling a die |
The Effect of Outliers
Outliers affect:
- Mean (pulled toward outlier) โ NOT resistant
- Standard deviation (increases) โ NOT resistant
- Range (increases) โ NOT resistant
Outliers do NOT significantly affect:
- Median โ resistant
- IQR โ resistant
Part 6: Problem-Solving Workshop
๐ Problem-Solving Workshop
Part 6 of 7 โ AP-Style Practice
AP Exam Framework for EDA
When describing a distribution, ALWAYS address:
- Shape (is it symmetric? skewed? bimodal?)
- Outliers (are there any? use the 1.5รIQR rule)
- Center (give an approximate value and name the statistic)
- Spread (report the appropriate measure)
Template Answer
โThe distribution of [variable] is [shape] with [center measure] approximately [value] and [spread measure] approximately [value]. [There are / are no] outliers.โ
Comparing Distributions
Always compare both distributions on ALL four features. Use comparative language: โhigher,โ โwider,โ โmore skewed.โ
Concept Check U0001f3af
SOCS Description ๐งฎ
Test scores: Min=45, , Med=75, , Max=98. Roughly symmetric, no outliers.
Part 7: Mixed Review
๐ Mixed Review
Part 7 of 7 โ Comprehensive Review
Quick Reference
| Measure | Resistant? | Best for |
|---|---|---|
| Mean | No | Symmetric data |
| Median | Yes | Skewed data |
| Std Dev | No | Symmetric data |
| IQR | Yes | Skewed data |
| Range | No | Quick summary |
EDA Checklist
- Identify variable type (categorical vs. quantitative)
- Choose appropriate graph
- Describe shape (symmetric, skewed L/R, bimodal, uniform)
- Check for outliers (1.5 ร IQR rule)
- Report center (mean or median)
- Report spread (SD or IQR)
- Use context (variable names, units)
Concept Check U0001f3af
Final Challenge ๐งฎ
Data: 3, 5, 7, 8, 9, 10, 12, 14, 50