A class of 20 students averages 75 on a test. A class of 30 students averages 85. What is the combined average?
Step
Work
Sum for class 1
20ร75=1500
Sum for class 2
30ร85=2550
Combined mean
20+301500+2550โ=504050โ=81
Note: The combined average is NOT simply (75+85)/2=80. The larger class pulls the average closer to its mean.
Median from a Frequency Table
Score
Frequency
70
3
80
5
90
4
100
2
14 values total โ median = average of 7th and 8th values. Count: positions 1โ3 are 70, positions 4โ8 are 80. Both 7th and 8th are 80, so median = 80.
1โ
+
n2โ
n1โโ xห1โ+n2โโ xห2โ
โ
Outlier effect
Mean shifts toward outlier; median barely moves
On the SAT, "average" always means mean unless otherwise specified
The weighted average is always closer to the group with more data points
{10,30,50,70,90}
Both have mean 50, but B has a much larger standard deviation.
Effect of Transformations
Transformation
Mean
SD
Add k to all values
Mean +k
Same
Multiply all by k
Mean รk
SD $\times
Adding a constant shifts all data equally โ spread doesn't change.
Multiplying stretches the data โ spread increases.
Standard Deviation ๐ฏ
Worked Example 1 โ Comparing Spread Visually
Two dot plots both have mean 50:
Plot A: values at 48, 49, 50, 51, 52
Plot B: values at 30, 40, 50, 60, 70
Measure
Plot A
Plot B
Range
4
40
SD
Low
High
Plot B has values much farther from the mean โ larger SD.
Worked Example 2 โ Transformation Chain
A dataset has mean 40 and SD 6. Every value is tripled, then 5 is subtracted. Find the new mean and SD.
Transformation
Mean
SD
Original
40
6
Multiply by 3
40ร3=120
6ร3=18
Subtract 5
120โ5=115
18 (unchanged)
Adding/subtracting does NOT change SD. Multiplying DOES.
Percentiles and Quartiles
Measure
Meaning
Q1โ (25th percentile)
25% of data below
Q2โ (median, 50th)
50% of data below
(75th percentile)
A value is an outlier if it's more than 1.5รIQR beyond Q1โ or Q3โ.
Spread & Transformations ๐ฏ
What Happens to the SD? ๐
Determine the effect of each transformation on standard deviation.
Key Takeaways โ Part 2
Concept
Key Rule
SD measures
Spread / distance from mean
Add constant k
Mean + k; SD unchanged
Multiply by k
Mean ร k; **SD ร $
SD = 0
All values identical
IQR
Q3โโQ1โ; spread of middle 50%
Outlier threshold
Beyond Q1โโ1.5(IQR) or Q3โ
SD is ALWAYS โฅ 0 (it can never be negative)
On the SAT, you compare SDs visually โ more clustered = lower SD
y=ax+b
a (slope) = predicted change in y for each 1-unit increase in x
b (y-intercept) = predicted y when x=0
Residuals
Residual=ActualโPredicted
Positive residual: point is above the line
Negative residual: point is below the line
SAT Strategy
"According to the line of best fit..." โ plug into the equation and calculate.
"The slope of the line means..." โ interpret as "for each additional [x-unit], the [y-quantity] increases/decreases by [slope]."
Scatterplots ๐ฏ
Worked Example 1 โ Interpreting Slope in Context
Regression: y^โ=0.85x+12, where x = hours practiced per week and y = free-throw percentage.
Component
Value
Interpretation
Slope
0.85
For each additional hour of practice, free-throw % is predicted to increase by 0.85 points
y-intercept
12
A player who practices 0 hours is predicted to have a 12% free-throw rate
SAT phrasing: "The slope means that for each additional hour of weekly practice, the predicted free-throw percentage increases by 0.85 percentage points."
Worked Example 2 โ Residual Analysis
A student studies 8 hours and scores 92. The regression line predicts y^โ=3(8)+65=89.
Step
Work
Residual
92โ89=3
Meaning
Student scored 3 points above prediction
On graph
Point is 3 units above the line
Correlation vs. Causation
Statement
Valid?
"Hours of study is correlated with higher grades"
โ (describes relationship)
"Studying more hours causes higher grades"
โ ๏ธ Only valid if from a controlled experiment
"The data proves studying improves grades"
โ "Proves" is too strong for any study
Choosing a Model
If a residual plot shows a clear curve, a linear model is NOT the best fit โ try quadratic or exponential.
Scatterplot Interpretation ๐ฏ
Interpret the Regression ๐
For the equation y^โ=2.5x+40 where x = study hours and y = test score:
Key Takeaways โ Part 3
Concept
Key Rule
Slope
Predicted change in y per 1-unit increase in x
y-intercept
Predicted y when x=0
Residual
Actual โ Predicted
Positive residual
Point above the line
Negative residual
Point below the line
r close to ยฑ1
Strong linear correlation
r close to 0
Weak or no linear correlation
SAT Wording
Correct Response
"What does the slope represent?"
"For each additional [x-unit], [y] is predicted to change by [slope]"
"Does this prove causation?"
Only if randomized experiment
"Is the linear model appropriate?"
Check the residual plot for patterns
40/150โ26.7%
Conditional frequency: Of those who prefer cats, what percent are female? 40/70โ57.1%
SAT Trap โ ๏ธ
"What fraction of cat owners are female?" โ denominator = cat total = 70 โ 40/70โ57.1%
"What fraction of females own cats?" โ denominator = female total = 70 โ 40/70โ57.1%
In this example they happen to give the same answer because both totals are 70, but in general they're different! The trick is identifying the correct denominator (row total, column total, or grand total).
Worked Example 1 โ Filling in a Two-Way Table
120 students: 55 play sports, 80 are in clubs. 30 do both. Complete the table.
Step
Work
Sports AND clubs
30
Sports only
55โ30=25
Clubs only
80โ30=50
Neither
120โ25โ30โ50=15
In Club
Not in Club
Total
Sports
30
25
55
No Sports
50
15
65
Total
80
40
120
Worked Example 2 โ Conditional vs. Joint
From the table above:
Question
Type
Calculation
P(sports AND club)
Joint
30/120=25%
P(club
sports)
Conditional
P(sports
club)
Conditional
Two-Way Tables ๐ฏ
Cat
Dog
Male
30
50
Female
40
30
Independence in Two-Way Tables
Two events A and B are independent if P(AโฃB)=P(A).
From the pet table: P(cat)=70/150โ46.7%. But P(catย |ย male)=30/80=37.5%. Since 37.5%๎ =46.7%, gender and pet preference are not independent.
Worked Example 3 โ Relative Frequency Table
Convert a two-way table to relative frequencies (divide everything by grand total):
Cat
Dog
Total
Male
30/150=20%
50/150=33.3%
53.3%
Female
Common SAT Denominator Guide
Question asks
Denominator
"Of all survey respondents..."
Grand total
"Of those who prefer cats..."
Column total for cats
"Of the males surveyed..."
Row total for males
"What proportion of the total..."
Grand total
Advanced Two-Way Tables ๐ฏ
Pick the Right Denominator ๐
What denominator do you use for each question?
Key Takeaways โ Part 4
Question Type
Denominator
Example
"Of all..."
Grand total
P(maleย andย dog)=50/150
"Of [group]..."
Group total
$P(\text{cat}
Joint probability
Grand total
P(AโฉB)
Conditional probability
Condition total
$P(A
Skill
How-to
Fill in table
Use row/column totals to find missing cells
Check independence
Compare $P(A
Convert to relative freq.
Divide each cell by grand total
The SAT's #1 trap in two-way tables is using the wrong denominator
Always re-read the question to identify the "of" phrase โ that gives you the denominator
P(notย A)=1โP(A)
"AND" (Intersection)
If events are independent: P(Aย andย B)=P(A)รP(B)
Example: Coin flip AND die roll: P(headsย andย 6)=(1/2)(1/6)=1/12
"OR" (Union)
P(Aย orย B)=P(A)+P(B)โP(Aย andย B)
If events are mutually exclusive (can't happen together): P(Aย orย B)=P(A)+P(B)
Conditional Probability
P(AโฃB)=P(B)P(Aย andย B)โ
"Probability of A given B" โ restrict your attention to only the B outcomes.
Probability ๐ฏ
Worked Example 1 โ "At Least One" with Complement
A coin is flipped 3 times. What is the probability of getting at least one head?
Approach
Work
Direct
Count all cases with โฅ 1 head... tedious
Complement
P(atย leastย 1ย head)=1โP(noย heads)
Calculate
P(allย tails)=(1/2)3=1/8
Answer
1โ1/8=7/8
Rule: "At least one" = 1 โ P(none). Always use the complement!
Worked Example 2 โ "OR" with Overlap
In a class of 30: 18 play soccer, 12 play basketball, 5 play both. What is P(soccer OR basketball)?
Of students who passed math, half also passed science.
Advanced Probability ๐ฏ
Probability Setup ๐
Choose the correct formula or approach for each problem.
Key Takeaways โ Part 5
Rule
Formula
When to Use
Basic probability
P(A)=favorable/total
Always
Complement
P(notย A)=1โP(A)
"At least one," "none"
AND (independent)
P(A)รP(B)
Events don't affect each other
OR
P(A)+P(B)โP(AโฉB)
"Either... or..."
Conditional
$P(A
B) = P(A \cap B) / P(B)$
Without replacement
Adjust denominator after each draw
Drawing from a bag/deck
"At least one" โ use complement (much faster)
With replacement: probabilities stay the same. Without: they change
On the SAT, conditional probability often comes from two-way tables
Worked Example 1 โ Is the Conclusion Valid?
A researcher gives Vitamin C to 50 volunteers and a placebo to 50 others (randomly assigned). The Vitamin C group had fewer colds. Conclusion: "Vitamin C reduces colds."
Check
Answer
Study type
Experiment (random assignment)
Random assignment?
Yes
Can conclude causation?
Yes โ this is valid
Worked Example 2 โ Why Only Association?
A study surveys 1,000 adults and finds that coffee drinkers exercise more. Conclusion: "Coffee causes people to exercise."
Check
Answer
Study type
Observational (no intervention)
Confound?
Maybe: health-conscious people both drink coffee and exercise
Valid conclusion?
"Coffee consumption is associated with more exercise" โ NOT "causes"
Margin of Error
When the SAT says "95% confidence interval is 52%ยฑ3%":
Plausible range: 49% to 55%
Larger sample โ smaller margin of error
The margin does NOT mean 3% of people changed their mind
Study Design ๐ฏ
Generalizability vs. Causation โ Two Separate Questions
Feature
Allows
Random sampling from population
Generalize results to the population
Random assignment to treatments
Conclude cause and effect
Both
Generalize + causation
Neither
Only describes the sample
SAT Conclusion Wording Guide
Study Design
Valid Conclusion Wording
Random sample, observational
"People who [X] tend to [Y]"
Random sample + random assignment
"[X] causes [Y] in the population"
Convenience sample, observational
"Among these participants, [X] is associated with [Y]"
Worked Example 3 โ Margin of Error
A poll of 400 voters: 58%ยฑ4% support a candidate.
Question
Answer
Confidence interval
54% to 62%
Can we say majority support?
Yes โ even the low end (54%) exceeds 50%
If interval were to ?
Evaluating Conclusions ๐ฏ
Identify the Study Type ๐
Classify each scenario.
Key Takeaways โ Part 6
Concept
Key Rule
Causation
Only from randomized experiments
Generalization
Only from random sampling
Association
Observational studies can show this
Confounding variable
Third variable explaining a correlation
Margin of error
Confidence interval = estimate ยฑ margin
Larger sample
Smaller margin of error
Bias Type
Example
Selection
Surveying only library users
Response
Leading questions
Voluntary response
Online opt-in polls
On the SAT, the wrong answer often claims causation from an observational study โ always check!
Causation
Only from randomized experiments
Common SAT Question Types
Calculate the mean/median from given data
Interpret a slope or y-intercept in context
Read a two-way table for conditional probability
Evaluate whether a study conclusion is valid
Compare standard deviations visually
Worked Example 1 โ Multi-Concept Problem
A survey of 200 students found a regression equation y^โ=1.5x+50 relating weekly study hours (x) to test scores (y). The mean study time was 10 hours with SD 3.
Question
Work
Predicted score at x=10
1.5(10)+50=65
Predicted score at x=16
1.5(16)+50=74
If scores multiplied by 2, new SD of study hours?
SD of x is unchanged (transformation was on y, not x)
Worked Example 2 โ Two-Way Table + Probability
In a class: 15 males passed, 5 males failed, 20 females passed, 10 females failed.
Pass
Fail
Total
Male
15
5
20
Female
20
10
30
Total
35
15
50
Question
Answer
P(pass)
35/50=70%
P(pass
male)
P(male
pass)
Are gender & passing independent?
$P(\text{pass}) = 70% \neq P(\text{pass}
Mixed Review ๐ฏ
SAT Data & Statistics Cheat Sheet
Topic
Key Formula / Concept
Mean
xห=nโxโ; Sum = Mean ร Count
Median
Middle value; sort first
SD
Spread from mean; add โ same; multiply โ changes
Slope
Predicted ฮy per unit ฮx
Residual
Actual โ Predicted
Complement
P(notย A)=1โP(A)
OR (with overlap)
P(A)+P(B)โP(AโฉB)
AND (independent)
P(A)รP(B)
Conditional
$P(A
Causation
Only from randomized experiments
Generalization
Only from random sampling
Common Mistakes to Avoid
Mistake
Fix
Using wrong denominator in two-way table
Re-read "of [group]" to find denominator
Saying correlation = causation
Use "associated with" unless random assignment
Forgetting to subtract overlap in P(A or B)
Always check: can A and B happen together?
Confusing "all values" with "the mean"
Adding 10 to every value โ adding 10 to just the mean
Ignoring "without replacement"
After removing one item, total decreases by 1
SAT Challenge Round ๐ฏ
Quick Concept Check ๐
Match each scenario to the correct statistical concept.
Key Takeaways โ Part 7
Part
Core Skill
1
Mean, median, mode โ and how outliers affect them
2
Standard deviation โ comparing spread, effect of transformations