Independent groups (and n<10% of population for each)
Large counts: n1โ
๐ Under H0โ, we assume p1โ=p2, so we use the for the standard error.
Two-Proportion Test ๐ฏ
Two-Proportion Calculations ๐งฎ
Treatment group: 45 successes out of 150. Control group: 30 successes out of 150.
1)p^โ1โ=45/150=? (Express as a decimal)
2)?
Part 2: Two-Sample T-Test for Means
๐ Two-Sample T-Test for Means
Part 2 of 7 โ Comparing Two Independent Groups
Topics in This Part
Section
๐ฏ Hypotheses for ฮผ1โโฮผ2โ
๐ The Two-Sample -Statistic
Part 3: Paired T-Test
๐ Paired T-Test
Part 3 of 7 โ When Data Come in Pairs
Topics in This Part
Section
๐ Paired vs. Two-Sample Designs
๐ The Paired t-Test Procedure
โ Conditions for Paired Data
๐ Worked Example
๐ Key Concept: A paired t-test is a one-sample t-test performed on the differences within each pair.
When to Use a Paired Test
Use a paired -test when:
Part 4: Confidence Intervals for Differences
๐ Confidence Intervals for Differences
Part 4 of 7 โ Estimating How Much Two Groups Differ
Topics in This Part
Section
๐ Two-Sample CI for ฮผ1โโฮผ2โ
๐ Paired CI for
Part 5: Power and Sample Size
๐ Power and Sample Size
Part 5 of 7 โ Detecting Real Differences
Topics in This Part
Section
โก What Is Power?
๐ฏ Type I and Type II Errors
๐ Factors Affecting Power
๐งฎ Sample Size Considerations
๐ Key Concept: Power is the probability of correctly rejecting H0โ when H is actually false. Higher power = better ability to detect a real effect.
Part 6: Problem-Solving Workshop
๐ Problem-Solving Workshop
Part 6 of 7 โ Complete Worked Examples
Worked Example 1: Two-Sample T-Test
Problem
A fitness company wants to compare two training programs. 45 volunteers are randomly assigned: 22 to Program A, 23 to Program B. After 8 weeks, weight loss (in pounds) is recorded:
Program A
Program B
n
22
23
xห
12.4
9.7
Part 7: Review & Applications
๐ Review & Applications
Part 7 of 7 โ Comprehensive Review
Complete Formula Reference
Two-Sample T-Test for Means
Component
Formula
Standard Error
SE=n
2
โ
๎ =
0(orย >
0ย orย <
0)
^
โ
(
1
โ
p^โ
)
(n1โ1โ+n2โ1โ)
โ
(p^โ1โโp^โ2โ)โ0
โ
2
โ
x1โ+x2โ
โ
pooled proportion
p^โ
,
n1โ
(
1
โ
p^โ),n2โp^โ,n2โ(1โ
p^โ)โฅ
10
โ
pooled proportion
p^โ
p^โ2โ=30/150=
3) Pooled p^โ=(45+30)/(150+150)=?
t
โ Conditions
๐ Worked Example
๐ Key Concept: The two-sample t-test compares means from two independent groups. The key question: "Is the difference in sample means large enough to conclude the population means differ?"
The Setup
Two independent random samples:
Group 1: n1โ observations, sample mean xห1โ, sample SD s1โ
Group 2: n2โ observations, sample mean xห2โ, sample SD
Degrees of freedom: Use the calculator/technology value (Welch's approximation) or the conservative value df=min(n1โโ1,n2โโ1).
โ ๏ธ AP Tip: The AP formula sheet provides this formula. On the AP exam, use the conservative df or the calculator df โ both are acceptable.
Conditions
Condition
Check
Random
Both samples are random (or randomly assigned in an experiment)
Independent
The two groups are independent of each other
Normal/Large Sample
Both populations are Normal OR both n1โโฅ30 and n2โโฅ30
10% Rule
n1โ<10% of population 1 AND n2โ<10% of population 2
Worked Example
Study: Does a new teaching method improve test scores?
Method A (traditional): n1โ=35, xห1โ=74.2, s1โ=8.5
Method B (new): n2โ=38, xห2โ=,
Step 1 โ Hypotheses:H0โ:ฮผBโโฮผAโ=0 (no difference in mean scores)
Haโ:ฮผBโโฮผAโ (new method produces higher mean scores)
Step 2 โ Conditions:
Random: Students randomly assigned to methods โ
Independent: Two separate groups โ
Normal: n1โ=35โฅ30 and n2โ=38โฅ30 โ
10%: Each group is less than 10% of all students โ
Step 3 โ Test statistic:t=387.92โ+358.52โโ79.6โ74.2โ=1.642+2.064โ5.4โ=3.706โ5.4โ=1.9255.4โ=2.805
df =min(34,37)=34 (conservative) or use calculator df
Step 4 โ Conclusion:Pโ0.004 (one-sided). Since P=0.004<0.05=ฮฑ, we reject H0โ. There is convincing evidence that the new teaching method produces higher mean test scores than the traditional method.
Two-Sample t-Test Concepts ๐ฏ
Computing the t-Statistic ๐งฎ
Group 1: xห1โ=50, s1โ=10, n1โ=25
Group 2: xห2โ=44, s2โ=12, n2โ=30
1)xห1โโxห2โ
2)s12โ/n1โ+s2 (compute to one decimal)
3) Conservative df =min(n1โโ1,n2โโ1)
Decisions and Interpretations ๐
Exit Quiz โ Two-Sample t-Test โ
t
The same subjects are measured twice (before/after)
Subjects are matched in pairs (e.g., twins, partners)
Each observation in one group has a natural partner in the other
Use a two-sample t-test when:
The two groups are independent (different subjects, no pairing)
The Procedure
Step 1: Compute the differences: diโ=x1iโโx2iโ for each pair
Step 2: Treat the differences as a single sample and perform a one-sample t-test:
โ ๏ธ AP Tip: Define d clearly: "d = After โ Before" or "d = Treatment โ Control." The direction matters for one-sided tests.
Conditions
Condition
Check
Random
Pairs are randomly selected or treatments randomly assigned
Independent
Individual pairs are independent of each other (10% condition on pairs)
Normal
The differences (d) are approximately Normal (check histogram/QQ plot of d)
Worked Example
Study: Does a tutoring program improve SAT scores?
12 students take the SAT, receive tutoring, then retake it.
Student
Before
After
d = After โ Before
1
520
560
40
2
480
510
30
3
550
540
โ10
...
...
...
...
Summary
dห=28.5, sdโ=22.3
Step 1 โ Hypotheses:H0โ:ฮผdโ=0 (tutoring has no effect on mean SAT scores)
Haโ:ฮผdโ>0 (tutoring increases mean SAT scores)
where d = After โ Before
Step 2 โ Conditions:
Random: Students randomly selected โ
Independent: 12 students < 10% of all SAT takers โ
Normal: Histogram of differences shows no strong skew โ
Step 3 โ Test statistic:t=22.3/12โ28.5โ0โ=6.43628.5โ=4.43
df =12โ1=11
Step 4 โ Conclusion:P<0.001 (one-sided). Since P<0.001<0.05, we reject H0โ. There is convincing evidence that the tutoring program increases mean SAT scores.
Paired vs. Two-Sample: Why It Matters
Feature
Two-Sample
Paired
Data structure
Two independent groups
Pairs of related observations
Parameter
ฮผ1โโฮผ2โ
ฮผdโ
Advantage
Simpler design
Controls for variability between subjects
df
Complex (Welch)
npairsโโ1
๐ Key Insight: Pairing reduces variability by eliminating subject-to-subject differences, making it easier to detect a treatment effect.
Paired t-Test Concepts ๐ฏ
Computing the Paired t ๐งฎ
n=16 pairs, dห=5.0, sdโ=8.0
1) SE of dห=sdโ/n
2)t=dห/SE=
3) df =
Paired or Two-Sample? ๐
Exit Quiz โ Paired t-Test โ
ฮผdโ
๐ Interpretation Templates
๐งฎ Connection to Hypothesis Tests
๐ Key Concept: A CI for the difference gives a range of plausible values for how much two population means (or the mean difference) differ.
Two-Sample CI for ฮผ1โโฮผ2โ
(xห1โโxห2โ)ยฑtโโ n1โs12โโ+nโ
Same conditions as the two-sample t-test (Random, Independent, Normal/Large, 10%)
df: use calculator (Welch) or conservative min(n1โโ1,n2โโ1)
Paired CI for ฮผdโ
dหยฑtโโ nโsdโโโ
Same conditions as the paired t-test (Random, Independent pairs, Normal differences)
df =nโ1 (where n = number of pairs)
Interpretation Templates
Two-Sample:
"We are [C]% confident that the true difference in mean [context] between [group 1] and [group 2] is between [lower] and [upper] [units]."
Paired:
"We are [C]% confident that the true mean difference in [context] is between [lower] and [upper] [units]."
CI โ Test Connection
CI contains 0?
Test conclusion at ฮฑ=1โC
Yes
Fail to reject H0โ:ฮผ1โ=ฮผ2โ (or ฮผdโ=0)
No
Reject H0โ
Worked Example โ Two-Sample CI
Group A (old drug): n1โ=40, xห1โ=82, s1โ=12
Group B (new drug): n2โ=45, xห2โ=88, s2โ=10
95% CI for ฮผBโโฮผAโ:
xหBโโxหAโ=88โ82=6
SE =102/45+122/40
Conservative df =min(39,44)=39, tโโ2.023
6ยฑ2.023(2.413)=6ยฑ4.881=(1.119,10.881)
Interpretation: "We are 95% confident that the true difference in mean recovery scores (new โ old) is between 1.1 and 10.9 points. Since 0 is not in the interval, there is evidence the new drug produces higher mean scores."
Worked Example โ Paired CI
15 patients measured before and after treatment.
dห=8.3 (After โ Before), sdโ=6.2
95% CI for ฮผdโ: df =14, tโ=2.1458.3ยฑ2.145โ 15โ6.2โ=(4.866,11.734)
Interpretation: "We are 95% confident that the true mean change in [outcome] after treatment is between 4.9 and 11.7 units."
โ ๏ธ AP Tip: Always define d (e.g., After โ Before) and include context and units.
CI for Differences Concepts ๐ฏ
Building CIs ๐งฎ
Two-Sample:xห1โโxห2โ=10, SE =4, tโ=2.0
1) Margin of error =
2) Lower bound =
Paired:dห=โ3, sdโ=6, ,
3) SE of dห=
Interpretation Decisions ๐
Exit Quiz โ CIs for Differences โ
0
โ
Error Types
H0โ True
H0โ False
Reject H0โ
Type I Error (ฮฑ)
Correct! (Power)
Fail to Reject
Correct!
Type II Error (ฮฒ)
Power=1โฮฒ=P(rejectย H0โโฃH0โย isย false)
Type I Error (ฮฑ)
Rejecting H0โ when it is true (false positive)
Probability =ฮฑ (the significance level)
Example: Concluding a drug works when it actually does not
Type II Error (ฮฒ)
Failing to reject H0โ when it is false (false negative)
Probability = ฮฒ
Example: Concluding a drug does not work when it actually does
Factors That Increase Power
Factor
Direction
Effect on Power
Sample size (n)
โ
Power โ
Significance level (ฮฑ)
โ
Power โ
True effect size ($
\mu_1 - \mu_2
$)
Population variability (ฯ)
โ
Power โ
โ ๏ธ AP Tip: You will NOT be asked to calculate power on the AP exam, but you MUST understand conceptually how each factor affects power.
Intuition for Each Factor
Larger n: More data โ smaller SE โ easier to detect a difference
SE=n1โs12โโ+n2โs22โโโโย largerย nย โย smallerย SEย โย largerย โฃtโฃ
Larger ฮฑ: Easier rejection threshold โ more likely to reject (but more risk of Type I error)
Larger effect: A bigger real difference is easier to detect than a tiny one
Smaller ฯ: Less noise โ the signal (difference) stands out more clearly
The Power-ฮฑ Tradeoff
ฮฑโโPowerโโฮฒโ
Decreasing ฮฑ (e.g., from 0.05 to 0.01) reduces Type I error but increases Type II error (reduces power). The only way to reduce BOTH errors simultaneously is to increase sample size.
Sample Size Planning
Before collecting data, researchers choose n to achieve desired power (typically 80% or higher):
Specify the smallest meaningful effect size
Estimate population variability (ฯ)
Choose ฮฑ (usually 0.05)
Use a power table or software to find the required n
๐ Key Insight: Larger samples are always better for power, but they cost more. Sample size planning balances statistical needs with practical constraints.
Power Concepts ๐ฏ
Error and Power Calculations ๐งฎ
1) If ฮฒ=0.15, what is the power? (give as decimal)
2) If ฮฑ=0.01, what is the probability of a Type I error?
3) A test has power =0.80. What is ฮฒ?
Power Factors ๐
Exit Quiz โ Power & Sample Size โ
s
3.8
4.1
Test whether Program A produces greater average weight loss at ฮฑ=0.05.
Step 1: State Hypotheses
H0โ:ฮผAโโฮผBโ=0Haโ:ฮผAโโฮผBโ
Where ฮผAโ = true mean weight loss for Program A, ฮผBโ = true mean weight loss for Program B.
Step 2: Check Conditions
โ Random: Volunteers randomly assigned to groups (experiment)
โ Normal:nAโ=22โฅ10 and nBโ=23โฅ10 (or no strong skewness mentioned)
โ Independent: Groups are independent; each person in only one program. Both <10% of all potential participants.
AP-Style Conclusion: There is convincing evidence that the true mean weight loss for Program A is greater than the true mean weight loss for Program B.
Worked Example 2: Paired T-Test
Problem
A researcher tests whether a meditation app reduces stress. 30 participants rate their stress (1โ100) before and after 4 weeks of daily use:
Before
After
Differences (Before โ After)
n
30
30
30
xห
68.2
59.5
xหdโ=8.7
s
โ
โ
sdโ=11.3
Test whether the app reduces stress at ฮฑ=0.05.
Step 1: State Hypotheses
H0โ:ฮผdโ=0Haโ:ฮผdโ>0
Where ฮผdโ = true mean difference in stress scores (Before โ After) for all users of this app.
Step 2: Check Conditions
โ Random: 30 participants randomly selected (or assume representative)
โ Normal:n=30โฅ30 (CLT applies for differences)
โ Independent: Differences within each person are independent; 30<10% of all potential users
โ ๏ธ Key: We check conditions on the DIFFERENCES, not the individual scores.
Step 3: Calculate
SE=nโsdโโ=30โ11.3โ=5.47711.3โโ2.063
t=SExหdโโ0โ=2.0638.7โโ4.217
df=nโ1=29, P-value <0.001
Step 4: Conclude
Since P<0.001<0.05=ฮฑ, we reject H0โ.
AP-Style Conclusion: There is convincing evidence that the meditation app reduces mean stress scores.
Common AP Mistakes
Mistake
Why It Costs Points
Using two-sample test when data is paired
Wrong procedure โ wrong test statistic โ wrong conclusion
Not defining ฮผdโ clearly
"Mean difference" must include direction (A โ B) and context
Skipping conditions
Automatic deduction on free-response
No context in conclusion
Must reference the specific variables and setting
Saying "accept H0โ"
Always say "fail to reject H0โ"
Not identifying data as paired
Look for: same subjects, before/after, matched pairs
Workshop Practice ๐ฏ
Computation Practice ๐งฎ
1) Two groups: xห1โ=45, s1โ=6, n1โ=36; xห2โ=41, s2โ=8, n2โ=36. Calculate SE (round to 2 decimal places).
2) Using the values above, calculate the t-statistic (round to 2 decimal places).
3) Paired data: xหdโ=3.2, sdโ, . Calculate the t-statistic (round to 2 decimal places).
Random: Both samples from random processes (or random assignment)
Normal:n1โโฅ30 and n2โโฅ30, or no strong skewness/outliers
Independent: Samples independent of each other; each <10% of its population
Paired Tests/CIs
Random: Random sample of pairs (or randomly determine order)
Normal:nโฅ30 differences, or differences show no strong skewness/outliers
Independent: Individual pairs are independent; n<10% of all pairs in population
Interpretation Templates
Hypothesis Test Conclusion
"Since P-value = ___ is [less/greater] than ฮฑ= ___, we [reject/fail to reject] H0โ. There [is/is not] convincing evidence that [context: what the alternative hypothesis claims]."
Confidence Interval Interpretation
"We are ___% confident that the true difference in means [context: ฮผ1โโฮผ2โ in words] is between ___ and ___."
Confidence Interval and Significance
If 0 is NOT in the CI โ Reject H0โ (at the corresponding ฮฑ)
If 0 IS in the CI โ Fail to reject H0โ
Key Concepts from Every Part
Part
Topic
Key Idea
1
Introduction
Two-sample vs. paired designs
2
Two-Sample T-Test
Tests for differences between independent groups
3
Paired T-Test
Tests for differences within matched pairs
4
Confidence Intervals
Estimate the true difference with a range
5
Power & Sample Size
Power = 1โฮฒ; larger n โ more power
6
Problem-Solving
Complete 4-step process with real data
Power Quick Review
Power=1โฮฒ=P(rejectย H0โโฃH0โย isย false)
Power increases when: nโ, ฮฑโ, effect size โ, ฯโ
๐ Final Tip: The AP exam tests your ability to (1) choose the right test, (2) check conditions, (3) calculate correctly, and (4) interpret in context. Practice the full 4-step process until it becomes automatic.
Comprehensive Review ๐ฏ
Formula Application ๐งฎ
1) Two-sample data: xห1โ=78, xห2โ=72, SE=2.5. Calculate t (round to 1 decimal place).