Paired Data
Analyzing matched pairs
Paired Data and Matched Pairs
What is Paired Data?
Paired data: Two measurements on same subject or matched subjects
Examples:
- Before/after measurements on same people
- Twins (one gets treatment A, other gets treatment B)
- Matched subjects (similar age, gender, etc.)
- Same subjects under two conditions
Key: Natural pairing creates dependence
Why Pair?
Reduces variability by controlling for subject-to-subject differences
Example: Blood pressure
- People naturally have different BP
- Before/after on same person: eliminates person-to-person variation
- More sensitive to treatment effect
Pairing is powerful! Can detect smaller effects than two independent samples
Paired vs Two-Sample
Paired:
- Same subjects (or matched pairs)
- Analyze differences
- Use one-sample t-test on differences
Two-sample:
- Different subjects in each group
- Independent samples
- Use two-sample t-test
MUST identify which before analyzing!
Paired t-Test Procedure
1. Calculate differences: d = measurement₁ - measurement₂ for each pair
2. Hypotheses about mean difference:
- H₀: μ_d = 0 (no mean difference)
- Hₐ: μ_d ≠ 0 (or μ_d > 0 or μ_d < 0)
3. Use one-sample t-test on differences:
Where:
- = mean of differences
- s_d = standard deviation of differences
- n = number of pairs (not total observations!)
- df = n - 1
Conditions for Paired t-Test
- Random: Pairs randomly selected
- Normal: Differences approximately normal OR n ≥ 30
- Independent: Pairs independent of each other
Note: Measurements within pair are dependent (that's the point!), but pairs themselves must be independent
Example 1: Before/After
Blood pressure before and after medication (10 patients):
| Patient | Before | After | Difference (Before - After) | |---------|--------|-------|---------------------------| | 1 | 145 | 138 | 7 | | 2 | 152 | 145 | 7 | | ... | ... | ... | ... |
, s_d = 4.2, n = 10
STATE:
- μ_d = true mean reduction in BP
- H₀: μ_d = 0
- Hₐ: μ_d > 0
- α = 0.05
PLAN:
- Paired t-test
- Random: Assume ✓
- Normal: n = 10, check plot of differences (assume ok) ✓
- Independent: Patients independent ✓
DO:
df = 9
P-value = P(t ≥ 6.39) < 0.001
CONCLUDE: P-value < 0.05, reject H₀. Medication significantly reduces blood pressure.
Example 2: Matched Pairs
Twins study - Math scores (twin₁ gets tutoring, twin₂ doesn't):
n = 15 twin pairs
= 5.2 (tutored - control)
s_d = 6.8
Test if tutoring helps:
STATE:
- μ_d = true mean difference (tutored - control)
- H₀: μ_d = 0
- Hₐ: μ_d > 0
- α = 0.05
DO:
df = 14
P-value ≈ 0.005
CONCLUDE: Reject H₀. Significant evidence tutoring increases scores.
Direction of Differences
Consistent subtraction order matters!
Common choices:
- Before - After (positive means decrease)
- After - Before (positive means increase)
- Treatment - Control (positive means treatment better)
Be consistent and interpret accordingly!
Advantages of Pairing
1. Controls for confounding variables
- Each subject is own control
- Eliminates between-subject variation
2. Increases power
- Reduced variability → easier to detect effects
- Can use smaller sample size
3. More efficient
- Need fewer total subjects than two independent samples
When NOT to Pair
Don't pair if:
- No natural pairing exists
- Pairing is artificial or forced
- Want to generalize to unpaired populations
Pairing must be meaningful and appropriate!
Paired CI
Confidence interval for mean difference:
Interpretation: Range of plausible values for true mean difference
Example: Earlier BP study
90% CI:
We're 90% confident mean BP reduction is between 6.06 and 10.94 points.
Checking Normality of Differences
Important: Check normality of DIFFERENCES, not original data
Methods:
- Dotplot of differences
- Boxplot of differences
- Normal probability plot of differences
For small n: Must be close to normal
For large n (≥30): CLT applies to differences
Common Mistakes
❌ Using two-sample t-test on paired data (loses power!)
❌ Using paired test on independent samples
❌ Counting total observations instead of pairs for df
❌ Not checking normality of differences
❌ Inconsistent subtraction order
Identifying Paired Data
Ask yourself:
- Are there two measurements per subject?
- Is there natural pairing/matching?
- Would it make sense to calculate differences?
If yes → Paired data
If no → Independent samples
Calculator Commands (TI-83/84)
Method 1: Enter differences directly
- Calculate differences, enter in list
- STAT → TESTS → 2:T-Test
- Use difference list
Method 2: Use paired test
- Enter both measurements in separate lists
- STAT → TESTS → 2:T-Test
- Specify list₁ - list₂
Real-World Applications
Medical: Before/after treatment
Education: Pre-test/post-test
Psychology: Same subjects under different conditions
Agriculture: Adjacent plots (control for soil variation)
Marketing: Same consumers rating two products
Quick Reference
Key idea: Analyze differences, not separate groups
Test statistic: , df = n - 1
n = number of pairs (not total measurements)
Conditions: Random pairs, differences normal (or n ≥ 30), pairs independent
Remember: Pairing is powerful! Use it when available. Analyze differences with one-sample t-test. Don't use two-sample test on paired data!
📚 Practice Problems
1Problem 1easy
❓ Question:
A researcher wants to test if a new study technique improves test scores. She records the scores of 10 students before and after using the technique. Why should she use a paired t-test rather than a two-sample t-test?
💡 Show Solution
She should use a paired t-test because the same students are measured twice (before and after), creating natural pairs. This violates the independence assumption required for two-sample t-tests. The paired design is more powerful because it controls for individual student differences in baseline ability.
Key considerations: • Each student serves as their own control • Focus is on the difference within each pair • Reduces variability by eliminating between-student differences
2Problem 2medium
❓ Question:
Ten married couples were asked to rate their happiness on a scale from 1 to 10. The differences (husband - wife) in ratings were: 2, -1, 0, 3, -2, 1, 0, 2, -1, 1. Construct a 95% confidence interval for the mean difference in happiness ratings.
💡 Show Solution
Step 1: Calculate statistics from differences d̄ = (2 + (-1) + 0 + 3 + (-2) + 1 + 0 + 2 + (-1) + 1) / 10 = 0.5
Step 2: Calculate standard deviation sd = √[Σ(di - d̄)² / (n-1)] = √[14.5 / 9] ≈ 1.27
Step 3: Find t* for df = 9, 95% confidence t* = 2.262
Step 4: Calculate confidence interval CI = d̄ ± t*(sd/√n) CI = 0.5 ± 2.262(1.27/√10) CI = 0.5 ± 0.91 CI = (-0.41, 1.41)
Conclusion: We are 95% confident that the true mean difference in happiness ratings (husband - wife) is between -0.41 and 1.41 points.
3Problem 3medium
❓ Question:
A coach wants to know if a new training program improves 100m sprint times. He records the times of 8 runners before and after the program. The mean difference (before - after) is 0.3 seconds with a standard deviation of 0.4 seconds. Test at α = 0.05 if the program improves times.
💡 Show Solution
H₀: μd = 0 (no improvement) Hₐ: μd > 0 (improvement, before > after)
Test statistic: t = (d̄ - 0) / (sd/√n) t = (0.3 - 0) / (0.4/√8) t = 0.3 / 0.141 t ≈ 2.12
df = n - 1 = 7
P-value (one-tailed): P(t > 2.12) ≈ 0.036
Decision: Since p-value (0.036) < α (0.05), reject H₀
Conclusion: There is sufficient evidence at the 5% significance level to conclude that the training program improves 100m sprint times.
4Problem 4hard
❓ Question:
A pharmaceutical company tests a new medication on 15 patients with high blood pressure. Each patient's blood pressure is measured before treatment and after 3 months. The differences (before - after) have a mean of 8 mmHg and standard deviation of 6 mmHg. Can we conclude at α = 0.01 that the medication lowers blood pressure?
💡 Show Solution
H₀: μd = 0 (no change) Hₐ: μd > 0 (blood pressure decreases)
Test statistic: t = (d̄ - 0) / (sd/√n) t = (8 - 0) / (6/√15) t = 8 / 1.549 t ≈ 5.16
df = 14
P-value (one-tailed): P(t > 5.16) < 0.0001
Decision: Since p-value < 0.01, reject H₀
Conclusion: There is very strong evidence (p < 0.01) that the medication lowers blood pressure. The large t-statistic (5.16) indicates the effect is both statistically significant and likely clinically meaningful.
5Problem 5hard
❓ Question:
A nutritionist studies whether eating breakfast affects students' performance on a math test. She has 20 students take a test after skipping breakfast and another test after eating breakfast (order randomized). Why is this a paired design? What are the advantages and potential concerns?
💡 Show Solution
Why it's paired: Each student takes both tests (no breakfast and with breakfast), creating natural pairs. We analyze the difference in scores for each student.
Advantages: • Controls for individual differences in math ability • More powerful than independent samples design • Requires fewer subjects (20 vs 40 for independent groups) • Each student serves as their own control
Potential concerns:
-
Practice effect: Students might do better on the second test regardless of breakfast Solution: Randomize which condition comes first
-
Carryover effect: Effects from first test might influence second test Solution: Sufficient time between tests
-
Different test difficulty: If tests aren't equivalent, this confounds results Solution: Use equivalent forms or counterbalance test versions
-
Learning between tests: Students might study between tests Solution: Control time between tests, avoid giving feedback
Practice with Flashcards
Review key concepts with our flashcard system
Browse All Topics
Explore other calculus topics