Paired Data

Analyzing matched pairs

Paired Data and Matched Pairs

What is Paired Data?

Paired data: Two measurements on same subject or matched subjects

Examples:

  • Before/after measurements on same people
  • Twins (one gets treatment A, other gets treatment B)
  • Matched subjects (similar age, gender, etc.)
  • Same subjects under two conditions

Key: Natural pairing creates dependence

Why Pair?

Reduces variability by controlling for subject-to-subject differences

Example: Blood pressure

  • People naturally have different BP
  • Before/after on same person: eliminates person-to-person variation
  • More sensitive to treatment effect

Pairing is powerful! Can detect smaller effects than two independent samples

Paired vs Two-Sample

Paired:

  • Same subjects (or matched pairs)
  • Analyze differences
  • Use one-sample t-test on differences

Two-sample:

  • Different subjects in each group
  • Independent samples
  • Use two-sample t-test

MUST identify which before analyzing!

Paired t-Test Procedure

1. Calculate differences: d = measurement₁ - measurement₂ for each pair

2. Hypotheses about mean difference:

  • H₀: μ_d = 0 (no mean difference)
  • Hₐ: μ_d ≠ 0 (or μ_d > 0 or μ_d < 0)

3. Use one-sample t-test on differences:

t=dˉ0sd/nt = \frac{\bar{d} - 0}{s_d/\sqrt{n}}

Where:

  • dˉ\bar{d} = mean of differences
  • s_d = standard deviation of differences
  • n = number of pairs (not total observations!)
  • df = n - 1

Conditions for Paired t-Test

  • Random: Pairs randomly selected
  • Normal: Differences approximately normal OR n ≥ 30
  • Independent: Pairs independent of each other

Note: Measurements within pair are dependent (that's the point!), but pairs themselves must be independent

Example 1: Before/After

Blood pressure before and after medication (10 patients):

| Patient | Before | After | Difference (Before - After) | |---------|--------|-------|---------------------------| | 1 | 145 | 138 | 7 | | 2 | 152 | 145 | 7 | | ... | ... | ... | ... |

dˉ=8.5\bar{d} = 8.5, s_d = 4.2, n = 10

STATE:

  • μ_d = true mean reduction in BP
  • H₀: μ_d = 0
  • Hₐ: μ_d > 0
  • α = 0.05

PLAN:

  • Paired t-test
  • Random: Assume ✓
  • Normal: n = 10, check plot of differences (assume ok) ✓
  • Independent: Patients independent ✓

DO:

t=8.504.2/10=8.51.336.39t = \frac{8.5 - 0}{4.2/\sqrt{10}} = \frac{8.5}{1.33} \approx 6.39

df = 9

P-value = P(t ≥ 6.39) < 0.001

CONCLUDE: P-value < 0.05, reject H₀. Medication significantly reduces blood pressure.

Example 2: Matched Pairs

Twins study - Math scores (twin₁ gets tutoring, twin₂ doesn't):

n = 15 twin pairs
dˉ\bar{d} = 5.2 (tutored - control)
s_d = 6.8

Test if tutoring helps:

STATE:

  • μ_d = true mean difference (tutored - control)
  • H₀: μ_d = 0
  • Hₐ: μ_d > 0
  • α = 0.05

DO:

t=5.206.8/15=5.21.762.95t = \frac{5.2 - 0}{6.8/\sqrt{15}} = \frac{5.2}{1.76} \approx 2.95

df = 14

P-value ≈ 0.005

CONCLUDE: Reject H₀. Significant evidence tutoring increases scores.

Direction of Differences

Consistent subtraction order matters!

Common choices:

  • Before - After (positive means decrease)
  • After - Before (positive means increase)
  • Treatment - Control (positive means treatment better)

Be consistent and interpret accordingly!

Advantages of Pairing

1. Controls for confounding variables

  • Each subject is own control
  • Eliminates between-subject variation

2. Increases power

  • Reduced variability → easier to detect effects
  • Can use smaller sample size

3. More efficient

  • Need fewer total subjects than two independent samples

When NOT to Pair

Don't pair if:

  • No natural pairing exists
  • Pairing is artificial or forced
  • Want to generalize to unpaired populations

Pairing must be meaningful and appropriate!

Paired CI

Confidence interval for mean difference:

dˉ±tsdn\bar{d} \pm t^* \frac{s_d}{\sqrt{n}}

Interpretation: Range of plausible values for true mean difference

Example: Earlier BP study

90% CI: 8.5±1.833(1.33)=8.5±2.44=(6.06,10.94)8.5 \pm 1.833(1.33) = 8.5 \pm 2.44 = (6.06, 10.94)

We're 90% confident mean BP reduction is between 6.06 and 10.94 points.

Checking Normality of Differences

Important: Check normality of DIFFERENCES, not original data

Methods:

  • Dotplot of differences
  • Boxplot of differences
  • Normal probability plot of differences

For small n: Must be close to normal
For large n (≥30): CLT applies to differences

Common Mistakes

❌ Using two-sample t-test on paired data (loses power!)
❌ Using paired test on independent samples
❌ Counting total observations instead of pairs for df
❌ Not checking normality of differences
❌ Inconsistent subtraction order

Identifying Paired Data

Ask yourself:

  1. Are there two measurements per subject?
  2. Is there natural pairing/matching?
  3. Would it make sense to calculate differences?

If yes → Paired data
If no → Independent samples

Calculator Commands (TI-83/84)

Method 1: Enter differences directly

  • Calculate differences, enter in list
  • STAT → TESTS → 2:T-Test
  • Use difference list

Method 2: Use paired test

  • Enter both measurements in separate lists
  • STAT → TESTS → 2:T-Test
  • Specify list₁ - list₂

Real-World Applications

Medical: Before/after treatment
Education: Pre-test/post-test
Psychology: Same subjects under different conditions
Agriculture: Adjacent plots (control for soil variation)
Marketing: Same consumers rating two products

Quick Reference

Key idea: Analyze differences, not separate groups

Test statistic: t=dˉsd/nt = \frac{\bar{d}}{s_d/\sqrt{n}}, df = n - 1

n = number of pairs (not total measurements)

Conditions: Random pairs, differences normal (or n ≥ 30), pairs independent

Remember: Pairing is powerful! Use it when available. Analyze differences with one-sample t-test. Don't use two-sample test on paired data!

📚 Practice Problems

No example problems available yet.