Paired Data and Matched Pairs

What is Paired Data?

Paired data: Two measurements on same subject or matched subjects

Examples:

Before/after measurements on same people
Twins (one gets treatment A, other gets treatment B)
Matched subjects (similar age, gender, etc.)
Same subjects under two conditions

Key: Natural pairing creates dependence

Why Pair?

Reduces variability by controlling for subject-to-subject differences

Example: Blood pressure

People naturally have different BP
Before/after on same person: eliminates person-to-person variation
More sensitive to treatment effect

Pairing is powerful! Can detect smaller effects than two independent samples

Paired vs Two-Sample

Paired:

Same subjects (or matched pairs)
Analyze differences
Use one-sample t-test on differences

Two-sample:

Different subjects in each group
Independent samples
Use two-sample t-test

MUST identify which before analyzing!

Paired t-Test Procedure

1. Calculate differences: d = measurement₁ - measurement₂ for each pair

2. Hypotheses about mean difference:

H₀: μ_d = 0 (no mean difference)
Hₐ: μ_d ≠ 0 (or μ_d > 0 or μ_d < 0)

3. Use one-sample t-test on differences:

$t = \frac{\bar{d} - 0}{s_d/\sqrt{n}}$

Where:

$\bar{d}$ = mean of differences
s_d = standard deviation of differences
n = number of pairs (not total observations!)
df = n - 1

Conditions for Paired t-Test

Random: Pairs randomly selected
Normal: Differences approximately normal OR n ≥ 30
Independent: Pairs independent of each other

Note: Measurements within pair are dependent (that's the point!), but pairs themselves must be independent

Example 1: Before/After

Blood pressure before and after medication (10 patients):

| Patient | Before | After | Difference (Before - After) | |---------|--------|-------|---------------------------| | 1 | 145 | 138 | 7 | | 2 | 152 | 145 | 7 | | ... | ... | ... | ... |

$\bar{d} = 8.5$ , s_d = 4.2, n = 10

STATE:

μ_d = true mean reduction in BP
H₀: μ_d = 0
Hₐ: μ_d > 0
α = 0.05

PLAN:

Paired t-test
Random: Assume ✓
Normal: n = 10, check plot of differences (assume ok) ✓
Independent: Patients independent ✓

DO:

$t = \frac{8.5 - 0}{4.2/\sqrt{10}} = \frac{8.5}{1.33} \approx 6.39$

df = 9

P-value = P(t ≥ 6.39) < 0.001

CONCLUDE: P-value < 0.05, reject H₀. Medication significantly reduces blood pressure.

Example 2: Matched Pairs

Twins study - Math scores (twin₁ gets tutoring, twin₂ doesn't):

n = 15 twin pairs
$\bar{d}$ = 5.2 (tutored - control)
s_d = 6.8

Test if tutoring helps:

STATE:

μ_d = true mean difference (tutored - control)
H₀: μ_d = 0
Hₐ: μ_d > 0
α = 0.05

DO:

$t = \frac{5.2 - 0}{6.8/\sqrt{15}} = \frac{5.2}{1.76} \approx 2.95$

df = 14

P-value ≈ 0.005

CONCLUDE: Reject H₀. Significant evidence tutoring increases scores.

Direction of Differences

Consistent subtraction order matters!

Common choices:

Before - After (positive means decrease)
After - Before (positive means increase)
Treatment - Control (positive means treatment better)

Be consistent and interpret accordingly!

Advantages of Pairing

1. Controls for confounding variables

Each subject is own control
Eliminates between-subject variation

2. Increases power

Reduced variability → easier to detect effects
Can use smaller sample size

3. More efficient

Need fewer total subjects than two independent samples

When NOT to Pair

Don't pair if:

No natural pairing exists
Pairing is artificial or forced
Want to generalize to unpaired populations

Pairing must be meaningful and appropriate!

Paired CI

Confidence interval for mean difference:

$\bar{d} \pm t^* \frac{s_d}{\sqrt{n}}$

Interpretation: Range of plausible values for true mean difference

Example: Earlier BP study

90% CI: $8.5 \pm 1.833(1.33) = 8.5 \pm 2.44 = (6.06, 10.94)$

We're 90% confident mean BP reduction is between 6.06 and 10.94 points.

Checking Normality of Differences

Important: Check normality of DIFFERENCES, not original data

Methods:

Dotplot of differences
Boxplot of differences
Normal probability plot of differences

For small n: Must be close to normal
For large n (≥30): CLT applies to differences

Common Mistakes

❌ Using two-sample t-test on paired data (loses power!)
❌ Using paired test on independent samples
❌ Counting total observations instead of pairs for df
❌ Not checking normality of differences
❌ Inconsistent subtraction order

Identifying Paired Data

Ask yourself:

Are there two measurements per subject?
Is there natural pairing/matching?
Would it make sense to calculate differences?

If yes → Paired data
If no → Independent samples

Calculator Commands (TI-83/84)

Method 1: Enter differences directly

Calculate differences, enter in list
STAT → TESTS → 2:T-Test
Use difference list

Method 2: Use paired test

Enter both measurements in separate lists
STAT → TESTS → 2:T-Test
Specify list₁ - list₂

Real-World Applications

Medical: Before/after treatment
Education: Pre-test/post-test
Psychology: Same subjects under different conditions
Agriculture: Adjacent plots (control for soil variation)
Marketing: Same consumers rating two products

Quick Reference

Key idea: Analyze differences, not separate groups

Test statistic: $t = \frac{\bar{d}}{s_d/\sqrt{n}}$ , df = n - 1

n = number of pairs (not total measurements)

Conditions: Random pairs, differences normal (or n ≥ 30), pairs independent

Remember: Pairing is powerful! Use it when available. Analyze differences with one-sample t-test. Don't use two-sample test on paired data!

Paired Data