Loading…
Perform inference for the slope of a regression line using t-tests and confidence intervals.
Learn step-by-step with practice exercises built right in.
Null hypothesis: (no linear relationship)
Test statistic:
A regression of study hours (x) on test scores (y) gives slope b₁ = 5.2 with SE = 1.3, n = 20. Construct a 95% confidence interval for the true slope β₁.
Step 1: Identify given information Slope: b₁ = 5.2 Standard error: SE = 1.3 Sample size: n = 20 Confidence level: 95%
Step 2: Find degrees of freedom df = n - 2 = 20 - 2 = 18 (Use n-2 for regression, not n-1)
Step 3: Find t* critical value From t-table with df = 18, 95% confidence: t* = 2.101
Step 4: Calculate margin of error ME = t* × SE ME = 2.101 × 1.3 ME ≈ 2.73
Step 5: Construct confidence interval CI = b₁ ± ME CI = 5.2 ± 2.73 CI = (2.47, 7.93)
Step 6: Interpret "We are 95% confident that for each additional hour studied, the true mean increase in test score is between 2.47 and 7.93 points."
Note: Since 0 is NOT in the interval, there is significant evidence of a positive relationship (can reject H₀: β₁ = 0).
Answer: 95% CI: (2.47, 7.93) points per hour
Avoid these 3 frequent errors
Review key concepts with our flashcard system
Explore more AP Statistics topics
Where:
Interpretation:
Never skip: Always state all five conditions and explain how you checked each.
Data: Study hours vs. Exam score (n = 20 students)
Regression:
, so
; (two-tailed, )
Since , reject .
Conclusion: There is significant evidence of a linear relationship between hours studied and exam score.
Example:
Interpretation: We are 95% confident the true slope is between 1.89 and 6.51 points per hour.
❌ Not checking LINER conditions (major point deduction) ❌ Using without stating it ❌ Confusing test for slope with correlation significance (similar but different) ❌ Forgetting (not )
State all five conditions and HOW you checked each (e.g., "Residual plot shows random scatter, supporting linearity"). Name the test: "t-test for the slope." Report the test statistic, degrees of freedom, and p-value (or critical value). Always conclude in context.
Test H₀: β₁ = 0 vs Hₐ: β₁ ≠ 0 given b₁ = 3.5, SE = 1.2, n = 25, α = 0.05.
Step 1: Set up hypotheses H₀: β₁ = 0 (no relationship) Hₐ: β₁ ≠ 0 (relationship exists)
Two-tailed test, α = 0.05
Step 2: Check conditions LINEAR: Assume scatterplot is linear ✓ INDEPENDENT: Assume random sample, n < 10% population ✓ NORMAL: Residuals approximately normal ✓ EQUAL VARIANCE: Residual plot shows constant spread ✓ RANDOM: Random sample ✓
(LINE conditions for regression inference)
Step 3: Calculate test statistic df = n - 2 = 25 - 2 = 23
t = (b₁ - 0)/SE t = 3.5/1.2 t ≈ 2.917
Step 4: Find p-value From t-table with df = 23, two-tailed: t = 2.917 is between t = 2.807 (p = 0.01) and t = 3.767 (p = 0.001)
So: 0.001 < p-value < 0.01
More precisely: p-value ≈ 0.0077
Step 5: Make decision p-value (0.0077) < α (0.05) REJECT H₀
Step 6: Conclusion in context "There is significant evidence (p = 0.008) that a linear relationship exists between x and y. The slope is significantly different from zero."
Answer: t = 2.92, p-value ≈ 0.008. Reject H₀. Significant evidence of linear relationship.
What are the conditions (LINE) for inference in regression? Explain each briefly.
The LINE conditions for regression inference:
L - LINEAR Relationship between x and y is linear Check: Scatterplot should show linear pattern Residual plot should show no curve
I - INDEPENDENT
Observations are independent
Check: Random sampling
n < 10% of population (if sampling without replacement)
No time series or repeated measures
N - NORMAL Residuals are approximately normally distributed Check: Histogram or normal probability plot of residuals Not critical if n is large (n ≥ 30) Just need no strong skewness or outliers
E - EQUAL VARIANCE (also called homoscedasticity) Variability of y is constant for all x Check: Residual plot shows roughly equal vertical spread No fan shape or other pattern in spread
Why these matter:
If violations:
Answer: LINE = Linear relationship, Independent observations, Normal residuals, Equal variance. Check using scatterplot, residual plot, and normal probability plot.
Computer output shows: b₁ = 2.4, SE(b₁) = 0.8, t = 3.0, p = 0.006, n = 22. Interpret the p-value in context.
Step 1: Identify the test Testing: H₀: β₁ = 0 (no relationship) Against: Hₐ: β₁ ≠ 0 (relationship exists)
Given: p-value = 0.006
Step 2: What p-value means statistically The probability of observing a slope as extreme as 2.4 (or more extreme) IF the true slope is actually 0.
Step 3: Interpret in context "If there were truly no linear relationship between x and y (β₁ = 0), the probability of obtaining a sample slope of 2.4 or more extreme (in either direction) is 0.006, or 0.6%."
Step 4: Practical interpretation This is very unlikely (less than 1% chance)!
Therefore: Strong evidence AGAINST H₀ The relationship is statistically significant.
Step 5: Decision at α = 0.05 Since p-value (0.006) < α (0.05): REJECT H₀
Conclusion: "There is strong evidence of a significant linear relationship. The slope is significantly different from zero (p = 0.006)."
Step 6: What this does NOT mean ✗ Does not mean slope is definitely 2.4 ✗ Does not mean x causes y ✗ Does not mean model fits well (could still have problems) ✓ Only means: slope significantly different from zero
Answer: If true slope were 0, probability of getting b₁ = 2.4 or more extreme is only 0.006. This provides strong evidence the slope is not zero - there is a significant linear relationship.
Why do we use t-distribution with df = n-2 for regression inference instead of df = n-1?
Step 1: Compare to one-sample t-test One-sample t-test: df = n - 1
Regression: df = n - 2
Step 2: What we're estimating In regression, we estimate:
Both use up degrees of freedom!
Step 3: Degrees of freedom explained Start with n observations
df = n - 2
Step 4: Why it matters Smaller df → wider t* critical values → wider CIs
Example: n = 10, 95% confidence
Regression CI slightly wider (more uncertainty).
Step 5: As n increases For large n, the difference is minimal:
Step 6: General pattern Degrees of freedom = n - (number of parameters estimated)
Answer: We estimate TWO parameters (β₀ and β₁), so we lose 2 degrees of freedom, giving df = n - 2. This accounts for the extra uncertainty from estimating both intercept and slope.