Inference for Regression
Confidence intervals and tests for slope
Inference for Regression
Beyond Description
So far: Described relationship in sample data
Now: Make inferences about population relationship
- Confidence interval for slope
- Hypothesis test for slope
- Prediction intervals
Conditions for Inference (LINE)
L - Linear relationship: Check scatterplot
I - Independent observations: Random sample, n < 10%N
N - Normal distribution of residuals: Check histogram/normal plot of residuals
E - Equal variance: Check residual plot (constant spread)
Must check all before inference!
Slope as Parameter
Sample: b = slope from data
Population: β (beta) = true slope in population
Question: Is there really a relationship, or did we just see pattern by chance?
Hypothesis Test for Slope
Hypotheses:
- H₀: β = 0 (no linear relationship)
- Hₐ: β ≠ 0 (linear relationship exists)
If β = 0: x has no effect on y
Test statistic:
df = n - 2
SE_b (standard error of slope): Provided by calculator/computer
Example 1: Test for Slope
Height (x) and weight (y), n = 25:
b = 4, SE_b = 1.2
STATE:
- β = true slope
- H₀: β = 0
- Hₐ: β ≠ 0
- α = 0.05
PLAN:
- t-test for slope
- Conditions: LINE all checked ✓
DO:
df = 25 - 2 = 23
P-value = 2 × P(t > 3.33) ≈ 0.003
CONCLUDE: P-value < 0.05, reject H₀. Significant linear relationship between height and weight.
Confidence Interval for Slope
Formula:
df = n - 2
Interpretation: "We are C% confident the true slope is between [L] and [U]."
Meaning: For each unit increase in x, y changes by between L and U units (on average in population)
Example 2: CI for Slope
Same data: b = 4, SE_b = 1.2, n = 25
95% CI:
df = 23, t* ≈ 2.069
Interpretation: "We are 95% confident that for each additional inch of height, weight increases by between 1.52 and 6.48 pounds on average."
Relationship Between Test and CI
For two-sided test at α:
Check if (1-α) CI contains 0:
- If 0 in CI → fail to reject H₀
- If 0 not in CI → reject H₀
Example: 95% CI is (1.52, 6.48)
- Doesn't contain 0
- Reject H₀: β = 0 at α = 0.05
Prediction Interval
Different from confidence interval!
Confidence interval: For mean response
Prediction interval: For individual response
Prediction interval is wider (more uncertainty predicting individual)
Formula (approximate):
Where s = standard deviation of residuals
More precise formula accounts for:
- Distance of x from (farther = wider interval)
- Sample size
Example 3: Prediction Interval
Predict weight for height = 70:
= 158, s = 10, n = 25
95% prediction interval (rough):
Interpretation: "We predict an individual with height 70 inches will weigh between 137 and 179 pounds with 95% confidence."
Standard Error of Slope
Formula:
Where s = standard deviation of residuals
Factors making SE_b smaller:
- Smaller s (points closer to line)
- Larger sample size n
- More spread in x-values
Smaller SE_b → narrower CI → more precise estimate
Checking Conditions
Linearity:
- Scatterplot roughly linear
- Residual plot shows no curve
Independence:
- Random sample
- No time trends
- Each observation independent
Normality:
- Histogram of residuals roughly normal
- Normal probability plot roughly linear
- Less critical for large n (CLT)
Equal Variance:
- Residual plot shows constant spread
- No fan shape
What if Conditions Not Met?
Nonlinear: Transform variables or use nonlinear methods
Not normal (small n): Be cautious with inference
Not equal variance: Consider transformation or weighted regression
Not independent: Use time series or other methods
Don't ignore violations! Inference may be invalid
Prediction vs Confidence Interval
Confidence Interval for Mean Response:
- "Average y for all individuals with x = x₀"
- Narrower
- Use: Policy decisions, understanding average effect
Prediction Interval for Individual:
- "Single y value for one individual with x = x₀"
- Wider (includes individual variability)
- Use: Predicting specific outcome
Always wider: Prediction interval > confidence interval
Multiple Regression Preview
So far: One explanatory variable
Multiple regression: Several explanatory variables
Can test each slope: Does this variable help predict y (controlling for others)?
Beyond AP Stats but important to know exists
Common Mistakes
❌ Not checking LINE conditions
❌ Using normal instead of t-distribution
❌ Confusing prediction and confidence intervals
❌ Using df = n instead of n - 2
❌ Making inference when conditions violated
Practical Significance
Statistical significance (P < 0.05) doesn't mean practical importance
Example: Slope = 0.01, P = 0.001
- Statistically significant
- But is 0.01 change per unit practically meaningful?
Consider:
- Effect size (magnitude of slope)
- Context
- Practical implications
Quick Reference
Test for slope: , df = n - 2
CI for slope:
Conditions: LINE (Linear, Independent, Normal, Equal variance)
Prediction interval: Wider than confidence interval
0 in CI for slope? → No significant relationship
Remember: Check LINE conditions before inference! Inference lets us extend conclusions beyond our sample to the broader population, but only if conditions are met.
📚 Practice Problems
No example problems available yet.
Practice with Flashcards
Review key concepts with our flashcard system
Browse All Topics
Explore other calculus topics