Inference for Regression

Confidence intervals and tests for slope

Inference for Regression

Beyond Description

So far: Described relationship in sample data

Now: Make inferences about population relationship

  • Confidence interval for slope
  • Hypothesis test for slope
  • Prediction intervals

Conditions for Inference (LINE)

L - Linear relationship: Check scatterplot

I - Independent observations: Random sample, n < 10%N

N - Normal distribution of residuals: Check histogram/normal plot of residuals

E - Equal variance: Check residual plot (constant spread)

Must check all before inference!

Slope as Parameter

Sample: b = slope from data

Population: β (beta) = true slope in population

Question: Is there really a relationship, or did we just see pattern by chance?

Hypothesis Test for Slope

Hypotheses:

  • H₀: β = 0 (no linear relationship)
  • Hₐ: β ≠ 0 (linear relationship exists)

If β = 0: x has no effect on y

Test statistic:

t=b0SEbt = \frac{b - 0}{SE_b}

df = n - 2

SE_b (standard error of slope): Provided by calculator/computer

Example 1: Test for Slope

Height (x) and weight (y), n = 25:

b = 4, SE_b = 1.2

STATE:

  • β = true slope
  • H₀: β = 0
  • Hₐ: β ≠ 0
  • α = 0.05

PLAN:

  • t-test for slope
  • Conditions: LINE all checked ✓

DO:

t=401.23.33t = \frac{4 - 0}{1.2} \approx 3.33

df = 25 - 2 = 23

P-value = 2 × P(t > 3.33) ≈ 0.003

CONCLUDE: P-value < 0.05, reject H₀. Significant linear relationship between height and weight.

Confidence Interval for Slope

Formula:

b±tSEbb \pm t^* SE_b

df = n - 2

Interpretation: "We are C% confident the true slope is between [L] and [U]."

Meaning: For each unit increase in x, y changes by between L and U units (on average in population)

Example 2: CI for Slope

Same data: b = 4, SE_b = 1.2, n = 25

95% CI:

df = 23, t* ≈ 2.069

CI=4±2.069(1.2)=4±2.48=(1.52,6.48)CI = 4 \pm 2.069(1.2) = 4 \pm 2.48 = (1.52, 6.48)

Interpretation: "We are 95% confident that for each additional inch of height, weight increases by between 1.52 and 6.48 pounds on average."

Relationship Between Test and CI

For two-sided test at α:

Check if (1-α) CI contains 0:

  • If 0 in CI → fail to reject H₀
  • If 0 not in CI → reject H₀

Example: 95% CI is (1.52, 6.48)

  • Doesn't contain 0
  • Reject H₀: β = 0 at α = 0.05

Prediction Interval

Different from confidence interval!

Confidence interval: For mean response
Prediction interval: For individual response

Prediction interval is wider (more uncertainty predicting individual)

Formula (approximate):

y^±ts\hat{y} \pm t^* s

Where s = standard deviation of residuals

More precise formula accounts for:

  • Distance of x from xˉ\bar{x} (farther = wider interval)
  • Sample size

Example 3: Prediction Interval

Predict weight for height = 70:

y^\hat{y} = 158, s = 10, n = 25

95% prediction interval (rough):

158±2.069(10)=158±20.69=(137.31,178.69)158 \pm 2.069(10) = 158 \pm 20.69 = (137.31, 178.69)

Interpretation: "We predict an individual with height 70 inches will weigh between 137 and 179 pounds with 95% confidence."

Standard Error of Slope

Formula:

SEb=s(xxˉ)2SE_b = \frac{s}{\sqrt{\sum(x - \bar{x})^2}}

Where s = standard deviation of residuals

Factors making SE_b smaller:

  1. Smaller s (points closer to line)
  2. Larger sample size n
  3. More spread in x-values

Smaller SE_b → narrower CI → more precise estimate

Checking Conditions

Linearity:

  • Scatterplot roughly linear
  • Residual plot shows no curve

Independence:

  • Random sample
  • No time trends
  • Each observation independent

Normality:

  • Histogram of residuals roughly normal
  • Normal probability plot roughly linear
  • Less critical for large n (CLT)

Equal Variance:

  • Residual plot shows constant spread
  • No fan shape

What if Conditions Not Met?

Nonlinear: Transform variables or use nonlinear methods

Not normal (small n): Be cautious with inference

Not equal variance: Consider transformation or weighted regression

Not independent: Use time series or other methods

Don't ignore violations! Inference may be invalid

Prediction vs Confidence Interval

Confidence Interval for Mean Response:

  • "Average y for all individuals with x = x₀"
  • Narrower
  • Use: Policy decisions, understanding average effect

Prediction Interval for Individual:

  • "Single y value for one individual with x = x₀"
  • Wider (includes individual variability)
  • Use: Predicting specific outcome

Always wider: Prediction interval > confidence interval

Multiple Regression Preview

So far: One explanatory variable

Multiple regression: Several explanatory variables

y^=a+b1x1+b2x2+...+bkxk\hat{y} = a + b_1x_1 + b_2x_2 + ... + b_kx_k

Can test each slope: Does this variable help predict y (controlling for others)?

Beyond AP Stats but important to know exists

Common Mistakes

❌ Not checking LINE conditions
❌ Using normal instead of t-distribution
❌ Confusing prediction and confidence intervals
❌ Using df = n instead of n - 2
❌ Making inference when conditions violated

Practical Significance

Statistical significance (P < 0.05) doesn't mean practical importance

Example: Slope = 0.01, P = 0.001

  • Statistically significant
  • But is 0.01 change per unit practically meaningful?

Consider:

  • Effect size (magnitude of slope)
  • Context
  • Practical implications

Quick Reference

Test for slope: t=bSEbt = \frac{b}{SE_b}, df = n - 2

CI for slope: b±tSEbb \pm t^* SE_b

Conditions: LINE (Linear, Independent, Normal, Equal variance)

Prediction interval: Wider than confidence interval

0 in CI for slope? → No significant relationship

Remember: Check LINE conditions before inference! Inference lets us extend conclusions beyond our sample to the broader population, but only if conditions are met.

📚 Practice Problems

No example problems available yet.