🎯⭐ INTERACTIVE LESSON

Inference for Regression

Learn step-by-step with interactive practice!

← Back to Standard Lesson

Inference for Regression - Complete Interactive Lesson

Part 1: Regression Model Assumptions

📐 Inference for Linear Regression

Part 1 of 7 — Regression Model Assumptions

The Population Regression Model

$y = \beta_0 + \beta_1 x + \epsilon$

where $\epsilon \sim N(0, \sigma)$ — errors are normally distributed with constant spread.

Symbol	Meaning
$\beta_0$	Population $y$ -intercept
$\beta_1$

Conditions for Inference (LINE)

Condition	Check
Linear	Scatterplot and residual plot show no pattern
Independent	Observations are independent ( $n < 10\%$ of population)
Normal	Residuals are approximately normal (histogram or Q-Q plot)
Equal variance	Residual plot shows constant spread (no fanning)

🔑 The residual plot is the most important diagnostic tool. Look for random scatter around zero.

Regression Assumptions 🎯

Part 2: T-Test for Slope

📊 T-Test for Slope

Part 2 of 7 — Is There a Linear Relationship?

Topics in This Part

Section
🎯 Hypotheses for the Slope
📐 The $t$ -Statistic for $b$
✅ Conditions for Inference
📝 Worked Example

🔑 Key Concept: The $t$ -test for slope tests whether the true population slope $\beta$ is zero (no linear relationship) or nonzero.

Part 3: Confidence Interval for Slope

📊 Confidence Interval for Slope

Part 3 of 7 — Estimating the True Slope

Topics in This Part

Section
📐 CI Formula for $\beta$
📝 Interpreting the CI
🔗 Connection to the $t$ -Test
🧮 Worked Example

🔑 Key Concept: A confidence interval for $\beta$ gives a range of plausible values for the true population slope.

The Formula

Part 4: Computer Output Interpretation

📊 Computer Output Interpretation

Part 4 of 7 — Reading Regression Output Like a Pro

Topics in This Part

Section
📋 Standard Regression Table Layout
🔍 Identifying $b$ , $\text{SE}_b$ , $t$ , and

Part 5: Prediction Intervals

📊 Prediction Intervals

Part 5 of 7 — Predicting Individual Values vs. Mean Responses

Topics in This Part

Section
🎯 Confidence Interval for Mean Response
📐 Prediction Interval for Individual Response
🔍 Why Prediction Intervals Are Wider
⚠️ Limitations of Predictions

🔑 Key Concept: A prediction interval for a single future observation is always wider than a confidence interval for the mean response at the same $x$ -value.

Two Types of Intervals at a Given $x^*$

1. Confidence Interval for Mean Response

Part 6: Problem-Solving Workshop

📊 Problem-Solving Workshop

Part 6 of 7 — Full Inference for Regression Problems

Workshop Goals

Skill
📝 State hypotheses for slope tests
✅ Check LINE conditions
📐 Compute $t$ -statistics from output
📊 Build CIs for $\beta$ from output
🎯 Write AP-quality conclusions

🔑 AP Tip: Inference for regression is one of the most commonly tested topics on the AP exam. Master the 4-step process: hypotheses → conditions → mechanics → conclusion.

Worked Example 1 — Chirps and Temperature

A biology student records cricket chirps per minute ( $x$ ) and outdoor temperature (, °F) for 15 observations.

Part 7: Review & Applications

📊 Review & Applications

Part 7 of 7 — Comprehensive Inference for Regression Review

Complete Formula Reference

Concept	Formula
Population model	$y = \alpha + \beta x + \varepsilon$ , $\varepsilon \sim N(0, \sigma)$

The Linear Regression Model

The population model is:

$y = \alpha + \beta x + \varepsilon$

where $\varepsilon \sim N(0, \sigma)$ (errors are independent and Normally distributed with constant variance).

$\beta$ = true population slope
$b$ = sample slope (our estimate of $\beta$ )
$\text{SE}_b$ = standard error of the slope

$H_0: \beta = 0 \quad \text{(no linear relationship)}$ $H_a: \beta \neq 0 \quad \text{(or } \beta > 0 \text{ or } \beta < 0\text{)}$

⚠️ AP Tip: Most AP problems use the two-sided alternative $\beta \neq 0$ . One-sided tests are less common but do appear.

$\boxed{t = \frac{b - 0}{\text{SE}_b} = \frac{b}{\text{SE}_b}}$

with $\text{df} = n - 2$ (two parameters estimated: $a$ and $b$ ).

Letter	Condition	How to Check
L	Linear relationship	Scatterplot and residual plot show no curve
I	Independent observations	Random sample or $n < 10\%$ of population
N	Normal errors	Residual plot approximately symmetric, no strong skew; histogram/QQ plot of residuals
E	Equal variance	Residual plot shows constant spread (no fan shape)

A researcher studies 20 pine trees. $x$ = diameter (inches), $y$ = height (feet).

Predictor	Coef	SE Coef	T	P
Constant	$24.0$	$5.1$	$4.71$	$< 0.001$
Diameter	$2.35$	$0.42$	$5.60$	$< 0.001$

Step 1 — Hypotheses: $H_0: \beta = 0$ (no linear relationship between diameter and height) $H_a: \beta \neq 0$ (there is a linear relationship)

Step 2 — Conditions:

L: Residual plot shows random scatter ✓
I: Trees randomly selected; $20 < 10\%$ of all pine trees ✓
N: Histogram of residuals approximately Normal ✓
E: No fan shape in residual plot ✓

Step 3 — Test statistic: $t = b / \text{SE}_b = 2.35 / 0.42 = 5.60$ , df $= 20 - 2 = 18$

Step 4 — P-value: $P < 0.001$ (from the table or computer output)

Step 5 — Conclusion: "Since $P < 0.001 < \alpha = 0.05$ , we reject $H_0$ . There is convincing evidence of a linear relationship between tree diameter and tree height."

🔑 AP Tip: Always state "convincing evidence" (not "proof") and reference the context.

T-Test for Slope Concepts 🎯

Calculating the $t$ -Statistic 🧮

1) $b = 3.6$ , $\text{SE}_b = 1.2$ . What is $t$ ?

2) $n = 30$ data points. What are the degrees of freedom?

3) $b = -0.45$ , $\text{SE}_b = 0.15$ . What is $|t|$ ?

Conditions and Conclusions 🔍

Exit Quiz — $t$ -Test for Slope ✅

b \pm t^{*} \cdot SE_{b}

$b$ = sample slope
$t^*$ = critical value from a $t$ -distribution with df $= n - 2$
$\text{SE}_b$ = standard error of the slope (from computer output)

Interpretation Template

$\text{"We are [C]\% confident that the true slope of the linear relationship}$ $\text{between [x context] and [y context] is between [lower] and [upper]."}$

Example: 95% CI for $\beta$ : $(1.52, 3.18)$ , $x$ = diameter (in), $y$ = height (ft).

✅ "We are 95% confident that the true increase in height per additional inch of diameter is between 1.52 and 3.18 feet."

Connection to Hypothesis Testing

CI contains 0?	Conclusion at $\alpha = 1 - C$
Yes	Fail to reject $H_0: \beta = 0$
No	Reject $H_0: \beta = 0$

If the confidence interval does not contain 0, there is evidence of a linear relationship.

Same LINE conditions as the $t$ -test:

Linear relationship
Independent observations
Normal residuals
Equal variance of residuals

Study: $n = 15$ students. $x$ = hours using phone/day, $y$ = GPA.

Computer output: $b = -0.12$ , $\text{SE}_b = 0.04$

Build a 95% CI: df $= 15 - 2 = 13$ , so $t^* = 2.160$ .

$-0.12 \pm 2.160(0.04) = -0.12 \pm 0.0864$

$(-0.2064, -0.0336)$

Interpretation: "We are 95% confident that the true slope of the relationship between daily phone use and GPA is between $-0.206$ and $-0.034$ . For each additional hour of daily phone use, GPA is predicted to decrease by between 0.034 and 0.206 points."

Connection to test: Since 0 is NOT in the interval, we would reject $H_0: \beta = 0$ at $\alpha = 0.05$ .

⚠️ AP Tip: You can read $b$ and $\text{SE}_b$ directly from computer output. The AP formula sheet provides the CI formula.

CI for Slope Concepts 🎯

Building a CI 🧮

$b = 4.5$ , $\text{SE}_b = 1.5$ , $t^* = 2.101$ (95%, df $= 18$ )

1) Margin of error $= t^* \times \text{SE}_b =$

2) Lower bound of the CI $=$

3) Upper bound of the CI $=$

Interpretation Practice 🔍

Exit Quiz — CI for Slope ✅

🔑 Key Concept: The AP exam always provides computer output. You must know where to find each number and what it means.

Standard Computer Output Table

Predictor	Coef	SE Coef	T	P
Constant	$a$	$\text{SE}_a$	$t_a$	$P_a$
$x$ -variable	$b$	$\text{SE}_b$	$t_b$

Below the table: $S = s_e \quad R\text{-}sq = r^2 \quad R\text{-}sq(\text{adj}) = r^2_{\text{adj}}$

What Each Value Means

Symbol	Location	Meaning
Coef (Constant row)	$a$	$y$ -intercept of LSRL
Coef ( $x$ row)	$b$	Slope of LSRL
SE Coef ( $x$ row)	$\text{SE}_b$	Standard error of the slope
T ( $x$ row)	$t$	$t$ -statistic $= b / \text{SE}_b$
P ( $x$ row)	$P$	$P$ -value for $H_0: \beta = 0$ (two-sided)
S	$s_e$	Standard deviation of residuals (typical prediction error)
R-sq	$r^2$	Proportion of variability explained

⚠️ Important: The $P$ -value in the table tests $H_0: \beta = 0$ vs. $H_a: \beta \neq 0$ (two-sided). For a one-sided test, divide by 2.

Worked Example — Reading Output

Predictor	Coef	SE Coef	T	P
Constant	$15.8$	$3.2$	$4.94$	$0.000$
StudyHours	$2.45$	$0.38$	$6.45$	$0.000$

$S = 4.12 \quad R\text{-}sq = 76.3\% \quad R\text{-}sq(\text{adj}) = 74.8\%$

From this output:

LSRL: $\hat{y} = 15.8 + 2.45x$
Slope: For each additional study hour, predicted score increases by 2.45 points
$t = 2.45 / 0.38 = 6.45$ ✓ (matches output)
$P < 0.001$ → reject $H_0: \beta = 0$ (strong evidence of a linear relationship)
$R^2 = 76.3\%$ → 76.3% of variability in scores is explained by study hours
$S = 4.12$ → typical prediction error is about 4.12 points

Building a CI from Output

Using the same output with $n = 22$ :

df $= 22 - 2 = 20$ , $t^* = 2.086$ (95%)
CI: $2.45 \pm 2.086(0.38) = 2.45 \pm 0.793 = (1.657, 3.243)$

🔑 AP Tip: Verify: the $P$ -value in the table is for the two-sided test. The CI and test should agree — if 0 is not in the CI, the $P$ -value should be $< \alpha$ .

Mistake	Correction
Using the SE Coef from the Constant row for the slope test	Use the SE Coef from the $x$ -variable row
Confusing $S$ with SE Coef	$S$ = residual SD; SE Coef = SD of the slope estimate
Not checking if the $P$ -value is one- or two-sided	Default output is two-sided; halve it for a one-sided test

Reading Output 🎯

Extracting Values from Output 🧮

Predictor	Coef	SE Coef	T	P
Constant	$8.2$	$2.1$	$3.90$	$0.001$
Rainfall	$1.75$	$0.25$	$?$	$0.000$

$S = 2.80 \quad R\text{-}sq = 83.0\%$ , $n = 20$

1) What is the $t$ -statistic for the slope?

2) What is the LSRL equation? (Write the slope value only)

3) Degrees of freedom $=$

Output Interpretation 🔍

Exit Quiz — Computer Output ✅

Estimates the average $y$ -value for all individuals with $x = x^*$
Variability comes only from estimating the line (uncertainty in $a$ and $b$ )

2. Prediction Interval for Individual Response $y_{\text{new}}$

Predicts a single new $y$ -value when $x = x^*$
Variability comes from estimating the line AND the natural scatter of individuals around the line

Formulas (Conceptual)

Both center on $\hat{y} = a + bx^*$ , but the standard errors differ:

$\text{CI for mean: } \hat{y} \pm t^* \cdot \text{SE}_{\hat{\mu}}$

$\text{PI for individual: } \hat{y} \pm t^* \cdot \text{SE}_{\text{pred}}$

where $\text{SE}_{\text{pred}} > \text{SE}_{\hat{\mu}}$ because:

$\text{SE}_{\text{pred}}^2 = \text{SE}_{\hat{\mu}}^2 + S^2$

The extra $S^2$ accounts for the individual-to-individual scatter.

⚠️ AP Note: The formulas for these SEs are not on the AP formula sheet. You should understand the concept — why prediction intervals are wider — but you will not be asked to compute them by hand on the AP exam.

$\underbrace{\text{Narrow band}}_{\text{CI for mean}} \quad \subset \quad \underbrace{\text{Wide band}}_{\text{Prediction interval}}$

Both intervals are narrowest near $\bar{x}$ and widen as $x^*$ moves away from $\bar{x}$ .

Why Wider at Extreme $x$ ?

The further $x^*$ is from $\bar{x}$ :

More uncertainty in where the true line is → wider CI for mean
Same extra scatter for individuals → prediction interval grows similarly
At the extremes of the data, both intervals are widest

This is related to the concept of extrapolation — predicting outside the data range is unreliable because both intervals become very wide.

Feature	CI for Mean Response	Prediction Interval
Estimates	$\mu_{y	x^*}$ (average)
Width	Narrower	Wider
Extra source of variability	No	Yes ( $S^2$ )
Narrowest at	$\bar{x}$	$\bar{x}$
As $n \to \infty$	Shrinks to 0	Shrinks to $\pm z^* \cdot S$

🔑 Key Insight: Even with infinite data, a prediction interval never shrinks to zero width because individual variability ( $S$ ) always remains.

Prediction Interval Concepts 🎯

Conceptual Calculations 🧮

$\hat{y} = 50$ at $x^* = 10$ . The CI for the mean is $(47, 53)$ .

1) What is the margin of error of the CI for the mean?

2) Would the prediction interval at $x^* = 10$ be narrower or wider? (narrower/wider)

3) If $x^* = \bar{x}$ , the intervals are at their ___ width. (narrowest/widest)

Applying the Concepts 🔍

Exit Quiz — Prediction Intervals ✅

Computer output:

Predictor	Coef	SE Coef	T	P
Constant	$25.2$	$10.3$	$2.45$	$0.030$
Chirps	$3.29$	$0.57$	$5.77$	$< 0.001$

$S = 3.83 \quad R\text{-}sq = 71.9\%$

Step 1 — Hypotheses: $H_0: \beta = 0$ (no linear relationship between chirp rate and temperature) $H_a: \beta \neq 0$ (there is a linear relationship)

Step 2 — Conditions (LINE):

L: Scatterplot shows a linear pattern; residual plot shows random scatter ✓
I: Observations taken on different days; 15 < 10% of all possible days ✓
N: Histogram of residuals is approximately Normal ✓
E: Residual plot shows constant spread ✓

Step 3 — Mechanics: $t = b/\text{SE}_b = 3.29/0.57 = 5.77$ , df $= 15 - 2 = 13$ $P < 0.001$

Step 4 — Conclusion: "Since $P < 0.001 < \alpha = 0.05$ , we reject $H_0$ . There is convincing evidence of a linear relationship between cricket chirps per minute and outdoor temperature."

95% CI for slope: df $= 13$ , $t^* = 2.160$ $3.29 \pm 2.160(0.57) = 3.29 \pm 1.231 = (2.059, 4.521)$ "We are 95% confident that for each additional chirp per minute, the true increase in temperature is between 2.06°F and 4.52°F."

Worked Example 2 — Fertilizer and Yield

An agronomist tests 25 plots. $x$ = fertilizer (kg/hectare), $y$ = crop yield (tons/hectare).

Predictor	Coef	SE Coef	T	P
Constant	$2.1$	$0.8$	$2.63$	$0.015$
Fertilizer	$0.035$	$0.019$	$1.84$	$0.078$

$S = 0.45 \quad R\text{-}sq = 12.8\%$

Analysis at $\alpha = 0.05$ :

$t = 0.035/0.019 = 1.84$ , df $= 23$
$P = 0.078 > 0.05$
Conclusion: "We fail to reject $H_0$ . There is not convincing evidence of a linear relationship between fertilizer amount and crop yield."
$R^2 = 12.8\%$ — fertilizer explains very little of the variability in yield

95% CI: $t^* = 2.069$ (df $= 23$ ) $0.035 \pm 2.069(0.019) = 0.035 \pm 0.039 = (-0.004, 0.074)$ The interval contains 0, consistent with failing to reject $H_0$ .

Mistake	Fix
Writing $H_0: b = 0$	Use $\beta$ (population slope), not $b$ (sample)
Skipping conditions	Must check all four LINE conditions
"We accept $H_0$ "	Say "fail to reject $H_0$ "
Using Constant SE to test slope	Use the SE Coef from the $x$ -variable row
No context in conclusion	Name the variables — not just "reject $H_0$ "

Inference Workshop Practice 🎯

Practice from Output 🧮

Predictor	Coef	SE Coef	T	P
Constant	$12.0$	$4.0$	$3.00$	$0.007$
Altitude	$-0.006$	$0.002$	$?$	$?$

$n = 32$

1) What is $t$ for the slope?

2) What is df?

3) 95% CI margin of error if $t^* = 2.042$ : $t^* \times \text{SE}_b =$

Decision Making 🔍

Exit Quiz — Inference Workshop ✅

LINE Conditions Summary

Condition	Check With	Look For
Linear	Scatterplot & residual plot	No curves
Independent	Study design	Random sample; $n < 10\%$ of population
Normal	Histogram/QQ of residuals	Approximate symmetry
Equal variance	Residual plot	Constant spread (no fan)

Interpretation Templates (AP Exam Ready)

$t$ -Test Conclusion (Reject): "Since $P = [\text{value}] < \alpha = [\text{level}]$ , we reject $H_0$ . There is convincing evidence of a linear relationship between [x in context] and [y in context]."

$t$ -Test Conclusion (Fail to Reject): "Since $P = [\text{value}] > \alpha = [\text{level}]$ , we fail to reject $H_0$ . There is not convincing evidence of a linear relationship between [x in context] and [y in context]."

CI for Slope: "We are [C]% confident that the true slope is between [lower] and [upper]. For each additional [unit of x], [y in context] changes by between [lower] and [upper] [units of y]."

Key Concept Connections

Topic	Connection
$t$ -test and CI	Both use $b$ , $\text{SE}_b$ , df $= n - 2$ , and LINE conditions
CI contains 0 ↔ test result	CI contains 0 = fail to reject; CI excludes 0 = reject
One-sided vs. two-sided	Two-sided $P$ from output; halve for one-sided (same direction as $b$ )
$r$ , $r^2$ , and $t$	Testing $\beta = 0$ is equivalent to testing $\rho = 0$ ; same and
Prediction intervals	Wider than CI for mean because of individual scatter
Computer output	All needed values ( $b$ , SE, $t$ , $P$ , $S$ , $R^2$ ) come from the output table

$\text{Read computer output} \to \text{Identify } b, \text{SE}_b, t, P, S, R^2$ $\downarrow$ $\text{State } H_0: \beta = 0 \text{ and } H_a$ $\downarrow$ $\text{Check LINE conditions}$ $\downarrow$ $\text{Report } t = b/\text{SE}_b, \text{df} = n-2, P\text{-value}$ $\downarrow$ $\text{Compare } P \text{ to } \alpha \to \text{Reject or Fail to Reject}$ $\downarrow$ $\text{State conclusion in context}$

🔑 AP Exam Strategy: Inference for regression appears on the AP exam nearly every year, often as a full free-response question. The 4-step process is your blueprint for full credit.

Comprehensive Review 🎯

Mixed Review Calculations 🧮

Predictor	Coef	SE Coef	T	P
Constant	$50.0$	$8.0$	$6.25$	$0.000$
Hours	$-2.5$	$0.5$	$?$	$?$

$n = 27$ , $R\text{-}sq = 48.0\%$

1) $t$ -statistic for the slope $=$

2) df $=$

3) 95% CI lower bound if $t^* = 2.060$ : $b - t^* \cdot \text{SE}_b =$

Concept Connections 🔍

Final Exam — Inference for Regression ✅

Inference for Regression

The Linear Regression Model

Hypotheses

The Test Statistic

Conditions (LINE)

Worked Example

Interpretation Template

Connection to Hypothesis Testing

Conditions

Worked Example

Standard Computer Output Table

What Each Value Means

Worked Example — Reading Output

Building a CI from Output

Common Mistakes

Formulas (Conceptual)

Visual Intuition

Why Wider at Extreme $x$ ?

Summary Comparison

Worked Example 2 — Fertilizer and Yield

Common AP Mistakes

LINE Conditions Summary

Interpretation Templates (AP Exam Ready)

Key Concept Connections

Decision Flowchart

Inference for Regression

Inference for Regression - Complete Interactive Lesson

Part 1: Regression Model Assumptions

📐 Inference for Linear Regression

The Population Regression Model

Conditions for Inference (LINE)

Part 2: T-Test for Slope

📊 T-Test for Slope

Topics in This Part

Part 3: Confidence Interval for Slope

📊 Confidence Interval for Slope

Topics in This Part

The Formula

Part 4: Computer Output Interpretation

📊 Computer Output Interpretation

Topics in This Part

Part 5: Prediction Intervals

📊 Prediction Intervals

Topics in This Part

Two Types of Intervals at a Given x∗x^*x∗

Part 6: Problem-Solving Workshop

📊 Problem-Solving Workshop

Workshop Goals

Worked Example 1 — Chirps and Temperature

Part 7: Review & Applications

📊 Review & Applications

Complete Formula Reference

The Linear Regression Model

Hypotheses

The Test Statistic

Conditions (LINE)

Worked Example

Interpretation Template

Connection to Hypothesis Testing

Conditions

Worked Example

Standard Computer Output Table

What Each Value Means

Worked Example — Reading Output

Building a CI from Output

Common Mistakes

Formulas (Conceptual)

Visual Intuition

Why Wider at Extreme xxx?

Summary Comparison

Worked Example 2 — Fertilizer and Yield

Common AP Mistakes

LINE Conditions Summary

Interpretation Templates (AP Exam Ready)

Key Concept Connections

Decision Flowchart

Two Types of Intervals at a Given $x^*$

Why Wider at Extreme $x$ ?