🎯⭐ INTERACTIVE LESSON

Hypothesis Testing Framework

Learn step-by-step with interactive practice!

← Back to Standard Lesson

Hypothesis Testing Framework - Complete Interactive Lesson

Part 1: Null & Alternative Hypotheses

📐 Null & Alternative Hypotheses

Part 1 of 7 — Setting Up the Test

What Is Hypothesis Testing?

Hypothesis testing is a formal procedure for using sample data to decide between two competing claims about a population parameter.

Component	Symbol	Description
Null hypothesis	$H_0$	No effect / no difference — the status quo
Alternative hypothesis	$H_a$	There IS an effect / difference — the research claim

🔑 Key Idea: We assume $H_0$ is true and look for evidence against it. We NEVER prove $H_0$ true — we either reject it or fail to reject it.

Writing Hypotheses

Hypotheses are always about population parameters ( $\mu$ , $p$ ), never about sample statistics ( $\bar{x}$ , $\hat{p}$ ).

For means:

Type	$H_0$	$H_a$	When to Use
Two-tailed	$H_{}$

For proportions:

Type	$H_0$	$H_a$
Two-tailed	$H_{0} : p =_{}$

⚠️ Important: $H_0$ always contains the equals sign ( $=$ ). The alternative contains $\neq$ , $>$ , or .

Worked Example

Claim: "Students at this school score higher than the national average of 75."

Parameter: $\mu$ = true mean score of students at this school
$H_0: \mu = 75$ (no difference from national average)
$H_a: \mu > 75$ (school average is higher)

This is a right-tailed test because the claim is "higher than."

Significance Level ( $\alpha$ )

Before testing, we choose a significance level $\alpha$ (usually 0.05):

$\alpha$	Meaning
0.05	Reject $H_0$ if the evidence would occur less than 5% of the time under $H_0$

🔑 AP Tip: Unless told otherwise, assume $\alpha = 0.05$ on the AP exam.

Hypothesis Setup 🎯

Hypothesis Identification 🧮

For each claim, identify the null value ( $\mu_0$ ):

1) Claim: $\mu > 75$ . $H_0: \mu =$ ?

Test Direction 🔍

Exit Quiz — Null & Alternative Hypotheses ✅

Part 2: Test Statistics

📊 Test Statistics

Part 2 of 7 — Measuring the Evidence

What Is a Test Statistic?

A test statistic measures how far the sample result falls from the null hypothesis value, expressed in standard-error units.

$t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}}$

Part 3: P-Values

🔢 P-Values

Part 3 of 7 — How Surprising Is the Evidence?

What Is a P-Value?

The P-value is the probability of obtaining a test statistic as extreme as (or more extreme than) the one observed, assuming $H_0$ is true.

🔑 Plain English: "If nothing special is happening ( $H_0$ is true), how likely is it that we'd see data this extreme just by chance?"

Part 4: Type I & Type II Errors

📈 Type I & Type II Errors

Part 4 of 7 — Making the Wrong Decision

Decision Table

Every hypothesis test has four possible outcomes:

	$H_0$ is actually true	$H_0$ is actually

Part 5: One-Sample t-Test

🧮 One-Sample t-Test

Part 5 of 7 — The Complete Procedure

When to Use a One-Sample t-Test

Use a one-sample $t$ -test when:

You have one quantitative variable
You want to test a claim about the population mean $\mu$
The population standard deviation $\sigma$ is unknown (use $s$ instead)

Conditions (CHECK EVERY TIME)

Condition	What to Check

Part 6: Problem-Solving Workshop

🛠️ Problem-Solving Workshop

Part 6 of 7 — Putting It All Together

Worked Example 1: Cereal Box Weights

A cereal company advertises 16 oz boxes. A consumer group suspects the boxes are underfilled. They weigh a random sample of 40 boxes and find $\bar{x} = 15.7$ oz, $s = 0.8$ oz. Test at $\alpha = 0.05$ .

Part 7: Review & Applications

🏆 Review & Applications

Part 7 of 7 — Complete Reference Guide

Formula Reference

Formula	Expression	Purpose
Standard Error	$SE = \dfrac{s}{\sqrt{n}}$

H_{0} : μ = μ_{0}

2) Claim: $\mu < 50$ . $H_0: \mu =$ ?

3) Claim: $\mu \neq 100$ . $H_0: \mu =$ ?

Symbol	Meaning
$\bar{x}$	Sample mean
$\mu_0$	Null hypothesis value
$s$	Sample standard deviation
$n$	Sample size
$s / \sqrt{n}$	Standard error of $\bar{x}$

🔑 Interpretation: $t$ tells you how many standard errors the sample mean is from the null value. Larger $|t|$ → stronger evidence against $H_0$ .

The standard error measures the typical distance between $\bar{x}$ and $\mu$ due to sampling variability:

$SE = \frac{s}{\sqrt{n}}$

Factor	Effect on SE
Larger $s$ (more variability)	SE increases
Larger $n$ (bigger sample)	SE decreases

🔑 Key Insight: Quadrupling the sample size halves the standard error (because $\sqrt{4} = 2$ ).

For a one-sample $t$ -test: $df = n - 1$

The degrees of freedom determine which $t$ -distribution to use for finding the P-value. More degrees of freedom → the $t$ -distribution looks more like a normal distribution.

A school claims its average SAT math score is 500. A random sample of 36 students gives $\bar{x} = 520$ and $s = 60$ .

Step 1 — Standard Error:

$SE = \frac{s}{\sqrt{n}} = \frac{60}{\sqrt{36}} = \frac{60}{6} = 10$

Step 2 — Test Statistic:

$t = \frac{\bar{x} - \mu_0}{SE} = \frac{520 - 500}{10} = \frac{20}{10} = 2.0$

Step 3 — Degrees of Freedom:

$df = n - 1 = 36 - 1 = 35$

Interpretation: The sample mean is 2.0 standard errors above the null value. This is moderate-to-strong evidence against $H_0$ .

How Large Is "Large Enough"?

| $|t|$ Value | Rough Guide | |:-----------:|-------------| | $< 1$ | Weak evidence against $H_0$ | | $1$ to $2$ | Moderate evidence | | $> 2$ | Strong evidence | | $> 3$ | Very strong evidence |

⚠️ Caution: These are rough guidelines. Always compute the P-value for a precise conclusion.

Test Statistic Concepts 🎯

Computing Test Statistics 🧮

1) $s = 14$ , $n = 49$ . What is the standard error?

2) $\bar{x} = 82$ , $\mu_0 = 75$ , $SE = 2$ . What is the $t$ -statistic?

3) $n = 26$ . What are the degrees of freedom?

Interpreting Test Statistics 🔍

Exit Quiz — Test Statistics ✅

Comparison	Decision	Conclusion
$P < \alpha$	Reject $H_0$	Result is statistically significant
$P \geq \alpha$	Fail to reject $H_0$	Result is NOT statistically significant

⚠️ Never say "accept $H_0$ ." We either reject or fail to reject.

Interpreting P-Values

P-value Range	Strength of Evidence Against $H_0$
$P > 0.10$	Weak or no evidence
$0.05 < P \leq 0.10$	Moderate evidence
$0.01 < P \leq 0.05$	Strong evidence
$P \leq 0.01$	Very strong evidence

One-Tailed vs Two-Tailed P-Values

Test Type	P-value Calculation
Right-tailed ( $H_a: \mu > \mu_0$ )	$P = P(t \geq t_{obs})$
Left-tailed ( $H_a: \mu < \mu_0$ )	$P = P(t \leq t_{obs})$
Two-tailed ( $H_a: \mu \neq \mu_0$ )	$P = 2 \cdot P(t \geq

🔑 Two-tailed tests double the one-tail probability because evidence in either direction counts.

$\bar{x} = 520$ , $\mu_0 = 500$ , $SE = 10$ , $\alpha = 0.05$ , right-tailed test.

Step 1 — Test statistic: $t = \frac{520 - 500}{10} = 2.0$

Step 2 — P-value (using calculator): $P = \text{tcdf}(2.0, 10^{99}, 35) \approx 0.027$

Step 3 — Decision: $0.027 < 0.05 \Rightarrow \text{Reject } H_0$

Step 4 — Conclusion in context: "There is convincing evidence ( $t = 2.0$ , $P = 0.027$ ) that the true mean SAT math score at this school is greater than 500."

Writing AP Conclusions

Always include four elements:

Decision — Reject or fail to reject $H_0$
Evidence — Cite $t$ -statistic and P-value
Context — Refer to the specific problem
Direction — "Greater than," "less than," or "different from"

🔑 AP Tip: "Fail to reject" does NOT mean the null is proven true — only that we lack sufficient evidence.

P-Value Concepts 🎯

P-Value Decisions 🧮

1) The most common significance level $\alpha$ is:

2) $P = 0.03$ vs $P = 0.08$ : which is more significant? (enter the P-value)

3) $P = 0.12$ , $\alpha = 0.05$ . Do we reject? (enter "yes" or "no")

P-Value Interpretation 🔍

Exit Quiz — P-Values ✅

Type I Error (False Positive)

Definition: Rejecting $H_0$ when it is actually true.

Probability = $\alpha$ (the significance level)
You conclude there IS an effect when there really isn't one

Real-world example: A medical test says the patient has a disease, but they are actually healthy.

🔑 Key Connection: Choosing $\alpha = 0.05$ means you accept a 5% chance of a Type I error.

Type II Error (False Negative)

Definition: Failing to reject $H_0$ when it is actually false.

Probability = $\beta$
You conclude there is NO effect when there really IS one

Real-world example: A medical test says the patient is healthy, but they actually have the disease.

Definition: The probability of correctly rejecting a false $H_0$ .

$\text{Power} = 1 - \beta$

Factor	Effect on Power
Increase $n$ (sample size)	Power increases
Increase $\alpha$	Power increases (but more Type I risk)
Larger true effect size	Power increases
Decrease variability ( $s$ )	Power increases

🔑 AP Tip: Power is typically considered adequate when it is at least 0.80 (80%).

The $\alpha$ – $\beta$ Tradeoff

Action	Type I Risk ( $\alpha$ )	Type II Risk ( $\beta$ )	Power ( $1-\beta$ )
Lower $\alpha$ (e.g., 0.01)	Decreases ✅	Increases ❌	Decreases ❌
Raise $\alpha$ (e.g., 0.10)	Increases ❌	Decreases ✅	Increases ✅
Increase $n$	No change	Decreases ✅	Increases ✅

🔑 The only way to reduce BOTH errors is to increase the sample size.

A jury trial: $H_0$ : The defendant is innocent.

Outcome	Error Type	Consequence
Convict an innocent person	Type I	Wrongful conviction
Acquit a guilty person	Type II	Criminal goes free

The justice system sets a very low $\alpha$ ("beyond reasonable doubt") because Type I errors have severe consequences.

Error Identification 🎯

Error Probabilities 🧮

1) Rejecting a true $H_0$ is a Type ___ error.

2) If $\alpha = 0.05$ , the probability of a Type I error is:

3) If $\beta = 0.20$ , the power is:

Error Concepts 🔍

Exit Quiz — Type I & Type II Errors ✅

⚠️ AP Tip: You MUST state and verify all three conditions to earn full credit on the free response.

The Four-Step Process

Step 1 — STATE:

Define the parameter: "Let $\mu$ = the true mean ..."
Write hypotheses: $H_0: \mu = \mu_0$ vs $H_a$

Step 2 — PLAN:

Name the test: "One-sample $t$ -test"
Check all three conditions

$t = \frac{\bar{x} - \mu_0}{s/\sqrt{n}}, \quad df = n - 1$

Find P-value using $t$ -distribution with $df = n-1$

Step 4 — CONCLUDE:

Compare P-value to $\alpha$
State conclusion in context

A manufacturer claims their batteries last 500 hours. A random sample of 25 batteries gives $\bar{x} = 490$ , $s = 20$ . Test at $\alpha = 0.05$ .

$\mu$ = true mean battery life (hours)
$H_0: \mu = 500$ , $H_a: \mu < 500$ (left-tailed — suspect batteries last LESS)

One-sample $t$ -test
✅ Random: stated "random sample"
✅ Independent: 25 < 10% of all batteries produced
✅ Normal: $n = 25$ (borderline); assume no strong skew mentioned

DO: $SE = \frac{s}{\sqrt{n}} = \frac{20}{\sqrt{25}} = \frac{20}{5} = 4$ $t = \frac{490 - 500}{4} = \frac{-10}{4} = -2.5$ $df = 25 - 1 = 24$ $P = \text{tcdf}(-10^{99}, -2.5, 24) \approx 0.0098$

CONCLUDE: Since $P = 0.0098 < \alpha = 0.05$ , we reject $H_0$ . There is convincing evidence that the true mean battery life is less than 500 hours.

Feature	z-Test	t-Test
$\sigma$ known?	Yes	No (use $s$ )
Distribution	Standard normal	$t$ with $df = n-1$
AP Exam usage	Rare (proportions only)	Very common (means)

t-Test Concepts 🎯

Computing a t-Test 🧮

$n = 36$ , $\bar{x} = 52$ , $s = 6$ , $\mu_0 = 50$ :

1) $SE = s/\sqrt{n} =$ ?

2) $t = (\bar{x} - \mu_0)/SE =$ ?

3) $df =$ ?

t-Test Procedure 🔍

Exit Quiz — One-Sample t-Test ✅

$\mu$ = true mean weight of cereal boxes (oz)
$H_0: \mu = 16$ vs $H_a: \mu < 16$ (left-tailed — suspect underfilling)

One-sample $t$ -test
✅ Random: "random sample" stated
✅ Independent: 40 boxes $< 10\%$ of all boxes produced
✅ Normal/Large: $n = 40 \geq 30$ → CLT applies

DO: $SE = \frac{0.8}{\sqrt{40}} = \frac{0.8}{6.325} \approx 0.1265$

$t = \frac{15.7 - 16}{0.1265} = \frac{-0.3}{0.1265} \approx -2.372$

$P = \text{tcdf}(-10^{99}, -2.372, 39) \approx 0.0114$

CONCLUDE: Since $P = 0.0114 < \alpha = 0.05$ , we reject $H_0$ . There is convincing evidence that the true mean weight of cereal boxes is less than 16 oz. The consumer group's suspicion is supported.

Worked Example 2: Study Hours

A college claims students study an average of 15 hours per week. A professor surveys a random sample of 50 students and finds $\bar{x} = 13.2$ hours, $s = 5.1$ hours. Test at $\alpha = 0.05$ whether the true mean differs from 15.

$\mu$ = true mean weekly study hours for students at this college
$H_0: \mu = 15$ vs $H_a: \mu \neq 15$ (two-tailed — "differs from")

One-sample $t$ -test
✅ Random: "random sample" stated
✅ Independent: $50 < 10\%$ of all college students
✅ Normal/Large: $n = 50 \geq 30$ → CLT applies

DO: $SE = \frac{5.1}{\sqrt{50}} = \frac{5.1}{7.071} \approx 0.7212$

$t = \frac{13.2 - 15}{0.7212} = \frac{-1.8}{0.7212} \approx -2.495$

$P = 2 \times \text{tcdf}(-10^{99}, -2.495, 49) \approx 2 \times 0.0080 = 0.0160$

CONCLUDE: Since $P = 0.016 < \alpha = 0.05$ , we reject $H_0$ . There is convincing evidence that the true mean weekly study hours for students at this college differs from 15 hours.

Common AP Mistakes to Avoid

Mistake	Correction
Not stating hypotheses in terms of $\mu$	Always use population parameters
Skipping conditions	Must check all three explicitly
Saying "accept $H_0$ "	Say "fail to reject $H_0$ "
Conclusion without context	"There is (not) convincing evidence that [real-world statement]"
Using $\bar{x}$ or $\hat{p}$ in hypotheses	Always use $\mu$ or $p$
Forgetting to double P for two-tailed	Two-tailed: $P = 2 \times \text{one-tail area}$

Workshop Practice 🎯

Workshop Calculations 🧮

$n = 25$ , $\bar{x} = 84$ , $s = 10$ , $\mu_0 = 80$ :

1) $SE = s/\sqrt{n} =$ ?

2) $t = (\bar{x} - \mu_0)/SE =$ ?

3) $df =$ ?

AP Process Steps 🔍

Exit Quiz — Problem-Solving Workshop ✅

Hypothesis Test Decision Guide

Question	Answer
"Is it greater than?"	Right-tailed: $H_a: \mu > \mu_0$
"Is it less than?"	Left-tailed: $H_a: \mu < \mu_0$
"Is it different from?"	Two-tailed: $H_a: \mu \neq \mu_0$

$\text{If } P < \alpha \Rightarrow \text{Reject } H_0 \qquad \text{If } P \geq \alpha \Rightarrow \text{Fail to reject } H_0$

	$H_0$ True	$H_0$ False
Reject $H_0$	Type I ( $\alpha$ )	✅ Correct (Power = $1-\beta$ )
Fail to reject $H_0$	✅ Correct	Type II ( $\beta$ )

Conditions Checklist

Condition	Check
Random	SRS or randomized experiment
Independent	$n < 10\%$ of population
Normal/Large	$n \geq 30$ (CLT) or data approximately normal

AP Four-Step Process

STATE — Define parameter; write $H_0$ and $H_a$
PLAN — Name the test; check all three conditions
DO — Compute SE, $t$ , $df$ , P-value
CONCLUDE — Decision + evidence + context

To Increase Power	Do This
Increase $n$	More data → smaller SE → easier to detect effects
Increase $\alpha$	More willing to reject (but more Type I risk)
Larger effect size	Bigger $
Decrease $s$	Less variability → more precise estimates

Common Mistakes on the AP Exam

Mistake	Why It's Wrong
"Accept $H_0$ "	We can only "fail to reject" — never prove $H_0$
Using $\bar{x}$ in hypotheses	Hypotheses use $\mu$ (parameter), not $\bar{x}$ (statistic)
No context in conclusion	Must relate conclusion to the specific problem
Forgetting to check conditions	All three required for full credit
Confusing statistical and practical significance	Small P doesn't mean the effect matters in practice

Comprehensive Review 🎯

Quick Calculations 🧮

1) $n = 30$ . $df =$ ?

2) $s = 12$ , $n = 36$ . $SE =$ ?

3) $\bar{x} = 48$ , $\mu_0 = 50$ , $SE = 2$ . ?

Key Concepts 🔍

Exit Quiz — Hypothesis Testing Review ✅

Reject $H_0$	❌ Type I Error ( $\alpha$ )	✅ Correct Decision (Power)
Fail to reject $H_0$	✅ Correct Decision	❌ Type II Error ( $\beta$ )

Random	Data from a random sample or randomized experiment	Stated in problem
Independence	$n < 10\%$ of the population (10% condition)	$N \geq 10n$
Normal/Large Sample	Population is approximately normal OR $n \geq 30$	Check dotplot/histogram for skew; if $n \geq 30$ , CLT applies

Hypothesis Testing Framework

Standard Error (SE)

Degrees of Freedom

Worked Example

How Large Is "Large Enough"?

Decision Rule

Interpreting P-Values

One-Tailed vs Two-Tailed P-Values

Worked Example

Writing AP Conclusions

Type I Error (False Positive)

Type II Error (False Negative)

Power

The $\alpha$ – $\beta$ Tradeoff

Worked Example

The Four-Step Process

Worked Example

t-Test vs z-Test

Worked Example 2: Study Hours

Common AP Mistakes to Avoid

Hypothesis Test Decision Guide

Decision Rule

Error Summary

Conditions Checklist

AP Four-Step Process

Power Factors

Common Mistakes on the AP Exam

Hypothesis Testing Framework

Hypothesis Testing Framework - Complete Interactive Lesson

Part 1: Null & Alternative Hypotheses

📐 Null & Alternative Hypotheses

What Is Hypothesis Testing?

Writing Hypotheses

Worked Example

Significance Level (α\alphaα)

Part 2: Test Statistics

📊 Test Statistics

What Is a Test Statistic?

Part 3: P-Values

🔢 P-Values

What Is a P-Value?

Part 4: Type I & Type II Errors

📈 Type I & Type II Errors

Decision Table

Part 5: One-Sample t-Test

🧮 One-Sample t-Test

When to Use a One-Sample t-Test

Conditions (CHECK EVERY TIME)

Part 6: Problem-Solving Workshop

🛠️ Problem-Solving Workshop

Worked Example 1: Cereal Box Weights

Part 7: Review & Applications

🏆 Review & Applications

Formula Reference

Standard Error (SE)

Degrees of Freedom

Worked Example

How Large Is "Large Enough"?

Decision Rule

Interpreting P-Values

One-Tailed vs Two-Tailed P-Values

Worked Example

Writing AP Conclusions

Type I Error (False Positive)

Type II Error (False Negative)

Power

The α\alphaα–β\betaβ Tradeoff

Worked Example

The Four-Step Process

Worked Example

t-Test vs z-Test

Worked Example 2: Study Hours

Common AP Mistakes to Avoid

Hypothesis Test Decision Guide

Decision Rule

Error Summary

Conditions Checklist

AP Four-Step Process

Power Factors

Common Mistakes on the AP Exam

Significance Level ( $\alpha$ )

The $\alpha$ – $\beta$ Tradeoff