Hypothesis Testing Framework
Null and alternative hypotheses, significance level
Hypothesis Testing Framework
What is Hypothesis Testing?
Hypothesis Test: Formal procedure to decide between two competing claims about a population parameter
Two hypotheses:
- Null hypothesis (H₀): Status quo, no effect, no difference
- Alternative hypothesis (Hₐ or H₁): What we're trying to show
Goal: Determine if data provides sufficient evidence to reject H₀ in favor of Hₐ
Setting Up Hypotheses
H₀: Always includes equality (=, ≤, ≥)
Hₐ: Can be:
- Two-sided: μ ≠ μ₀ (different from)
- Right-sided: μ > μ₀ (greater than)
- Left-sided: μ < μ₀ (less than)
Examples:
Claim: Mean height > 68 inches
- H₀: μ = 68 or μ ≤ 68
- Hₐ: μ > 68
Claim: Proportion ≠ 0.5
- H₀: p = 0.5
- Hₐ: p ≠ 0.5
The Four-Step Process
Step 1: STATE
- Parameter of interest
- Hypotheses (H₀ and Hₐ)
- Significance level α
Step 2: PLAN
- Choose appropriate test
- Check conditions
Step 3: DO
- Calculate test statistic
- Find P-value
Step 4: CONCLUDE
- Compare P-value to α
- State conclusion in context
Test Statistic
General form:
For means (t-test):
For proportions (z-test):
Measures: How many standard errors the statistic is from hypothesized parameter
P-Value
P-value: Probability of getting results as extreme or more extreme than observed, assuming H₀ is true
Interpretation:
- Small P-value → data inconsistent with H₀ → evidence against H₀
- Large P-value → data consistent with H₀ → insufficient evidence against H₀
Finding P-value:
- Two-sided: P(|test statistic| ≥ observed)
- Right-sided: P(test statistic ≥ observed)
- Left-sided: P(test statistic ≤ observed)
Significance Level (α)
α: Threshold for rejecting H₀
Common values: 0.05, 0.01, 0.10
Decision rule:
- If P-value ≤ α → Reject H₀
- If P-value > α → Fail to reject H₀
Note: "Fail to reject" ≠ "accept" H₀ (lack of evidence against ≠ evidence for)
Example: Complete Test
Claim: Mean score exceeds 75. Sample: n = 30, = 78, s = 10
STATE:
- Parameter: μ = true mean score
- H₀: μ = 75
- Hₐ: μ > 75
- α = 0.05
PLAN:
- One-sample t-test
- Conditions: Random ✓, n = 30 ≥ 30 ✓, n < 10%N ✓
DO:
df = 29, P-value ≈ 0.056 (from tcdf)
CONCLUDE: P-value = 0.056 > 0.05, fail to reject H₀. Insufficient evidence that mean exceeds 75.
One-Sided vs Two-Sided Tests
Two-sided: Looking for any difference
- Hₐ: μ ≠ μ₀
- P-value = 2 × P(|t| ≥ observed)
One-sided: Looking for specific direction
- Hₐ: μ > μ₀ or μ < μ₀
- P-value = P(t ≥ observed) or P(t ≤ observed)
Choose before seeing data! One-sided only if direction specified in advance
Statistical Significance
Statistically significant: P-value ≤ α
Interpretation: Result unlikely to occur by chance alone if H₀ true
NOT the same as practically significant!
- Can have statistically significant but tiny effect
- Large sample can detect trivial differences
Relationship to Confidence Intervals
For two-sided test at α = 0.05:
Equivalent to checking if (1-α) CI contains H₀ value
- If μ₀ in 95% CI → P-value > 0.05
- If μ₀ not in 95% CI → P-value ≤ 0.05
CI gives more information: Range of plausible values, not just yes/no
Common Misconceptions
❌ "P-value is probability H₀ is true"
- No! It's P(data | H₀), not P(H₀ | data)
❌ "Fail to reject H₀ means H₀ is true"
- No! Just insufficient evidence against it
❌ "Significant means important"
- No! Statistically significant ≠ practically important
❌ "P-value is probability of error"
- No! That's α (if we reject H₀)
Writing Conclusions
✓ Good: "We have sufficient evidence to conclude the mean exceeds 75."
✓ Good: "There is insufficient evidence that the proportion differs from 0.5."
✗ Bad: "We prove the mean is 75."
✗ Bad: "We accept H₀."
✗ Bad: "The probability H₀ is true is 0.056."
Quick Reference
Hypotheses:
- H₀: includes =
- Hₐ: what we're testing for
Test statistic: (statistic - parameter) / SE
P-value: P(as extreme | H₀ true)
Decision:
- P ≤ α: Reject H₀
- P > α: Fail to reject H₀
Remember: Hypothesis testing is about evidence, not proof. Small P-value = strong evidence against H₀, but never proves Hₐ!
📚 Practice Problems
1Problem 1easy
❓ Question:
A manufacturer claims their batteries last an average of 500 hours. You suspect they last less than claimed. Set up appropriate hypotheses to test this claim.
💡 Show Solution
Step 1: Identify the claim Manufacturer claims: μ = 500 hours
Step 2: Identify what we suspect We suspect: μ < 500 hours (Batteries last LESS than claimed)
Step 3: Set up null hypothesis H₀ H₀: μ = 500
The null hypothesis:
- Assumes the claim is true
- "Status quo" or "no effect"
- Equality statement
- What we're testing against
Step 4: Set up alternative hypothesis Hₐ Hₐ: μ < 500
The alternative hypothesis:
- What we suspect/want to show
- Research hypothesis
- What we have evidence for
- Inequality statement
Step 5: Determine test type This is a ONE-TAILED (left-tailed) test
Why?
- Hₐ: μ < 500 (less than)
- Only interested in one direction
- Looking for evidence batteries last LESS
- Not testing if they last MORE
Step 6: Why this setup? Burden of proof on us:
- Manufacturer claims 500 hours
- We must provide evidence against claim
- Start assuming claim is true (H₀)
- Collect data to see if claim is unreasonable
Step 7: Connection to significance Will collect sample data:
- Calculate x̄ and s
- If x̄ is MUCH less than 500
- Evidence against H₀
- Might reject H₀
If x̄ is close to 500:
- Insufficient evidence against H₀
- Fail to reject H₀
- Can't conclude batteries last less
Answer: H₀: μ = 500 hours (null hypothesis: claim is true) Hₐ: μ < 500 hours (alternative: batteries last less than claimed)
This is a one-tailed (left-tailed) test because we're only testing if the mean is less than 500, not different from 500.
2Problem 2easy
❓ Question:
Explain the difference between null and alternative hypotheses. Why do we set them up this way?
💡 Show Solution
Step 1: Null Hypothesis (H₀) Definition: Statement of "no effect" or "no difference"
- Assumes status quo
- Contains equality (=, ≤, ≥)
- What we test against
- Presumed true until evidence says otherwise
Examples:
- μ = 50 (parameter equals specific value)
- μ₁ = μ₂ (two means are equal)
- p = 0.5 (proportion equals 0.5)
Step 2: Alternative Hypothesis (Hₐ or H₁) Definition: Statement of what we want to show
- Research hypothesis
- What we suspect is true
- Contains inequality (<, >, ≠)
- Needs evidence to support
Examples:
- μ < 50 (one-tailed)
- μ > 50 (one-tailed)
- μ ≠ 50 (two-tailed)
- μ₁ > μ₂ (one group higher)
Step 3: Why this setup? (Legal analogy) Like a trial:
H₀ = "Defendant is innocent"
- Presumed true (innocent until proven guilty)
- Status quo
Hₐ = "Defendant is guilty"
- What prosecutor wants to show
- Needs strong evidence
We don't prove innocence! We either:
- Find enough evidence for guilty (reject H₀)
- Don't find enough evidence (fail to reject H₀)
Step 4: Types of alternative hypotheses
TWO-TAILED (≠): Hₐ: μ ≠ 50
- Parameter is different (either direction)
- Don't know which way
- Testing for ANY difference
ONE-TAILED, RIGHT (>): Hₐ: μ > 50
- Parameter is greater
- Specific direction
- Only interested in increase
ONE-TAILED, LEFT (<): Hₐ: μ < 50
- Parameter is less
- Specific direction
- Only interested in decrease
Step 5: How they work together Must be:
- Complementary (cover all possibilities)
- Mutually exclusive (can't both be true)
Examples: H₀: μ = 50 and Hₐ: μ ≠ 50 ✓ H₀: μ ≥ 50 and Hₐ: μ < 50 ✓ H₀: μ ≤ 50 and Hₐ: μ > 50 ✓
Step 6: Burden of proof Null hypothesis:
- Assumed true
- Skeptical position
- "Nothing is happening"
Alternative hypothesis:
- Must provide evidence
- Burden of proof on us
- Need convincing data
Step 7: Decision framework After collecting data:
If evidence is strong (p-value small): → Reject H₀ → Support Hₐ → "Significant" result
If evidence is weak (p-value large): → Fail to reject H₀ → Don't support Hₐ → "Not significant"
Step 8: Why can't we "accept" H₀? We NEVER "accept" or "prove" H₀
Why?
- Absence of evidence ≠ evidence of absence
- Maybe we just didn't have enough data
- Maybe our sample wasn't sensitive enough
- Just means: insufficient evidence against H₀
Say "fail to reject H₀" not "accept H₀"
Answer: NULL HYPOTHESIS (H₀): Statement of no effect or no difference, assumed true, contains equality. Represents status quo.
ALTERNATIVE HYPOTHESIS (Hₐ): What we want to show, needs evidence, contains inequality. Represents research question.
We set them up this way to put burden of proof on the researcher - must provide convincing evidence to overturn the assumed status quo. Like "innocent until proven guilty" in law.
3Problem 3medium
❓ Question:
A company claims 40% of customers prefer their product. You survey 200 customers and find 68 prefer it. Test at α = 0.05 level if the true proportion differs from 40%.
💡 Show Solution
Step 1: Set up hypotheses H₀: p = 0.40 (claim is true) Hₐ: p ≠ 0.40 (proportion differs)
This is TWO-TAILED (≠)
Step 2: Check conditions n = 200, p₀ = 0.40
Random: Assume random survey ✓ Normal: np₀ = 200(0.40) = 80 ≥ 10 ✓ n(1-p₀) = 200(0.60) = 120 ≥ 10 ✓ Independent: 200 ≤ 0.10N (assume) ✓
Step 3: Calculate sample proportion p̂ = 68/200 = 0.34
Step 4: Calculate test statistic z = (p̂ - p₀)/√(p₀(1-p₀)/n) = (0.34 - 0.40)/√(0.40(0.60)/200) = -0.06/√(0.24/200) = -0.06/√0.0012 = -0.06/0.0346 ≈ -1.73
Step 5: Find p-value (two-tailed) From z-table: P(Z < -1.73) ≈ 0.0418
Two-tailed p-value: p-value = 2 × 0.0418 = 0.0836
Step 6: Compare to α p-value = 0.0836 α = 0.05
Is 0.0836 < 0.05? NO
Step 7: Make decision Since p-value > α: FAIL TO REJECT H₀
Step 8: State conclusion At the α = 0.05 significance level, there is insufficient evidence to conclude that the true proportion differs from 40%.
The observed 34% could reasonably occur by chance if the true proportion is 40%.
Step 9: Interpret p-value p-value = 0.0836 means:
If true proportion really is 40%, there's an 8.36% chance of getting a sample proportion as extreme as 34% (or more extreme) just by random chance.
Since this is > 5%, not unusual enough to reject claim.
Answer: Test statistic: z = -1.73 P-value: 0.084 Decision: Fail to reject H₀ at α = 0.05 Conclusion: Insufficient evidence that proportion differs from 40%
4Problem 4medium
❓ Question:
What is a p-value? Interpret a p-value of 0.032 in the context of testing H₀: μ = 100 vs Hₐ: μ > 100.
💡 Show Solution
Step 1: Define p-value P-value: Probability of getting results as extreme as (or more extreme than) what we observed, ASSUMING H₀ IS TRUE.
In symbols: p-value = P(getting our data or more extreme | H₀ is true)
Step 2: What "extreme" means Depends on Hₐ:
For Hₐ: μ > 100 (right-tailed): "Extreme" = values ≥ observed
For Hₐ: μ < 100 (left-tailed): "Extreme" = values ≤ observed
For Hₐ: μ ≠ 100 (two-tailed): "Extreme" = values in both tails
Step 3: Interpret p-value = 0.032 Context: H₀: μ = 100, Hₐ: μ > 100
Interpretation: "If the true mean really is 100, there is a 3.2% chance of getting a sample mean as large as (or larger than) what we observed, just by random sampling variability."
Step 4: What this tells us p = 0.032 = 3.2% is fairly small
Means:
- Our result is somewhat unusual under H₀
- Would rarely happen if H₀ true
- Evidence against H₀
- Sample mean is higher than expected
Step 5: Making a decision Compare to significance level α
If α = 0.05: p = 0.032 < 0.05 → REJECT H₀ → Statistically significant → Evidence that μ > 100
If α = 0.01: p = 0.032 > 0.01 → FAIL TO REJECT H₀ → Not significant at 0.01 level → Insufficient evidence
Step 6: Common misconceptions P-value is NOT: ✗ Probability that H₀ is true ✗ Probability that Hₐ is true ✗ Probability results are due to chance ✗ Probability of making an error
P-value IS: ✓ Probability of data given H₀ ✓ How surprising data is under H₀ ✓ Measure of evidence against H₀
Step 7: The logic Small p-value (like 0.032): → Data unlikely if H₀ true → Either: a) H₀ is true and we got unlucky, OR b) H₀ is false → More reasonable to conclude H₀ is false → Reject H₀
Large p-value (like 0.50): → Data common if H₀ true → Consistent with H₀ → No reason to doubt H₀ → Fail to reject H₀
Step 8: Strength of evidence P-value scale (rough guideline):
p > 0.10: Little/no evidence against H₀ p = 0.05 to 0.10: Weak evidence against H₀ p = 0.01 to 0.05: Moderate evidence against H₀ p = 0.001 to 0.01: Strong evidence against H₀ p < 0.001: Very strong evidence against H₀
Our p = 0.032: Moderate evidence against H₀
Step 9: Full interpretation for our problem p-value = 0.032
"Assuming the true mean is 100, there is only a 3.2% probability of obtaining a sample mean as large as (or larger than) what we observed. Since this probability is small (less than our significance level of 0.05), we have sufficient evidence to reject the null hypothesis and conclude that the true mean is greater than 100."
Answer: A p-value is the probability of getting results as extreme as what we observed, assuming H₀ is true.
P-value = 0.032 means: If μ really equals 100, there's only a 3.2% chance of getting a sample mean as large as (or larger than) ours. This is fairly unlikely, providing moderate evidence against H₀. At α = 0.05, we would reject H₀ and conclude μ > 100.
5Problem 5hard
❓ Question:
A researcher finds p = 0.048 when testing H₀: μ₁ = μ₂ vs Hₐ: μ₁ ≠ μ₂. At α = 0.05, what decision is made? What if α = 0.01? Explain the relationship between p-value, α, and the decision.
💡 Show Solution
Step 1: The decision rule General rule:
- If p-value < α → REJECT H₀
- If p-value ≥ α → FAIL TO REJECT H₀
The significance level α is our cutoff!
Step 2: Decision at α = 0.05 p-value = 0.048 α = 0.05
Is 0.048 < 0.05? YES
Decision: REJECT H₀
Conclusion: At the 0.05 significance level, there IS sufficient evidence that the means differ (μ₁ ≠ μ₂).
Step 3: Decision at α = 0.01 p-value = 0.048 α = 0.01
Is 0.048 < 0.01? NO
Decision: FAIL TO REJECT H₀
Conclusion: At the 0.01 significance level, there is NOT sufficient evidence that the means differ.
Step 4: Why different decisions? α = significance level = "how much evidence we require"
α = 0.05 (5%):
- Willing to accept more risk
- Less stringent standard
- Easier to reject H₀
α = 0.01 (1%):
- Want stronger evidence
- More stringent standard
- Harder to reject H₀
Our p = 0.048 (4.8%):
- Strong enough for 5% standard ✓
- Not strong enough for 1% standard ✗
Step 5: Understanding α α represents:
- Maximum acceptable error rate
- How rare results must be to reject H₀
- Probability of Type I error (rejecting true H₀)
Common values:
- α = 0.05 (most common)
- α = 0.01 (more conservative)
- α = 0.10 (less conservative)
Step 6: The relationship Think of α as a threshold:
p-value = strength of evidence against H₀ α = required strength to reject H₀
If p-value < α: Evidence is strong enough → reject H₀
If p-value ≥ α: Evidence not strong enough → fail to reject H₀
Step 7: Borderline case p = 0.048 is borderline!
- Just barely significant at 0.05
- Not significant at 0.01
Shows importance of:
- Choosing α BEFORE seeing data
- Not treating 0.05 as magic cutoff
- Reporting actual p-value
Better to report: "p = 0.048" than just "significant" Lets reader judge strength of evidence
Step 8: Multiple comparisons Same data, different standards:
At α = 0.10: 0.048 < 0.10 → Reject H₀ ✓ At α = 0.05: 0.048 < 0.05 → Reject H₀ ✓ At α = 0.01: 0.048 > 0.01 → Fail to reject ✗
This doesn't mean results are contradictory! Just means: evidence moderate, not overwhelming
Step 9: Practical interpretation p = 0.048 means:
- About 4.8% chance of this data if H₀ true
- Moderate evidence against H₀
- Results fairly unlikely under H₀
- Probably a real difference, but not certain
Should we be confident?
- Depends on context
- Depends on consequences of error
- Consider practical significance too
Step 10: Fixed vs reported p-value CORRECT approach:
- Choose α before collecting data
- Collect data
- Calculate p-value
- Compare to α
- Make decision
INCORRECT approach:
- Collect data
- Calculate p-value
- Choose α to get desired result This is p-hacking!
Answer: AT α = 0.05: REJECT H₀ (p = 0.048 < 0.05) Sufficient evidence that means differ.
AT α = 0.01: FAIL TO REJECT H₀ (p = 0.048 > 0.01) Insufficient evidence at this stricter standard.
RELATIONSHIP: α is the threshold for decision. If p-value < α, evidence is strong enough to reject H₀. The same data can lead to different decisions depending on how stringent our evidence requirement (α) is. Always choose α before seeing data!
Practice with Flashcards
Review key concepts with our flashcard system
Browse All Topics
Explore other calculus topics