Biostatistics for the MCAT - Complete Interactive Lesson
Part 1: Descriptive Statistics & Data Distributions
Biostatistics Fundamentals
Part 1 of 4 — Descriptive Statistics & Data Distributions
Types of Data
| Type | Definition | Examples |
|---|---|---|
| Continuous | Can take any value in a range | Height, weight, temperature, time |
| Discrete | Can only take specific values | Number of cells, number of mutations |
| Nominal | Categorical, no order | Blood type (A, B, AB, O) |
| Ordinal | Categorical, with order | Stage of cancer (I, II, III, IV) |
Measures of Central Tendency
| Measure | Definition | When to Use |
|---|---|---|
| Mean | Sum of values ÷ count | Normal distribution; sensitive to outliers |
| Median | Middle value | Skewed data; resistant to outliers |
| Mode | Most frequent value | Categorical data |
Example: A drug trial shows patient recovery times: 5, 6, 7, 8, 100 days.
- Mean = 25.2 days (affected by outlier)
- Median = 7 days (better representation)
Measures of Spread
| Measure | Formula | Interpretation |
|---|---|---|
| Range | Max − Min | Spread across all data |
| Variance |
68-95-99.7 Rule (Normal Distribution):
- 68% of data within 1 SD of mean
- 95% within 2 SD
- 99.7% within 3 SD
Descriptive Statistics 🎯
Key Takeaways — Part 1
- Central Tendency: Use median for skewed data; mean for symmetric distributions
- Spread: SD is most useful on MCAT; interpret via 68-95-99.7 rule
- Outliers: Robust stats (median, IQR) better than mean ± SD when outliers present
- Log scales: Many biomedical values are log-normally distributed (viral loads, enzyme concentrations—use log-transform)
Worked Examples — Descriptive Statistics
<details> <summary><b>Example 1: Choose mean vs median with an outlier</b></summary>Data: 4, 5, 5, 6, 40
- Mean = 60/5 = 12.
- Median = 5.
- Outlier (40) inflates the mean.
Best central tendency: median.
</details> <details> <summary><b>Example 2: Use the 68-95-99.7 rule</b></summary>Mean = 70, SD = 5. Estimate the range containing about 95% of values.
- 95% is roughly mean ± 2 SD.
- 70 ± 10 gives 60 to 80.
Approximate 95% interval: 60 to 80.
</details> <details> <summary><b>Example 3: Interpret standard deviation practically</b></summary>Two test forms have the same mean score (80). Form A has SD 3; Form B has SD 12.
- Same mean means same average performance.
- Lower SD means scores cluster more tightly.
- Higher SD means performance is more variable.
Conclusion: Form A is more consistent across students.
</details>Part 2: Hypothesis Testing & p-values
Biostatistics Fundamentals
Part 2 of 4 — Hypothesis Testing & p-values
Hypothesis Types
| Hypothesis | Definition | Example |
|---|---|---|
| Null (H₀) | No effect or difference | The drug has no effect on blood pressure |
| Alternative (H₁) | There is an effect | The drug lowers blood pressure |
One-tailed vs Two-tailed:
- One-tailed: Predicts direction (Drug lowers BP) → p-value not split
- Two-tailed: No direction (Drug changes BP) → p-value split between tails
Type I & II Errors
| Error | What Happens | Probability |
|---|---|---|
| Type I | Reject H₀ when it's true (False positive) | (significance level) |
Part 3: Confidence Intervals & Effect Size
Biostatistics Fundamentals
Part 3 of 4 — Confidence Intervals & Effect Size
Confidence Intervals (CI)
A CI gives a range where the true parameter likely lies (unlike a single p-value).
95% CI = Sample mean ± 1.96 × SE
(where SE = SD / √n)
Interpretation: "We are 95% confident the true population mean falls within this range."
| CI Width | What it means |
|---|---|
| Narrow CI | More precise estimate (good sample size) |
| Wide CI | Less precise estimate (small sample size) |
| CI doesn't cross 0 | Statistically significant difference |
| CI crosses 0 | Not statistically significant |
Example: Study finds mean blood pressure reduction of 10 mmHg (95% CI: 5–15 mmHg).
- Interpretation: Likely true reduction is between 5–15 mmHg
- Since CI doesn't include 0, the effect is significant
Effect Size
Effect size quantifies magnitude of difference (independent of sample size).
| Measure | What it shows | Range |
|---|---|---|
| Cohen's d |
Part 4: Correlation, Causation & Study Design
Biostatistics Fundamentals
Part 4 of 4 — Correlation vs Causation & Study Design Implications
Correlation Coefficient (r)
Measures strength and direction of linear relationship between two variables.
r = -1 → Perfect negative correlation
r = 0 → No correlation
r = +1 → Perfect positive correlation
| r Value | Interpretation |
|---|---|
| ±0.0–0.3 | Weak correlation |
| ±0.3–0.7 | Moderate correlation |
| ±0.7–1.0 | Strong correlation |
Critical: r close to ±1 does NOT prove causation!
Correlation ≠ Causation
Three mechanisms for correlation:
- Causation: X → Y (aspirin → reduced heart attack risk)
- Reverse Causation: Y → X (depression ← poor health status)
- Confounding Variable: Z → both X and Y (smoking → both yellow teeth AND lung cancer)
Example: Ice cream sales correlate with drowning deaths.
- Confounder: Summer heat drives both (neither causes the other)
Study Design & Confounders
| Design | Controls Confounders? |
|---|