Transformations for Linearity
Linearizing nonlinear relationships
Transformations to Achieve Linearity
Why Transform?
Problem: Many relationships are nonlinear
Solution: Transform one or both variables to make relationship linear
Benefits:
- Can use linear regression tools
- Easier interpretation
- Better predictions
When to Transform
Indicators need transformation:
- Scatterplot shows curve (not line)
- Residual plot shows pattern (not random)
- Low r² despite clear relationship
Don't transform if:
- Relationship already linear
- Residual plot looks good
Common Transformations
For y:
- log(y): Exponential growth/decay
- √y: Moderate curve
- 1/y: Inverse relationship
For x:
- log(x): Logarithmic curve
- x²: Quadratic relationship
- √x: Moderate curve
Both:
- log(y) vs log(x): Power relationship
Exponential Model
Original relationship:
Curved scatterplot, exponential growth/decay
Transform: Take log of y
Becomes linear:
Regression: log(y) on x gives linear relationship
Example: Population growth, compound interest, radioactive decay
Example 1: Exponential Transformation
Bacteria population over time:
Original data shows exponential growth (curved)
Transform: Calculate log(population) for each time
New scatterplot: log(population) vs time is linear!
Regression:
Back-transform for predictions:
Power Model
Original relationship:
Curved relationship
Transform: Take log of both
Becomes linear:
Regression: log(y) on log(x) gives linear relationship
Example: Area vs radius, metabolic rate vs body mass
Example 2: Power Transformation
Planet orbital period vs distance from sun:
Both variables on logarithmic scale → linear!
Regression:
Slope b ≈ 1.5 (Kepler's third law: )
Square Root and Squaring
√y transformation:
- Moderate upward curve
- Spread-increasing pattern
x² transformation:
- Quadratic relationship (parabola)
- But limited to one side
Example: Free-fall distance (d) vs time (t)
suggests regress d on t²
Choosing the Right Transformation
Trial and error approach:
- Try transformation
- Make scatterplot of transformed data
- Check residual plot
- Check r²
- If not linear, try different transformation
Guided approach:
- Exponential pattern → log(y)
- Power relationship → log-log
- Quadratic → x²
- Fan shape in residuals → log(y)
Interpreting Transformed Models
Log(y) on x:
Slope interpretation: "For each unit increase in x, y is multiplied by "
Example: Slope = 0.301 in log(population) vs time
"Each year, population multiplies by "
(Population doubles each year)
Log(y) on log(x):
Slope interpretation: "A 1% increase in x is associated with approximately b% increase in y"
Back-Transformation
After fitting model on transformed data:
Make predictions on transformed scale, then back-transform
Example: Model is
For x = 10:
Don't just transform predictions after the fact!
Checking the Transformation
Good transformation produces:
- Linear scatterplot
- Random residual plot
- Higher r²
- Roughly constant spread
Compare before/after:
- Original r² vs transformed r²
- Original residual plot vs transformed residual plot
Multiple Transformations
Sometimes try several:
Example: Comparing transformations for curved data
- log(y) vs x: r² = 0.85
- √y vs x: r² = 0.92
- y vs x²: r² = 0.78
Choose: √y vs x (highest r², simplest)
Common Patterns and Transformations
| Pattern | Try | |---------|-----| | Exponential growth/decay | log(y) | | Power relationship | log(y) and log(x) | | Quadratic (parabola) | x² | | Moderate upward curve | √y or √x | | Spread increases with y | log(y) |
Residual Plot After Transformation
Must check! Transformation successful if:
- No pattern in residuals
- Random scatter around 0
- Constant spread
If still see pattern: Try different transformation
Linearizable vs Non-linearizable
Linearizable: Can be made linear with transformation
- Exponential: y = ab^x
- Power: y = ax^p
- Quadratic: y = a + bx + cx²
Non-linearizable: Cannot be easily linearized
- Some periodic functions
- Complex curves
- May need nonlinear regression
Common Mistakes
❌ Not checking residual plot after transformation
❌ Back-transforming incorrectly
❌ Transforming when already linear
❌ Misinterpreting slope of transformed model
❌ Comparing r² before and after (different y variable!)
Practical Considerations
Pros of transformation:
- Use simple linear methods
- Often theoretically motivated
- Can improve predictions
Cons of transformation:
- Harder to interpret
- Must back-transform for predictions
- Not all relationships linearizable
Alternative: Modern nonlinear regression (beyond AP Stats)
Example 3: Complete Transformation
Original: y vs x is curved (r² = 0.40, residuals show pattern)
Transform: Use log(y)
New: log(y) vs x is linear (r² = 0.95, random residuals)
Equation:
Interpretation: "Each unit increase in x multiplies y by "
For prediction at x = 10:
Quick Reference
Exponential (y = ab^x): Use log(y) vs x
Power (y = ax^p): Use log(y) vs log(x)
Quadratic: Use y vs x²
Goal: Linear scatterplot, random residuals, high r²
Check: Always examine residual plot of transformed data
Interpret carefully: Slopes mean different things after transformation
Remember: Transform to fix nonlinearity, but always check if transformation worked! Linear models are powerful when applied to appropriately transformed data.
📚 Practice Problems
1Problem 1medium
❓ Question:
A scatterplot of x vs y shows a curved exponential pattern. The residual plot for ŷ = a + bx is curved. Try plotting log(y) vs x. What pattern should you see if this transformation works?
💡 Show Solution
Step 1: Understand the original problem
- Scatterplot shows exponential curve (y = ae^(bx))
- Linear model residuals are curved
- Need to linearize the relationship
Step 2: Why try log(y) vs x? Exponential relationship: y = ae^(bx) Take log of both sides: log(y) = log(a) + bx
This is LINEAR in x!
Step 3: What to look for after transformation If log transformation is appropriate: ✓ Scatterplot of log(y) vs x should be LINEAR ✓ Residual plot should show RANDOM scatter ✓ No curved pattern in residuals
Step 4: How to check
- Create new variable: y' = log(y)
- Plot y' vs x (should be linear)
- Fit regression: ŷ' = b₀ + b₁x
- Check residual plot (should be random)
Step 5: Interpretation After transformation:
- Can use linear regression on log(y) vs x
- To predict y: ŷ = e^(b₀ + b₁x)
- Or: ŷ = e^(b₀) × e^(b₁x)
Answer: After log transformation, the plot of log(y) vs x should show a LINEAR pattern, and residuals should be randomly scattered with no curve.
2Problem 2hard
❓ Question:
Data shows a power relationship: y = ax^b. What transformation will linearize this relationship?
💡 Show Solution
Step 1: Identify the relationship Power model: y = ax^b (Example: area = πr², where b = 2)
Step 2: Apply log transformation to BOTH variables Take log of both sides: log(y) = log(a × x^b) log(y) = log(a) + log(x^b) log(y) = log(a) + b·log(x)
Step 3: Recognize linear form Let: Y = log(y), X = log(x), A = log(a) Then: Y = A + bX
This is LINEAR!
Step 4: How to transform
- Create Y = log(y)
- Create X = log(x)
- Plot Y vs X (should be linear)
- Fit regression: Ŷ = b₀ + b₁X
Step 5: Interpret coefficients After regression:
- b₁ = power (exponent b)
- b₀ = log(a), so a = e^(b₀) or a = 10^(b₀)
To predict original y: ŷ = e^(b₀) × x^(b₁) [if using natural log] ŷ = 10^(b₀) × x^(b₁) [if using log base 10]
Example: If Ŷ = 2 + 1.5X (using log base 10) Then y = 10² × x^1.5 = 100x^1.5
Answer: Take log of BOTH variables. Plot log(y) vs log(x), which linearizes power relationships.
3Problem 3hard
❓ Question:
After fitting y vs x, the residual plot fans out (variance increases). You try log(y) vs x and get a better residual plot. Why does this help?
💡 Show Solution
Step 1: Identify the original problem Fan-shaped residuals mean:
- Variance increases with x
- Violates constant variance assumption
- Often occurs when y grows exponentially
Step 2: Why log(y) helps with variance When y is exponential or multiplicative:
- Larger y values have larger variability
- Variance proportional to mean
- log transformation STABILIZES variance
Mathematical reason: If y has variance proportional to y²: Var(y) ∝ y²
Then: Var(log(y)) ≈ constant (Delta method from calculus)
Step 3: Additional benefit Log transformation often: ✓ Linearizes exponential relationships ✓ Stabilizes variance (fixes fan shape) ✓ Makes distribution more symmetric ✓ Reduces impact of outliers
Step 4: When to use log transformation Use log(y) when you see:
- Exponential growth pattern
- Fan-shaped residuals
- Right-skewed distribution
- Multiplicative relationships
- Variance increases with mean
Step 5: Check after transformation After using log(y):
- Residual plot should show equal spread
- No fan shape
- Random scatter
- Valid for inference
Answer: Log transformation stabilizes variance. When variance increases with mean (fan shape), log(y) typically has constant variance, fixing the heteroscedasticity problem.
4Problem 4medium
❓ Question:
You fit log(y) = 2 + 0.5x using natural log. Predict y when x = 10.
💡 Show Solution
Step 1: Understand the model Fitted equation: log(y) = 2 + 0.5x This uses NATURAL LOG (ln)
Step 2: Predict log(y) for x = 10 log(y) = 2 + 0.5(10) log(y) = 2 + 5 log(y) = 7
Step 3: Back-transform to get y Since we used natural log (ln): ln(y) = 7
To solve for y, use exponential: y = e^7
Step 4: Calculate y = e^7 ≈ 1,096.63
Step 5: Interpretation "When x = 10, y is predicted to be approximately 1,097."
Important notes:
- Must back-transform using e^(predicted value)
- If using log₁₀, would use 10^(predicted value)
- Always specify which log was used!
Alternative form: Original model: y = e^(2 + 0.5x) = e² × e^(0.5x) y = e² × e^(0.5x) ≈ 7.39 × e^(0.5x)
When x = 10: y = 7.39 × e^5 ≈ 1,097
Answer: y = e^7 ≈ 1,097
5Problem 5hard
❓ Question:
A residual plot shows both curvature AND fan shape. What transformations might you try?
💡 Show Solution
Step 1: Identify TWO problems
- Curvature → nonlinear relationship
- Fan shape → non-constant variance
Need transformation that fixes BOTH!
Step 2: Try log(y) vs x Often works for:
- Exponential relationships (fixes curve)
- Multiplicative error (fixes fan)
- Right-skewed data
Check result: ✓ Should be linear ✓ Should have constant variance
Step 3: If log(y) doesn't work completely Try other transformations:
- √y vs x (square root)
- 1/y vs x (reciprocal)
- log(y) vs log(x) (both sides)
Step 4: Systematic approach
- Try log(y) vs x first (most common)
- Check residual plot
- If still curved, try log-log or other
- If variance still not constant, try different transformation
Step 5: Decision guide Pattern → Try transformation:
- Exponential curve + fan → log(y) vs x
- Power relationship → log(y) vs log(x)
- Moderate curve → √y vs x
- Strong right skew → log(y)
Step 6: After transformation Must verify: ✓ Scatterplot is linear ✓ Residuals randomly scattered ✓ Constant variance (no fan) ✓ Approximately normal residuals
Answer: Try log(y) vs x first, as it often fixes both curvature (exponential) and fan shape (non-constant variance). Check residual plot; if issues remain, try other transformations like √y or log-log.
Practice with Flashcards
Review key concepts with our flashcard system
Browse All Topics
Explore other calculus topics