Transformations to Achieve Linearity

Why Transform?

Problem: Many relationships are nonlinear

Solution: Transform one or both variables to make relationship linear

Benefits:

Can use linear regression tools
Easier interpretation
Better predictions

When to Transform

Indicators need transformation:

Scatterplot shows curve (not line)
Residual plot shows pattern (not random)
Low r² despite clear relationship

Don't transform if:

Relationship already linear
Residual plot looks good

Common Transformations

For y:

log(y): Exponential growth/decay
√y: Moderate curve
1/y: Inverse relationship

For x:

log(x): Logarithmic curve
x²: Quadratic relationship
√x: Moderate curve

Both:

log(y) vs log(x): Power relationship

Exponential Model

Original relationship: $y = ab^x$

Curved scatterplot, exponential growth/decay

Transform: Take log of y

Becomes linear: $\log(y) = \log(a) + x\log(b)$

Regression: log(y) on x gives linear relationship

Example: Population growth, compound interest, radioactive decay

Example 1: Exponential Transformation

Bacteria population over time:

Original data shows exponential growth (curved)

Transform: Calculate log(population) for each time

New scatterplot: log(population) vs time is linear!

Regression: $\log(\hat{y}) = 2 + 0.3x$

Back-transform for predictions:

$\hat{y} = 10^{2 + 0.3x}$

Power Model

Original relationship: $y = ax^p$

Curved relationship

Transform: Take log of both

Becomes linear: $\log(y) = \log(a) + p\log(x)$

Regression: log(y) on log(x) gives linear relationship

Example: Area vs radius, metabolic rate vs body mass

Example 2: Power Transformation

Planet orbital period vs distance from sun:

Both variables on logarithmic scale → linear!

Regression: $\log(\text{period}) = a + b\log(\text{distance})$

Slope b ≈ 1.5 (Kepler's third law: $p \propto d^{1.5}$ )

Square Root and Squaring

√y transformation:

Moderate upward curve
Spread-increasing pattern

x² transformation:

Quadratic relationship (parabola)
But limited to one side

Example: Free-fall distance (d) vs time (t)

$d = \frac{1}{2}gt^2$ suggests regress d on t²

Choosing the Right Transformation

Trial and error approach:

Try transformation
Make scatterplot of transformed data
Check residual plot
Check r²
If not linear, try different transformation

Guided approach:

Exponential pattern → log(y)
Power relationship → log-log
Quadratic → x²
Fan shape in residuals → log(y)

Interpreting Transformed Models

Log(y) on x:

Slope interpretation: "For each unit increase in x, y is multiplied by $10^b$ "

Example: Slope = 0.301 in log(population) vs time

"Each year, population multiplies by $10^{0.301} \approx 2$ "

(Population doubles each year)

Log(y) on log(x):

Slope interpretation: "A 1% increase in x is associated with approximately b% increase in y"

Back-Transformation

After fitting model on transformed data:

Make predictions on transformed scale, then back-transform

Example: Model is $\log(\hat{y}) = 2 + 0.3x$

For x = 10:

$\log(\hat{y}) = 2 + 0.3(10) = 5$

$\hat{y} = 10^5 = 100,000$

Don't just transform predictions after the fact!

Checking the Transformation

Good transformation produces:

Linear scatterplot
Random residual plot
Higher r²
Roughly constant spread

Compare before/after:

Original r² vs transformed r²
Original residual plot vs transformed residual plot

Multiple Transformations

Sometimes try several:

Example: Comparing transformations for curved data

log(y) vs x: r² = 0.85
√y vs x: r² = 0.92
y vs x²: r² = 0.78

Choose: √y vs x (highest r², simplest)

Common Patterns and Transformations

| Pattern | Try | |---------|-----| | Exponential growth/decay | log(y) | | Power relationship | log(y) and log(x) | | Quadratic (parabola) | x² | | Moderate upward curve | √y or √x | | Spread increases with y | log(y) |

Residual Plot After Transformation

Must check! Transformation successful if:

No pattern in residuals
Random scatter around 0
Constant spread

If still see pattern: Try different transformation

Linearizable vs Non-linearizable

Linearizable: Can be made linear with transformation

Exponential: y = ab^x
Power: y = ax^p
Quadratic: y = a + bx + cx²

Non-linearizable: Cannot be easily linearized

Some periodic functions
Complex curves
May need nonlinear regression

Common Mistakes

❌ Not checking residual plot after transformation
❌ Back-transforming incorrectly
❌ Transforming when already linear
❌ Misinterpreting slope of transformed model
❌ Comparing r² before and after (different y variable!)

Practical Considerations

Pros of transformation:

Use simple linear methods
Often theoretically motivated
Can improve predictions

Cons of transformation:

Harder to interpret
Must back-transform for predictions
Not all relationships linearizable

Alternative: Modern nonlinear regression (beyond AP Stats)

Example 3: Complete Transformation

Original: y vs x is curved (r² = 0.40, residuals show pattern)

Transform: Use log(y)

New: log(y) vs x is linear (r² = 0.95, random residuals)

Equation: $\log(\hat{y}) = 1.5 + 0.2x$

Interpretation: "Each unit increase in x multiplies y by $10^{0.2} \approx 1.58$ "

For prediction at x = 10:

$\log(\hat{y}) = 1.5 + 0.2(10) = 3.5$

$\hat{y} = 10^{3.5} \approx 3162$

Quick Reference

Exponential (y = ab^x): Use log(y) vs x

Power (y = ax^p): Use log(y) vs log(x)

Quadratic: Use y vs x²

Goal: Linear scatterplot, random residuals, high r²

Check: Always examine residual plot of transformed data

Interpret carefully: Slopes mean different things after transformation

Remember: Transform to fix nonlinearity, but always check if transformation worked! Linear models are powerful when applied to appropriately transformed data.

Transformations for Linearity