Bias in Sampling and Surveys
Types of bias and how to minimize them
Bias in Sampling and Surveys
What is Bias?
Bias: Systematic tendency to over- or under-estimate population parameter.
Key point: Bias ≠ random error. Bias is consistent, predictable deviation in one direction.
Unbiased method: On average, gives correct answer
Biased method: Systematically off, doesn't improve with larger sample
Types of Sampling Bias
1. Selection Bias
Definition: Some members of population systematically more/less likely to be selected.
Causes:
- Non-random sampling method
- Convenience sampling
- Judgment/purposive sampling
Examples:
- Survey only people at shopping mall (excludes non-shoppers)
- Online poll (excludes those without internet)
- Call only landlines (excludes cell-phone-only households)
Result: Sample not representative of population
Solution: Use random sampling methods
2. Undercoverage
Definition: Some groups in population left out of sampling frame.
Sampling frame: List from which sample is drawn
Examples:
- Phone directory excludes unlisted numbers
- Email list excludes those without email
- Voter registration list excludes unregistered voters
Result: Missing groups lead to biased estimates
Solution: Use complete, up-to-date sampling frame that covers entire population
3. Voluntary Response Bias
Definition: Individuals choose whether to participate.
Characteristics:
- Self-selection
- Those with strong opinions more likely to respond
- Usually overrepresents extreme views
Examples:
- Online polls where anyone can vote
- Call-in surveys
- Mail-back questionnaires (without follow-up)
- Social media polls
Result: Respondents not representative (tend to have stronger, more extreme opinions)
Solution: Use probability sampling where researcher selects participants
4. Nonresponse Bias
Definition: Selected individuals don't respond, and non-respondents differ from respondents.
Types:
- Unit nonresponse: Entire survey not completed
- Item nonresponse: Specific questions skipped
Examples:
- Mail survey with 20% response rate
- Phone survey where people don't answer
- Web survey where people start but don't finish
Result: If non-respondents differ systematically from respondents, estimates are biased
Solutions:
- Follow up with non-respondents
- Make survey convenient/appealing
- Keep it short
- Offer incentives (if appropriate)
- Compare respondent characteristics to population
Response Bias
Definition: Responses are systematically incorrect due to how question is asked or answered.
1. Question Wording Bias
Loaded/leading questions suggest a particular answer:
- "Don't you agree that...?"
- "Like most Americans, do you support...?"
Emotionally charged language:
- "Should innocent babies be protected?" vs "Should abortion be legal?"
Solution: Use neutral, clear language
2. Question Order Bias
Earlier questions influence later responses
Example:
- Q1: "How satisfied are you with the president?"
- Q2: "How satisfied are you with the economy?"
Q1 may influence Q2 answers
Solution: Randomize question order or carefully consider order effects
3. Response Option Bias
Limited or unbalanced options can bias results
Example:
- Only offering "Yes" or "No" when "Unsure" is valid
- 4 positive options, 1 negative option
Solution: Offer balanced, complete response options including "no opinion" when appropriate
4. Social Desirability Bias
Respondents give socially acceptable answers rather than truthful ones
Examples:
- Overreporting voting, recycling, charitable donations
- Underreporting illegal behavior, prejudice, embarrassing habits
Solutions:
- Anonymous surveys
- Neutral wording
- Indirect questioning
- Validation against records when possible
5. Interviewer Bias
Interviewer characteristics or behavior influence responses
Examples:
- Gender, race, age of interviewer affects responses to sensitive topics
- Interviewer tone, body language suggests preferred answer
- Recording errors
Solutions:
- Standardize interviewer training
- Use self-administered surveys when possible
- Monitor interviewer performance
6. Recall Bias
Inaccurate memory of past events
Examples:
- "How many times did you exercise last month?" (people forget)
- "What did you eat for lunch 3 days ago?"
Solution: Ask about recent, specific time periods; verify with records when possible
Other Survey Issues
1. Overcoverage
Sampling frame includes units not in target population
Example: List includes deceased people, duplicates, or out-of-scope units
Solution: Clean and update sampling frame regularly
2. Measurement Error
Inaccurate measurements of response variable
Causes:
- Poor question design
- Respondent misunderstanding
- Recording errors
- Equipment problems
Solution: Pilot test survey, train data collectors, use validated measures
3. Processing Error
Errors in data entry, coding, or analysis
Solution: Double-check data entry, use data validation, verify calculations
Reducing Bias: Best Practices
Sampling:
✓ Use probability sampling (random selection)
✓ Ensure complete, accurate sampling frame
✓ Maximize response rate
✓ Follow up with non-respondents
✓ Compare respondent characteristics to population
Survey Design:
✓ Use clear, neutral question wording
✓ Avoid leading or loaded questions
✓ Offer balanced, complete response options
✓ Consider question order effects
✓ Pilot test before full implementation
Data Collection:
✓ Train interviewers/data collectors
✓ Standardize procedures
✓ Consider anonymity for sensitive topics
✓ Verify data accuracy
✓ Document procedures
Impact of Bias
Key insight: Large sample doesn't fix bias!
- Unbiased small sample > Biased large sample
- Bias is systematic - doesn't average out
- Can't use statistics to "correct" for bias after the fact
Example: 1936 Literary Digest poll
- Mailed 10 million ballots (huge sample!)
- Predicted Landon would beat Roosevelt
- Roosevelt won in landslide
- Problem: Undercoverage and nonresponse bias (sampled from phone books and car registrations during Depression; only 24% responded)
Identifying Bias in Studies
When evaluating study, ask:
- How were participants selected? (Random? Convenient?)
- What's the sampling frame? (Complete? Current?)
- What's the response rate? (High? Low?)
- How are questions worded? (Neutral? Leading?)
- Who conducted the survey? (Potential conflicts of interest?)
- How were data collected? (Method may introduce bias)
Quick Reference
Selection Bias: Non-random sampling
Undercoverage: Incomplete sampling frame
Voluntary Response: Self-selection
Nonresponse: Low response rate
Question Wording: Leading/loaded questions
Social Desirability: Giving "acceptable" answers
Interviewer Bias: Interviewer influences responses
Recall Bias: Inaccurate memory
Key Principle: Use random selection, neutral questions, high response rate, careful measurement
Remember: No amount of sophisticated analysis can fix a biased sample. Preventing bias through good design is essential. When evaluating studies, always look for potential sources of bias before trusting the conclusions!
📚 Practice Problems
No example problems available yet.
Practice with Flashcards
Review key concepts with our flashcard system
Browse All Topics
Explore other calculus topics