Sampling Methods
Simple random, stratified, cluster, and systematic sampling
Sampling Methods
Why Sample?
Sampling allows us to study a subset of a population to make inferences about the whole population. It's practical, economical, and often the only feasible approach.
Population: All individuals/items of interest
Sample: Subset selected for study
Goal: Use sample statistics to estimate population parameters
Simple Random Sample (SRS)
Definition: Every individual has equal probability of selection; every group of size n has equal probability.
How to obtain:
- Assign number to each population member
- Use random number generator or table
- Select corresponding individuals
Example: Select 50 students from 500 by randomly generating 50 numbers between 1-500.
Advantages: Unbiased, every member equally likely
Disadvantages: Requires complete population list, may not represent subgroups well
Stratified Random Sampling
Method:
- Divide population into homogeneous groups (strata)
- Take SRS from each stratum
- Combine samples
When to use: Want guaranteed representation from each subgroup
Example: School has 40% freshmen, 30% sophomores, 20% juniors, 10% seniors. For sample of 100, randomly select 40 freshmen, 30 sophomores, 20 juniors, 10 seniors.
Advantages: Ensures all strata represented, more precise estimates, can compare groups
Disadvantages: Requires knowledge of strata, more complex
Cluster Sampling
Method:
- Divide population into clusters (heterogeneous groups)
- Randomly select some clusters
- Survey ALL members in selected clusters
When to use: Population geographically spread, no complete list available
Example: Select 5 random schools, survey all students in those 5 schools.
Key difference from stratified: In stratified, sample from all groups; in cluster, select whole groups.
Advantages: Practical, economical, reduces travel costs
Disadvantages: Less precise than SRS, clusters must be mini-populations
Systematic Sampling
Method:
- Calculate k = N/n (population size / sample size)
- Randomly select starting point (1 to k)
- Select every kth individual
Example: From 1000 students, want 100. k = 10. Start at random number 7, then select 7, 17, 27, 37, etc.
Advantages: Easy to implement, spreads sample across population
Disadvantages: Problems if list has hidden patterns or cycles
Comparing Methods
Use SRS when: Simplest approach, have complete list
Use Stratified when: Subgroups matter, want comparisons
Use Cluster when: Geographic spread, practical constraints
Use Systematic when: Have ordered list, want efficiency
Sampling Bias
Selection Bias: Some individuals more likely to be selected
Voluntary Response: Individuals self-select (those with strong opinions respond)
Undercoverage: Some groups excluded from sampling frame
Nonresponse: Selected individuals don't participate
Avoid bias: Use random selection, ensure complete sampling frame, maximize response rate
Key Principles
✓ Randomization reduces bias
✓ Larger samples generally better (but quality > quantity)
✓ Representative samples crucial for valid inference
✓ Response rate matters (low response = nonresponse bias)
Remember: Good sampling is the foundation of statistical inference. A biased sample, no matter how large, leads to invalid conclusions!
📚 Practice Problems
1Problem 1easy
❓ Question:
A principal wants to survey 50 students from a high school of 500 students. Describe how to select a simple random sample (SRS).
💡 Show Solution
Step 1: Understand Simple Random Sample (SRS) Every student must have equal probability of being selected Every group of 50 students must have equal probability
Step 2: Assign numbers to all students Number all 500 students from 001 to 500 Use student ID numbers or assign sequentially
Step 3: Use random selection method Option A: Random number generator
- Generate 50 random numbers between 1 and 500
- No repeats allowed
- Select students with those numbers
Option B: Random number table
- Pick starting point randomly
- Read 3-digit numbers
- Ignore repeats and numbers >500
- Continue until 50 students selected
Option C: Names in hat (physical)
- Not practical for 500, but conceptually valid
- Mix thoroughly, draw 50
Step 4: Verify randomness Each student has probability 50/500 = 1/10 of being selected No systematic pattern in selection No human judgment involved
Answer: Number all 500 students from 001-500. Use a random number generator or table to select 50 unique numbers between 1 and 500. Survey the students corresponding to those numbers.
2Problem 2easy
❓ Question:
Explain why this is NOT a simple random sample: "To survey students, the principal stands at the main entrance and surveys the first 50 students who arrive at school."
💡 Show Solution
Step 1: Identify the sampling method used This is a CONVENIENCE sample Principal selects students who are easy to reach Based on who arrives first
Step 2: Check SRS requirements For SRS, every student must have equal probability For SRS, selection must be random
Step 3: Identify problems with this method
Problem 1: Unequal probabilities
- Students who arrive early: HIGH probability of selection
- Students who arrive late: ZERO probability
- Not all students have equal chance
Problem 2: Systematic bias
- Early arrivers may be different from late arrivers
- Might be more studious, live closer, take bus, etc.
- Different characteristics than general population
Problem 3: Not random
- Order of arrival determines selection
- Predictable pattern
- Could manipulate by arriving early/late
Step 4: Potential biases introduced Early arrivers might:
- Be more organized/responsible
- Have different transportation
- Live closer to school
- Have different family situations
- Be more/less involved in activities
Results won't represent all students
Answer: This is NOT a simple random sample because not all students have equal probability of selection - only early arrivers can be chosen. It's a convenience sample that likely introduces bias, as early-arriving students may differ systematically from the general student population.
3Problem 3medium
❓ Question:
A university has 4,000 freshmen, 3,000 sophomores, 2,000 juniors, and 1,000 seniors. Design a stratified random sample of 200 students that maintains class proportions.
💡 Show Solution
Step 1: Calculate total population Total = 4,000 + 3,000 + 2,000 + 1,000 = 10,000 students
Step 2: Find proportion of each class Freshmen: 4,000/10,000 = 0.40 = 40% Sophomores: 3,000/10,000 = 0.30 = 30% Juniors: 2,000/10,000 = 0.20 = 20% Seniors: 1,000/10,000 = 0.10 = 10%
Step 3: Apply proportions to sample size Sample size = 200 students
Freshmen: 200 × 0.40 = 80 students Sophomores: 200 × 0.30 = 60 students Juniors: 200 × 0.20 = 40 students Seniors: 200 × 0.10 = 20 students
Step 4: Verify 80 + 60 + 40 + 20 = 200 ✓ 80/200 = 40% ✓ 60/200 = 30% ✓ 40/200 = 20% ✓ 20/200 = 10% ✓
Step 5: How to select within each stratum From each class, take a simple random sample:
- Randomly select 80 from 4,000 freshmen
- Randomly select 60 from 3,000 sophomores
- Randomly select 40 from 2,000 juniors
- Randomly select 20 from 1,000 seniors
Answer: Select 80 freshmen, 60 sophomores, 40 juniors, and 20 seniors using simple random sampling within each class. This maintains the 40%-30%-20%-10% class distribution.
4Problem 4medium
❓ Question:
A researcher wants to study student satisfaction across a large university with 30 dorms. She randomly selects 5 dorms and surveys ALL students in those 5 dorms. What sampling method is this? What are the advantages and potential problems?
💡 Show Solution
Step 1: Identify the sampling method This is CLUSTER SAMPLING
- Population divided into groups (clusters = dorms)
- Randomly select SOME clusters (5 dorms)
- Survey ALL individuals in selected clusters
Step 2: Advantages of cluster sampling
-
Cost-effective
- Only need to visit 5 dorms, not 30
- Reduced travel time and expense
- Easier to administer
-
Practical
- Complete list of students only needed for selected dorms
- Don't need list of all students initially
- Can focus resources on selected areas
-
Logistically simple
- Survey whole dorms at once
- Can hold dorm-wide meetings
- Easier coordination
Step 3: Potential problems
-
Clusters may not be representative
- Each dorm might have unique characteristics
- Honors dorm, freshman dorm, quiet dorm, party dorm
- Selected dorms might not represent all 30
-
Students within dorms are similar
- Dorm culture affects all residents
- Same facilities, RAs, rules
- Reduces variability (not as much info as SRS)
-
Increased sampling error
- Generally less precise than SRS of same size
- Need larger sample for same precision
- Between-cluster variability matters
-
Risk of unlucky selection
- Could randomly select 5 unusual dorms
- With only 5 clusters, high risk
- Should select more clusters if possible
Step 4: When cluster sampling is best Good when:
- Clusters are heterogeneous (mixed) internally
- Clusters are similar to each other
- Cost/logistics are major concerns
Bad when:
- Clusters are very different from each other
- Students within cluster are very similar
- High precision needed
Answer: Cluster sampling. Advantages: cost-effective, practical, easy logistics. Problems: dorms may differ systematically (honors vs. freshman dorm), students within dorms are similar (less variability), potentially higher sampling error than SRS. Best when cost matters more than precision.
5Problem 5hard
❓ Question:
Compare stratified random sampling and cluster sampling. When should you use each? Give examples where each would be preferred.
💡 Show Solution
STRATIFIED RANDOM SAMPLING:
How it works:
- Divide population into homogeneous groups (strata)
- Take a random sample from EACH stratum
- Combine samples
Key: Sample from ALL groups, but not everyone in each group
Example strata: grade levels, income brackets, regions
CLUSTER SAMPLING:
How it works:
- Divide population into groups (clusters)
- Randomly select SOME clusters
- Survey ALL (or sample) within selected clusters
Key: Use only SOME groups, but everyone in selected groups
Example clusters: schools, city blocks, dorms
COMPARISON TABLE:
Sample from all groups? Stratified: YES (every stratum) Cluster: NO (only selected clusters)
Survey everyone in selected group? Stratified: NO (random sample) Cluster: YES (all members)
Within-group similarity: Stratified: HIGH (homogeneous strata) Cluster: LOW (heterogeneous clusters)
Between-group differences: Stratified: HIGH (different strata) Cluster: LOW (similar clusters)
Precision: Stratified: HIGHER (ensures representation) Cluster: LOWER (risk of unrepresentative clusters)
Cost: Stratified: HIGHER (must visit all strata) Cluster: LOWER (visit only selected clusters)
WHEN TO USE STRATIFIED:
-
Subgroups are important Example: Testing drug on different age groups Want to ensure all ages represented
-
Groups differ substantially Example: Income study in city with rich and poor areas Want proportional representation
-
Precision is priority Example: Political polling Need accurate estimates for each demographic
-
Have good frame for all strata Example: Employee survey with department lists Can access each group easily
WHEN TO USE CLUSTER:
-
No natural strata Example: Households on city blocks Blocks are similar, households within block vary
-
Cost/logistics are major concern Example: Door-to-door health survey Cheaper to survey whole neighborhoods
-
Complete list unavailable Example: All residents in a city Can list neighborhoods, but not all people
-
Groups are internally diverse Example: Schools in a district (each has mix of students) Each school represents population well
REAL EXAMPLES:
Stratified:
- Poll likely voters by party affiliation (Dem, Rep, Ind)
- Medical study ensuring males and females both represented
- University survey with proportional freshmen, soph, junior, senior
Cluster:
- WHO selecting villages in developing country for vaccination study
- Census using city blocks
- Agricultural study selecting random farms, testing all plots on each
Answer: Use stratified when groups differ and you want precise estimates ensuring all groups represented (costs more). Use cluster when groups are similar and cost/logistics matter more (less precise). Stratified samples from all groups; cluster samples all from selected groups.
Practice with Flashcards
Review key concepts with our flashcard system
Browse All Topics
Explore other calculus topics