Types of Data and Sampling
Categorical vs quantitative data, sampling methods
Types of Data and Sampling
Introduction
Statistics is the science of collecting, organizing, analyzing, and interpreting data. Understanding the different types of data and proper sampling methods is fundamental to conducting valid statistical analyses.
Types of Data
Categorical vs. Quantitative
Categorical (Qualitative) Data:
- Describes characteristics or qualities
- Places individuals into categories
- Cannot be measured numerically in a meaningful way
Examples:
- Eye color (blue, brown, green)
- Political party (Democrat, Republican, Independent)
- Type of car (sedan, SUV, truck)
- Opinion rating (agree, neutral, disagree)
Quantitative (Numerical) Data:
- Consists of numerical measurements or counts
- Can be added, averaged, or otherwise manipulated mathematically
Examples:
- Height (68 inches, 72 inches)
- Test score (85, 92, 78)
- Number of siblings (0, 1, 2, 3)
- Temperature (72°F, 85°F)
Discrete vs. Continuous
Within quantitative data, we distinguish:
Discrete Data:
- Countable values
- Usually whole numbers
- Often from counting
Examples:
- Number of students in a class (25, 30, 18)
- Number of cars owned (0, 1, 2, 3)
- Number of errors on a test (2, 5, 0)
Continuous Data:
- Can take any value in an interval
- Usually from measuring
- Infinite possible values between any two points
Examples:
- Height (5.7 feet, 5.75 feet, 5.752 feet...)
- Weight (142.3 lbs, 142.35 lbs...)
- Time (3.2 seconds, 3.25 seconds...)
Levels of Measurement
Understanding the level of measurement helps determine appropriate statistical analyses.
Nominal
Characteristics:
- Categories with no inherent order
- Most basic level
- Can only count frequencies
Examples:
- Blood type (A, B, AB, O)
- Gender (male, female, non-binary)
- Favorite color (red, blue, green)
Valid operations: Count, mode
Ordinal
Characteristics:
- Categories with meaningful order
- Differences between ranks not necessarily equal
- Cannot measure exact distance between values
Examples:
- Class rank (1st, 2nd, 3rd)
- Letter grades (A, B, C, D, F)
- Satisfaction rating (very satisfied, satisfied, neutral, dissatisfied)
Valid operations: Count, mode, median
Interval
Characteristics:
- Numerical scale with equal intervals
- No true zero point
- Zero doesn't mean "absence of"
Examples:
- Temperature in Celsius or Fahrenheit (0°F doesn't mean "no temperature")
- IQ scores
- Calendar years (year 0 is arbitrary)
Valid operations: Count, mode, median, mean, addition/subtraction
Ratio
Characteristics:
- Numerical scale with equal intervals
- Has true zero point
- Zero means complete absence
- Can form ratios (twice as much, half as big)
Examples:
- Height (0 inches = no height)
- Weight (0 lbs = no weight)
- Age (0 years = newborn)
- Income (0 dollars = no money)
Valid operations: All mathematical operations
Populations vs. Samples
Population
Definition: The entire group of individuals or items we want to study
Characteristics:
- Complete collection
- Often too large or expensive to study completely
- Denoted by for size
Examples:
- All students in the United States
- All adults registered to vote in California
- Every car manufactured by Toyota in 2024
Parameters: Numerical characteristics of populations
- Population mean: (mu)
- Population standard deviation: (sigma)
- Population proportion:
Sample
Definition: A subset of the population, selected for study
Characteristics:
- Representative portion of population
- Practical and economical to study
- Denoted by for size
Examples:
- 500 randomly selected U.S. students
- 1,000 California voters surveyed
- 100 Toyota cars tested from 2024 production
Statistics: Numerical characteristics of samples
- Sample mean: (x-bar)
- Sample standard deviation:
- Sample proportion: (p-hat)
Key relationship: We use statistics from samples to make inferences about parameters of populations.
Sampling Methods
Random Sampling
Simple Random Sample (SRS):
- Every individual has equal chance of selection
- Every group of size has equal chance
- "Gold standard" of sampling
How to obtain:
- Assign numbers to all population members
- Use random number generator
- Select corresponding individuals
Example: Put all 500 student names in a hat, mix thoroughly, draw 50 names
Advantages:
- Unbiased
- Simple to understand
- Known probability of selection
Disadvantages:
- Requires complete list of population
- May not represent subgroups well
- Can be impractical for large populations
Stratified Random Sample
Method:
- Divide population into homogeneous groups (strata)
- Take SRS from each stratum
- Combine samples
Example: Divide school by grade level (9th, 10th, 11th, 12th), randomly sample 25 students from each grade
When to use:
- Want to ensure representation of subgroups
- Strata are internally similar but different from each other
- Interested in comparing groups
Advantages:
- Guarantees representation from each stratum
- More precise estimates
- Can compare strata
Disadvantages:
- Requires knowledge of population characteristics
- More complex than SRS
Cluster Sample
Method:
- Divide population into groups (clusters)
- Randomly select some clusters
- Study ALL individuals in selected clusters
Example: Divide city into neighborhoods (clusters), randomly select 5 neighborhoods, survey all households in those 5
When to use:
- No complete population list available
- Geographically dispersed population
- Cost-effective approach needed
Advantages:
- Practical and economical
- No need for complete population list
- Reduces travel/contact costs
Disadvantages:
- Less precise than SRS
- Clusters should be heterogeneous (like mini-populations)
Systematic Sample
Method:
- Select every th individual from list
- Random starting point
- (population size / sample size)
Example: From 1000 students, select every 10th student (random start between 1-10), get sample of 100
When to use:
- Have organized list
- Want easy implementation
- Population not cyclical
Advantages:
- Simple to implement
- Spreads sample across population
- Often as good as SRS
Disadvantages:
- Problems if list has hidden patterns
- Not truly random
Sampling Bias
Types of Bias
Selection Bias:
- Some individuals more likely to be selected
- Sample not representative of population
Example: Surveying only people in shopping mall (excludes those who don't shop there)
Voluntary Response Bias:
- Individuals choose to participate
- Often those with strong opinions respond
Example: Online poll where anyone can vote (those who care most will participate)
Undercoverage:
- Some groups systematically excluded
- Sampling frame incomplete
Example: Phone survey excludes those without phones
Nonresponse Bias:
- Selected individuals don't respond
- Respondents differ from non-respondents
Example: Survey with 20% response rate (80% non-response)
Best Practices
For Valid Sampling:
✓ Use random selection when possible
✓ Define population clearly
✓ Ensure sampling frame matches population
✓ Minimize nonresponse
✓ Watch for sources of bias
✓ Use stratification when subgroups matter
✓ Make sample size adequate for precision needed
Common Mistakes to Avoid:
❌ Convenience sampling (just because it's easy)
❌ Voluntary response (self-selection bias)
❌ Assuming bigger is always better (quality > quantity)
❌ Ignoring nonresponse
❌ Using outdated sampling frame
Quick Reference
Data Type Decision Tree:
- Is it numerical? → Quantitative (otherwise Categorical)
- Can it be counted? → Discrete (otherwise Continuous)
- Does it have true zero? → Ratio (otherwise Interval)
Sampling Method Selection:
- Want simplicity and have complete list → SRS
- Need to ensure subgroup representation → Stratified
- Population spread out geographically → Cluster
- Have organized list, want efficiency → Systematic
Remember: Good sampling is the foundation of valid statistical inference. A biased sample, no matter how large, leads to invalid conclusions!
📚 Practice Problems
No example problems available yet.
Practice with Flashcards
Review key concepts with our flashcard system
Browse All Topics
Explore other calculus topics