Ch 4 Stratified Random Sampling (STS)

- DEFN A stratified random sample is obtained by

separating the population units into

non-overlapping groups, called strata, and then

selecting a random sample from each stratum

Procedure

- Divide sampling frame into mutually exclusive and

exhaustive strata - Assign each SU to one and only one stratum
- Select a random sample from each stratum
- Select random sample from stratum 1
- Select random sample from stratum 2

Stratum H

Stratum 1

h1

h2

. . .

. . .

hH

Ag example

- Divide 3078 counties into 4 strata corresponding

to regions of the countries - Northeast (h 1)
- North central (h 2)
- South (h 3)
- West (h 4)
- Select a SRS from each stratum
- In this example, stratum sample size is

proportional to stratum population size - 300 is 9.75 of 3078
- Each stratum sample size is 9.75 of stratum

population

Ag example 2

Procedure 2

- Need to have a stratum value for each SU in the

frame - Minimum set of variables in sampling frame SU

id, stratum assignment

Ag example 3

Procedure 3

- Each stratum sample is selected independently of

others - New set of random numbers for each stratum
- Basis for deriving properties of estimators
- Design within a stratum
- For Ch 4, we will assume a SRS is selected within

each stratum - Can use any probability design within a stratum
- Sample designs do not need to be the same across

strata

Uses for STS

- To improve representativeness of sample
- In SRS, can get ANY combination of n elements in

the sample - In SYS, we severely restricted the set to k

possible samples - Can get bad samples
- Less likely to get unbalanced samples if frame is

sorted using a variable correlated with Y

Uses for STS 2

- To improve representativeness of sample - 2
- In STS, we also exclude samples
- Explicitly choose strata to restrict possible

samples - Improve chance of getting representative samples

if use strata to encourage spread across

variation in population

Uses for STS 3

- To improve precision of estimates for population

parameters - Achieved by creating strata so that
- variation WITHIN stratum is small
- variation AMONG strata is large
- Uses same principal as blocking in experimental

design - Improve precision of estimate for population

parameter by obtaining precise estimates within

each stratum

Uses for STS 4

- To study specific subpopulations
- Define strata to be subpopulations of interest
- Examples
- Male v. female
- Racial/ethnic minorities
- Geographic regions
- Population density (rural v. urban)
- College classification
- Can establish sample size within each stratum to

achieve desired precision level for estimates of

subpopulations

Uses for STS 5

- To assist in implementing operational aspects of

survey - May wish to apply different sampling and data

collection procedures for different groups - Agricultural surveys (sample designs)
- Large farms in one stratum are selected using a

list frame - Smaller farms belong to a second strata, and are

selected using an area sample - Survey of employers (data collection methods)
- Large firms use mail survey because information

is too voluminous to get over the phone - Small firms telephone survey

Estimation strategy

- Objective estimate population total
- Obtain estimates for each stratum
- Estimate stratum population total
- Use SRS estimator for stratum total
- Estimate variance of estimator in each stratum
- Use SRS estimator for variance of estimated

stratum total - Pool estimates across strata
- Sum stratum total estimates and variance

estimates across strata - Variance formula justified by independence of

samples across strata

Ag example 4

Ag example 5

- Estimated total farm acres in US

Ag example 6

Ag example 7

- Estimated variance for estimated total farm acres

in US

Ag example 8

- Compare with SRS estimates

Estimation strategy - 2

- Objective estimate population mean
- Divide estimated total by population size
- OR equivalently,
- Obtain estimates for each stratum
- Estimate stratum mean with stratum sample mean
- Pool estimates across strata
- Use weighted average of stratum sample means with

weights proportional to stratum sizes Nh

Ag example 9

- Estimated mean farm acres / county

Ag example 10

- Estimate variance of estimated mean farm acres /

county

Notation

- Index set for stratum h 1, 2, , H
- Uh 1, 2, , Nh
- Nh number of OUs in stratum h in the population
- Partition sample of size n across strata
- nh number of sample units from stratum h

(fixed) - Sh index set for sample belonging to stratum h

Stratum H

Notation 2

- Population sizes
- Nh number of OUs in stratum h in the population
- N N1 N2 NH
- Partition sample of size n across strata
- nh number of sample units from stratum h
- n n1 n2 nH
- The stratum sample sizes are fixed
- In domain estimation, they are random
- For now, we will assume that the sampling unit

(SU) is an observation unit (OU)

Notation 3

- Response variable
- Yhj characteristic of interest for OU j in

stratum h - Population and stratum totals

Notation 4

- Population and stratum means

Notation 5

- Population stratum variance

Notation 6

- SRS estimators for stratum parameters

STS estimators

- For population total

STS estimators 2

- For population mean

STS estimators 3

- For population proportion

Properties

- STS estimators are unbiased
- Each estimate of stratum population mean or total

is unbiased (from SRS)

Properties 2

- Inclusion probability for SU j in stratum h
- Definition in words
- Formula ?hj

Properties 3

- In general, for any stratification scheme, STS

will provide a more precise estimate of the

population parameters (mean, total, proportion)

than SRS - For example
- Confidence intervals
- Same form (using z?/2)
- Different CLT

Sampling weights

- Note that
- Sampling weight for SU j in stratum h
- A sampling weight is a measure of the number of

units in populations represented by SU j in

stratum h

Example

- Note weights for each OU within a stratum are

the same

Example 2

- Dataset from study

Sampling weights 2

- For STS estimators presented in Ch 4, sampling

weight is the inverse inclusion probability

Defining strata

- Depends on purpose of stratification
- Improved representativeness
- Improved precision
- Subpopulations estimates
- Implementing operational aspects
- If possible, use factors related to variation in

characteristic of interest, Y - Geography, political boundaries, population

density - Gender, ethnicity/race, ISU classification
- Size or type of business
- Remember
- Stratum variable must be available for all OUs

Allocation strategies

- Want to sample n units from the population
- An allocation rule defines how n will be spread

across the H strata and thus defines values for

nh - Overview for estimating population parameters

Special cases of optimal allocation

Allocation strategies 2

- Focus is on estimating parameter for entire

population - Well look at subpopulations later
- Factors affecting allocation rule
- Number of OUs in stratum
- Data collection costs within strata
- Within-stratum variance

Proportional allocation

- Stratum sample size allocated in proportion to

population size within stratum - Allocation rule

Ag example 11

Proportional allocation 2

- Proportional allocation rule implies
- Sampling fraction for stratum h is constant

across strata - Inclusion probability is constant for all SUs in

population - Sampling weight for each unit is constant

Proportional allocation 3

- STS with proportional allocation leads to a

self-weighting sample - What is a self-weighting sample?
- If whj has the same value for every OU in the

sample, a sample is said to be self-weighting - Since each weight is the same, each sample unit

represents the same number of units in the

population - For self-weighting samples, estimator for

population mean to sample mean - Estimator for variance does NOT necessarily

reduce to SRS estimator for variance of

Proportional allocation 4

- Check to see that a STS with proportional

allocation generates a self-weighting sample - Is the sample weight whj is same for each OU?
- Is estimator for population mean equal to

the sample mean ? - What happens to the variance of ?

Ag example 12

- Even though we have used proportional allocation,

rounding in setting sample sizes can lead to

unequal (but approximately equal) weights

Neyman allocation

- Suppose within-stratum variances vary

across strata - Stratum sample size allocated in proportion to
- Population size within stratum Nh
- Population standard deviation within stratum Sh
- Allocation rule

Caribou survey example

Optimal allocation

- Suppose data collection costs ch vary across

strata - Let C total budget
- c0 fixed costs (office rental, field

manager) - ch cost per SU in stratum h (interviewer

time, travel cost) - Express budget constraints asand determine nh

Optimal allocation 2

- Assume general case stratum population sizes,

stratum variances, and stratum data collection

costs vary across strata - Sample size is allocated to strata in proportion

to - Stratum population size Nh
- Stratum standard deviation Sh
- Inverse square root of stratum data collection

costs - Allocation rule

Optimal allocation 3

- Obtain this formula by finding nh such that

is minimized given cost constraints - The optimal stratum allocation will generate the

smallest variance of for a given

stratification and cost constraint - Sample size for stratum h (nh ) is larger in

strata where one or more of the following

conditions exist - Stratum size Nh is large
- Stratum variance is large
- Stratum per-unit data collection costs ch are

small

Welfare example

- Objective
- Estimate fraction of welfare participant

households in NE Iowa that have access to a

reliable vehicle for work - Sample design
- Frame welfare participant list
- Stratum 1 Phone
- N1 4500 households, p1 0.85, c1 100
- Stratum 2 No phone
- N2 500 households, p2 0.50, c2 300
- Sample size n 500

Welfare example 2

- Optimal allocation with phone strata

Optimal allocation 4

- Proportional and Neyman allocation are special

cases of optimal allocation - Neyman allocation
- Data collection costs per sample unit ch are

approximately constant across strata - Telephone survey of US residents with regional

strata - ch term cancels out of optimal allocation formula

Optimal allocation 5

- Proportional allocation
- Data collection costs per sample unit ch are

approximately constant across strata - Within stratum variances are approximately

constant across strata - Y number of persons per household is relatively

constant across regions - ch and Sh terms drop out of allocation formula

Subpopulation allocation

- Suppose main interest is in estimating stratum

parameters - Subpopulation (stratum) mean, total, proportion
- Define strata to be subpopulations
- Estimate stratum population parameters
- Allocation rules derived from independent SRS

within each stratum (subpopulation) - Equal allocation for equal stratum costs,

variances - Stratum variances change across strata

Subpopulation allocation 2

- Equal allocation
- Assume
- Desired precision levels for each subpopulation

(stratum) are constant across strata - Stratum costs, stratum variances equal across

strata - Stratum FPCs near 1
- Allocation rule is to divide n equally across

the H strata (subpopulations) - If Nh vary much, equal allocation will lead to

less precise estimates of parameters for full

population

Welfare example 3

- Suppose we wanted to estimate proportion of

welfare households that have access to a car for

households in each of three subpopulations in NE

Iowa - Metropolitan county
- Counties adjacent to metropolitan county
- Counties not adjacent to metro county

Welfare example 4

- Equal allocation with population density strata

Subpopulation allocation 3

- More complex settings If Sh vary across strata,

can use SRS formulas for determining stratum

sample sizes, e.g., for stratum mean - Result is
- May get sample sizes (nh) that are too large or

small relative to budget - Relax margin of error eh and/or confidence level

100(1-?) - Recalibrate stratum sample sizes to get desired

sample size

Welfare example 5

- 95 CI, e 0.10 for all pop density strata

Compromise allocations

Proportional Allocation

Equal Allocation

nh n /H

nh nNh /N

nh

nh

Nh

Nh

Nh

Square Root Allocation

Square root allocation

- More SUs to small strata than proportional

allocation - Fewer SUs to large strata than equal
- Variance for subpopulation estimates is smaller

than proportional - Variance for whole population estimates is

smaller than equal allocation

Nh

Square Root Allocation

Compromise allocations 2

- May want to set
- Minimum number of SUs in a stratum
- Cap on max number of SUs in a stratum
- Rule
- nh min for Nh lt A
- nh max for Nh gt B
- Apply rule in between A and B
- Square root
- Proportional

nh

max nh

min nh

A B Nh

nh

max nh

min nh

A B Nh

Welfare example 6

- Comparing equal, proportional and square root

allocation

Other allocations

- Certainty stratum is used to guarantee inclusion

in sample - Census (sample all) the units in a stratum
- For certainty stratum h
- Allocation nh Nh
- Inclusion probability ?hj 1
- Ad hoc allocations
- The sample allocation does not have to follow any

of the rules mentioned so far - However, you should determine the stratum

allocation in relation to analysis objectives and

operational constraints

Welfare example 7

- Ad hoc allocation

Determining sample size n

- Determine allocation using rule expressed in

terms of relative sample size nh /n - Rewrite variance of as a function of

relative sample sizes (ignoring stratum FPCs) - Sample size calculation based on margin of error

e for population total

Determining sample size n 2

- Rewrite variance of as a function of

relative sample sizes (ignoring stratum FPCs) - Samples size calculation based on margin of error

e for population mean

Welfare example 8

- Relative sample size for equal allocation
- Value of ?
- For 95 CI with e 0.1

STS Summary

- Choose stratification scheme
- Scheme depends on objectives, operational

constraints - Must know stratum identifier for each SU in the

frame - Set a design for each stratum
- Design for each stratum SRS, SYS,
- Determine n and nh
- Select sample independently within each stratum
- Pool stratum estimates to get estimates of

population parameters

