TwoStage Cluster Sampling from Equal Clusters: - PowerPoint PPT Presentation

1 / 30

About This Presentation

Title:

TwoStage Cluster Sampling from Equal Clusters:

Description:

... the first stage m clusters can be randomly sampled from M clusters, and in the ... selected from each of 80 randomly selected drawers: (M=400, m=80, =50, and =5) ... – PowerPoint PPT presentation

Number of Views:123

Avg rating:3.0/5.0

Slides: 31

Provided by: barbara177

Category:

more less

Transcript and Presenter's Notes

Title: TwoStage Cluster Sampling from Equal Clusters:

1

Two-Stage Cluster Sampling from Equal Clusters
Basic ideas
A two-stage sampling is a natural extension of
one-stage cluster sampling when a hierarchical
frame is used.
Use of two-stage sampling will improve design
efficiency when the within cluster (PSU)
variance is small. No reason to observe all
elements in the cluster when the elements in the
cluster are alike

In the first stage m clusters can be randomly
sampled from M clusters, and in the second stage
units can be randomly sampled from units.
Simple two-stage cluster sampling serves as a
basic model for multi-stage sampling designs.

Number of possible samples
The total number of possible such samples will
be

For example, there are 27 possible samples in a
two stage sample of m2 and 2 from a
population of M3 and 3

Unbiased estimators
Under simple two-stage sampling design
described above, unbiased estimates can be
obtained by applying the theory of simple random
sampling in two stages.
(mean per unit in the sample) is
unbiased estimate of (mean per
unit in the population)
is unbiased estimate of
(mean per cluster in the population
is unbiased estimate of
X (population total)

Sampling variance
True sampling variance of (mean per unit)
can be obtained by ,
where is variance among clusters (first
stage units) and is variance among
second-stage units within clusters.

Sampling variance for (mean per cluster)
can be obtained by multiplying to
.
Sampling variance for x (population total)
can be obtained by multiplying
to .

Estimator of sampling variance
The unbiased estimator of can be
obtained by substituting and
with the following quantities

9
Where

Note that cannot be directly estimated
from
alone, whereas can be
directly from

Then we get the estimator of sampling variance

Variance estimators for and x can be
obtained by multiplying and
respectively to the above estimator.
Note that the contribution from the second
term will be very small as
long as is a small fraction.

The above variance formula for two-stage design
can easily be extended to a simple three-stage
design by adding the third term. But
contribution from the third term will be even
smaller than that of the second term.
The formula given in Box 10.2 (page 281) is
ultimate cluster approximation, which is based
on the first term of the above estimator, with
the substitution of

Example to verify the above formulas
Consider a two-stage sample of m2 and 2
taken from a population M3 and 3.
Suppose the population consists of 3 clusters
with following values
Cluster 1 1 6 7 14 4.67
Cluster 2 2 5 8 15 5.00
Cluster 3 3 4 9 16 5.33

There are 27 possible samples

Sample estimates for these possible samples
are shown in the attachment along with various
expected values calculated from the possible
samples.
This example demonstrates how the above
formulas work and how the ultimate cluster
approximation given in the text (Box 10.2)
compares with the exact formula.

Example for applying the estimators
A set of 20,000 medical records is stored in
400 file drawers, each containing 50 records.
In drawing a two-stage sample, 5 records are
randomly selected from each of 80 randomly
selected drawers (M400, m80, 50, and
5). For one variable X, we obtained 15.2,
9050 and 805.
Using the two-term estimator,

Using the ultimate cluster approximation in
the text,
Using the first term only,
Using the first term, ignoring fpc,

Sampling variance can be estimated quite
satisfactorily, ignoring the second stage units.
The ultimate cluster approximation appears to
be a good compromise.

The case of proportion in two-stage sampling
The unbiased estimate of population proportion
is
where
Estimator of sampling variance

19
Where and
20
Example for the case of proportion A large store
handles about 20,000 accounts receivable
per month. A 2 sample ( 400) was verified
every other month for the last 4 years (M48, and
m24). The number of accounts found to be in
error per month was as follows 0, 0,
1, 1, 2, 4, 4, 5, 5, 5, 5, 6, 6, 6,
7, 7, 8, 9, 9, 10, 10, 13, 14, 17 (sum154)

or 1.6 applying the above formula,
21
95 confidence interval(1.21, 2.00)
95 confidence
interval (1.14, 2.06), using the textbook
formula (See STATA output)
22

Optimal subsample size
One of the key questions in designing a
two-stage cluster sample is how to determine the
sample size in the second stage.
Optimal choice of depends on within
cluster variance and the relative costs of
survey at both stages.

Considering a cost function of
and sampling variance of
variance would be minimized for fixed
C (C would be minimized for fixed V) when the
following condition is satisfied (by using the
Cauchy-Schwarz inequality)

This can be calculated from sample data by
substituting with
calculated from sample data as shown below.
This suggests us to choose a larger when
the interclass correlation is small (large
within cluster variability) and when the unit
cost for cluster is large relative the unit cost
for element.

The intraclass correlation can be calculated
by the formula (10.10) on page 298. The same
formula can be used for sample data by
substituting
with
.

26
Example for optimal subsample size
consideration Let us calculate the optimal
for the above example of store accounts, assuming
C1100 and C210
27
This suggests that 0.225 (45) sample will be
sufficient, instead of 2 sample (200).
28

Notes on definition of between-cluster variation
The following notes will help you relate the
textbook definition to convention you are
familiar with and understand different
definitions used in other books
In analysis of variance, the mean square
between groups is defined as

(for equal size group or cluster)
29