TwoStage Cluster Sampling from Equal Clusters: - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

TwoStage Cluster Sampling from Equal Clusters:

Description:

... the first stage m clusters can be randomly sampled from M clusters, and in the ... selected from each of 80 randomly selected drawers: (M=400, m=80, =50, and =5) ... – PowerPoint PPT presentation

Number of Views:123
Avg rating:3.0/5.0
Slides: 31
Provided by: barbara177
Category:

less

Transcript and Presenter's Notes

Title: TwoStage Cluster Sampling from Equal Clusters:


1
  • Two-Stage Cluster Sampling from Equal Clusters 
  • Basic ideas  
  • A two-stage sampling is a natural extension of
    one-stage cluster sampling when a hierarchical
    frame is used.
  •   Use of two-stage sampling will improve design
    efficiency when the within cluster (PSU)
    variance is small. No reason to observe all
    elements in the cluster when the elements in the
    cluster are alike

2
  • In the first stage m clusters can be randomly
    sampled from M clusters, and in the second stage
    units can be randomly sampled from units.
  •  
  • Simple two-stage cluster sampling serves as a
    basic model for multi-stage sampling designs.

3
  • Number of possible samples
  • The total number of possible such samples will
    be
  •  

4
  • For example, there are 27 possible samples in a
    two stage sample of m2 and 2 from a
    population of M3 and 3

5
  • Unbiased estimators
  •   Under simple two-stage sampling design
    described above, unbiased estimates can be
    obtained by applying the theory of simple random
    sampling in two stages.
  •  
  • (mean per unit in the sample) is
    unbiased estimate of (mean per
    unit in the population)
  • is unbiased estimate of
    (mean per cluster in the population
  • is unbiased estimate of
    X (population total)

6
  • Sampling variance
  •   True sampling variance of (mean per unit)
    can be obtained by ,
  • where is variance among clusters (first
    stage units) and is variance among
    second-stage units within clusters.

7
  • Sampling variance for (mean per cluster)
    can be obtained by multiplying to
    .
  •  
  • Sampling variance for x (population total)
    can be obtained by multiplying
    to .

8
  • Estimator of sampling variance
  • The unbiased estimator of can be
    obtained by substituting and
    with the following quantities

9
Where
  • Note that cannot be directly estimated
    from
  • alone, whereas can be
    directly from

10
  • Then we get the estimator of sampling variance
  • Variance estimators for and x can be
    obtained by multiplying and
    respectively to the above estimator.
  •   Note that the contribution from the second
    term will be very small as
    long as is a small fraction.

11
  • The above variance formula for two-stage design
    can easily be extended to a simple three-stage
    design by adding the third term. But
    contribution from the third term will be even
    smaller than that of the second term.
  • The formula given in Box 10.2 (page 281) is
    ultimate cluster approximation, which is based
    on the first term of the above estimator, with
    the substitution of

12
  • Example to verify the above formulas
  •   Consider a two-stage sample of m2 and 2
    taken from a population M3 and 3.
    Suppose the population consists of 3 clusters
    with following values
  • Cluster 1 1 6 7 14 4.67
  • Cluster 2 2 5 8 15 5.00
  • Cluster 3 3 4 9 16 5.33

13
  • There are 27 possible samples

14
  • Sample estimates for these possible samples
    are shown in the attachment along with various
    expected values calculated from the possible
    samples.
  •  
  • This example demonstrates how the above
    formulas work and how the ultimate cluster
    approximation given in the text (Box 10.2)
    compares with the exact formula.
  •  

15
  • Example for applying the estimators
  •  A set of 20,000 medical records is stored in
    400 file drawers, each containing 50 records.
    In drawing a two-stage sample, 5 records are
    randomly selected from each of 80 randomly
    selected drawers (M400, m80, 50, and
    5). For one variable X, we obtained 15.2,
    9050 and 805.
  •  
  • Using the two-term estimator,

16
  • Using the ultimate cluster approximation in
    the text,
  • Using the first term only,
  • Using the first term, ignoring fpc,

17
  • Sampling variance can be estimated quite
    satisfactorily, ignoring the second stage units.
  • The ultimate cluster approximation appears to
    be a good compromise.

18
  • The case of proportion in two-stage sampling
  • The unbiased estimate of population proportion
    is
  • where
  • Estimator of sampling variance

19
Where and
20
Example for the case of proportion A large store
handles about 20,000 accounts receivable
per month. A 2 sample ( 400) was verified
every other month for the last 4 years (M48, and
m24). The number of accounts found to be in
error per month was as follows 0, 0,
1, 1, 2, 4, 4, 5, 5, 5, 5, 6, 6, 6,
7, 7, 8, 9, 9, 10, 10, 13, 14, 17 (sum154)


or 1.6 applying the above formula,
21
95 confidence interval(1.21, 2.00)
95 confidence
interval (1.14, 2.06), using the textbook
formula (See STATA output)
22
  • Optimal subsample size
  •   One of the key questions in designing a
    two-stage cluster sample is how to determine the
    sample size in the second stage.
  •  
  • Optimal choice of depends on within
    cluster variance and the relative costs of
    survey at both stages.

23
  • Considering a cost function of
    and sampling variance of
    variance would be minimized for fixed
    C (C would be minimized for fixed V) when the
    following condition is satisfied (by using the
    Cauchy-Schwarz inequality)

24
  • This can be calculated from sample data by
    substituting with
    calculated from sample data as shown below.
  •  
  • This suggests us to choose a larger when
    the interclass correlation is small (large
    within cluster variability) and when the unit
    cost for cluster is large relative the unit cost
    for element.
  •  

25
  • The intraclass correlation can be calculated
    by the formula (10.10) on page 298. The same
    formula can be used for sample data by
    substituting
  • with
    .
  •  


26
Example for optimal subsample size
consideration Let us calculate the optimal
for the above example of store accounts, assuming
C1100 and C210
27
This suggests that 0.225 (45) sample will be
sufficient, instead of 2 sample (200).
28
  • Notes on definition of between-cluster variation
  •   The following notes will help you relate the
    textbook definition to convention you are
    familiar with and understand different
    definitions used in other books
  •  
  • In analysis of variance, the mean square
    between groups is defined as

(for equal size group or cluster)
29
  • The definition of between cluster variation
    used in the text is

  • Between group sum of squares can be obtained by

or
30
  • Cochran defines between cluster variation as
  • If we use the definition used in the text the
    formula on the first page (last column) of
    handout should be changed to
Write a Comment
User Comments (0)
About PowerShow.com