Sampling strategies for the estimation of long form census variables - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Sampling strategies for the estimation of long form census variables

Description:

Sampling strategies for the estimation of long form census variables ... Luana De Felici Istat. Claudia De Vitiis Istat. Francesca Inglese Istat ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 38
Provided by: ist454
Category:

less

Transcript and Presenter's Notes

Title: Sampling strategies for the estimation of long form census variables


1
Sampling strategies for the estimation of long
form census variables
  • Francesco Borrelli Istat
  • Giancarlo Carbonetti Istat
  • Luana De Felici Istat
  • Claudia De Vitiis Istat
  • Francesca Inglese Istat
  • Fabrizio Solari Istat
  • SCORUS Conference Darmstadt, October 17-19 2007

2
Introduction /1
  • The Italian Institute of Statistics (ISTAT) has
    recently started a project study to evaluate
    alternative census methods in order to improve
    the efficiency of the survey operations and
    reduce the statistical burden.
  • This work is only related to the achievement of
    the 2011 Population Census in Italy.

3
Introduction /2
  • One of the innovation is referred to the
    possibility to divide the overall set of census
    variables into two subsets the first containing
    the demographic census variables and the latter
    the remaining variables (educational level,
    employment status, commuting).
  • Then, only for the first set of variables,
    traditional census methods will be used, while
    Istat will carry out a sampling survey for the
    second set of variables in the municipalities
    with population size greater than 10,000
    inhabitants.
  • In the smallest municipalities census forms will
    be submitted for all the variables (long form) in
    a traditional way.
  • In the other municipalities reduced census forms
    (short form) will be submitted to all the
    households and only to a sample of them the long
    form will be given.

4
Aim of the study
  • The study takes into account several sampling
    strategies for the submission of the long form.
  • To evaluate the efficiency of the sampling
    strategies a simulation study has been carried
    out.
  • The proposed sampling designs have to be coherent
    with the census framework and have to be able to
    produce statistical information for small
    sub-regions with a good level of accuracy.

5
Alternative strategies
  • The sampling strategies under analysis are
  • a) Sampling design of households
  • the sample is drawn from Administrative
    Registers.
  • b) Area frame sampling
  • the sample is given by the enumeration areas
    (clusters of households) selected from Digital
    Georeferenced Database.

6
Some aspects of the sampling design for the long
form
Domains sub-municipality areas
(sub-areas) Target Variables cross-classificatio
n of educational level, employment status and
commuting, with demographic variables Sampling
Units households or enumeration area, depending
on the strategy Sampling rate the long form is
submitted to about 1/3 of the households in each
municipalities Estimator final weights are
computed from the sampling weights by means of a
calibration process so that the sample is more
representative
7
Sampling design of households
  • Simple Design, without replacement and with equal
    probability of selection for each household in
    the register (CCSFAM).
  • Stratified design, without replacement and with
    equal probability of selection for each household
    in each stratum
  • stratification by household size (STRNCOMP)
  • stratification by age of head of household
    (STRETACAP).

8
Area frame sampling design
  • Simple Design, without replacement and with
    equal probability of selection for each
    enumeration area (CCSSEZ).
  • Stratified design, without replacement and with
    equal probability of selection for each
    enumeration area in each stratum from the area
    frame sorted by population size
  • stratification into three strata having
    approximately the same population size (but
    unequal number of enumeration areas) (STRSPOP)
  • stratification into three strata having
    approximately equal number of enumeration areas
    (but substantially unequal population size)
    (STRSSEZ).

9
Main aspects of the simulation study
  • Source of data 2001 population census data
  • Variables under study cross-classification of
    educational level, employment status and
    commuting, with sex
  • (90 dichotomous variables totally)
  • Municipalities Milano, Perugia and Aosta chosen
    to represent large, medium and small
    population size municipalities
  • Calibration constraints cross-classification of
    sex and age,
  • cross-classification of sex and civil status
    (40 variables totally)
  • Software Genesees v3.0 developed in Istat has
    been used in order to compute final weights in
    the calibration process.

10
Simulation algorithm
  • The computational algorithm, implemented in SAS
    code, consists of the following steps (for each
    municipality and for each alternative sampling
    design)
  • 1) selection of a sample (of households or
    enumeration areas)
  • 2) computation of final weights
  • 3) calculation of the estimates of the relative
    frequencies p for each dichotomous target
    variable
  • 4) iteration of steps 1), 2) and 3) for a fixed
    number of replications (1.000 sampling
    replications)
  • 5) computation for each dichotomous target
    variable, of the mean and standard error of the
    estimates from the simulated sampling
    distribution.

11
Evaluation criterion the coefficient of variation
  • In order to compare the sampling strategies we
    have considered as evaluation criterion the
    coefficient of variation
  • which represents an accuracy measurement of the
    sampling estimates.
  • The distribution given by the empirical cvs for
    all the 90 dichotomous target variables has been
    determined. After dividing the dichotomous
    variables into classes depending on the value of
    p, we have studied the distribution of the cvs
    related to the variable in the same group.

12
Main results /1
  • ? The median cv decreases for increasing values
    of p

13
Scatter plot of cv and p for each sub-area.
CCSSEZ design. City of Perugia.
3
2
1
14
Main results /1
  • ? The median cv decreases for increasing values
    of p
  • ? Due to the cluster effect, sampling designs of
    households results in sampling errors smaller
    than sampling designs of enumeration areas

15
Distribution of median cv for classes of p for
all the alternative sampling strategies
(estimation at sub-area level). City of Milano.
16
Distribution of median cv for classes of p for
all the alternative sampling strategies
(estimation at municipality level). City of
Perugia.
17
Main results /1
  • ? The median cv decreases for increasing values
    of p
  • ? Due to the cluster effect, sampling designs on
    households results in sampling errors smaller
    than sampling designs on enumeration areas
  • ?The stratification of the households seems not
    to produce significant reduction of sampling
    errors, both for municipality and sub-
    municipality level

18
Distribution of cv (min, median and max) for
classes of p for all the households sampling
designs (estimation at sub-area level). City of
Milano.
19
Distribution of cv (min, median and max) for
classes of p for all the households sampling
designs (estimation at municipality level). City
of Milano.
20
Main results /1
  • ? The median cv decreases for increasing values
    of p
  • ? Due to the cluster effect, sampling designs on
    households results in sampling errors smaller
    than sampling designs on enumeration areas
  • ?The stratification of the households seems not
    to produce significant reduction of sampling
    errors, both for municipality and sub-
    municipality level
  • ? As far as the stratification of the enumeration
    areas it is concerned, there are no clear
    differences among the sampling strategies
    referred to sub-area level estimation. With
    regard to municipality level estimation, the
    stratification implies smaller values of the
    cv only when pgt5

21
Distribution of cv (min, median and max) for
classes of p for all the area frame sampling
(estimation at municipality level). City of Aosta.
22
Main results /2
  • Considering the same sampling design for all the
    municipalities, we can observe that
  • ? the median cvs referred to the sub-area level
    dont differ substantially from one
    municipality to another

23
Comparison of distribution of median cv for
classes of p for estimation at sub-area level
referred to the city of Milano, Perugia and
Aosta. Sampling design CCSFAM and CCSSEZ.
CCSFAM
CCSSEZ
24
Main results /2
  • Considering the same sampling design for all the
    municipalities, we can observe that
  • ? the median cvs referred to the sub-area level
    dont differ substantially each other
  • ? the median cvs referred to the municipality
    level are significantly different, showing,
    obviously, the smallest values in the largest
    municipality

25
Distribution of cv (min, median and max) for
classes of p for estimation at municipality
level. Sampling design CCSFAM. Milano, Perugia
and Aosta.
26
Comparison of distribution of median cv for
classes of p for estimation at municipality level
referred to the city of Milano, Perugia and
Aosta. Sampling design CCSFAM and CCSSEZ.
CCSFAM
CCSSEZ
27
Main results /2
  • Considering the same sampling design for all the
    municipalities, we can observe that
  • ? the median cvs referred to the sub-area level
    dont differ substantially each other
  • ? the median cvs referred to the municipality
    level are significantly different, showing
    the smallest values in the largest municipality
  • ? the municipality level estimates display
    smaller cvs values than the sub-area level
    estimates

28
Median cv for three classes of p for estimation
at sub-area level and municipality level for each
sampling design (Milano, Perugia and Aosta).
29
Main results /2
  • Considering the same sampling design for all the
    municipalities, we can observe that
  • ? the median cvs referred to the sub-area level
    dont differ substantially each other
  • ? the median cvs referred to the municipality
    level are significantly different, showing
    the smallest values in the largest municipality
  • ? the municipality level estimates display
    smaller cvs values than the sub-area level
    estimates
  • ? the larger the municipality population size the
    biggest the reduction of cv values when moving
    from sub-area level estimation to municipality
    level estimation

30
Comparison of median cv for two classes of p
between estimation at sub-area level and
municipality level for each sampling design
(Milano, Perugia and Aosta).
31
Main results /2
  • Considering the same sampling design for all the
    municipalities, we can observe that
  • ? the median cvs referred to the sub-area level
    dont differ substantially each other
  • ? the median cvs referred to the municipality
    level are significantly different, showing
    the smallest values in the largest municipality
  • ? the municipality level estimates display
    smaller cvs values than the sub-area level
    estimates
  • ? the larger the municipality population size the
    biggest the reduction of cv values when moving
    from sub-area level estimation to municipality
    level estimation
  • ? with reference to sub-area level estimation, we
    have small cv values for large sub-areas
    (gt10.000 inhabitants) or for sub-areas with
  • a large number of enumeration areas (gt50).

32
Distribution of median cv for four classes of p
and three classes of sub-areas (according to
population size) for all the sampling designs.
City of Milano.
33
Distribution of median cv for four classes of p
and three classes of sub-areas (according to
number of enumeration areas) for all the sampling
designs. City of Milano.
34
Comparison of the distributions of median cv for
three classes of sub-areas (according to
population size). CCSFAM sampling design. City of
Milano.
35
Comparison of the distributions of median cv for
three classes of sub-areas (according to number
of enumeration areas). CCSFAM sampling design.
City of Milano.
36
Conclusions
  • The results seem to encourage the use of sampling
    techniques for the adoption of the long form
    strategy in the population census.
  • Sampling of households produces more efficient
    estimates than sampling of enumeration areas.
    Area frame sampling, however, represents a
    practicable solution to produce estimates with
    reliable quality level.
  • The only statistical information about households
    available on administrative registers doesnt
    allow efficient stratification.
  • For the area frame sampling, the best
    stratification criterion seems to divide the
    enumeration areas in to three strata with
    approximately equal population size.
  • More accurate estimates are observed for the
    largest sub-areas (in terms of population size or
    number of enumeration areas).

37
Final remark
The comparative analysis shows that the sampling
strategies on households and the area frame
sampling could be applied simultaneously for a
complex sampling strategy using the long form
approach in the 2011 population census in
Italy. In fact, where reliable administrative
registers are not available, then the use of area
frame sampling seems to be a good alternative
solution. For any question please contact
Giancarlo Carbonetti at the following e-mail
address carbonet_at_istat.it
Write a Comment
User Comments (0)
About PowerShow.com