Title: Nonparametric, ModelAssisted Estimation for a TwoStage Sampling Design Mark Delorey, F. Jay Breidt,
1Nonparametric, Model-Assisted Estimation for a
Two-Stage Sampling Design Mark Delorey, F. Jay
Breidt, Colorado State University
Abstract In aquatic resources, a two-stage
sampling design can be employed to make the best
use of what are often limited time and financial
resources. Even with the ability to focus such
resources, it is often the case that the sample
sizes are not sufficiently large to make
model-free inferences. The presence of auxiliary
information for the regions of interest suggests
employing a model in our inferences. Breidt,
Claeskens, and Opsomer (2003) propose
incorporating this auxiliary information through
a class of model-assisted estimators based on
penalized spline regression in single stage
sampling. Zheng and Little (2003) also use
penalized spline regression in a model-based
approach for finite population estimation in a
two-stage sample. In a survey context, weights
computed from a set of auxiliary information are
often applied to many study variables. With this
approach, model-assisted estimators should fare
better than model-based estimators. We compare
the two through a series of simulations.
- Two-Stage Sampling
- The population of elements U 1,, k,, N is
partitioned into clusters or primary sampling
units (PSUs), U1,, Ui,, . So,where Ni
is the number of elements or secondary sampling
units (SSUs) in Ui. - First stage A sample of clusters, sI, is
selected based on a design, pI(?) with inclusion
probabilities ?Ii and ?Iij. - ?Ii and ?Iij are the first and second order
inclusion probabilities, respectively - Second stage For every i ? sI, a sample si is
drawn from Ui based on the design pi(? sI) - Typically require second stage design to be
invariant and independent of the first stage - Two-Stage Sampling with Aquatic Resources
- Time and expense constraints may make two-stage
sampling more efficient - Auxiliary information may be available on
different scales
- The Estimators (for population totals)
- Horvitz-Thompson (HT)where
- Model-assistedwhere is the PSU total
predicted by the model - Model-basedwhere
is the ith cluster mean predicted by
the model
- Notes on the Models and Model Parameters
- 3 different models used
- Linear
- Penalized spline with random effect for PSU
- Penalized spline with no random effect for PSU
- In a survey context, such as those found in
environmental monitoring, it is often desirable
to obtain a single set of survey weights that can
be used to predict any study variable. To
accommodate this - Smoothing parameter for spline is selected by
fixing the degrees of freedom for the smooth
rather than using a data driven approach - Variance component for PSU effect is computed for
the linear model and resulting covariance matrix
and corresponding survey weights are applied to
samples from other data sets - In this kind of survey context, model-assisted
estimators have good efficiency properties and
should be superior to model-based estimators
which rely on correct specification of variance
components
- Case A Cluster Level Auxiliaries (Our focus)
- The auxiliary information is available for all
clusters in the population - Leads to regression modeling of quantities
associated with the clusters, such as cluster
totals - Cluster quantities can be computed for all
clusters - Population quantities can be computed from
cluster estimates - Example Lake represents a cluster auxiliary
information is elevation
- Generating Responses
- 500 PSUs the number of SSUs per cluster
Uniform(50, 400) - ?PSU m(?I) ?, where m(?) is one of the eight
functions below and ? N(0, ?2I) - We use first order inclusion probabilities
proportional to size (pps) - Auxiliary data is often proportional to size of
cluster - Response of interest yij ?i ?ij. where yij is
the jth element in the ith cluster and ?ij iid
N(0, ?2)
- Comments on Simulation Results
- 500 samples from each of the populations were
drawn - H-T Horvitz-Thompson estimatorM-A lin
Model-assisted estimator using a linear
modelM-B pmmra Model-based estimator using a
penalized spline and including a random effect
for PSUM-A pmm Model-assisted estimator using
a penalized spline with no random effect for PSU - Point represents MSEEstimatorMSEModel-assisted
estimator with radom effect for PSU - Vertical black bars represent approximate 95
confidence intervals - Model-assisted estimator with random effect for
PSU is as efficient or more efficient than
model-based estimator we do not appear to lose
efficiency (with respect to MSE) by using
model-assisted non-parametric methods
- Case B Complete Element Level Auxiliaries
- The auxiliary information is available for all
elements in the population - Leads to regression modeling of quantities
associated with the elements - Cluster and population quantities can then be
computed from element estimates and observations - Example EMAP hexagon is cluster lake is
element auxiliary information is elevation
- Case C Limited Element Level Auxiliaries
- The auxiliary information is available for all
elements in selected clusters only - Leads to regression modeling of quantities
associated with the elements - Regression estimators can be used for
cluster-level quantities only for the clusters
selected in the first-stage sample - Example Aerial photography of selected sites
(clusters) for each point (element) in site, we
have percent forested, urban, industrial
- Case D Limited Cluster Level Auxiliaries
- The auxiliary information is available for all
clusters in the first-stage sample - Not a very interesting case
- Design-based estimator can be used for population
quantities - In some cases, good estimators for population
quantities are not available - Example Cluster is lake auxiliary information
is measure of size which is not available until
site is visited