Nonparametric, ModelAssisted Estimation for a TwoStage Sampling Design Mark Delorey, F. Jay Breidt,

About This Presentation

Title:

Nonparametric, ModelAssisted Estimation for a TwoStage Sampling Design Mark Delorey, F. Jay Breidt,

Description:

Nonparametric, ModelAssisted Estimation for a TwoStage Sampling Design Mark Delorey, F. Jay Breidt, – PowerPoint PPT presentation

Number of Views:51

Avg rating:3.0/5.0

Slides: 2

Provided by: NSU18

Category:

more less

Transcript and Presenter's Notes

Title: Nonparametric, ModelAssisted Estimation for a TwoStage Sampling Design Mark Delorey, F. Jay Breidt,

1
Nonparametric, Model-Assisted Estimation for a
Two-Stage Sampling Design Mark Delorey, F. Jay
Breidt, Colorado State University
Abstract In aquatic resources, a two-stage
sampling design can be employed to make the best
use of what are often limited time and financial
resources. Even with the ability to focus such
resources, it is often the case that the sample
sizes are not sufficiently large to make
model-free inferences. The presence of auxiliary
information for the regions of interest suggests
employing a model in our inferences. Breidt,
Claeskens, and Opsomer (2003) propose
incorporating this auxiliary information through
a class of model-assisted estimators based on
penalized spline regression in single stage
sampling. Zheng and Little (2003) also use
penalized spline regression in a model-based
approach for finite population estimation in a
two-stage sample. In a survey context, weights
computed from a set of auxiliary information are
often applied to many study variables. With this
approach, model-assisted estimators should fare
better than model-based estimators. We compare
the two through a series of simulations.

Two-Stage Sampling
The population of elements U 1,, k,, N is
partitioned into clusters or primary sampling
units (PSUs), U1,, Ui,, . So,where Ni
is the number of elements or secondary sampling
units (SSUs) in Ui.
First stage A sample of clusters, sI, is
selected based on a design, pI(?) with inclusion
probabilities ?Ii and ?Iij.
?Ii and ?Iij are the first and second order
inclusion probabilities, respectively
Second stage For every i ? sI, a sample si is
drawn from Ui based on the design pi(? sI)
Typically require second stage design to be
invariant and independent of the first stage
Two-Stage Sampling with Aquatic Resources
Time and expense constraints may make two-stage
sampling more efficient
Auxiliary information may be available on
different scales

The Estimators (for population totals)
Horvitz-Thompson (HT)where
Model-assistedwhere is the PSU total
predicted by the model
Model-basedwhere
is the ith cluster mean predicted by
the model

Notes on the Models and Model Parameters
3 different models used
Linear
Penalized spline with random effect for PSU
Penalized spline with no random effect for PSU
In a survey context, such as those found in
environmental monitoring, it is often desirable
to obtain a single set of survey weights that can
be used to predict any study variable. To
accommodate this
Smoothing parameter for spline is selected by
fixing the degrees of freedom for the smooth
rather than using a data driven approach
Variance component for PSU effect is computed for
the linear model and resulting covariance matrix
and corresponding survey weights are applied to
samples from other data sets
In this kind of survey context, model-assisted
estimators have good efficiency properties and
should be superior to model-based estimators
which rely on correct specification of variance
components

Case A Cluster Level Auxiliaries (Our focus)
The auxiliary information is available for all
clusters in the population
Leads to regression modeling of quantities
associated with the clusters, such as cluster
totals
Cluster quantities can be computed for all
clusters
Population quantities can be computed from
cluster estimates
Example Lake represents a cluster auxiliary
information is elevation

Generating Responses
500 PSUs the number of SSUs per cluster
Uniform(50, 400)
?PSU m(?I) ?, where m(?) is one of the eight
functions below and ? N(0, ?2I)
We use first order inclusion probabilities
proportional to size (pps)
Auxiliary data is often proportional to size of
cluster
Response of interest yij ?i ?ij. where yij is
the jth element in the ith cluster and ?ij iid
N(0, ?2)

Comments on Simulation Results
500 samples from each of the populations were
drawn
H-T Horvitz-Thompson estimatorM-A lin
Model-assisted estimator using a linear
modelM-B pmmra Model-based estimator using a
penalized spline and including a random effect
for PSUM-A pmm Model-assisted estimator using
a penalized spline with no random effect for PSU
Point represents MSEEstimatorMSEModel-assisted
estimator with radom effect for PSU
Vertical black bars represent approximate 95
confidence intervals
Model-assisted estimator with random effect for
PSU is as efficient or more efficient than
model-based estimator we do not appear to lose
efficiency (with respect to MSE) by using
model-assisted non-parametric methods

Case B Complete Element Level Auxiliaries
The auxiliary information is available for all
elements in the population
Leads to regression modeling of quantities
associated with the elements
Cluster and population quantities can then be
computed from element estimates and observations
Example EMAP hexagon is cluster lake is
element auxiliary information is elevation

Case C Limited Element Level Auxiliaries
The auxiliary information is available for all
elements in selected clusters only
Leads to regression modeling of quantities
associated with the elements
Regression estimators can be used for
cluster-level quantities only for the clusters
selected in the first-stage sample
Example Aerial photography of selected sites
(clusters) for each point (element) in site, we
have percent forested, urban, industrial

Case D Limited Cluster Level Auxiliaries
The auxiliary information is available for all
clusters in the first-stage sample
Not a very interesting case
Design-based estimator can be used for population
quantities
In some cases, good estimators for population
quantities are not available
Example Cluster is lake auxiliary information
is measure of size which is not available until
site is visited