Nonparametric, ModelAssisted Estimation for a TwoStage Sampling Design Mark Delorey, F. Jay Breidt, - PowerPoint PPT Presentation

1 / 1
About This Presentation
Title:

Nonparametric, ModelAssisted Estimation for a TwoStage Sampling Design Mark Delorey, F. Jay Breidt,

Description:

Nonparametric, ModelAssisted Estimation for a TwoStage Sampling Design Mark Delorey, F. Jay Breidt, – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 2
Provided by: NSU18
Category:

less

Transcript and Presenter's Notes

Title: Nonparametric, ModelAssisted Estimation for a TwoStage Sampling Design Mark Delorey, F. Jay Breidt,


1
Nonparametric, Model-Assisted Estimation for a
Two-Stage Sampling Design Mark Delorey, F. Jay
Breidt, Colorado State University
Abstract In aquatic resources, a two-stage
sampling design can be employed to make the best
use of what are often limited time and financial
resources. Even with the ability to focus such
resources, it is often the case that the sample
sizes are not sufficiently large to make
model-free inferences. The presence of auxiliary
information for the regions of interest suggests
employing a model in our inferences. Breidt,
Claeskens, and Opsomer (2003) propose
incorporating this auxiliary information through
a class of model-assisted estimators based on
penalized spline regression in single stage
sampling. Zheng and Little (2003) also use
penalized spline regression in a model-based
approach for finite population estimation in a
two-stage sample. In a survey context, weights
computed from a set of auxiliary information are
often applied to many study variables. With this
approach, model-assisted estimators should fare
better than model-based estimators. We compare
the two through a series of simulations.
  • Two-Stage Sampling
  • The population of elements U 1,, k,, N is
    partitioned into clusters or primary sampling
    units (PSUs), U1,, Ui,, . So,where Ni
    is the number of elements or secondary sampling
    units (SSUs) in Ui.
  • First stage A sample of clusters, sI, is
    selected based on a design, pI(?) with inclusion
    probabilities ?Ii and ?Iij.
  • ?Ii and ?Iij are the first and second order
    inclusion probabilities, respectively
  • Second stage For every i ? sI, a sample si is
    drawn from Ui based on the design pi(? sI)
  • Typically require second stage design to be
    invariant and independent of the first stage
  • Two-Stage Sampling with Aquatic Resources
  • Time and expense constraints may make two-stage
    sampling more efficient
  • Auxiliary information may be available on
    different scales
  • The Estimators (for population totals)
  • Horvitz-Thompson (HT)where
  • Model-assistedwhere is the PSU total
    predicted by the model
  • Model-basedwhere
    is the ith cluster mean predicted by
    the model
  • Notes on the Models and Model Parameters
  • 3 different models used
  • Linear
  • Penalized spline with random effect for PSU
  • Penalized spline with no random effect for PSU
  • In a survey context, such as those found in
    environmental monitoring, it is often desirable
    to obtain a single set of survey weights that can
    be used to predict any study variable. To
    accommodate this
  • Smoothing parameter for spline is selected by
    fixing the degrees of freedom for the smooth
    rather than using a data driven approach
  • Variance component for PSU effect is computed for
    the linear model and resulting covariance matrix
    and corresponding survey weights are applied to
    samples from other data sets
  • In this kind of survey context, model-assisted
    estimators have good efficiency properties and
    should be superior to model-based estimators
    which rely on correct specification of variance
    components
  • Case A Cluster Level Auxiliaries (Our focus)
  • The auxiliary information is available for all
    clusters in the population
  • Leads to regression modeling of quantities
    associated with the clusters, such as cluster
    totals
  • Cluster quantities can be computed for all
    clusters
  • Population quantities can be computed from
    cluster estimates
  • Example Lake represents a cluster auxiliary
    information is elevation
  • Generating Responses
  • 500 PSUs the number of SSUs per cluster
    Uniform(50, 400)
  • ?PSU m(?I) ?, where m(?) is one of the eight
    functions below and ? N(0, ?2I)
  • We use first order inclusion probabilities
    proportional to size (pps)
  • Auxiliary data is often proportional to size of
    cluster
  • Response of interest yij ?i ?ij. where yij is
    the jth element in the ith cluster and ?ij iid
    N(0, ?2)
  • Comments on Simulation Results
  • 500 samples from each of the populations were
    drawn
  • H-T Horvitz-Thompson estimatorM-A lin
    Model-assisted estimator using a linear
    modelM-B pmmra Model-based estimator using a
    penalized spline and including a random effect
    for PSUM-A pmm Model-assisted estimator using
    a penalized spline with no random effect for PSU
  • Point represents MSEEstimatorMSEModel-assisted
    estimator with radom effect for PSU
  • Vertical black bars represent approximate 95
    confidence intervals
  • Model-assisted estimator with random effect for
    PSU is as efficient or more efficient than
    model-based estimator we do not appear to lose
    efficiency (with respect to MSE) by using
    model-assisted non-parametric methods
  • Case B Complete Element Level Auxiliaries
  • The auxiliary information is available for all
    elements in the population
  • Leads to regression modeling of quantities
    associated with the elements
  • Cluster and population quantities can then be
    computed from element estimates and observations
  • Example EMAP hexagon is cluster lake is
    element auxiliary information is elevation
  • Case C Limited Element Level Auxiliaries
  • The auxiliary information is available for all
    elements in selected clusters only
  • Leads to regression modeling of quantities
    associated with the elements
  • Regression estimators can be used for
    cluster-level quantities only for the clusters
    selected in the first-stage sample
  • Example Aerial photography of selected sites
    (clusters) for each point (element) in site, we
    have percent forested, urban, industrial
  • Case D Limited Cluster Level Auxiliaries
  • The auxiliary information is available for all
    clusters in the first-stage sample
  • Not a very interesting case
  • Design-based estimator can be used for population
    quantities
  • In some cases, good estimators for population
    quantities are not available
  • Example Cluster is lake auxiliary information
    is measure of size which is not available until
    site is visited
Write a Comment
User Comments (0)
About PowerShow.com