Title: Applications of Nonparametric Survey Regression Estimation in Aquatic Resources F. Jay Breidt, Siobhan Everson-Stewart, Alicia Johnson, Jean D. Opsomer
1Applications of Nonparametric Survey Regression
Estimation in Aquatic ResourcesF. Jay Breidt,
Siobhan Everson-Stewart, Alicia Johnson, Jean D.
Opsomer
Application to Northeastern Lakes
Nonparametric Model-Assisted Survey Regression
Estimation F. Jay Breidt Jean D. Opsomer
- Population and Study Design
- EMAP surveyed lakes in the northeastern
- United States from 1991-1996
- Aquatic resource of interest is
- over 20,000 lakes in 8 states
- 330 individual lakes were visited, each
- from one to six times
- Many measurements were taken on each lake,
including several lake chemistry levels - Acid neutralizing capacity (ANC) is a
- measure of a lakes ability to
- buffer itself
- Auxiliary Information
- For every lake in the region of interest,
- auxiliary information included spatial
- location, elevation, and ecoregion
- Use spatial location for illustration
- Easy to extend semiparametrically with
parametric terms for elevation and ecoregion
- Model-Assisted Estimation
- Auxiliary Information
- Use auxiliary information available for the
entire aquatic resource of interest in addition
to the sample data - Example spatial location of every lake in the
population is known for EPAs Environmental
Monitoring and Assessment Program (EMAP)
Northeastern Lakes study - General Form of the Model-Assisted Estimator
- Estimate population total as sum of model-based
predictions for all population elements, plus a
design-bias adjustment - Classical Parametric Survey Regression Estimator
- Model-based predictions come from regressing the
sample response on the auxiliary variable - A Nonparametric Approach
- Motivation for Nonparametric Methods
- Regression estimator is inefficient if true
relationship between the response and the
auxiliary information is not linear - Breidt and Opsomer (2000) replaced parametric
regression by nonparametric regression
Map of lake population and lakes included in the
EMAP Northeastern Lakes survey.
For more information, see Everson-Stewart (2003),
Nonparametric survey regression estimation in
two-stage spatial sampling, unpublished masters
project, Colorado State University, available at
http//www.stat.colostate.edu/starmap/everson-ste
wart.report.pdf.
Extension to CDF Estimation Colorado State MS
Project Alicia Johnson
- Findings
- For both CDF estimation and estimation of
- the median
- Compared nonparametric regression estimator to
Horvitz-Thompson and - parametric estimators
- Nonparametric regression estimator performed
well, in terms of mean square error, especially
when the parametric model was misspecified - Model-assisted approaches had lower relative bias
than model-based approaches
Objectives Extend nonparametric regression
estimation to finite population cumulative
distribution function (CDF) estimation and
compare to parametric techniques.
Illustration of local linear regression. Curves
at the bottom of the graph are kernel weights.
The solid lines show the local weighted least
squares fit at the points of interest. The dotted
line is the kernel smooth.
Cumulative distribution function of ANC based on
local planar regression (LPR) smooth on spatial
location, with 95 pointwise confidence
intervals. For comparison, design-based
empirical CDF and confidence bounds are also
shown.
- Approach
- Replaced response variable by indicator
- 1 for , 0 otherwise
- Smoothed indicator versus auxiliary, x
- Generated seven populations with various
- mean functions and variance terms
-
- Performed simulation study to compare
- nonparametric regression CDF estimator to
standard CDF estimators - for estimation of CDF at median
- for estimation of median
Illustration of the model mean and
standard deviation bounds (left) and the CDF
(right) for one of seven generated populations.
Relative biases and mean square error ratios
(relative to model-assisted local linear, LLR)
for DB (design-based Horvitz-Thompson), CD0 and
CD1 (parametric model-based using ratio and
regression models), RKM0 and RKM1 (parametric
model-assisted using ratio and regression
models), and LLRB (local linear model-based)
CI for Proportion of Acidic Lakes with National
Surface Waters Survey Estimate
For more information, see Johnson, A. (2003),
Estimating Distribution Functions from Survey
Data, unpublished masters project, Colorado State
University, available at http//www.stat.colostat
e.edu/starmap/johnsonaa.report.pdf.
This research is funded by U.S.EPA Science To
Achieve Results (STAR) Program Cooperative Agreeme
nts
The research described in this poster has been
funded by the U.S. Environmental Protection
Agency through STAR Cooperative Agreements
CR-829095 awarded by the U.S. Environmental
Protection Agency (EPA) to Colorado State
University and CR82-9096 awarded to Oregon State
University. The poster has not been subjected
to the Agency's review and therefore does not
necessarily reflect the views of the Agency, and
no official endorsement should be inferred
CR 829095 and CR 829096