Title: Model-%20vs.%20design-based%20sampling%20and%20variance%20estimation%20on%20continuous%20domains
1Model- vs. design-based sampling and variance
estimationon continuous domains
- Cynthia Cooper
- OSU Statistics
- September 11, 2004
2Introduction
- Research on model- and design-based sampling and
estimation on continuous domains - Compare ...
- Basis of inference of each
- Sampling concepts
- Interpretation of variance
- Variance estimation
3Duality in Environmental Monitoring
- Design-based Estimates
- Status and trend
- No model of underlying stochastic process
- Defensible
- Probability sample
- Avoid selection bias
- Control sample process variance
- Model-based predictions
- Stochastic behavior of response
- Forecasting/prediction conditional on the
observed data
4General Outline
- Introduction
- Summary comparison of approaches
- Summary characterization of variance estimators
- Proposed model-assisted variance estimator
- Simulation methods
- Design-based context results
- Model-based (kriging) results
- Conclusion
5Comparison of approaches - Design-based
- Probability samples unbiased estimates
- Basis for long-run frequency properties
- Design-induced randomness sample process
variance - Basic linear estimator scales up sample responses
to extrapolate to population - Inclusion probabilities
- Examples
- EPA EMAP
- ODFW Monitoring Plan Augmented Rotating Panel
- USFS Forest Inventory and Analysis
6Comparison of approaches - Design-based
- Inclusion probability
- Element-wise Sum of probabilities of all
samples which include the ith element - ?i
- Pair-wise -- Sum of which include ith jth
elements - ?ij
- For continuous domains
- Inclusion probability densities (IPD) (Cordy
(1993))
7Comparison of approaches - Model-based
- Response generated by a stochastic process
- Likelihood-based approaches to estimating
parameters of model - BLUP
- Conditional on values observed in sample
- Examples
- Mining surveys
- Soil and hydrology surveys
8Variance estimators - Design-based
- Quantifies variability induced by sampling
process - Variance of linear estimators
- Scale up square and cross-product terms with
inverse marginal and pair-wise inclusion
probability densities (IPDs) - For continuous domains
- Congruent tessellation stratified samples w/ one
observation per stratum - Require randomized grid origin to achieve
non-zero cross-product terms (pij-pipj) (Stevens
(1997))
9Variance estimators - Design-based
- Horvitz-Thompson (HT)
- Can be negative
- Especially samples with a point pair in close
proximity - Requires randomly-located tessellation grid
10Variance estimators - Design-based
- Yates-Grundy (YG)
- Assumes fixed effective sample size
- Point pairs with close proximity can destabilize
(Stevens (2003)) - Requires randomly-located tessellation grid
11Variance estimators - Model-based
- Estimating MSPE of BLUP
- Involves variances and covariances associated
with square and cross-product terms of error - Assume form of covariance that describes rate of
decay of covariance - Exponential
- Spherical
- Must result in positive-definite covariance
matrix - Incremental stationarity
- E(z(si) -z(so))2 g(si-so) g(h)
- Typically, h ? ? E ?
12Variance estimators - Model-based
- Variance
- Quantifies stochastic variability of expected
value of response - Vanishes as si-so ? 0
- Mean-square prediction error (MSPE)
- a.k.a. MSE
- Variance bias2
- Sample process variability of BLUP
- Weighted averages vary less
- Varies more as sample range increases relative to
resolution
13Proposed model-assisted variance (VMA)
Use covariance structure of response to model
variability due to sampling process
- Predict variance within a stratum
- Variance is reduced by mean covariance (assuming
positively correlated elements) - Similar to error variance computations (Ripley
(1981)) - Within-stratum estimated as
- Sill reduced by within-stratum average covariance
- Linear estimator variance estimated as sum of
squared coefficients times within-stratum variance
14Precursors of and precedence formodeling
covariance
- Cochran (1946)
- Finite population
- Serial correlation w/ discrete lags
- Bellhouse (1977)
- Continued extension of Cochrans work to finite
populations ordered on two dimensions - Small-area estimation model-assisted approaches
- J.N.K Rao (2003)
15Methods part 1
- Random field (background) generated in R
- M. Schlather's GaussRF() of R package
RandomFields - Exponential covariance structure bexp(-h/r)
- (e.g. 4exp(-h/2))
- h is distance b and r are "sill" and "range"
parameters
16Methods part 1a
- Repeat 1000 times per realization
- Stratified sample
- n100 one observation per stratum stratum size
2x2 - Simple square-grid tessellation
- Randomized origin
- Constant origin
- REML estimate of covariance parameters (b,r)
17Methods part 2
- Repeat 1000 times per realization (continued)
- For the design-based context
- Estimate total (zhat)
- HT estimator for continuous domain
- Compute VHT, VYG and VMA
- Compare estimated variances with empirical
variance (Vzhat) - For the model-based context example (Kriging)
- Randomly selected zo at fixed location over 1000
trials - Obtain zhat, VOK, VMA
18(No Transcript)
19Results Design-based application
Empirical median relative error Compares
estimated variances with empirical variance of
estimate of total (Vzhat) (Stratified sample
with randomized origin)
20Results Design-based application
Exponential covariance with range 2 and sill 4
Avg Med
Avg Med
Avg Med
200
200
Observed Vzhat
Observed Vzhat
Observed Vzhat
0
0
0
1000
1500
2000
2500
3000
3500
4000
4500
-6000
-4000
-2000
0
2000
4000
6000
1000
2000
3000
4000
5000
Yates-Grundy Variance
Horvitz-Thompson Variance
Model-assisted Variance
21Results Design-based application
Ratios of empirical standard deviations (Stratif
ied sample with randomized origin)
22Results Model-based application
Exponential covariance with range 1 and sill
1 (stratified sample with randomized origin)
Avg
Avg
100
Observed Vzhat
Observed Vzhat
0
0
0.0
0.2
0.4
0.6
0.8
1.0
1.2
0.0
0.2
0.4
0.6
0.8
1.0
1.2
Kriging variance (MSPE)
Model-assisted variance
23Concluding - Model-assisted approach
- Small-area precedence
- Application to systematic and one-observation-per-
stratum samples - Effective alternative to direct estimators of
continuous-domain randomized-origin tessellation
stratified samples - Empirical results less bias, better efficiency
- Doesnt require randomly-located tessellation
grid on continuous domain for non-zero pij
24Acknowledgements
- Thanks to
- Don Stevens
- Committee members
- OSU Statistics Faculty
- UW QERM Faculty
25The research described in this presentation has
been funded by the U.S. Environmental Protection
Agency through the STAR Cooperative Agreement
CR82-9096-01 National Research Program on
Design-Based/Model-Assisted Survey Methodology
for Aquatic Resources at Oregon State
University. It has not been subjected to the
Agency's review and therefore does not
necessarily reflect the views of the Agency, and
no official endorsement should be inferred