Some Extensions of the Generalized Random Tessellation Stratified GRTS Sampling Methodology

About This Presentation

Title:

Some Extensions of the Generalized Random Tessellation Stratified GRTS Sampling Methodology

Description:

Ancillary variable. Lake size is known for the frame ... Only other ancillary information. available for population is location: ... – PowerPoint PPT presentation

Number of Views:108

Avg rating:3.0/5.0

Slides: 53

Provided by: Stev217

Category:

more less

Transcript and Presenter's Notes

Title: Some Extensions of the Generalized Random Tessellation Stratified GRTS Sampling Methodology

1
Some Extensions of the Generalized Random
Tessellation Stratified (GRTS) Sampling
Methodology

Don L. Stevens, Jr. , OSU
Anthony R. Olsen, USEPA, WED

2
This presentation was developed under STAR
Research Assistance Agreement No. CR82-9096-01
awarded by the U.S. Environmental Protection
Agency to Oregon State University. It has not
been formally reviewed by EPA. The views
expressed in this document are solely those of
the author and EPA does not endorse any products
or commercial services mentioned in this
presentation.
3
Generalized Random-tessellation Stratified (GRTS)
Design

Design is based on a random function that maps
the unit square into the unit interval.
The random function is constructed so that it is
1?1 and preserves some 2-dimensional proximity
relationships in the 1-dimensional image.
Accommodates variable sample point density,
sample augmentation, and spatially-structured
temporal samples.

4
(No Transcript)
5
Quadrant Recursive Function

Uses a spatial address based on recursive
partitioning to order points in 2-space
Split a quadrant into similar sub-quadrants
Split each sub-quadrant into sub-sub-quadrants
And so on, ad infinitum
Same idea also works in n-space

6
Quadrant Recursive Functions

Address follows splitting order
Number the sub-quadrants
First digit identifies sub-quadrant
Second digit identifies sub-sub-quadrant
And so on

7
(No Transcript)
8
Applications of Recursive Partitioning

Infer inclusion probability for non-probability
sample
Balance a sample over ancillary non-spatial
variables
n-dimensional bump-hunting

9
Northeast Lakes Studies

EMAP Northeast Lakes Pilot
Probability sample of all lakes in the
northeastern US
Secchi transparency evaluated (among many
responses)
Great American Dip-in Lakes
5,000 participants in various lake monitoring
programs (nation-wide)
Volunteers were asked to evaluate Secchi
transparency in their lakes between 7/1/95 and
7/9/95 (and again in 1996)
Ancillary variable
Lake size is known for the frame
S.A. Peterson, N. S. Urquhart, and E. B.
Welsh, Environmental Science and Technology 33
1559 - 1565. (1999)

10
Northeast Lake Population 21, 277 lakes
11
(No Transcript)
12
(No Transcript)
13
(No Transcript)
14
Cumulative Distribution of Secchi Depth EMAP AND
DIP-IN Lakes
15
Prior Approaches

Overton, Young , Overton (1993)
Use frame information to divide population into
homogeneous subgroups
Define pseudo-probabilities by treating
convenience sample as stratified random
E.g., if subgroup i contains Ni elements, and the
the convenience sample has ni units in the
subgroup, then take pi ni / Ni
Similar to re-weighting used in
post-stratification

16
Prior Approaches

Brus De Gruijter (2000)
Used spatial interpolation to create frame
information
Treated the interpolated values as auxiliary
variable used regression estimator to improve
estimate of target variable.

17
Problem BackgroundSelection Functions

Selection function Describes how individuals in
a population are selected to produce a second
population
Developed for study of natural selection and
resource utilization
Population 1 is initial population, prior to
being subjected to some stress Population 2 is
the survivors
Population 1 is the available resource, e.g.,
food supply Population 2 is consumed resource.

18
Selection Function
w(x) is a selection function for densities f1
and f2 if
19
Selection Function

Manley MacDonald (1992) suggested estimating
selection function by
where fi are kernel density estimators.

20
Inclusion Function

Informally, the (probability) inclusion function
p(s) is the probability that the unit indexed by
s is included in the sample
s may be a discrete index or a continuous
variable, e.g., xy-coordinates
Horvitz-Thompson estimator is unbiased for
population total

21
We can view the inclusion function as a version
of the selection function, scaled so that
22
Apply the selection function concept using the
auxiliary variable lake area Fit densities using
S-Plus
23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
Only other ancillary information available for
population is location the (x,y) coordinates
of each lake in the population Can we do
something similar to a spatial selection
function by comparing spatial density of sample
and population?
27
Try this, using a spatial density
representation by first mapping 2-d space to
1-d, and then estimating a kernel density. Use
a quadrant recursive function for the 2-d to 1-d
map
28
(No Transcript)
29
(No Transcript)
30
Dominant feature of 1-d density is the periodic
valleys (corresponding to flats in the cdf). To
some extent, this is an artifact of the map to
the unit square.
31
Northeast Lake Population
32
We can eliminate the flats, and scrunch up the
cdf
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
38
Quadrant recursive function can also be used to
map R3 into R1, so we can use same technique to
get 1-dimensional (Octant Recursive?)
representation of (x,y,z)
39
(No Transcript)
40
(No Transcript)
41
(No Transcript)
42
(No Transcript)
43
Balancing Sample over Ancillary Variables

Basic idea
Want to estimate the YT, the total of Y
Have a known variable X (paired with Y)
Believe X is (highly) correlated with Y
Balanced sample over X should (might?) give a
more precise estimate of YT
Usual approach is to stratify on X

44
GRTS MultivariateBalanced SamplingExample

Suppose X1, , X4 known, that is, their values
are known for all population units.
For example, the Xi may be remotely-sensed
variables, e.g., a greeness index.
Further, we believe the Xi are correlated with
the target response Y

45
Xi aY (1-a)N(0,1)
46
(No Transcript)
47
Balanced Sample Simulation

5000 samples of size 50
Scenario 1 Balanced over X1, X2, and X3
Scenario 2 Balanced over X2, X3, and X4

48
Variance RatiosSRS vs Balanced Sample
49
Pattern Recognition in n-space

Use recursive partition maps of ?n to ?1 to
identify structure
Example Compare densities of (Y, X1, X2), (Y,
X2, X3), and (Y, X3, X4)

50
(No Transcript)
51
(No Transcript)
52
Conclusions

Selection function estimate of the inclusion
function did remove some bias, but still missing
a factor
Recursive partitioning balancing does have
promise for incorporating ancillary information
into sample
RP may also lead to a method for locating regions
of high density in n-space

Write a Comment

User Comments (0)

About PowerShow.com

Some Extensions of the Generalized Random Tessellation Stratified GRTS Sampling Methodology - PowerPoint PPT Presentation

Some Extensions of the Generalized Random Tessellation Stratified GRTS Sampling Methodology

Ancillary variable. Lake size is known for the frame ... Only other ancillary information. available for population is location: ... – PowerPoint PPT presentation