Title: Some Extensions of the Generalized Random Tessellation Stratified GRTS Sampling Methodology
1Some Extensions of the Generalized Random
Tessellation Stratified (GRTS) Sampling
Methodology
- Don L. Stevens, Jr. , OSU
- Anthony R. Olsen, USEPA, WED
2This presentation was developed under STAR
Research Assistance Agreement No. CR82-9096-01
awarded by the U.S. Environmental Protection
Agency to Oregon State University. It has not
been formally reviewed by EPA. The views
expressed in this document are solely those of
the author and EPA does not endorse any products
or commercial services mentioned in this
presentation.
3Generalized Random-tessellation Stratified (GRTS)
Design
- Design is based on a random function that maps
the unit square into the unit interval. - The random function is constructed so that it is
1?1 and preserves some 2-dimensional proximity
relationships in the 1-dimensional image. - Accommodates variable sample point density,
sample augmentation, and spatially-structured
temporal samples.
4(No Transcript)
5Quadrant Recursive Function
- Uses a spatial address based on recursive
partitioning to order points in 2-space - Split a quadrant into similar sub-quadrants
- Split each sub-quadrant into sub-sub-quadrants
- And so on, ad infinitum
- Same idea also works in n-space
6Quadrant Recursive Functions
- Address follows splitting order
- Number the sub-quadrants
- First digit identifies sub-quadrant
- Second digit identifies sub-sub-quadrant
- And so on
7(No Transcript)
8Applications of Recursive Partitioning
- Infer inclusion probability for non-probability
sample - Balance a sample over ancillary non-spatial
variables - n-dimensional bump-hunting
9Northeast Lakes Studies
- EMAP Northeast Lakes Pilot
- Probability sample of all lakes in the
northeastern US - Secchi transparency evaluated (among many
responses) - Great American Dip-in Lakes
- 5,000 participants in various lake monitoring
programs (nation-wide) - Volunteers were asked to evaluate Secchi
transparency in their lakes between 7/1/95 and
7/9/95 (and again in 1996) - Ancillary variable
- Lake size is known for the frame
- S.A. Peterson, N. S. Urquhart, and E. B.
Welsh, Environmental Science and Technology 33
1559 - 1565. (1999)
10Northeast Lake Population 21, 277 lakes
11(No Transcript)
12(No Transcript)
13(No Transcript)
14Cumulative Distribution of Secchi Depth EMAP AND
DIP-IN Lakes
15Prior Approaches
- Overton, Young , Overton (1993)
- Use frame information to divide population into
homogeneous subgroups - Define pseudo-probabilities by treating
convenience sample as stratified random - E.g., if subgroup i contains Ni elements, and the
the convenience sample has ni units in the
subgroup, then take pi ni / Ni - Similar to re-weighting used in
post-stratification
16Prior Approaches
- Brus De Gruijter (2000)
- Used spatial interpolation to create frame
information - Treated the interpolated values as auxiliary
variable used regression estimator to improve
estimate of target variable.
17Problem BackgroundSelection Functions
- Selection function Describes how individuals in
a population are selected to produce a second
population - Developed for study of natural selection and
resource utilization - Population 1 is initial population, prior to
being subjected to some stress Population 2 is
the survivors - Population 1 is the available resource, e.g.,
food supply Population 2 is consumed resource.
18Selection Function
w(x) is a selection function for densities f1
and f2 if
19Selection Function
- Manley MacDonald (1992) suggested estimating
selection function by - where fi are kernel density estimators.
20Inclusion Function
- Informally, the (probability) inclusion function
p(s) is the probability that the unit indexed by
s is included in the sample - s may be a discrete index or a continuous
variable, e.g., xy-coordinates - Horvitz-Thompson estimator is unbiased for
population total
21We can view the inclusion function as a version
of the selection function, scaled so that
22Apply the selection function concept using the
auxiliary variable lake area Fit densities using
S-Plus
23(No Transcript)
24(No Transcript)
25(No Transcript)
26Only other ancillary information available for
population is location the (x,y) coordinates
of each lake in the population Can we do
something similar to a spatial selection
function by comparing spatial density of sample
and population?
27Try this, using a spatial density
representation by first mapping 2-d space to
1-d, and then estimating a kernel density. Use
a quadrant recursive function for the 2-d to 1-d
map
28(No Transcript)
29(No Transcript)
30Dominant feature of 1-d density is the periodic
valleys (corresponding to flats in the cdf). To
some extent, this is an artifact of the map to
the unit square.
31Northeast Lake Population
32We can eliminate the flats, and scrunch up the
cdf
33(No Transcript)
34(No Transcript)
35(No Transcript)
36(No Transcript)
37(No Transcript)
38Quadrant recursive function can also be used to
map R3 into R1, so we can use same technique to
get 1-dimensional (Octant Recursive?)
representation of (x,y,z)
39(No Transcript)
40(No Transcript)
41(No Transcript)
42(No Transcript)
43Balancing Sample over Ancillary Variables
- Basic idea
- Want to estimate the YT, the total of Y
- Have a known variable X (paired with Y)
- Believe X is (highly) correlated with Y
- Balanced sample over X should (might?) give a
more precise estimate of YT - Usual approach is to stratify on X
44GRTS MultivariateBalanced SamplingExample
- Suppose X1, , X4 known, that is, their values
are known for all population units. - For example, the Xi may be remotely-sensed
variables, e.g., a greeness index. - Further, we believe the Xi are correlated with
the target response Y
45Xi aY (1-a)N(0,1)
46(No Transcript)
47Balanced Sample Simulation
- 5000 samples of size 50
- Scenario 1 Balanced over X1, X2, and X3
- Scenario 2 Balanced over X2, X3, and X4
48Variance RatiosSRS vs Balanced Sample
49Pattern Recognition in n-space
- Use recursive partition maps of ?n to ?1 to
identify structure - Example Compare densities of (Y, X1, X2), (Y,
X2, X3), and (Y, X3, X4)
50(No Transcript)
51(No Transcript)
52Conclusions
- Selection function estimate of the inclusion
function did remove some bias, but still missing
a factor - Recursive partitioning balancing does have
promise for incorporating ancillary information
into sample - RP may also lead to a method for locating regions
of high density in n-space