Some Extensions of the Generalized Random Tessellation Stratified GRTS Sampling Methodology - PowerPoint PPT Presentation

1 / 52
About This Presentation
Title:

Some Extensions of the Generalized Random Tessellation Stratified GRTS Sampling Methodology

Description:

Ancillary variable. Lake size is known for the frame ... Only other ancillary information. available for population is location: ... – PowerPoint PPT presentation

Number of Views:108
Avg rating:3.0/5.0
Slides: 53
Provided by: Stev217
Category:

less

Transcript and Presenter's Notes

Title: Some Extensions of the Generalized Random Tessellation Stratified GRTS Sampling Methodology


1
Some Extensions of the Generalized Random
Tessellation Stratified (GRTS) Sampling
Methodology
  • Don L. Stevens, Jr. , OSU
  • Anthony R. Olsen, USEPA, WED

2
This presentation was developed under STAR
Research Assistance Agreement No. CR82-9096-01
awarded by the U.S. Environmental Protection
Agency to Oregon State University. It has not
been formally reviewed by EPA. The views
expressed in this document are solely those of
the author and EPA does not endorse any products
or commercial services mentioned in this
presentation.
3
Generalized Random-tessellation Stratified (GRTS)
Design
  • Design is based on a random function that maps
    the unit square into the unit interval.
  • The random function is constructed so that it is
    1?1 and preserves some 2-dimensional proximity
    relationships in the 1-dimensional image.
  • Accommodates variable sample point density,
    sample augmentation, and spatially-structured
    temporal samples.

4
(No Transcript)
5
Quadrant Recursive Function
  • Uses a spatial address based on recursive
    partitioning to order points in 2-space
  • Split a quadrant into similar sub-quadrants
  • Split each sub-quadrant into sub-sub-quadrants
  • And so on, ad infinitum
  • Same idea also works in n-space

6
Quadrant Recursive Functions
  • Address follows splitting order
  • Number the sub-quadrants
  • First digit identifies sub-quadrant
  • Second digit identifies sub-sub-quadrant
  • And so on

7
(No Transcript)
8
Applications of Recursive Partitioning
  • Infer inclusion probability for non-probability
    sample
  • Balance a sample over ancillary non-spatial
    variables
  • n-dimensional bump-hunting

9
Northeast Lakes Studies
  • EMAP Northeast Lakes Pilot
  • Probability sample of all lakes in the
    northeastern US
  • Secchi transparency evaluated (among many
    responses)
  • Great American Dip-in Lakes
  • 5,000 participants in various lake monitoring
    programs (nation-wide)
  • Volunteers were asked to evaluate Secchi
    transparency in their lakes between 7/1/95 and
    7/9/95 (and again in 1996)
  • Ancillary variable
  • Lake size is known for the frame
  • S.A. Peterson, N. S. Urquhart, and E. B.
    Welsh, Environmental Science and Technology 33
    1559 - 1565. (1999)

10
Northeast Lake Population 21, 277 lakes
11
(No Transcript)
12
(No Transcript)
13
(No Transcript)
14
Cumulative Distribution of Secchi Depth EMAP AND
DIP-IN Lakes
15
Prior Approaches
  • Overton, Young , Overton (1993)
  • Use frame information to divide population into
    homogeneous subgroups
  • Define pseudo-probabilities by treating
    convenience sample as stratified random
  • E.g., if subgroup i contains Ni elements, and the
    the convenience sample has ni units in the
    subgroup, then take pi ni / Ni
  • Similar to re-weighting used in
    post-stratification

16
Prior Approaches
  • Brus De Gruijter (2000)
  • Used spatial interpolation to create frame
    information
  • Treated the interpolated values as auxiliary
    variable used regression estimator to improve
    estimate of target variable.

17
Problem BackgroundSelection Functions
  • Selection function Describes how individuals in
    a population are selected to produce a second
    population
  • Developed for study of natural selection and
    resource utilization
  • Population 1 is initial population, prior to
    being subjected to some stress Population 2 is
    the survivors
  • Population 1 is the available resource, e.g.,
    food supply Population 2 is consumed resource.

18
Selection Function
w(x) is a selection function for densities f1
and f2 if
19
Selection Function
  • Manley MacDonald (1992) suggested estimating
    selection function by
  • where fi are kernel density estimators.

20
Inclusion Function
  • Informally, the (probability) inclusion function
    p(s) is the probability that the unit indexed by
    s is included in the sample
  • s may be a discrete index or a continuous
    variable, e.g., xy-coordinates
  • Horvitz-Thompson estimator is unbiased for
    population total

21
We can view the inclusion function as a version
of the selection function, scaled so that
22
Apply the selection function concept using the
auxiliary variable lake area Fit densities using
S-Plus
23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
Only other ancillary information available for
population is location the (x,y) coordinates
of each lake in the population Can we do
something similar to a spatial selection
function by comparing spatial density of sample
and population?
27
Try this, using a spatial density
representation by first mapping 2-d space to
1-d, and then estimating a kernel density. Use
a quadrant recursive function for the 2-d to 1-d
map
28
(No Transcript)
29
(No Transcript)
30
Dominant feature of 1-d density is the periodic
valleys (corresponding to flats in the cdf). To
some extent, this is an artifact of the map to
the unit square.
31
Northeast Lake Population
32
We can eliminate the flats, and scrunch up the
cdf
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
38
Quadrant recursive function can also be used to
map R3 into R1, so we can use same technique to
get 1-dimensional (Octant Recursive?)
representation of (x,y,z)
39
(No Transcript)
40
(No Transcript)
41
(No Transcript)
42
(No Transcript)
43
Balancing Sample over Ancillary Variables
  • Basic idea
  • Want to estimate the YT, the total of Y
  • Have a known variable X (paired with Y)
  • Believe X is (highly) correlated with Y
  • Balanced sample over X should (might?) give a
    more precise estimate of YT
  • Usual approach is to stratify on X

44
GRTS MultivariateBalanced SamplingExample
  • Suppose X1, , X4 known, that is, their values
    are known for all population units.
  • For example, the Xi may be remotely-sensed
    variables, e.g., a greeness index.
  • Further, we believe the Xi are correlated with
    the target response Y

45
Xi aY (1-a)N(0,1)
46
(No Transcript)
47
Balanced Sample Simulation
  • 5000 samples of size 50
  • Scenario 1 Balanced over X1, X2, and X3
  • Scenario 2 Balanced over X2, X3, and X4

48
Variance RatiosSRS vs Balanced Sample
49
Pattern Recognition in n-space
  • Use recursive partition maps of ?n to ?1 to
    identify structure
  • Example Compare densities of (Y, X1, X2), (Y,
    X2, X3), and (Y, X3, X4)

50
(No Transcript)
51
(No Transcript)
52
Conclusions
  • Selection function estimate of the inclusion
    function did remove some bias, but still missing
    a factor
  • Recursive partitioning balancing does have
    promise for incorporating ancillary information
    into sample
  • RP may also lead to a method for locating regions
    of high density in n-space
Write a Comment
User Comments (0)
About PowerShow.com