Title: The Generalized Random Tessellation Stratified Sampling Design for Selecting Spatially-Balanced Samples
1The Generalized Random Tessellation Stratified
Sampling Design for Selecting Spatially-Balanced
Samples
- Don L. Stevens, Jr.
- Department of Statistics
- Oregon State University
Monitoring Science Technology
Symposium September 20 - 24, 2004 Denver, Colorado
2This presentation was developed under STAR
Research Assistance Agreement No. CR82-9096-01
Program on Designs and Models for Aquatic
Resource Surveys awarded by the U.S.
Environmental Protection Agency to Oregon State
University. It has not been subjected to the
Agency's review and therefore does not
necessarily reflect the views of the Agency, and
no official endorsement should be inferred
3Historical Context
- GRTS design evolved from EMAP work on global
tessellations in the early 1990s - Scott Overton, Denis White, Jon Kimmerling
developed EMAPs triangular grid hexagonal
tessellation
4Historical Context
- EMAP began with a triangular grid hexagonal
tessellation - Expected to intensify grid as needed
- Triangular grid has several advantages
- More compact than square grid
- More subdivision factors
- Became clear that basic concept did not have
enough flexibility to accommodate the
characteristics of environmental resource sampling
5Environmental Resource Populations
- Point-like
- Finite population of discrete units, e.g., small-
to medium-sized lakes - Linear
- Width is very small relative to length, e.g.,
streams or riparian vegetation belts - Extensive
- Covers large area in a more or less continuous
and connected fashion, e.g., a large estuary
6Environmental Resource Populations
- Tobler's First Law of Geography Things that are
close together in space tend to have more similar
properties than things that are far apart. - OR
- Spatial correlation functions tend to decrease
with distance
7Sampling Environmental Resource Populations
- Environmental Resource Populations exist in a
spatial matrix - Population elements close to one another tend to
be more similar than widely separated elements - Good sampling designs tend to spread out the
sample points more or less regularly - Simple random sampling tends to exhibit uneven
spatial patterns
8Simple random sample of a domain with 3
subdomains
A B C 28
28 15
9Sampling Environmental Resource Populations
- Patterned response (gradients, patches, periodic
responses) - Variable inclusion probability
- 0, 1, and 2 dimensional populations (points,
lines, areas) - Pattern in population occurrence (density )
- Unreliable frame material
- Temporal panels often needed
10Environmental Resource Populations
- Ecological importance, environmental stressor
levels, scientific interest, and political
importance are not uniform over the extent of the
resource
11Desirable Properties of Environmental Resource
Samples
- (1) Accommodate varying spatial sample intensity
-
- (2) Spread the sample points evenly and regularly
over the domain, subject to (1) - (3) Allow augmentation of the sample
after-the-fact, while maintaining (2)
12Desirable Properties of Environmental Resource
Samples
- (4) Accommodate varying population spatial
density for finite linear populations, subject
to (1) (2). - (2) (4) Þ Sample spatial pattern should
reflect the (finite or linear) population spatial
pattern
13Sampling Environmental Resource Populations
- Systematic sample has substantial disadvantages
- Well known problems with periodic response
- Less well recognized problem patch-like response
14A B C 26
24 15
15A B C 32
20 16
16Sampling Environmental Resource Populations
- Systematic sample has substantial disadvantages
- Well known problems with periodic response
- Less well recognized problem patch-like response
- Difficult to apply to finite populations , e.g.,
Lakes - Limited flexibility to change sample point
density - Difficult to accommodate variable inclusion
probability or sample adjustment for frame errors
17Sample point intensity can be changed using
nested grids
A B C 26
88 15
18RANDOM-TESSELLATION STRATIFIED (RTS) DESIGN
- Compromise between systematic SRS that
resolves periodic/patchy response - Cover the population domain with a grid
- Randomly located
- Regular (square or triangular)
- Spacing chosen to give required spatial
resolution - Tile the domain with equal-sized regular polygons
containing the grid points - Select one sample point at random from each
tessellation polygon
19RANDOM-TESSELLATION STRATIFIED (RTS) DESIGN
- Solves some of systematic sample problems
- Non-zero pairwise inclusion probability
- Alignment with geographic features of population
- Lets points get close together with low
probability
20(No Transcript)
21RTS DESIGN
- Does not resolve systematic sample difficulties
with - variable probability
- finite linear populations
- pattern in population occurrence (density)
- unreliable frame material
- Limited ability to change density
22Generalized Random-Tessellation Stratified (GRTS)
Design
- Conceptual structure
- Population indexed by points contained within a
region R - Have inclusion probability p(s) defined on R
- Select a sample by picking points
- Finite points represent units
- p(s) is usual inclusion probability
- Linear points on the lines
- p(s) is a density sample points /unit length
- Extensive points are in region area
- p(s) is a density sample points/unit area
23GRTS Design Mechanics
- Map R into first quadrant of unit square, add a
random offset - Subdivide unit square into small grid cells
- At least small enough so that total inclusion
probability for a cell (expected number of
samples in the cell) is less than 1 - Total inclusion probability for cell is sum or
integral of p(s) over the extent of the cell
24Population region image
25Population region image random offset
26GRTS Design Mechanics
- Order the cells so that some 2-dimensional
proximity relationships are preserved - Cant preserve everything, because a 1-1, onto,
continuous map from unit square to unit interval
is impossible - Can get 1-1,onto, measureable, which is good
enough - GRTS uses a quadrant-recursive function, similar
to the space filling curve developed by Guiseppe
Peano in 1890.
27Assign each cell an address corresponding to the
order of subdivision The address of the shaded
quadrant is 0.213 Order the cells following the
address order
28GRTS DesignMechanics
- If we carry the process to the limit, letting the
grid cell size ? 0, the result is a quadrant
recursive function, that is, a function that maps
the unit square onto the unit interval such that
the image of every quadrant is an interval. - Apply a restricted randomization that preserves
quadrant recursiveness
29HIERARCHICAL RANDOMIZATION
- Each cell address is a base 4 fraction, that is,
t 0.t1t2t3..., where each digit ti is either a
0, 1, 2, or 3. A function hp is a hierarchical
permutation if - where is a possibly
distinct permutation of 0,1,2,3 for each unique
combination of digits - t1, t2, ..., tn - 1.
30HIERARCHICAL RANDOMIZATION
- If the permutations that define hp() are chosen
at random and independently from the set of all
possible permutations, we call hp() a
hierarchical randomization function, and the
process of applying hp() hierarchical
randomization. - Compose the basic q-r map with a hierarchical
randomization function
31(No Transcript)
32GRTS DesignMechanics
- The result is a random order of the small grid
cells such that - All grid cells in the same quadrant have
consecutive order positions - But will be randomly ordered within those
positions - This holds for all quadrant levels
- This induces a random ordering of population
elements
33GRTS DesignMechanics
- Assign each grid cell a length equal to its total
inclusion probability - String the lengths in the random order
- Result is a line with length equal to target
sample size - Take systematic sample along line (random start
unit interval) - Map back to population using inverse random qr
function
34GRTS DesignMechanics
- Points will be in hierarchical random order
- Re-order into reverse hierarchical order gives
some very useful features to the sample
35Reverse Hierarchical Order
- Illustrate for 2-levels of addressing
First 16 addresses as base 4-fractions 00 01 02
03 10 11 12 13 20 21 22 23 30
31 32 33
36Reverse Hierarchical Order
- Illustrate for 2-levels of addressing
First 16 addresses as base 4-fractions 00 01 02
03 10 11 12 13 20 21 22 23 30
31 32 33 Reversed digits 00 10 20 30
01 11 21 31 02 12 22 32 03 13 23
33
37Reverse Hierarchical Order
- Illustrate for 2-levels of addressing
First 16 addresses as base 4-numbers 00 01 02
03 10 11 12 13 20 21 22 23 30 31
32 33 Reversed digits 00 10 20 30 01
11 21 31 02 12 22 32 03 13 23
33 Reversed digits as base 10 numbers 0 4
8 12 1 5 9 13 2 6 10 14
3 7 11 15
38SPATIAL PROPERTIES OF REVERSE HIERARCHICAL
ORDERED GRTS SAMPLE
- The complete sample is nearly regular, capturing
much of the potential efficiency of a systematic
sample without the potential flaws - Any subsample consisting of a consecutive
subsequence is almost as regular as the full
sample in particular, the subsequence -
, is a spatially well-balanced sample. - Any consecutive sequence subsample, restricted to
the accessible domain, is a spatially
well-balanced sample of the accessible domain.
39(No Transcript)
40(No Transcript)
41(No Transcript)
42(No Transcript)
43(No Transcript)
44(No Transcript)
45(No Transcript)
46(No Transcript)
47(No Transcript)
48(No Transcript)
49(No Transcript)
50(No Transcript)
51(No Transcript)
52(No Transcript)
53(No Transcript)
54(No Transcript)
55(No Transcript)
56(No Transcript)
57(No Transcript)
58(No Transcript)
59(No Transcript)
60(No Transcript)
61(No Transcript)
62(No Transcript)
63(No Transcript)
64(No Transcript)
65(No Transcript)
66(No Transcript)
67(No Transcript)
68(No Transcript)
69(No Transcript)
70(No Transcript)
71(No Transcript)
72(No Transcript)
73(No Transcript)
74(No Transcript)
75(No Transcript)
76(No Transcript)
77(No Transcript)
78(No Transcript)
79(No Transcript)
80(No Transcript)
81(No Transcript)
82(No Transcript)
83(No Transcript)
84(No Transcript)
85(No Transcript)
86(No Transcript)
87(No Transcript)
88(No Transcript)
89(No Transcript)
90(No Transcript)
91(No Transcript)
92(No Transcript)
93(No Transcript)
94(No Transcript)
95(No Transcript)
96(No Transcript)
97(No Transcript)
98(No Transcript)
99(No Transcript)
100(No Transcript)
101(No Transcript)
102(No Transcript)
103(No Transcript)
104(No Transcript)
105(No Transcript)
106(No Transcript)
107(No Transcript)
108(No Transcript)
109(No Transcript)
110Inclusion probability density surface
Region is (0,1)x(0,0.8)
111(No Transcript)
112SPATIAL PROPERTIES OF REVERSE HIERARCHICAL
ORDERED GRTS SAMPLE
- Assess spatial balance by variance of size of
Voronoi polygons, compared to SRS sample of the
same size. - Voronoi polygons for a set of points
The ith polygon is the collection of points
in the domain that are closer to si than to any
other sj in the set. - Estimate variance by 1000 replications of a
sample of size 256 in unit square
113(No Transcript)
114SPATIAL PROPERTIES OF REVERSE HIERARCHICAL
ORDERED GRTS SAMPLE
- Compare regularity as points are added one at a
time, following reverse hierarchical order under
4 scenarios - Complete, continuous domain
- Domains with holes excluding 20 , modeling
non-response/access refusal - 20 randomly-located square holes, constant size
- 20 randomly-located square holes, increasing
linearly in size - 10 randomly-located square holes, increasing
exponentially in size
115(No Transcript)
116(No Transcript)
117(No Transcript)
11820 point GRTS Sample
119Four 20-point GRTS Panels
120Five 20-point GRTS Panels
121Five 20-point GRTS Panels Special Study Area
122Finite Population Example
123Equi-probable GRTS Sample
124GRTS Sample Probability inversely proportional
to population density
125Equi-probable
Inverse density