Title: Simulation of spatially correlated discrete random variables
1Simulation of spatially correlated discrete
random variables Dan Dalthorp and Lisa
Madsen Department of Statistics Oregon State
University dalthorp_at_science.oregonstate.edu lmadse
n_at_science.oregonstate.edu
Outline I. Generating one pair of correlated
discrete random variables. (a)
Lognormal-Poisson hierarchy (b) Overlapping
sums II. Generating a vector of correlated
discrete random variables by overlapping
sums III. Examples
2Introduction
Generate Y1, Y2 where
Y1, Y2 have specified means
variances
and correlation rY ? 0 Y1, Y2
are count r.v.'s i.e., y 0, 1, 2, ...
Distributions of Y1, Y2 are unimodal,
Poisson-like If s2 lt m, then both s2 and m are
small
3Lognormal-Poisson MethodFor Generating Y1 and Y2
Generate correlated normal RVs Z1, Z2
Transform to lognormals Xi exp(Zi)
Generate conditionally independent Yi
Poisson(Xi)
Y1 and Y2 resemble negative binomial RVs.
4Obtaining the Right Moments
To get with corr(Y1, Y2)
rY,
generate lognormals X1, X2 with
This requires normals Z1, Z2 with
and
5Constraints on Moments of Y1, Y2 with
Lognormal-Poisson Method
6Upper Bound for CorrelationLognormal Poisson
7Upper Bound for CorrelationLognormal Poisson
8Overlapping Sums Method For Generating Y1 and Y2
Generate independent, discrete RVs X1, X2,
X Let Y1 X X1 Y2 X
X2
Holgate (1964) Correlated Poissons
We are not concerned with the exact distribution
of Y1 and Y2, but we require them to be
ecologically plausible.
9Obtaining the Right Moments
To get with corr(Y1, Y2)
rY,
Generate independent X1, X2, X with
and
10Choose distributions for Xs based on relationship
between variance and mean
If , use X Negative
binomial(mX, sX2)
If , use X Poisson(mX)
If , use X
Bernoulli(mX)
If and
, use , where
BBernoulli(p), and PPoisson(l), with
and
then X cannot be simulatedby any method.
If
11Constraints on Moments of Y1, Y2 with
Overlapping Sums Method
No constraints on means of Yi, but we require
?
? Relationship between and
ecologically plausible
12Upper Bound for CorrelationOverlapping Sums
13Comparing Methods
14Comparing Methods
15A quick example Simulate Y1 and Y2 with
and r 0.2
Step 1 Find variances and means of X's Y1
X X1 Y2 X X2 where X, X1, and X2 are
independent count random variables with ...
Variances
Means
16Step 2 Define distributions for X's
X Bernoulli(0.0921) since by design X1
Negative binomial with m 0.928 and s2
1.172 X2 Bernoulli(p) Poisson(l) with p
0.05 and l 0.0079
Y1 X X1 Y2 X X2
Step 3 Simulate
17Generalizing to n gt 2
where X is a vector of independent r.v.s, and T
is a matrix of 0s and 1s
1. Park Shin (1998) algorithm gives variances
for X's
Find n ? m matrix T consisting of 0s and 1s and
m-vector such that and
2. Linear programming gives reasonable means for
X's
Find m-vector that solves subject
to constraints (i) Mi gt 0 for all i
and (ii) when s i2 ? 0.25
3. Generate independent X's with the appropriate
distributions and multiply by T
18Park Shin (1998) algorithm gives variances of
X's
E.g., Suppose
for the common component of Y3 and Y4
19(No Transcript)
20(No Transcript)
21(No Transcript)
22(No Transcript)
23(No Transcript)
24(No Transcript)
25(No Transcript)
26(No Transcript)
27Grub population density as a function of several
covariates
28Are the conditions for multiple regression met?
1. Non-normal response variable
2. Variance not constant
3. Observations not independent
2.0
0.2
1.5
0.1
Correlation of Residuals
Variance of Residuals
1.0
0.0
0.5
180
0
60
120
1st
2nd
3rd
4th
Lag distance (feet)
Fitted Values (quartiles)
29Generalized linear model (Fisher 1935 Dempster
1971 Berk 1972 Nelder and Wedderburn 1972)
A. Accommodates response variables with
distribution in exponential family
(including normal, binomial, Poisson, gamma,
exponential, chi-squared, etc.) B. Allows for
non-constant variance
with quasi-likelihood estimation (Wedderburn,
1974)
A. Accommodates response variables that are not
in an exponential family (including negative
binomial, unspecified distributions) B. Requires
only that the variance of the response variable
be expressed as a function of the mean
adapted for spatially dependent observations
(Liang and Zeger 1986 McCullagh amd Nelder
1989 Albert and McShane 1995 Gotway and Stroup
1997 Dalthorp 2004)
A. Accounts for spatial autocorrelation in the
residuals B. The statistical theory for the model
is not well-developed
30Example Japanese beetle grub population density
vs. soil organic matter
6
Means
Variances
Correlations
0.2
4
s2
Correlation
0.1
Grubs per soil sample
2
0.0
0
0
60
120
180
3
4
5
6
7
8
9
Lag distance (feet)
Organic matter content ()
Means (via GLM)
Variances (via TPL)
Correlations (via spherical model)
31The simulation
1000 reps with n 143
Xs are independent, count-valued random
variables -- variances from Park Shins
algorithm -- means from linear programming
PROBLEM No solution found! Choice between
one of the following i. One Y mean off-target
but no impossible X r.v.'s Need Y with m
0.141 Can only do m 0.151 ii. One impossible
X r.v. ( ) We need r.v. with m
0.0385, s2 0.0272 Can do Bernoulli m
0.0385, s2 0.0370 Consequences? Var(Y16)
0.139 vs. target of 0.129
32Results for 1000 simulation runs
3720 X's consisting of -- Negative binomial
1580 -- Bernoulli 2099 -- Bernoulli
Poisson 40 -- Impossible 1
(simulated s2 slightly larger than target)
Means
Variances
Simulated variance
Simulated mean
Target mean
Target variance
33Correlations
0.25
0.2
0.15
Correlation
0.1
0.05
0
-0.05
0
50
100
150
200
250
300
350
400
450
Lag distance
34Example Diamond back moth dispersal
0.1
Correlation
Release point
Correlation
0
Traps
Lag Distance
Means
Variances
Variance
Mean
35The simulation
1000 reps with n 114
Xs are independent negative binomials --
variances from Park Shins algorithm -- means
from linear programming T is a matrix of zeros
and ones that defines the common components of
the Ys
Results
Means
Variances
s2
s2
m
36Correlation Simulated vs. target
0.25
0.2
0.15
Correlation
0.1
0.05
0
-0.05
0
5
10
15
20
25
30
35
40
Lag distance
Circles are averages for 1000 sims
37Example Weed counts (Chenopodium polyspermum)
vs. soil magnesium
Weed counts and soil Mg in random quadrats in a
field ...
Variances
Means
s2
38Correlation
Infeasible correlations
Highest possible correlation between Yi , Yj is
With 49 pairs of points in the weed data, target
ri,j is too high.
39Summary Correlated count r.v.'s can be
simulated by overlapping sums of independent
negative binomials, Bernoullis, and Poissons
The simulated r.v.'s are very close to negative
binomial where m lt s2 and very close to Bernoulli
Poisson where m gt s2 Negative correlations
and strong positive correlations between r.v.s
with very different variances are not attainable,
but ... The method can accommodate a wide
variety of ecologically important scenarios that
the hierarchical lognormal-Poisson model balks
at, including -- underdispersed count
r.v.'s -- moderately strong correlations where
m1 ? m2 and s12 ? s22