Simulation of spatially correlated discrete random variables - PowerPoint PPT Presentation

1 / 39

About This Presentation

Title:

Simulation of spatially correlated discrete random variables

Description:

II. Generating a vector of correlated discrete random ... Example: Diamond back moth dispersal. Release point. Traps. Means. Variances. Mean. Variance ... – PowerPoint PPT presentation

Number of Views:48

Avg rating:3.0/5.0

Slides: 40

Provided by: dalt8

Category:

more less

Transcript and Presenter's Notes

Title: Simulation of spatially correlated discrete random variables

1
Simulation of spatially correlated discrete
random variables Dan Dalthorp and Lisa
Madsen Department of Statistics Oregon State
University dalthorp_at_science.oregonstate.edu lmadse
n_at_science.oregonstate.edu
Outline I. Generating one pair of correlated
discrete random variables. (a)
Lognormal-Poisson hierarchy (b) Overlapping
sums II. Generating a vector of correlated
discrete random variables by overlapping
sums III. Examples
2
Introduction
Generate Y1, Y2 where
Y1, Y2 have specified means
variances

and correlation rY ? 0 Y1, Y2
are count r.v.'s i.e., y 0, 1, 2, ...
Distributions of Y1, Y2 are unimodal,
Poisson-like If s2 lt m, then both s2 and m are
small
3
Lognormal-Poisson MethodFor Generating Y1 and Y2

Generate correlated normal RVs Z1, Z2
Transform to lognormals Xi exp(Zi)
Generate conditionally independent Yi
Poisson(Xi)
Y1 and Y2 resemble negative binomial RVs.
4
Obtaining the Right Moments
To get with corr(Y1, Y2)
rY,
generate lognormals X1, X2 with
This requires normals Z1, Z2 with
and
5
Constraints on Moments of Y1, Y2 with
Lognormal-Poisson Method

6
Upper Bound for CorrelationLognormal Poisson
7
Upper Bound for CorrelationLognormal Poisson
8
Overlapping Sums Method For Generating Y1 and Y2
Generate independent, discrete RVs X1, X2,
X Let Y1 X X1 Y2 X
X2
Holgate (1964) Correlated Poissons
We are not concerned with the exact distribution
of Y1 and Y2, but we require them to be
ecologically plausible.
9
Obtaining the Right Moments
To get with corr(Y1, Y2)
rY,
Generate independent X1, X2, X with
and
10
Choose distributions for Xs based on relationship
between variance and mean
If , use X Negative
binomial(mX, sX2)
If , use X Poisson(mX)
If , use X
Bernoulli(mX)
If and
, use , where
BBernoulli(p), and PPoisson(l), with
and
then X cannot be simulatedby any method.
If
11
Constraints on Moments of Y1, Y2 with
Overlapping Sums Method
No constraints on means of Yi, but we require
?
? Relationship between and
ecologically plausible

12
Upper Bound for CorrelationOverlapping Sums
13
Comparing Methods
14
Comparing Methods
15
A quick example Simulate Y1 and Y2 with
and r 0.2
Step 1 Find variances and means of X's Y1
X X1 Y2 X X2 where X, X1, and X2 are
independent count random variables with ...
Variances
Means
16
Step 2 Define distributions for X's
X Bernoulli(0.0921) since by design X1
Negative binomial with m 0.928 and s2
1.172 X2 Bernoulli(p) Poisson(l) with p
0.05 and l 0.0079
Y1 X X1 Y2 X X2
Step 3 Simulate
17
Generalizing to n gt 2
where X is a vector of independent r.v.s, and T
is a matrix of 0s and 1s
1. Park Shin (1998) algorithm gives variances
for X's
Find n ? m matrix T consisting of 0s and 1s and
m-vector such that and
2. Linear programming gives reasonable means for
X's
Find m-vector that solves subject
to constraints (i) Mi gt 0 for all i
and (ii) when s i2 ? 0.25
3. Generate independent X's with the appropriate
distributions and multiply by T
18
Park Shin (1998) algorithm gives variances of
X's
E.g., Suppose
for the common component of Y3 and Y4
19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22
(No Transcript)
23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
(No Transcript)
27
Grub population density as a function of several
covariates
28
Are the conditions for multiple regression met?
1. Non-normal response variable
2. Variance not constant
3. Observations not independent
2.0
0.2
1.5
0.1
Correlation of Residuals
Variance of Residuals
1.0
0.0
0.5
180
0
60
120
1st
2nd
3rd
4th
Lag distance (feet)
Fitted Values (quartiles)
29
Generalized linear model (Fisher 1935 Dempster
1971 Berk 1972 Nelder and Wedderburn 1972)
A. Accommodates response variables with
distribution in exponential family
(including normal, binomial, Poisson, gamma,
exponential, chi-squared, etc.) B. Allows for
non-constant variance
with quasi-likelihood estimation (Wedderburn,
1974)
A. Accommodates response variables that are not
in an exponential family (including negative
binomial, unspecified distributions) B. Requires
only that the variance of the response variable
be expressed as a function of the mean
adapted for spatially dependent observations
(Liang and Zeger 1986 McCullagh amd Nelder
1989 Albert and McShane 1995 Gotway and Stroup
1997 Dalthorp 2004)
A. Accounts for spatial autocorrelation in the
residuals B. The statistical theory for the model
is not well-developed
30
Example Japanese beetle grub population density
vs. soil organic matter
6

Means
Variances
Correlations
0.2
4

s2
Correlation
0.1

Grubs per soil sample
2

0.0
0

0
60
120
180
3
4
5
6
7
8
9
Lag distance (feet)
Organic matter content ()
Means (via GLM)
Variances (via TPL)
Correlations (via spherical model)
31
The simulation
1000 reps with n 143
Xs are independent, count-valued random
variables -- variances from Park Shins
algorithm -- means from linear programming
PROBLEM No solution found! Choice between
one of the following i. One Y mean off-target
but no impossible X r.v.'s Need Y with m
0.141 Can only do m 0.151 ii. One impossible
X r.v. ( ) We need r.v. with m
0.0385, s2 0.0272 Can do Bernoulli m
0.0385, s2 0.0370 Consequences? Var(Y16)
0.139 vs. target of 0.129
32
Results for 1000 simulation runs
3720 X's consisting of -- Negative binomial
1580 -- Bernoulli 2099 -- Bernoulli
Poisson 40 -- Impossible 1
(simulated s2 slightly larger than target)
Means
Variances
Simulated variance
Simulated mean
Target mean
Target variance
33
Correlations
0.25
0.2
0.15
Correlation
0.1
0.05
0
-0.05
0
50
100
150
200
250
300
350
400
450
Lag distance
34
Example Diamond back moth dispersal
0.1
Correlation
Release point
Correlation
0
Traps
Lag Distance
Means
Variances
Variance
Mean
35
The simulation
1000 reps with n 114
Xs are independent negative binomials --
variances from Park Shins algorithm -- means
from linear programming T is a matrix of zeros
and ones that defines the common components of
the Ys
Results
Means
Variances
s2
s2
m
36
Correlation Simulated vs. target
0.25
0.2
0.15
Correlation
0.1
0.05
0
-0.05
0
5
10
15
20
25
30
35
40
Lag distance
Circles are averages for 1000 sims
37
Example Weed counts (Chenopodium polyspermum)
vs. soil magnesium
Weed counts and soil Mg in random quadrats in a
field ...
Variances
Means
s2
38
Correlation
Infeasible correlations
Highest possible correlation between Yi , Yj is
With 49 pairs of points in the weed data, target
ri,j is too high.
39
Summary Correlated count r.v.'s can be
simulated by overlapping sums of independent
negative binomials, Bernoullis, and Poissons
The simulated r.v.'s are very close to negative
binomial where m lt s2 and very close to Bernoulli
Poisson where m gt s2 Negative correlations
and strong positive correlations between r.v.s
with very different variances are not attainable,
but ... The method can accommodate a wide
variety of ecologically important scenarios that
the hierarchical lognormal-Poisson model balks
at, including -- underdispersed count
r.v.'s -- moderately strong correlations where
m1 ? m2 and s12 ? s22

Write a Comment

User Comments (0)