Title: Modelling non-stationarity in space and time for air quality data Peter Guttorp University of Washington peter@stat.washington.edu
1Modelling non-stationarity in space and time for
air quality dataPeter GuttorpUniversity of
Washingtonpeter_at_stat.washington.edu
NRCSE
2Outline
- Lecture 1 Geostatistical tools
- Gaussian predictions
- Kriging and its neighbours
- The need for refinement
- Lecture 2 Nonstationary covariance estimation
- The deformation approach
- Other nonstationary models
- Extensions to space-time
- Lecture 3 Putting it all together
- Estimating trends
- Prediction of air quality surfaces
- Model assessment
3Research goals in air quality modeling
- Create exposure fields for health effects
modeling - Assess deterministic air quality models
- Interpret environmental standards
- Enhance understanding of complex systems
4The geostatistical setup
- Gaussian process
- m(s)EZ(s) Var Z(s) lt 8
- Z is strictly stationary if
- Z is weakly stationary if
- Z is isotropic if weakly stationary and
5The problem
- Given observations at n sites Z(s1),...,Z(sn)
- estimate
- Z(s0) (the process at an unobserved site)
- or (a weighted average of the
process)
6A Gaussian formula
7Simple kriging
- Let X (Z(s1),...,Z(Sn))T, Y Z(s0), so
- mXm1n, mYm,
- SXXC(si-sj), SYYC(0), and SYXC(si-s0).
- Thus
- This is the best linear unbiased predictor for
known m and C (simple kriging). - Variants ordinary kriging (unknown m)
- universal kriging (mAb for some covariate
A) - Still optimal for known C.
- Prediction error is given by
8The (semi)variogram
- Intrinsic stationarity
- Weaker assumption (C(0) need not exist)
- Kriging can be expressed in terms of variogram
9Estimation of covariance functions
- Method of moments square of all pairwise
differences, smoothed over lag bins - Problem Not necessarily a valid variogram
10Least squares
- Minimize
- Alternatives
- fourth root transformation
- weighting by 1/g2
- generalized least squares
11Fitted variogram
12Kriging surface
13Kriging standard error
14A better combination
15Maximum likelihood
- ZNn(m,S) S a r(si-sjq) a V(q)
- Maximize
- and maximizes the profile likelihood
16A peculiar ml fit
17Some more fits
18All together now...
19Effect of estimating covariance structure
- Standard geostatistical practice is to take the
covariance as known. When it is estimated,
optimality criteria are no longer valid, and
plug-in estimates of variability are biased
downwards. - (Zimmerman and Cressie, 1992)
- A Bayesian prediction analysis takes proper
account of all sources of uncertainty (Le and
Zidek, 1992)
20Violation of isotropy
21General setup
- Z(x,t) m(x,t) n(x)1/2E(x,t) e(x,t)
- trend smooth error
- We shall assume that m is known or constant
- t 1,...,T indexes temporal replications
- E is L2-continuous, mean 0, variance 1,
independent of the error e - C(x,y) Cor(E(x,t),E(y,t))
- D(x,y) Var(E(x,t)-E(y,t)) (dispersion)
22Geometric anisotropy
- Recall that if we have an isotropic
covariance (circular isocorrelation curves). - If for a linear transformation A, we have
geometric anisotropy (elliptical isocorrelation
curves). - General nonstationary correlation structures are
typically locally geometrically anisotropic.
23The deformation idea
- In the geometric anisotropic case, write
- where f(x) Ax. This suggests using a general
nonlinear transformation . Usually d2 or 3. - G-plane D-space
- We do not want f to fold.
24Implementation
- Consider observations at sites x1, ...,xn. Let
be the empirical covariance between sites xi and
xj. Minimize - where J(f) is a penalty for non-smooth
transformations, such as the bending energy -
25SARMAP
- An ozone monitoring exercise in California,
summer of 1990, collected data on some 130 sites.
26Transformation
- This is for hr. 16 in the afternoon
27Thin-plate splines
Linear part
28A Bayesian implementation
- Likelihood
- Prior
- Linear part
- fix two points in the G-D mapping
- put a (proper) prior on the remaining two
parameters - Posterior computed using Metropolis-Hastings
29California ozone
30Posterior samples
31Other applications
- Point process deformation (Jensen Nielsen,
Bernoulli, 2000) - Deformation of brain images (Worseley et al.,
1999)
32Isotropic covariances on the sphere
Isotropic covariances on a sphere are of the
form where p and q are directions, gpq the angle
between them, and Pi the Legendre
polynomials. Example ai(2i1)ri
33A class of global transformations
- Iteration between simple parametric deformation
of latitude (with parameters changing with
longitude) and similar deformations of longitude
(changing smoothly with latitude). - (Das, 2000)
34Three iterations
35Global temperature
- Global Historical Climatology Network 7280
stations with at least 10 years of data. Subset
with 839 stations with data 1950-1991 selected.
36Isotropic correlations
37Deformation
38Assessing uncertainty
39Gaussian moving averages
- Higdon (1998), Swall (2000)
- Let x be a Brownian motion without drift,
and . This is a Gaussian process with
correlogram - Account for nonstationarity by letting the kernel
b vary with location
40Kernel averaging
- Fuentes (2000) Introduce orthogonal local
stationary processes Zk(s), k1,...,K, defined on
disjoint subregions Sk and construct - where wk(s) is a weight function related to
dist(s,Sk). Then - A continuous version has
-
41Simplifying assumptions in space-time models
- Temporal stationarity
- seasonality
- decadal oscillations
- Spatial stationarity
- orographic effects
- meteorological forcing
- Separability
- C(t,s)C1(t)C2(s)
42SARMAP revisited
- Spatial correlation structure depends on hour of
the day (non-separable)
43Brunos seasonal nonseparability
- Nonseparability generated by seasonally changing
spatial term - Z1 large-scale feature
- Z2 separable field of local features
- (Bruno, 2004)
44A non-separable class of stationary space-time
covariance functions
- Cressie Huang (1999)
- Fourier domain
- Gneiting (2001) f is completely monotone if
(-1)n f (n) 0 for all n. Bernsteins theorem
for some non-decreasing
F. - Combine a completely monotone function and a
function y with completely monotone derivative
into a space-time covariance
45A particular case
a1/2,g1/2
a1/2,g1
a1,g1/2
a1,g1
46Uses for surface estimation
- Compliance
- exposure assessment
- measurement
- Trend
- Model assessment
- comparing (deterministic) model to data
- approximating model output
- Health effects modeling
47Health effects
- Personal exposure (ambient and non-ambient)
- Ambient exposure
- outdoor time
- infiltration
- Outdoor concentration model for individual i at
time t
48Seattle health effects study
2 years, 26 10-day sessions A total of 167
subjects 56 COPD subjects 40 CHD subjects 38
healthy subjects (over 65 years old,
non-smokers) 33 asthmatic kids A total of 108
residences 55 private homes 23 private
apartments 30 group homes
49Ogawa sampler
PUF
HPEM
pDR
50T/RH logger
CO2 monitor
Ogawa sampler
CAT
Nephelometer
HI
Quiet Pump Box
51(No Transcript)
52PM2.5 measurements
53Where do the subjects spend their time?
- Asthmatic kids
- 66 at home
- 21 indoors away from home
- 4 in transit
- 6 outdoors
- Healthy (CHD, COPD) adults
- 83 (86,88) at home
- 8 (7,6) indoors away from home
- 4 (4,3) in transit
- 3 (2,2) outdoors
54Panel results
- Asthmatic children not on anti-inflammatory
medication - decrease in lung function related to indoor and
to outdoor PM2.5, not to personal exposure - Adults with CV or COPD
- increase in blood pressure and heart rate related
to indoor and personal PM2.5
55(No Transcript)
56(No Transcript)
57Trend model
- where Vik are covariates, such as population
density, proximity to roads, local topography,
etc. - where the fj are smoothed versions of temporal
singular vectors (EOFs) of the TxN data matrix. - We will set m1(si) m0(si) for now.
58SVD computation
59EOF 1
60EOF 2
61EOF 3
62(No Transcript)
63(No Transcript)
64(No Transcript)
65Kriging of m0
66Kriging of r2
67Quality of trend fits
68Observed vs. predicted
69Observed vs. predicted, cont.
70Conclusions
- Good prediction of
- day-to-day variability
- seasonal shape of mean
- Not so good prediction of
- long-term mean
- Need to try to estimate
71Other difficulties
- Missing data
- Multivariate data
- Heterogenous (in space and time) geostatistical
tools - Different sampling intervals (particularly a PM
problem)
72Southern California PM2.5 data