Let It Rain: Modeling Multivariate Rain Time Series Using Hidden Markov Models - PowerPoint PPT Presentation

1 / 73

About This Presentation

Title:

Let It Rain: Modeling Multivariate Rain Time Series Using Hidden Markov Models

Description:

Snapshot of the Data. March 2, 2006. 8. Modeling Precipitation Occurrence... Northeast Brazil ... P(r|Q) (ML) or P(Q|r) (MAP) Cannot be done in closed form ... – PowerPoint PPT presentation

Number of Views:148

Avg rating:3.0/5.0

Slides: 74

Provided by: sergeyk7

Learn more at: https://www.stat.purdue.edu

Category:

more less

Transcript and Presenter's Notes

Title: Let It Rain: Modeling Multivariate Rain Time Series Using Hidden Markov Models

1
Let It Rain Modeling Multivariate Rain Time
Series Using Hidden Markov Models

Sergey Kirshner
Donald Bren School of
Information and Computer Sciences
UC Irvine

March 2, 2006
2
Acknowledgements
Padhraic Smyth UCI
Andy Robertson IRI
DOE (DE-FG02-02ER63413)
3
http//iri.columbia.edu/climate/forecast/net_asmt/
2006/feb2006/MAM06_World_pcp.html
4
What to Do with Rainfall Data?
Description
historical rainfall data
model
general circulation model (GCM) outputs
5
What to Do with Rainfall Data?
Downscaling
historical rainfall data
predicted data
model
general circulation model (GCM) outputs
6
What to Do with Rainfall Data?
Simulation
crop modeling
historical rainfall data
predicted data
model
general circulation model (GCM) outputs
water management
7
Snapshot of the Data
8
Modeling Precipitation Occurrence
Northeast Brazil 1975-2002 (except 1976, 78, 84,
and 86) 24 seasons (N) 90 days (T) 10 stations
(M)
9
and Amounts
10
Annual Precipitation Probability
11
Spatial Correlation
12
Spell Run Length Distributions
Dry spells are in blue wet spells are in red.
13
Important Data Characteristics

Correlation
Spatial dependence
Temporal structure
Run-length distributions
Persistence
First order dependence
Variability of individual series
Interannual variability important for climate
studies

14
Missing Data
Missing data mask (black) for 41 stations
(y-axis) in India for May 1 - Oct 31, 1973. 29
of the data is missing, with stations 13 14, 16,
24, 26, 30, 36, 38, and 40 missing more than 45
of the data for that station.
15
A Bit of Notation

Vector time series R
Vector observation of R at time t

R11
R21
RT1
R12
R22
RT2
R13
R23
RT3
R1M
R2M
RTM
R1
R2
RT
16
Weather Generator

Does not take spatial correlation into account

17
Rain Generating Process
18
Hidden Markov Model (HMM)

Discrete weather states S (K states)
Evolution of the weather state transition
probability P(StSt-1)
Rainfall generation in weather state i emission
probability P(RtSti)

19
Hidden Markov Model (HMM)
R1
R2
Rt
RT-1
RT
S1
S2
St
ST-1
ST
20
Basic Operations with HMMs

Probability of weather states given observed data
(inference)
Forward-Backward
Model parameter estimation given the data
Baum-Welch (EM)
Most likely sequence of weather states given the
data
Viterbi

Rabiner 89
21
States for 4-state HMM
Robertson, Kirshner, Smyth 04
22
Weather State Evolution
Robertson, Kirshner, and Smyth 04
23
Generalizations to HMMs Auto-regressive HMM
(AR-HMM)

Explicitly models temporal first-order dependence
of rainfall

24
Generalizations to HMMs Non-homogeneous HMM
(NHMM)

Incorporates atmospheric variables
Allows non-stationary and oscillatory behavior

Hughes and Guttorp 94 Bengio and Frasconi 95
25
Parameter Estimation

Find Q maximizing P(rQ) (ML) or P(Qr) (MAP)
Cannot be done in closed form
EM (Baum-Welch for HMMs)
E-step compute
Forward-Backward
Calculate
M-step
Maximize
Can be split into maximization of emission and
transition parameters

26
Modeling Approaches

Use HMMs
Transition probabilities for temporal dependence
Emissions (hidden state distributions) for
spatial or multivariate dependence (and
additional temporal dependence)
Emphasis on categorical valued data
Transitions and emissions can be specified
separately
Covers cross-product of models

27
Modeling Approaches (contd)

Use HMMs
Possible emission distributions
Conditional independence
Chow-Liu trees Chow and Liu 68, conditional
Chow-Liu forests Kirshner et al 04
Markov Random Fields
Maximum entropy models e.g., Jelinek 98,
Boltzmann machines e.g., Hinton and Sejnowski
86, thin junction trees Bach and Jordan 02
Belief Networks
Sigmoidal belief networks Neal 92
Possible transition distributions
Non-homogeneous mixture (mixture of experts
Jordan and Jacobs 94)
Stationary transition matrix
Non-homogeneous transition matrix (Hughes and
Guttorp 94, Meila and Jordan 96, Bengio and
Fasconi 95)

28
HMM-CI
e.g., Zucchini and Guttorp 91 Hughes and
Guttorp 94
29
Why Use HMM-CI?

Simple and efficient
O(TKM) for inference and for parameter estimation
Small number of free parameters
Can handle missing data
Can be used to model amounts

30
HMM-CI for Amounts

Types of mixture components
Gamma Bellone 01
Exponentials Robertson et al 06

31
Why not HMM-CI

Not matching spatial correlations or persistence
well
Models spatial correlation implicitly through
hidden states
May require large K to model regions with
moderate number of stations

32
HMM-Autologistic
Hughes, Guttorp, and Charles 99
33
What about HMM-Autologistic?

Sure!
Models spatial correlations very well
Can use sampling or approximate schemes to
compute normalization constant and to update
parameters

Not so sure
Complexity of exact computation is exponential in
M
What about temporal dependence?
May have too many free parameters if not
constrained
Does not handle missing values (or very slow)

34
Neither Here nor There

HMM-CI efficient but too simplistic
HMM-Autologistic more capable but computationally
more cumbersome
Want something in between
Computationally tractable
Emission spatial dependence
Additional temporal dependence
Missing values

35
Bayesian Networks and Trees

Tree-structured distributions
Chow-Liu trees (spatial dependence) Chow and Liu
68
With HMMs Kirshner et al 04
Conditional Chow-Liu forests (spatial and
temporal dependence) Kirshner et al 04
Markov (undirected) and Bayesian (directed)
networks
MaxEnt (logistic)
Conditional MaxEnt
Sigmoidal belief networks Neal 92
Would need to estimate both the parameters and
the structure

36
Chow-Liu Trees

Approximation of a joint distribution with a
tree-structured distribution Chow and Liu 68
Maximizing log-likelihood ? solving maximum
spanning tree (MST) problem
Can find both the tree structure and the
parameters in one swoop!
Finding MST is quadratic in the number of nodes
Kruskal 59
Edge weights are pairwise mutual information
values measure of conditional independence

37
Learning Chow-Liu Trees
0.3126 0.0229 0.0172 0.0230 0.0183 0.2603
38
Chow-Liu Trees

Approximation of a joint distribution with a
tree-structured distribution Chow and Liu 68
Properties
Efficient
O(TM2B2)
Optimal
Can handle missing data
Mixture of trees Meila and Jordan 00
More expressive than trees yet with simple
estimation procedure
HMMs with trees Kirshner et al 04

39
HMM-Chow-Liu
Kirshner et al 04
40
Tree-structured Emissions for Amounts
Ot2
Ot1
Rt2
Rt1
Ot4
Ot3
Rt4
Rt3
St1
41
Improving on Chow-Liu Trees

Tree edges with low MI add little to the
approximation.
Observations from the previous time point can be
more relevant than from the current one.
Idea Build Chow-Liu tree allowing it to include
variables from the current and the previous time
point.

42
Conditional Chow-Liu Forests

Extension of Chow-Liu trees to conditional
distributions
Approximation of conditional multivariate
distribution with a tree-structured distribution
Uses MI to build maximum spanning (directed)
trees (forest)
Variables of two consecutive time points as nodes
All nodes corresponding to the earlier time point
considered connected before the tree construction
Same asymptotic complexity as Chow-Liu trees
Optimal (within the class of structures)

Kirshner et al 04
43
Example of CCL-Forest Learning
0.3126 0.0229 0.0230 0.1207 0.1253 0.0623 0.1392 0
.1700 0.0559 0.0033 0.0030 0.0625
44
HMM-Conditional-Chow-Liu

St1
St2
St3
Kirshner et al 04
45
Beyond Trees

Can learn more complex structure
Optimality not guaranteed Chickering 96 Srebro
03
Structure and parameters may have to be learned
in separate computations
Computationally expensive
Independence model matches all univariate
marginals
Chow-Liu trees match all univariate and some
bivariate marginals
Unconstrained Bayesian or Markov Networks
May have too few data points for the number of
parameters
Even 3rd order cliques may have zero probability
mass

46
Log-linear or Logistic
a
b
c
d
47
Maximum Entropy Method

Given
Target distribution (empirical)
Set of features and corresponding constraints
Example feature is 1 when it rains both at
station 1 and 2
Corresponding constraint
Interpretation
Proportion of time it rains simultaneously at
stations 1 and 2 is the same for both the
historical data and according to the
learned distribution
Want to satisfy all of the constraints

e.g., Jelinek 98
48
MaxEnt Method (contd)

Maximize entropy of subject to
constraints corresponding to features
Exponential form
satisfying all of the
constraints for features in maximizes the
log-likelihood of the data!!! e.g., Della Pietra
et al 97
Such solution is unique (likelihood is concave)

49
HMM-Autologistic
Hughes, Guttorp, and Charles 99
50
Conditional Log-linear Distribution
a
d
b
c
e
51
Conditional MaxEnt Method

Extension of MaxEnt distribution to conditional
distributions
Target distribution
Set of features and corresponding constraints
Maximize conditional entropy subject to
constraints

e.g., Lafferty et al 01
52
Learning parameters of MaxEnt models

Assume set of features given
Need only free parameters to learn
Cannot be done in closed form
Iterative algorithms IS, GIS, IIS, conjugate
gradients Brown 59, Darroch and Ratciff 72,
Berger et al 96, Della Pietra et al 97, Goodman
02
Require computation of (or similar) per
iteration
Exact computation exponential in the size of the
largest clique in the Markov network and
proportional to the size of the data
Needs computation of the junction tree and
requires message passing e.g., Bach and Jordan
02
Needs potentially large number of iterations
Want to reduce computation

53
Sigmoidal Belief Network
a
b
c
d
Neal 92
54
Product of Univariate Conditional Maximum Entropy
Models

Approximate target distribution as a product of
univariate conditional MaxEnt distributions
(PUC-MaxEnt)
Parameters for each factor can be learned
separately
Requires summation over only a single modeled
variable at a time, not the largest clique
No message passing required
Intuition Bayesian network with factors modeled
as conditional univariate MaxEnt distributions
Sigmoidal belief networks Neal 92

55
Structure Learning

Number of possible structure super-exponential in
the number of variables
Finding optimal solution NP-hard Chickering 96
Need to search over possible structures
Search
Structure modification in the outer loop
Parameter estimation in the inner loop
Restricting to bivariate interactions
Edge induction

56
HMM-PUC-MaxEnt
Rt2
Rt2
Rt2
Rt1
Rt1
Rt1
Rt
Rt4
Rt3
Rt4
Rt3
Rt4
Rt3

St1
St2
St3
St
St
57
AR-HMM-PUC-MaxEnt
Rt-11
Rt-11
Rt-11
Rt1
Rt2
Rt1
Rt2
Rt1
Rt2
Rt-12
Rt-12
Rt-12

Rt-13
Rt-13
Rt-13
Rt3
Rt4
Rt3
Rt4
Rt4
Rt3
Rt-14
Rt-14
Rt-14
St1
St2
St3
58
Experimental Setup

Data
Australia
15 seasons, 184 days each, 30 stations
Queensland
40 seasons, 197 days each, 11 stations
Measuring predictive performance
Choose K (number of states)
Leave-n-out cross-validation
Evaluation metrics
Log-likelihood
Error for prediction of a single entry given the
rest
Difference in spatial correlation
Difference in persistence

59
Southwestern Australia
1978-1992 May-October 15 seasons 184 days 30
stations
60
Scaled out-of-sample log-likelihood (SW Australia)
61
Out-of-sample predictive error (SW Australia)
62
Examples of Weather States (HMM-CI)
63
Examples of Weather States (HMM-CL)
64
Examples of Weather States (HMM-PUC-MaxEnt)
65
Queensland (Northeastern Australia)
1958-1998 October-April 40 seasons 197 days 11
stations
66
Correlation and Persistence of Queensland Data
67
Scaled out-of-sample log-likelihood (Queensland)
68
Out-of-sample correlation difference (Queensland)
69
Out-of-sample persistence difference (Queensland)
70
Summary

Important and interesting application
Lots of data available, lots of problems to be
solved
Use tree-structured distributions
Can find parameters and structure at the same
time
If trees are not sufficient, prepare cycle
servers
Learning complexity jumps once loops are
introduced

71
Contributions

New models for multi-site rainfall occurrence and
amounts
Conditional Chow-Liu forest model for
multivariate data Kirshner, Smyth and Robertson,
UAI-2004
HMM with Chow-Liu and conditional Chow-Liu trees
for modeling multivariate time series Kirshner,
Smyth and Robertson, UAI-2004
HMM with Product-of-Univariate-Conditional MaxEnt
distributions (PUC-MaxEnt) Kirshner 2005
HMM with mixtures of exponentials Robertson et
al, in press
HMM with tree-structured mixtures

72
Software