Let It Rain: Modeling Multivariate Rain Time Series Using Hidden Markov Models - PowerPoint PPT Presentation

1 / 73
About This Presentation
Title:

Let It Rain: Modeling Multivariate Rain Time Series Using Hidden Markov Models

Description:

Snapshot of the Data. March 2, 2006. 8. Modeling Precipitation Occurrence... Northeast Brazil ... P(r|Q) (ML) or P(Q|r) (MAP) Cannot be done in closed form ... – PowerPoint PPT presentation

Number of Views:148
Avg rating:3.0/5.0
Slides: 74
Provided by: sergeyk7
Category:

less

Transcript and Presenter's Notes

Title: Let It Rain: Modeling Multivariate Rain Time Series Using Hidden Markov Models


1
Let It Rain Modeling Multivariate Rain Time
Series Using Hidden Markov Models
  • Sergey Kirshner
  • Donald Bren School of
  • Information and Computer Sciences
  • UC Irvine

March 2, 2006
2
Acknowledgements
Padhraic Smyth UCI
Andy Robertson IRI
DOE (DE-FG02-02ER63413)
3
http//iri.columbia.edu/climate/forecast/net_asmt/
2006/feb2006/MAM06_World_pcp.html
4
What to Do with Rainfall Data?
Description
historical rainfall data
model
general circulation model (GCM) outputs
5
What to Do with Rainfall Data?
Downscaling
historical rainfall data
predicted data
model
general circulation model (GCM) outputs
6
What to Do with Rainfall Data?
Simulation
crop modeling
historical rainfall data
predicted data
model
general circulation model (GCM) outputs
water management
7
Snapshot of the Data
8
Modeling Precipitation Occurrence
Northeast Brazil 1975-2002 (except 1976, 78, 84,
and 86) 24 seasons (N) 90 days (T) 10 stations
(M)
9
and Amounts
10
Annual Precipitation Probability
11
Spatial Correlation
12
Spell Run Length Distributions
Dry spells are in blue wet spells are in red.
13
Important Data Characteristics
  • Correlation
  • Spatial dependence
  • Temporal structure
  • Run-length distributions
  • Persistence
  • First order dependence
  • Variability of individual series
  • Interannual variability important for climate
    studies

14
Missing Data
Missing data mask (black) for 41 stations
(y-axis) in India for May 1 - Oct 31, 1973. 29
of the data is missing, with stations 13 14, 16,
24, 26, 30, 36, 38, and 40 missing more than 45
of the data for that station.
15
A Bit of Notation
  • Vector time series R
  • Vector observation of R at time t

R11
R21
RT1
R12
R22
RT2
R13
R23
RT3
R1M
R2M
RTM
R1
R2
RT
16
Weather Generator
  • Does not take spatial correlation into account

17
Rain Generating Process
18
Hidden Markov Model (HMM)
  • Discrete weather states S (K states)
  • Evolution of the weather state transition
    probability P(StSt-1)
  • Rainfall generation in weather state i emission
    probability P(RtSti)

19
Hidden Markov Model (HMM)
R1
R2
Rt
RT-1
RT
S1
S2
St
ST-1
ST
20
Basic Operations with HMMs
  • Probability of weather states given observed data
    (inference)
  • Forward-Backward
  • Model parameter estimation given the data
  • Baum-Welch (EM)
  • Most likely sequence of weather states given the
    data
  • Viterbi

Rabiner 89
21
States for 4-state HMM
Robertson, Kirshner, Smyth 04
22
Weather State Evolution
Robertson, Kirshner, and Smyth 04
23
Generalizations to HMMs Auto-regressive HMM
(AR-HMM)
  • Explicitly models temporal first-order dependence
    of rainfall

24
Generalizations to HMMs Non-homogeneous HMM
(NHMM)
  • Incorporates atmospheric variables
  • Allows non-stationary and oscillatory behavior

Hughes and Guttorp 94 Bengio and Frasconi 95
25
Parameter Estimation
  • Find Q maximizing P(rQ) (ML) or P(Qr) (MAP)
  • Cannot be done in closed form
  • EM (Baum-Welch for HMMs)
  • E-step compute
  • Forward-Backward
  • Calculate
  • M-step
  • Maximize
  • Can be split into maximization of emission and
    transition parameters

26
Modeling Approaches
  • Use HMMs
  • Transition probabilities for temporal dependence
  • Emissions (hidden state distributions) for
    spatial or multivariate dependence (and
    additional temporal dependence)
  • Emphasis on categorical valued data
  • Transitions and emissions can be specified
    separately
  • Covers cross-product of models

27
Modeling Approaches (contd)
  • Use HMMs
  • Possible emission distributions
  • Conditional independence
  • Chow-Liu trees Chow and Liu 68, conditional
    Chow-Liu forests Kirshner et al 04
  • Markov Random Fields
  • Maximum entropy models e.g., Jelinek 98,
    Boltzmann machines e.g., Hinton and Sejnowski
    86, thin junction trees Bach and Jordan 02
  • Belief Networks
  • Sigmoidal belief networks Neal 92
  • Possible transition distributions
  • Non-homogeneous mixture (mixture of experts
    Jordan and Jacobs 94)
  • Stationary transition matrix
  • Non-homogeneous transition matrix (Hughes and
    Guttorp 94, Meila and Jordan 96, Bengio and
    Fasconi 95)

28
HMM-CI
e.g., Zucchini and Guttorp 91 Hughes and
Guttorp 94
29
Why Use HMM-CI?
  • Simple and efficient
  • O(TKM) for inference and for parameter estimation
  • Small number of free parameters
  • Can handle missing data
  • Can be used to model amounts

30
HMM-CI for Amounts
  • Types of mixture components
  • Gamma Bellone 01
  • Exponentials Robertson et al 06

31
Why not HMM-CI
  • Not matching spatial correlations or persistence
    well
  • Models spatial correlation implicitly through
    hidden states
  • May require large K to model regions with
    moderate number of stations

32
HMM-Autologistic
Hughes, Guttorp, and Charles 99
33
What about HMM-Autologistic?
  • Sure!
  • Models spatial correlations very well
  • Can use sampling or approximate schemes to
    compute normalization constant and to update
    parameters
  • Not so sure
  • Complexity of exact computation is exponential in
    M
  • What about temporal dependence?
  • May have too many free parameters if not
    constrained
  • Does not handle missing values (or very slow)

34
Neither Here nor There
  • HMM-CI efficient but too simplistic
  • HMM-Autologistic more capable but computationally
    more cumbersome
  • Want something in between
  • Computationally tractable
  • Emission spatial dependence
  • Additional temporal dependence
  • Missing values

35
Bayesian Networks and Trees
  • Tree-structured distributions
  • Chow-Liu trees (spatial dependence) Chow and Liu
    68
  • With HMMs Kirshner et al 04
  • Conditional Chow-Liu forests (spatial and
    temporal dependence) Kirshner et al 04
  • Markov (undirected) and Bayesian (directed)
    networks
  • MaxEnt (logistic)
  • Conditional MaxEnt
  • Sigmoidal belief networks Neal 92
  • Would need to estimate both the parameters and
    the structure

36
Chow-Liu Trees
  • Approximation of a joint distribution with a
    tree-structured distribution Chow and Liu 68
  • Maximizing log-likelihood ? solving maximum
    spanning tree (MST) problem
  • Can find both the tree structure and the
    parameters in one swoop!
  • Finding MST is quadratic in the number of nodes
    Kruskal 59
  • Edge weights are pairwise mutual information
    values measure of conditional independence

37
Learning Chow-Liu Trees
0.3126 0.0229 0.0172 0.0230 0.0183 0.2603
38
Chow-Liu Trees
  • Approximation of a joint distribution with a
    tree-structured distribution Chow and Liu 68
  • Properties
  • Efficient
  • O(TM2B2)
  • Optimal
  • Can handle missing data
  • Mixture of trees Meila and Jordan 00
  • More expressive than trees yet with simple
    estimation procedure
  • HMMs with trees Kirshner et al 04

39
HMM-Chow-Liu
Kirshner et al 04
40
Tree-structured Emissions for Amounts
Ot2
Ot1
Rt2
Rt1
Ot4
Ot3
Rt4
Rt3
St1
41
Improving on Chow-Liu Trees
  • Tree edges with low MI add little to the
    approximation.
  • Observations from the previous time point can be
    more relevant than from the current one.
  • Idea Build Chow-Liu tree allowing it to include
    variables from the current and the previous time
    point.

42
Conditional Chow-Liu Forests
  • Extension of Chow-Liu trees to conditional
    distributions
  • Approximation of conditional multivariate
    distribution with a tree-structured distribution
  • Uses MI to build maximum spanning (directed)
    trees (forest)
  • Variables of two consecutive time points as nodes
  • All nodes corresponding to the earlier time point
    considered connected before the tree construction
  • Same asymptotic complexity as Chow-Liu trees
  • Optimal (within the class of structures)

Kirshner et al 04
43
Example of CCL-Forest Learning
0.3126 0.0229 0.0230 0.1207 0.1253 0.0623 0.1392 0
.1700 0.0559 0.0033 0.0030 0.0625
44
HMM-Conditional-Chow-Liu

St1
St2
St3
Kirshner et al 04
45
Beyond Trees
  • Can learn more complex structure
  • Optimality not guaranteed Chickering 96 Srebro
    03
  • Structure and parameters may have to be learned
    in separate computations
  • Computationally expensive
  • Independence model matches all univariate
    marginals
  • Chow-Liu trees match all univariate and some
    bivariate marginals
  • Unconstrained Bayesian or Markov Networks
  • May have too few data points for the number of
    parameters
  • Even 3rd order cliques may have zero probability
    mass

46
Log-linear or Logistic
a
b
c
d
47
Maximum Entropy Method
  • Given
  • Target distribution (empirical)
  • Set of features and corresponding constraints
  • Example feature is 1 when it rains both at
    station 1 and 2
  • Corresponding constraint
  • Interpretation
  • Proportion of time it rains simultaneously at
    stations 1 and 2 is the same for both the
    historical data and according to the
    learned distribution
  • Want to satisfy all of the constraints

e.g., Jelinek 98
48
MaxEnt Method (contd)
  • Maximize entropy of subject to
    constraints corresponding to features
  • Exponential form
  • satisfying all of the
    constraints for features in maximizes the
    log-likelihood of the data!!! e.g., Della Pietra
    et al 97
  • Such solution is unique (likelihood is concave)

49
HMM-Autologistic
Hughes, Guttorp, and Charles 99
50
Conditional Log-linear Distribution
a
d
b
c
e
51
Conditional MaxEnt Method
  • Extension of MaxEnt distribution to conditional
    distributions
  • Target distribution
  • Set of features and corresponding constraints
  • Maximize conditional entropy subject to
    constraints

e.g., Lafferty et al 01
52
Learning parameters of MaxEnt models
  • Assume set of features given
  • Need only free parameters to learn
  • Cannot be done in closed form
  • Iterative algorithms IS, GIS, IIS, conjugate
    gradients Brown 59, Darroch and Ratciff 72,
    Berger et al 96, Della Pietra et al 97, Goodman
    02
  • Require computation of (or similar) per
    iteration
  • Exact computation exponential in the size of the
    largest clique in the Markov network and
    proportional to the size of the data
  • Needs computation of the junction tree and
    requires message passing e.g., Bach and Jordan
    02
  • Needs potentially large number of iterations
  • Want to reduce computation

53
Sigmoidal Belief Network
a
b
c
d
Neal 92
54
Product of Univariate Conditional Maximum Entropy
Models
  • Approximate target distribution as a product of
    univariate conditional MaxEnt distributions
    (PUC-MaxEnt)
  • Parameters for each factor can be learned
    separately
  • Requires summation over only a single modeled
    variable at a time, not the largest clique
  • No message passing required
  • Intuition Bayesian network with factors modeled
    as conditional univariate MaxEnt distributions
  • Sigmoidal belief networks Neal 92

55
Structure Learning
  • Number of possible structure super-exponential in
    the number of variables
  • Finding optimal solution NP-hard Chickering 96
  • Need to search over possible structures
  • Search
  • Structure modification in the outer loop
  • Parameter estimation in the inner loop
  • Restricting to bivariate interactions
  • Edge induction

56
HMM-PUC-MaxEnt
Rt2
Rt2
Rt2
Rt1
Rt1
Rt1
Rt
Rt4
Rt3
Rt4
Rt3
Rt4
Rt3

St1
St2
St3
St
St
57
AR-HMM-PUC-MaxEnt
Rt-11
Rt-11
Rt-11
Rt1
Rt2
Rt1
Rt2
Rt1
Rt2
Rt-12
Rt-12
Rt-12

Rt-13
Rt-13
Rt-13
Rt3
Rt4
Rt3
Rt4
Rt4
Rt3
Rt-14
Rt-14
Rt-14
St1
St2
St3
58
Experimental Setup
  • Data
  • Australia
  • 15 seasons, 184 days each, 30 stations
  • Queensland
  • 40 seasons, 197 days each, 11 stations
  • Measuring predictive performance
  • Choose K (number of states)
  • Leave-n-out cross-validation
  • Evaluation metrics
  • Log-likelihood
  • Error for prediction of a single entry given the
    rest
  • Difference in spatial correlation
  • Difference in persistence

59
Southwestern Australia
1978-1992 May-October 15 seasons 184 days 30
stations
60
Scaled out-of-sample log-likelihood (SW Australia)
61
Out-of-sample predictive error (SW Australia)
62
Examples of Weather States (HMM-CI)
63
Examples of Weather States (HMM-CL)
64
Examples of Weather States (HMM-PUC-MaxEnt)
65
Queensland (Northeastern Australia)
1958-1998 October-April 40 seasons 197 days 11
stations
66
Correlation and Persistence of Queensland Data
67
Scaled out-of-sample log-likelihood (Queensland)
68
Out-of-sample correlation difference (Queensland)
69
Out-of-sample persistence difference (Queensland)
70
Summary
  • Important and interesting application
  • Lots of data available, lots of problems to be
    solved
  • Use tree-structured distributions
  • Can find parameters and structure at the same
    time
  • If trees are not sufficient, prepare cycle
    servers
  • Learning complexity jumps once loops are
    introduced

71
Contributions
  • New models for multi-site rainfall occurrence and
    amounts
  • Conditional Chow-Liu forest model for
    multivariate data Kirshner, Smyth and Robertson,
    UAI-2004
  • HMM with Chow-Liu and conditional Chow-Liu trees
    for modeling multivariate time series Kirshner,
    Smyth and Robertson, UAI-2004
  • HMM with Product-of-Univariate-Conditional MaxEnt
    distributions (PUC-MaxEnt) Kirshner 2005
  • HMM with mixtures of exponentials Robertson et
    al, in press
  • HMM with tree-structured mixtures

72
Software
  • (M)ulti(V)ariate (N)onhomogeneous (H)idden
    (M)arkov (M)odels Toolbox
  • Free software for multivariate time series
    modeling with HMM as a backbone
  • Large selection of implemented emission
    distributions

http//www.datalab.uci.edu/software/mvnhmm
73
Future Work
  • Rainfall
  • Filling in missing data
  • Modeling large regions
  • Factorized state space
  • Using satellite data
  • OLR fields
  • Subseasonal predictions
  • Selecting good input variables
  • Other models for amounts
  • Machine Learning
  • Learning structure of the distribution from data
  • Modeling in the presence of missing data
  • Loops in HMM-Conditional-Chow-Liu and log-linear
    models
  • Factorized state-space models
  • Continuous hidden-state models
  • Modeling of multivariate real-valued non-Gaussian
    distributions

74
Correlation for 4-state HMM-CI
Robertson et al 04
75
Persistence for 4-state HMM-CI
Robertson et al 04
76
Inference for NHMMs
  • Inference (calculating )
  • Forward-Backward recursively compute

77
Forecasting Precipitation
  • Can we use this model for forecasting?
  • Same predicted expected values, no variability
  • Need additional information about the seasons to
    be forecasted

78
HMM-CI Is It Sufficient?
  • Simple yet effective
  • Few parameters
  • Implicit marginal spatial dependency through the
    hidden states
  • Requires large number of hidden states
  • Points to exploration of dependency models

79
Limitations of Chow-Liu Structures
Write a Comment
User Comments (0)
About PowerShow.com