Learning Stable Multivariate Baseline Models for Outbreak Detection presentation

About This Presentation

Transcript and Presenter's Notes

Title: Learning Stable Multivariate Baseline Models for Outbreak Detection

1
Learning Stable Multivariate Baseline Models for
Outbreak Detection

Sajid M. Siddiqi, Byron Boots, Geoffrey J.
Gordon, Artur W. Dubrawski
The Auton Lab School of Computer Science
Carnegie Mellon University

presented by Robin Sabhnani from the Auton Lab
This work was partly funded by NSF grant
IIS-0325581 and CDC award R01-PH000028
2
Motivation

Lots of health-related data available
Much of this data is temporal
Many data sources are also multivariate

3
Motivation

When detecting anomalies, the crucial information
could be hidden in the dynamics of the data as
well as the interaction between different data
streams
Our goal Learn good models for simulating
baseline data for use in training algorithms as
well as detecting anomalies
Linear Dynamical Systems are a good choice

4
Outline

Linear Dynamical Systems
Learning Stable Models
Experimental Setup
Results
Conclusion

5
Linear Dynamical Systems (LDS)
hidden variables (low-dimensional)
. . .
X1
X2
Xt
Xt1
Y1
Y2
Yt
Yt1
observed data (high-dimensional)
6
Linear Dynamical Systems (LDS)
hidden variables (low-dimensional)
. . .
X1
X2
Xt
Xt1
Y1
Y2
Yt
Yt1
observed data (high-dimensional)
7
Linear Dynamical Systems (LDS)
hidden variables (low-dimensional)
. . .
X1
X2
Xt
Xt1
Y1
Y2
Yt
Yt1
observed data (high-dimensional)

Dynamics matrix A models temporal evolution

8
Linear Dynamical Systems (LDS)
hidden variables (low-dimensional)
. . .
X1
X2
Xt
Xt1
Y1
Y2
Yt
Yt1
observed data (high-dimensional)

Dynamics matrix A models temporal evolution
Multivariate Gaussian noise vt , wt models
interaction between streams

9
Linear Dynamical Systems

The Good
Linear Dynamical Systems (aka State-Space models,
aka Kalman Filters) are a generalization of ARMA
models and can represent a wide range of time
series
LDS parameters can be learned from data
The Bad
LDSs learned from data are often unstable
Simulation from an unstable LDS degenerates

10
Stability

Stability of an LDS depends on its dynamics
matrix A
Let ?1,,?n be the eigenvalues of A in
decreasing order of magnitude
A is stable if ?1 lt 1
Constraining ?1 during learning is hard
We devise an iterative optimization method that
beats previous approaches in efficiency and
accuracy

A Constraint Generation Approach to Learning
Stable Linear Dynamical Systems, S. Siddiqi, B.
Boots, G. Gordon, NIPS 2007
11
Stability

Learning stable LDS models allows us to
Compress large temporal multivariate datasets
Generate realistic data sequences
Predict the future given some data
Deviations from predicted data indicate anomalies

12
Experimental Setup

Data
OTC drug sales data for 22 categories in 29
Pittsburgh zip codes over 60 days
track all zipcodes for cough/cold category
(multi-zipcode data)
track all drug-categories for city of pittsburgh
(multi-drug data)
Experiments
Learn a LDS model using first 15 days, and
Simulate a sequence (qualitative task)
Reconstruct state sequence (quantitative task)
Predict future occurrences (quantitative task)
Algorithms
Constraint Generation (our method),
LB-1 (state of the art stability algorithm),
Least Squares (naïve, no stability guarantees)

Subspace Identification with guaranteed
stability using subspace identification, S. Lacy
and D. Bernstein, ACC 2002
13
Data Simulations

Instability causes Least-Squares simulations to
diverge
Constraint Generation yields most realistic
simulations that are also stable

14
State Reconstruction

Obtained by computing the residual ?t Axt
xt12 , where xt are the estimated states
Least squares has the best score by definition,
since it is learned by regression on xt?xt1, but
at the cost of instability

squared error Multi-drug data Multi-zip data
Constraint Generation 57,338 26,171
LB-1 60,669 26,431
Least Squares 56,203 16,918

Stable methods trade off reconstruction error vs.
stability
Constraint Generation learns the most accurate
models that are also stable

15
Prediction (preliminary results)

Average prediction error obtained by tracking
(filtering) up to time t, then simulating upto
time t and calculating the sum of squared error,
and averaging this over all t and t gt t

avg sqd err (std dev) Multi-drug data Multi-zip data
Constraint Generation 59,845 (310) 45,465 (317)
LB-1 53,494 (364) 44,677 (266)
Least Squares 79,649 (648) n/a

Stable methods yield superior results to least
squares

16
Conclusion

Linear Dynamical Systems effective at modeling
multivariate time series data
Stability crucial for accurate performance
Superior performance of stable methods in
baseline generation and prediction on OTC data
Constraint Generation learns a more accurate
model with more realistic simulations, most
efficiently. Further work needed on prediction
accuracy metric.

A Constraint Generation Approach to Learning
Stable Linear Dynamical Systems, S. Siddiqi, B.
Boots, G. Gordon, NIPS 2007
17
Thank You! Questions?

further questions to siddiqi_at_cs.cmu.edu

Write a Comment

User Comments (0)

About PowerShow.com

Learning Stable Multivariate Baseline Models for Outbreak Detection PowerPoint PPT Presentation