Forecast, Detect, Intervene: Anomaly Detection for Time Series. - PowerPoint PPT Presentation

About This Presentation
Title:

Forecast, Detect, Intervene: Anomaly Detection for Time Series.

Description:

E.g. query volume, Hang-ups, ER admissions. Goal: A semi-automated statistical approach ... Dotted solid lines: Days when reports appeared in mainstream media ... – PowerPoint PPT presentation

Number of Views:136
Avg rating:3.0/5.0
Slides: 43
Provided by: Yah946
Category:

less

Transcript and Presenter's Notes

Title: Forecast, Detect, Intervene: Anomaly Detection for Time Series.


1
Forecast, Detect, Intervene Anomaly Detection
for Time Series.
  • Deepak Agarwal
  • Yahoo! Research

2
Outline
  • Approach
  • Forecast
  • Detect
  • Intervene
  • Monitoring multiple series
  • Multiple testing, a Bayesian solution.
  • Application

3
Issues
  • yt univariate, regularly spaced time series
    to be monitored for anomalies, novel events ,
    surprises prospectively.
  • E.g. query volume, Hang-ups, ER admissions
  • Goal A semi-automated statistical approach
  • Forecast accurately good baseline model.
  • Detect deviations from baseline
  • sensitivity/specificity/timeliness
  • Baseline model adaptive learn changes
    automatically
  • Important in applications better forecasts
    ?fewer false ve

4
Approach
  • Three components (West and Harrison, 1976)
  • Forecast Bayesian version of Kalman filter
  • Detection A new sequential algorithm
  • Intervention correct baseline model
  • .

5
Forecasting
6
Kalman filter
  • Observation Equation
  • Conditional distribution of data given parameters
  • State Equation
  • Evolution of parameters (states) through time
  • Posterior of states, predictive distribution
  • Estimated online by recursive algorithm

7
OBSERVATION EQUATION
8
STATE EQUATION What assumptions are appropriate
for the Truth?
9
More general models
Yt-1
Yt
Yt-1
xt
Xt-1
Xt-1
Gt
10
Lemma Bayes updating for Gaussian model
11
Kalman Filter update at time t
12
Estimating Variance components
13
Detection
14
An existing method
15
(No Transcript)
16
Pitfalls of GS
  • What if predictive not Gaussian?
  • Mixtures of Gaussians, Poisson etc
  • Bayes factor specify alternative explicitly
  • Large number of unspecified parameters
  • Require explicit model for each alternative

17
Our approach
  • Normal scores derived from p-values
  • Good for continuous, approximately good for
    discrete, especially for large means.
  • A sequential procedure with far less tweaking
    parameters.
  • Our method has more power, we sacrifice on
    timeliness.

18
Sequential detection procedure
  • At time t, we are in one of these regions
  • Acceptance region (A) The null model is true,
    the system is behaving as expected, no anomalies,
    start a new run.
  • Rejection region (R) The null model is not
    true, an anomaly is generated which is reported
    to the user and/or the forecasting model is
    reset. Start a new run.
  • Continue (C) Dont have enough data to reach a
    decision, keep accumulating evidence by taking
    another sample.

19
Detecting outliers and mean shifts
20
Detecting variance shifts
21
Gradual changes, auto correlated errors.
22
r P t
3 .84 14.91
4 .87 4.40
5 .96 3.77
6 .98 3.98
7 .99 3.66
8 .98 0.10
9 .98 0.10
10 .85 0.09
11 .85 0.09
23
The sequential algorithm at time t
24
Blueours red Gargallo and Salvador(GS)
25
(No Transcript)
26
Intervention
27
Intervention to adjust the baseline.
  • Outlier ? A tail or rare event has occurred
  • Ignore points ? short tail more false ve
  • Use points? elongated tail, more false -ve
  • A robust solution ignore points but elongate
    tail
  • retain same prior mean, increase prior variance.
  • system adapts, re-initializing the monitor.
  • Use the above for mean shifts and variance
    increase.
  • Variance decrease System stable, make prior
    tight.
  • Slow changes System under-adaptive, make prior
    vague.

28
Intervention strategy
29
No intervention, m1
30
strong intervention, m3
31
Example Blue is data, yellow is forecast.
32
Multiple testing
33
Multiple testing A Bayesian Approach.
  • Monitoring large number of independent streams
  • testing multiple hypotheses at each time point
  • Need correction for multiple testing.
  • Main idea
  • Derive an empirical null based on observed
    deviations
  • Present analyst with interesting cases adjusting
    for global characteristics of the system.
  • We use a Bayesian approach to derive shrinkage
    estimates of deviations
  • the shrunk deviations automatically build in
    penalty for conducting multiple tests.

34
Bayesian procedure.
35
Experiment comparing multiple testing versus
naïve procedure (threshold raw standardized
residuals)
  • Simulate K noise points N(0,1) (K500,1000,..),
    100 signal points from 2,11U-2,-11.
  • Adjust threshold of Bayesian residuals to match
    sensitivity of naive procedure.
  • Compute False Discovery Rate (FDR) for both
    procedures.

36
FDR of naive and Bayesian procedures. The
Bayesian method gets better with increase in
number of time series.
Calculations based on 100 replications. The
differences are statistically significant.
37
Application
38
Motivating Application (bio-surveillance).
  • Goal To find leading indicators of social
    disruption events in China before it gets
    reported in the mainstream media.
  • Approach Monitor the occurrence of a set of
    pre-defined patterns on a collection of Chinese
    websites (mainly news sites, government sites and
    portals similar to yahoo located in eastern
    China).

39
English translation of some Chinese patterns
being monitored
40
Notations and transformation.
41
Dotted solid lines Days when reports appeared in
mainstream mediaDotted gray lines Days when our
system found spikes related to the reports that
appeared later.
42
Rough validation using actual media reports.
  • July 24th mystery illness kills 17 people in
    China, we noticed several spikes on July 17th and
    18th alerting us on this.
  • Sept 29th and Dec 7th On Sept 29th , news
    reports of China carving out emergency plans to
    fight bird flu and prevent it from spreading to
    humans. On Dec 7th , a confirmed case of bird flu
    in humans reported.
  • We reported several spikes on Sept 12th and 14th,
    Nov 2nd, 7th, 11th, and 16th mostly for the
    pattern influenza, flu, pneumonia, meningitis. On
    Nov 21st , four big spikes on bf3.syd.com.cn on
    influenza, flu, pneumonia, meningitis
  • emergency, disaster, crisis prevention and
    quarantine.

43
Questions?
Write a Comment
User Comments (0)
About PowerShow.com