High Performance Discovery from Time Series Streams - PowerPoint PPT Presentation

About This Presentation
Title:

High Performance Discovery from Time Series Streams

Description:

Basic window digests: sum. DFT coefs. Three level time interval hierarchy ... The basic window time is the system response time. Digests ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 49
Provided by: yunyu
Learn more at: https://cs.nyu.edu
Category:

less

Transcript and Presenter's Notes

Title: High Performance Discovery from Time Series Streams


1
High Performance Discovery from Time Series
Streams
Dennis Shasha Joint work with Yunyue
Zhu yunyue_at_cs.nyu.edu shasha_at_cs.nyu.edu Cou
rant Institute, New York University
2
Overall Outline
  • Data mining both classical and activist
  • Algorithmic tools for time series
  • Surprise.

3
Goal of this work
  • Time series are important in so many applications
    biology, medicine, finance, music, physics,
  • A few fundamental operations occur all the time
    burst detection, correlation, pattern matching.
  • Do them fast to make data exploration faster,
    real time, and more fun.
  • Extend functionality for music and science.

4
StatStream (VLDB,2002) Example
  • Stock prices streams
  • The New York Stock Exchange (NYSE)
  • 50,000 securities (streams) 100,000 ticks (trade
    and quote)
  • Pairs Trading, a.k.a. Correlation Trading
  • Querywhich pairs of stocks were correlated with
    a value of over 0.9 for the last three hours?

XYZ and ABC have been correlated with a
correlation of 0.95 for the last three hours. Now
XYZ and ABC become less correlated as XYZ goes up
and ABC goes down. They should converge back
later. I will sell XYZ and buy ABC
5
Online Detection of High Correlation
  • Given tens of thousands of high speed time series
    data streams, to detect high-value correlation,
    including synchronized and time-lagged, over
    sliding windows in real time.
  • Real time
  • high update frequency of the data stream
  • fixed response time, online

6
Online Detection of High Correlation
  • Given tens of thousands of high speed time series
    data streams, to detect high-value correlation,
    including synchronized and time-lagged, over
    sliding windows in real time.
  • Real time
  • high update frequency of the data stream
  • fixed response time, online

7
Online Detection of High Correlation
  • Given tens of thousands of high speed time series
    data streams, to detect high-value correlation,
    including synchronized and time-lagged, over
    sliding windows in real time.
  • Real time
  • high update frequency of the data stream
  • fixed response time, online

8
StatStream Algorithm
  • Naive algorithm
  • N number of streams
  • w size of sliding window
  • space O(N) and time O(N2w) VS space O(N2) and
    time O(N2) .
  • Suppose that the streams are updated every
    second.
  • With a Pentium 4 PC, the exact computing method
    can only monitor 700 streams with a delay of 2
    minutes.
  • Our Approach
  • Use Discrete Fourier Transform to approximate
    correlation
  • Use grid structure to filter out unlikely pairs
  • Our approach can monitor 10,000 streams with a
    delay of 2 minutes.

9
StatStream Stream synoptic data structure
  • Three level time interval hierarchy
  • Time point, Basic window, Sliding window
  • Basic window (the key to our technique)
  • The computation for basic window i must finish by
    the end of the basic window i1
  • The basic window time is the system response
    time.
  • Digests

10
StatStream Stream synoptic data structure
  • Three level time interval hierarchy
  • Time point, Basic window, Sliding window
  • Basic window (the key to our technique)
  • The computation for basic window i must finish by
    the end of the basic window i1
  • The basic window time is the system response
    time.
  • Digests

Basic window digests sum DFT coefs
11
StatStream Stream synoptic data structure
  • Three level time interval hierarchy
  • Time point, Basic window, Sliding window
  • Basic window (the key to our technique)
  • The computation for basic window i must finish by
    the end of the basic window i1
  • The basic window time is the system response
    time.
  • Digests

Basic window digests sum DFT coefs
Sliding window digests sum DFT coefs
12
StatStream Stream synoptic data structure
  • Three level time interval hierarchy
  • Time point, Basic window, Sliding window
  • Basic window (the key to our technique)
  • The computation for basic window i must finish by
    the end of the basic window i1
  • The basic window time is the system response
    time.
  • Digests

Basic window digests sum DFT coefs
Sliding window digests sum DFT coefs
13
StatStream Stream synoptic data structure
  • Three level time interval hierarchy
  • Time point, Basic window, Sliding window
  • Basic window (the key to our technique)
  • The computation for basic window i must finish by
    the end of the basic window i1
  • The basic window time is the system response
    time.
  • Digests

Basic window digests sum DFT coefs
Basic window digests sum DFT coefs
Basic window digests sum DFT coefs
Time point
Basic window
14
Synchronized Correlation Uses Basic Windows
  • Inner-product of aligned basic windows

Stream x
Stream y
Basic window
Sliding window
  • Inner-product within a sliding window is the sum
    of the inner-products in all the basic windows in
    the sliding window.

15
Approximate Synchronized Correlation
  • Approximate with an orthogonal function family
    (e.g. DFT)

x1 x2 x3 x4 x5
x6 x7 x8
16
Approximate Synchronized Correlation
  • Approximate with an orthogonal function family
    (e.g. DFT)

x1 x2 x3 x4 x5
x6 x7 x8
17
Approximate Synchronized Correlation
  • Approximate with an orthogonal function family
    (e.g. DFT)

x1 x2 x3 x4 x5
x6 x7 x8
18
Approximate Synchronized Correlation
  • Approximate with an orthogonal function family
    (e.g. DFT)
  • Inner product of the time series Inner
    product of the digests
  • The time and space complexity is reduced from
    O(b) to O(n).
  • b size of basic window
  • n size of the digests (nltltb)
  • e.g. 120 time points reduce to 4 digests

x1 x2 x3 x4 x5
x6 x7 x8
19
Approximate lagged Correlation
  • Inner-product with unaligned windows
  • The time complexity is reduced from O(b) to O(n2)
    , as opposed to O(n) for synchronized
    correlation. Reason terms for different
    frequencies are non-zero in the lagged case.

20
Grid Structure(to avoid checking all pairs)
  • The DFT coefficients yields a vector.
  • High correlation gt closeness in the vector space
  • We can use a grid structure and look in the
    neighborhood, this will return a super set of
    highly correlated pairs.

21
Empirical Study Speed
Our algorithm is parallelizable.
22
Empirical Study Precision
  • Approximation errors
  • Larger size of digests, larger size of sliding
    window and smaller size of basic window give
    better approximation
  • The approximation errors are small for the stock
    data.

23
Sketches Random Projection
  • Correlation between time series of the returns of
    stock
  • Since most stock price time series are close to
    random walks, their return time series are close
    to white noise
  • DFT/DWT cant capture approximate white noise
    series because there is no clear trend (too many
    frequency components).
  • Solution Sketches (a form of random landmark)
  • Sketches pool matrix of random variables drawn
    from stable distribution
  • Sketches The random projection of all time
    series to lower dimensions by multiplication with
    the same matrix
  • The Euclidean distance (correlation) between time
    series is approximated by the distance between
    their sketches with a probabilistic guarantee.

24
Burst Detection
25
Burst Detection Applications
  • Discovering intervals with unusually large
    numbers of events.
  • In astrophysics, the sky is constantly observed
    for high-energy particles. When a particular
    astrophysical event happens, a shower of
    high-energy particles arrives in addition to the
    background noise. Might last milliseconds or
    days
  • In telecommunications, if the number of packages
    lost within a certain time period exceeds some
    threshold, it might indicate some network
    anomaly. Exact duration is unknown.
  • In finance, stocks with unusual high trading
    volumes should attract the notice of traders (or
    perhaps regulators).

26
Bursts across different window sizes in Gamma Rays
  • Challenge to discover not only the time of the
    burst, but also the duration of the burst.

27
Elastic Burst Detection Problem Statement
  • Problem Given a time series of positive numbers
    x1, x2,..., xn, and a threshold function f(w),
    w1,2,...,n, find the subsequences of any size
    such that their sums are above the thresholds
  • all 0ltwltn, 0ltmltn-w, such that xm xm1 xmw-1
    f(w)
  • Brute force search O(n2) time
  • Our shifted wavelet tree (SWT) O(nk) time.
  • k is the size of the output, i.e. the number of
    windows with bursts

28
Burst Detection Data Structure and Algorithm
  • Define threshold for node for size 2k to be
    threshold for window of size 1 2k-1

29
Burst Detection Example
30
Burst Detection Example
True Alarm
False Alarm
31
False Alarms (requires work, but no errors)
32
Empirical Study Gamma Ray Burst
33
Extension to other aggregates
  • SWT can be used for any aggregate that is
    monotonic
  • SUM, COUNT and MAX are monotonically increasing
  • the alarm threshold is aggregateltthreshold
  • MIN is monotonically decreasing
  • the alarm threshold is aggregateltthreshold
  • Spread MAX-MIN
  • Application in Finance
  • Stock with burst of trading or quote(bid/ask)
    volume (Hammer!)
  • Stock prices with high spread

34
Empirical Study Stock Price Spread Burst
35
Extension to high dimensions

36
Elastic Burst in two dimensions
  • Population Distribution in the US

37
How to find the threshold for Elastic Burst?
  • Suppose that the moving sum of a time series is a
    random variable from a normal distribution.
  • Let the number of bursts in the time series
    within sliding window size w be So(w) and its
    expectation be Se(w).
  • Se(w) can be computed from the historical data.
  • Given a threshold probability p, we set the
    threshold of burst f(w) for window size w such
    that PrSo(w) f(w) p.

38
Find threshold for Elastic Bursts
  • F(x) is the normal cdf, so symmetric around 0
  • Therefore

F(x)
p
x
F-1(p)
39
Summary
  • Able to detect bursts of many different durations
    in essentially linear time.
  • Can be used both for time series and for spatial
    searching.
  • Can specify thresholds either with absolute
    numbers or with probability of hit.
  • Algorithm is simple to implement and has low
    constants (code is available).
  • Ok, its embarrassingly simple.

40
With a Little Help From My Warped Correlation
  • Karens humming Match
  • Denniss humming Match
  • What would you do if I sang out of tune?"
  • Yunyues humming Match

41
Related Work in Query by Humming
  • Traditional method String Matching
    Ghias et. al. 95, McNab
    et.al. 97,Uitdenbgerd and Zobel 99
  • Music represented by string of pitch directions
    U, D, S (degenerated interval)
  • Hum query is segmented to discrete notes, then
    string of pitch directions
  • Edit Distance between hum query and music score
  • Problem
  • Very hard to segment the hum query
  • Partial solution users are asked to hum
    articulately
  • New Method matching directly from audio
    Mazzoni and Dannenberg 00
  • Problem
  • slowed down by DTW

42
Time Series Representation of Query
Segment this!
  • An example hum query
  • Note segmentation is hard!

43
How to deal with poor hum queries?
  • No absolute pitch
  • Solution the average pitch is subtracted
  • Incorrect tempo
  • Solution Uniform Time Warping
  • Inaccurate pitch intervals
  • Solution return the k-nearest neighbors
  • Local timing variations
  • Solution Dynamic Time Warping

44
Dynamic Time Warping
  • Euclidean distance sum of point-by-point
    distance
  • DTW distance allowing stretching or squeezing
    the time axis locally

45
Envelope Transform using Piecewise Aggregate
Approximation(PAA) Keogh VLDB 02
46
Envelope Transform using Piecewise Aggregate
Approximation(PAA)
  • Advantage of tighter envelopes
  • Still no false negatives, and fewer false
    positives

47
Container Invariant Envelope Transform
  • Container-invariant A transformation T for
    envelope such that
  • Theorem if a transformation is
    Container-invariant and Lower-bounding, then the
    distance between transformed times series x and
    transformed envelope of y lower bound their DTW
    distance.

48
The Vision
  • Ability to match time series quickly may open up
    entire new application areas, e.g. fast reaction
    to external events, music by humming and so on.
  • Main problems accuracy, excessive specification.
  • Reference (advert) High Performance Discovery in
    Time Series (Springer 2004)
Write a Comment
User Comments (0)
About PowerShow.com