BRAID: Stream Mining through Group Lag Correlations - PowerPoint PPT Presentation

Loading...

PPT – BRAID: Stream Mining through Group Lag Correlations PowerPoint presentation | free to download - id: 192e73-ZDc1Z



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

BRAID: Stream Mining through Group Lag Correlations

Description:

Monitor multiple numerical streams determine the pair ... Sxx(l,n) = : sum of square X of length n. Sxy(l) = : sum of square X of length n. Proposed method ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 33
Provided by: FM96
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: BRAID: Stream Mining through Group Lag Correlations


1
BRAID Stream Mining through Group Lag
Correlations
  • Yasushi Sakurai Spiros Papadimitriou Christos
    Faloutsos
  • SIGMOD 2005

2
Introduction
  • Lag correlations
  • For example
  • Higher amounts of fluoride in water ? fewer
    dental cavities some years later
  • Goal
  • Monitor multiple numerical streams determine the
    pair correlated with lag and the value

3
Introduction
  • k numerical sequences X1,Xk , report all pair of
    Xi and Xj which Xi follow Xj with lag l

4
Introduction
5
Introduction
  • In this paper, propose BRAID handle data stream
    of semi-infinite length
  • Any time processing, and fast
  • Nimble
  • Accurate
  • Small resource consumption

6
Proposed method
  • Data stream X x1, , xt, ..., xn , xn is the
    most recent value
  • R(0) X and Y with the same length n and
    have zero lag
  • ? Coefficient

7
Proposed method
  • For lag l ,consider common part of X and shifted
    Y , only n-l time ticks

8
Proposed method
9
Proposed method
  • R(l) correlation coefficient, X is delayed by l
  • Score at lag l

10
Proposed method
  • R(l) for large value of lag l n, the original
    and shifted time sequence have too few
    overlapping
  • Restrict maximum lag m to be n/2

11
Proposed method
  • Naive solution
  • At time n, access all value of X and Y, compute
    R(l) of all value lag l(0,1,)
  • Choose earliest max score above r , or report no
    lag
  • The solution based on three major step

12
Proposed method
  • Need some sufficient statistics for R to computed
    easily
  • Sx(l,n) sum of X of length n
  • Sxx(l,n) sum of square X of length n
  • Sxy(l) sum of square X of length n

13
Proposed method
  • R(l) is obtained

14
Proposed method
  • R(l) can estimate at any point time, only need to
    keep track five sufficient statistics
  • It still needs linear time to compute the
    cross-correlation function between two sequences

15
Proposed method
  • Propose to keep track of only a geometric
    progression of the lag value l 0,1,2,..2i,.
  • Only O(logn) number to track of, instead of O(n)
    that Naïve solution requires
  • Space required grow linearly with length n

16
Proposed method
  • In order to compute R(l) at any time, keep
    sliding window of size l, mn/2 need O(n) space
  • Instead of operating on original time sequence,
    also compute their smoothed version by computing
    non-overlapping windows

17
Proposed method
  • Window size power of g2
  • X original time sequence
  • Axh smoothed version with window of length 2h
  • Ax0 original sequence, Ax1 consists of n/2
    ticks ,..etc
  • Axh s sufficient statistic need compute every 2h
    time ticks
  • At time n, need O(log n) level, for each level
    compute sufficient statistic

18
(No Transcript)
19
Proposed method
  • In contrast with small lags, the larger one are
    sparse
  • Use cubic spline to interpolate the missing
    correlation coefficient

20
Proposed method
  • Axh(t) window average at time tick t for level
    h
  • Axh(0) xt

21
Proposed method
  • Sufficient statistics

22
(No Transcript)
23
Enhanced BRAID
  • If two sequence of size 220, require about
    5log 220 520100 float numbers , about 800
    bytes
  • Large memory available, propose a solution to
    probe more but use O(log n) space
  • Use mix of arithmetic plus geometric probing

24
Enhanced BRAID
  • BRAID use only one window at each smoothing level
  • Propose use bgt1 windows, b4 instead
  • Algorithm before b1,with exception bottom level
    has 2b coefficient
  • While computing R(l), use mixture geometric and
    arithmetic progression

25
Enhanced BRAID
  • Example of enhanced BRAID of b4
  • The algorithm behind if b1 also equal to the
    algorithm before

26
(No Transcript)
27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
Conclusion
  • Proposed BRAID to detection lag correlation on
    streaming data
  • At any time
  • Low resource consumption
  • High accuracy

32
Thank you very much
About PowerShow.com