# BRAID: Stream Mining through Group Lag Correlations - PowerPoint PPT Presentation

PPT – BRAID: Stream Mining through Group Lag Correlations PowerPoint presentation | free to download - id: 192e73-ZDc1Z

The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
Title:

## BRAID: Stream Mining through Group Lag Correlations

Description:

### Monitor multiple numerical streams determine the pair ... Sxx(l,n) = : sum of square X of length n. Sxy(l) = : sum of square X of length n. Proposed method ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 33
Provided by: FM96
Category:
Tags:
Transcript and Presenter's Notes

Title: BRAID: Stream Mining through Group Lag Correlations

1
BRAID Stream Mining through Group Lag
Correlations
• Yasushi Sakurai Spiros Papadimitriou Christos
Faloutsos
• SIGMOD 2005

2
Introduction
• Lag correlations
• For example
• Higher amounts of fluoride in water ? fewer
dental cavities some years later
• Goal
• Monitor multiple numerical streams determine the
pair correlated with lag and the value

3
Introduction
• k numerical sequences X1,Xk , report all pair of
Xi and Xj which Xi follow Xj with lag l

4
Introduction
5
Introduction
• In this paper, propose BRAID handle data stream
of semi-infinite length
• Any time processing, and fast
• Nimble
• Accurate
• Small resource consumption

6
Proposed method
• Data stream X x1, , xt, ..., xn , xn is the
most recent value
• R(0) X and Y with the same length n and
have zero lag
• ? Coefficient

7
Proposed method
• For lag l ,consider common part of X and shifted
Y , only n-l time ticks

8
Proposed method
9
Proposed method
• R(l) correlation coefficient, X is delayed by l
• Score at lag l

10
Proposed method
• R(l) for large value of lag l n, the original
and shifted time sequence have too few
overlapping
• Restrict maximum lag m to be n/2

11
Proposed method
• Naive solution
• At time n, access all value of X and Y, compute
R(l) of all value lag l(0,1,)
• Choose earliest max score above r , or report no
lag
• The solution based on three major step

12
Proposed method
• Need some sufficient statistics for R to computed
easily
• Sx(l,n) sum of X of length n
• Sxx(l,n) sum of square X of length n
• Sxy(l) sum of square X of length n

13
Proposed method
• R(l) is obtained

14
Proposed method
• R(l) can estimate at any point time, only need to
keep track five sufficient statistics
• It still needs linear time to compute the
cross-correlation function between two sequences

15
Proposed method
• Propose to keep track of only a geometric
progression of the lag value l 0,1,2,..2i,.
• Only O(logn) number to track of, instead of O(n)
that Naïve solution requires
• Space required grow linearly with length n

16
Proposed method
• In order to compute R(l) at any time, keep
sliding window of size l, mn/2 need O(n) space
• Instead of operating on original time sequence,
also compute their smoothed version by computing
non-overlapping windows

17
Proposed method
• Window size power of g2
• X original time sequence
• Axh smoothed version with window of length 2h
• Ax0 original sequence, Ax1 consists of n/2
ticks ,..etc
• Axh s sufficient statistic need compute every 2h
time ticks
• At time n, need O(log n) level, for each level
compute sufficient statistic

18
(No Transcript)
19
Proposed method
• In contrast with small lags, the larger one are
sparse
• Use cubic spline to interpolate the missing
correlation coefficient

20
Proposed method
• Axh(t) window average at time tick t for level
h
• Axh(0) xt

21
Proposed method
• Sufficient statistics

22
(No Transcript)
23
Enhanced BRAID
• If two sequence of size 220, require about
5log 220 520100 float numbers , about 800
bytes
• Large memory available, propose a solution to
probe more but use O(log n) space
• Use mix of arithmetic plus geometric probing

24
Enhanced BRAID
• BRAID use only one window at each smoothing level
• Propose use bgt1 windows, b4 instead
• Algorithm before b1,with exception bottom level
has 2b coefficient
• While computing R(l), use mixture geometric and
arithmetic progression

25
Enhanced BRAID
• Example of enhanced BRAID of b4
• The algorithm behind if b1 also equal to the
algorithm before

26
(No Transcript)
27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
Conclusion
• Proposed BRAID to detection lag correlation on
streaming data
• At any time
• Low resource consumption
• High accuracy

32
Thank you very much