We have seen the Classic Time Series Data Mining Tasks

About This Presentation

Title:

We have seen the Classic Time Series Data Mining Tasks

Description:

Title: dd Author: Information and Computer Science Last modified by: eamonn keogh Created Date: 8/10/2001 5:34:28 PM Document presentation format – PowerPoint PPT presentation

Number of Views:123

Avg rating:3.0/5.0

Slides: 54

Provided by: Informa86

Learn more at: http://www.cs.ucr.edu

Category:

more less

Transcript and Presenter's Notes

Title: We have seen the Classic Time Series Data Mining Tasks

1
We have seen the Classic Time Series Data Mining
Tasks
Clustering
Classification
Query by Content
2
Lets look at some problems which are more
interesting
Novelty Detection

Rule Discovery
Motif Discovery
10 ? s 0.5 c 0.3
3
Novelty Detection
Fault detection Interestingness
detection Anomaly detection Surprisingness
detection

4
note that this problem should not be confused
with the relatively simple problem of outlier
detection. Remember Hawkins famous definition of
an outlier...
... an outlier is an observation that deviates so
much from other observations as to arouse
suspicion that it was generated from a different
mechanism...
Thanks Doug, the check is in the mail. We are not
interested in finding individually surprising
datapoints, we are interested in finding
surprising patterns.
Douglas M. Hawkins
5
Lots of good folks have worked on this, and
closely related problems. It is referred to as
the detection of Aberrant Behavior1,
Novelties2, Anomalies3, Faults4,
Surprises5, Deviants6 ,Temporal Change7,
and Outliers8.

Brutlag, Kotsakis et. al.
Daspupta et. al., Borisyuk et. al.
Whitehead et. al., Decoste
Yairi et. al.
Shahabi, Chakrabarti
Jagadish et. al.
Blockeel et. al., Fawcett et. al.
Hawkins.

6
Arrr... what be wrong with current approaches?

The blue time series at the top is a normal
healthy human electrocardiogram with an
artificial flatline added. The sequence in red
at the bottom indicates how surprising local
subsections of the time series are under the
measure introduced in Shahabi et. al.
7
Simple Approaches I
Limit Checking
8
Simple Approaches II
Discrepancy Checking
35
30
25
20
15
10
5
0
-5
-10
0
100
200
300
400
500
600
700
800
900
1000
9
Our Solution
Based on the following intuition, a pattern is
surprising if its frequency of occurrence is
greatly different from that which we expected,
given previous experience
This is a nice intuition, but useless unless we
can more formally define it, and calculate it
efficiently
10
Note that unlike all previous attempts to solve
this problem, our notion surprisingness of a
pattern is not tied exclusively to its shape.
Instead it depends on the difference between the
shapes expected frequency and its observed
frequency. For example consider the familiar
head and shoulders pattern shown below...
The existence of this pattern in a stock market
time series should not be consider surprising
since they are known to occur (even if only by
chance). However, if it occurred ten times this
year, as opposed to occurring an average of twice
a year in previous years, our measure of surprise
will flag the shape as being surprising. Cool
eh? The pattern would also be surprising if its
frequency of occurrence is less than expected.
Once again our definition would flag such
patterns.
11
We call our algorithm Tarzan!
Tarzan is not an acronym. It is a pun on the
fact that the heart of the algorithm relies
comparing two suffix trees, tree to
tree! Homer, I hate to be a fuddy-duddy, but
could you put on some pants?
12
We begin by defining some terms Professor Frink?
Definition 1 A time series pattern P, extracted
from database X is surprising relative to a
database R, if the probability of its occurrence
is greatly different to that expected by chance,
assuming that R and X are created by the same
underlying process.
13
Definition 1 A time series pattern P, extracted
from database X is surprising relative to a
database R, if the probability of occurrence is
greatly different to that expected by chance,
assuming that R and X are created by the same
underlying process.
But you can never know the probability of a
pattern you have never seen! And probability
isnt even defined for real valued time series!
14
We need to discretize the time series into
symbolic strings
UUFDUUFDDUUD
15
Ive prepared a little math background...

We use ? to denote a nonempty alphabet of
symbols.
A string over ? is an ordered sequence of
symbols from the alphabet.
Given a string x, the number of symbols in x
defines the length x of x. We assume x n.

Let us decompose a text x in uvw, i.e., x uvw
where u, v, and w are strings over ?.
Strings u, v, and w are substrings, or words, of
x.
u is called a prefix of x.
w is called a suffix of x.

We write xi 1 ? i ? x to indicate the ith
symbol in x.
We use xi,j as shorthand for the substring
xi xi1 xj where 1 ? i ? j ? n, with the
convention that xi,i xi.
Substrings in the form x1,j correspond to to
the prefixes of x.
Substrings in the form xi,n correspond to to
the suffixes of x.
We say that a string y has an occurrence at
position i of a text x if y1 xi , y2
xi1 , , ym xim-1 where m y.
For any substring y of x, we denote by fx(y) the
number of occurrences of y in x.

17
If x principalskinner ? is
a,c,e,i,k,l,n,p,r,s x is 16 skin is a
substring of x prin is a prefix of x ner is a
suffix of x If y in, then fx(y) 2 If y
pal, then fx(y) 1 principalskinner
18
We consider a string generated by a stationary
Markov chain of order M 1 on the finite
alphabet ?. Let x x1x2 xn be an
observation of the random process and y
y1y2 ym an arbitrary but fixed pattern
over ?with m lt n. The stationary Markov chain
is completely determined by its transition matrix
? (p(y1,M, c))y1,,yM,c??
where p(y1,M,c) P(Xi1 cXi-M1,i
y1,M) are called transition probabilities,
with y1,,yM,c?? and M i n-1. The vector
of the stationary probabilities ? of a stationary
Markov chain with transition matrix ? is defined
as the solution of ? ??. We now introduce the
random variable that describes the occurrences of
the word y. We define Zi, 1 i n-m1 to be 1
if y occurs in x starting at position i, 0
otherwise. We set so that Zy is the random
variable for the total number of occurrences
fx(y).
19
We can now estimate the probability of seeing a
pattern, even if we have never seen it before!
John Smith Cen Lee John Jones Mike Lee John
Smith Bill Chu Selina Chu Bill Smith David
Gu Peggy Gu John Lee Susan Chu
E(John Smith) 2/12 E(Selina Chu) 1/12 E(John
Chu) 0?
E(John Chu) (4/12) (3/12)
f(John) 4/12 f(Chu) 3/12
20
suffix_tree Preprocess (string r,
string x) let Tr Suffix_tree(r) let Tx
Suffix_tree (x) let Annotate_f(w)(Tr) An
notate _f(w)(Tx) visit Tx in breadth-first
traversal, for each node u do let w L(u),
mw if w occurs in Tr then let
Ê(w)afr(w) else find the largest 1 lt l
ltm-1 such that using the suffix tree
Tr if such l exists then let
else let let z(w) fx(w)- Ê(w) store
z(w) in the node u return Tx
21
(No Transcript)
22
Experimental Evaluation
Sensitive and Selective, just like me

We would like to demonstrate two features of our
proposed approach
Sensitivity (High True Positive Rate) The
algorithm can find truly surprising patterns in a
time series.
Selectivity (Low False Positive Rate) The
algorithm will not find spurious surprising
patterns in a time series

23
We compare our work to two obvious rivals

TSA-tree A wavelet based approach by Shahabi et
al.
The authors suggest an method to find both
trends and surprises in large time series
datasets. The authors achieve this using a
wavelet-based tree structure that can represent
the data at different scales.
They define surprise in time series as
sudden changes in the original time series
data, which are captured by local maximums of the
absolute values of (wavelet detail
coefficients).
IMM Immunology inspired approach by Dasgupta
et.al.
This work is inspired by the negative selection
mechanism of the immune system, which
discriminates between self and non-self.
In this case self is the model of the time
series learned from the reference dataset, and
non-self are any observed patterns in the new
dataset that do not conform to the model within
some tolerance.

24
Experiment 1
I created an anomaly by halving the period of the
sine wave in the green section here
Training data R
Test data X
We constructed a reference dataset R by creating
a sine wave with 800 datapoints and adding some
Gaussian noise. We then built a test dataset X
using the same parameters as the reference set,
however we also inserted an artificial anomaly by
halving the period of the sine wave in the green
section.
25
The training data is just a noisy sin wave
The test data has an anomaly, a subsection where
the period was halved
The IMM algorithm failed to find the anomaly
But Tarzan detects the anomaly!
So did the TSA Wavelet tree!
26
Experiment 2
We consider a dataset that contains the power
demand for a Dutch research facility for the
entire year of 1997. The data is sampled over 15
minute averages, and thus contains 35,040 points.
Demand for Power? Excellent!
2500
2000
1500
1000
500
0
200
400
600
800
1000
1200
1400
1600
1800
2000
The first 3 weeks of the power demand dataset.
Note the repeating pattern of a strong peak for
each of the five weekdays, followed by relatively
quite weekends
27
We used from Monday January 6th to Sunday March
23rd as reference data. This time period is
devoid of national holidays. We tested on the
remainder of the year. We will just show the 3
most surprising subsequences found by each
algorithm. For each of the 3 approaches we show
the entire week (beginning Monday) in which the 3
largest values of surprise fell. Both TSA-tree
and IMM returned sequences that appear to be
normal workweeks, however Tarzan returned 3
sequences that correspond to the weeks that
contain national holidays in the Netherlands. In
particular, from top to bottom, the week spanning
both December 25th and 26th and the weeks
containing Wednesday April 30th (Koninginnedag,
Queen's Day) and May 19th (Whit Monday).
Mmm.. anomalous..
28
Experiment 3
The previous experiments demonstrate the ability
of Tarzan to find surprising patterns, however we
also need to consider Tarzans selectivity. If
even a small fraction of patterns flagged by our
approach are false alarms, then, as we attempt to
scale to massive datasets, we can expect to be
overwhelmed by innumerable spurious surprising
patterns
Shut up Flanders!! In designing an experiment to
show selectivity we are faced with the problem of
finding a database guaranteed to be free of
surprising patterns. Because using a real data
set for this task would always be open to
subjective post-hoc explanations of results, we
will conduct the experiment on random walk data
Och aye! By definition, random walk data can
contain any possible pattern. In fact, as the
size of a random walk dataset goes to infinity,
we should expect to see every pattern repeated an
infinite number of times. We can exploit this
property to test our wee algorithm
29
If we train Tarzan on another short random walk
dataset, we should expect that the test data
would be found surprising, since the chance that
similar patterns exist in the short training
database are very small. However as we increase
the size of the training data, the surprisingness
of the test data should decrease, since it is
more likely that similar data was encountered. To
restate, the intuition is this, the more
experience our algorithm has seeing random walk
data, the less surprising our particular section
of random walk XRW should appear.
30
Future Work or Tarzans Revenge!
Although we see Tarzans ability to find
surprising patterns without user intervention as
a great advantage, we intend to investigate the
possibility on incorporating user feedback and
domain based constraints
We have concentrated solely on the intricacies of
finding the surprising patterns, without
addressing the many Meta questions that arise.
For example, the possible asymmetric costs of
false alarms and false dismissals, and the
actionability of discovered knowledge. We intend
to address these issues in future work
31
Motif Discovery

Winding
Dataset

(
The angular speed of reel 2
)

0
50
0
1000
150
0
2000
2500

Informally, motifs are reoccurring patterns
32
Motif Discovery
To find these 3 motifs would require about
6,250,000 calls to the Euclidean distance
function.
33
Why Find Motifs?
Mining association rules in time series
requires the discovery of motifs. These are
referred to as primitive shapes and frequent
patterns. Several time series classification
algorithms work by constructing typical
prototypes of each class. These prototypes may be
considered motifs. Many time series
anomaly/interestingness detection algorithms
essentially consist of modeling normal behavior
with a set of typical shapes (which we see as
motifs), and detecting future patterns that are
dissimilar to all typical shapes. In robotics,
Oates et al., have introduced a method to allow
an autonomous agent to generalize from a set of
qualitatively different experiences gleaned from
sensors. We see these experiences as
motifs. In medical data mining, Caraca-Valente
and Lopez-Chavarrias have introduced a method for
characterizing a physiotherapy patients recovery
based of the discovery of similar patterns. Once
again, we see these similar patterns as motifs.
34

T
Trivial

Matches
Space Shuttle
STS
-
57
Telemetry

C
(
Inertial
Sensor
)

0
100
200
3
00
400
500
600
70
0
800
900

100
0

Definition 1. Match Given a positive real number
R (called range) and a time series T containing a
subsequence C beginning at position p and a
subsequence M beginning at q, if D(C, M) ? R,
then M is called a matching subsequence of
C. Definition 2. Trivial Match Given a time
series T, containing a subsequence C beginning at
position p and a matching subsequence M beginning
at q, we say that M is a trivial match to C if
either p q or there does not exist a
subsequence M beginning at q such that D(C, M)
gt R, and either q lt qlt p or p lt qlt
q. Definition 3. K-Motif(n,R) Given a time
series T, a subsequence length n and a range R,
the most significant motif in T (hereafter called
the 1-Motif(n,R)) is the subsequence C1 that has
highest count of non-trivial matches (ties are
broken by choosing the motif whose matches have
the lower variance). The Kth most significant
motif in T (hereafter called the K-Motif(n,R) )
is the subsequence CK that has the highest count
of non-trivial matches, and satisfies D(CK, Ci) gt
2R, for all 1 ? i lt K.
35
OK, we can define motifs, but how do we find them?
The obvious brute force search algorithm is just
too slow Our algorithm is based on a hot idea
from bioinformatics, random projection J
Buhler and M Tompa. Finding motifs using random
projections. In RECOMB'01. 2001.
36
A simple worked example of our motif discovery
algorithm
The next 4 slides

T

(
m 1000
)
0

500

1000

C

1

a c b a

C

Assume that we have a time series T of length
1,000, and a motif of length 16, which occurs
twice, at time T1 and time T58.
1

S

a

c

b

a

1

b

c

a

b

2

a 3

a
,
b
,
c

n 16

w
4

a

c

c

a

58

b

c

c

c

985
37
A mask 1,2 was randomly chosen, so the values
in columns 1,2 were used to project matrix into
buckets.
Collisions are recorded by incrementing the
appropriate location in the collision matrix
38
Once again, collisions are recorded by
incrementing the appropriate location in the
collision matrix
A mask 2,4 was randomly chosen, so the values
in columns 2,4 were used to project matrix into
buckets.
39
We can calculate the expected values in the
matrix, assuming there are NO patterns
1

2
2
1

3

27
2

1
58
3
1
Suppose E(k,a,w,d,t) 2
2

2

3
1
0
2
1

98
5

1
2
58
98
5

40
A Simple Experiment
Lets imbed two motifs into a random walk time
series, and see if we can recover them

C

A

D

B

0
20
40
60
80
100
120
0
20
40
60
80
100
120
41
Planted Motifs
C

A

B
D

42
Real Motifs

0
20
40
60
80
100
120

0
20
40
60
80
100
120
43
Some Examples of Real Motifs

Astrophysics (
Photon Count)

250
350
450
550
650
0

0

0

0

0

44
How Fast can we find Motifs?
45
Future Work on Motif Detection

A more detailed theoretical analysis.
Significance testing.
Discovery of motifs in multidimensional time
series
Discovery of motifs under different distance
measures such as Dynamic Time Warping

46
Rule Finding in Time Series
Support 9.1 Confidence 68.1
Das, G., Lin, K., Mannila, H., Renganathan, G.
Smyth, P. (1998). Rule Discovery from Time
Series. In proceedings of the 4th Int'l
Conference on Knowledge Discovery and Data
Mining. New York, NY, Aug 27-31. pp 16-22.
47
Papers Based on Rule Discovery from Time Series

Mori, T. Uehara, K. (2001). Extraction of
Primitive Motion and Discovery of Association
Rules from Human Motion.
Cotofrei, P. Stoffel, K (2002). Classification
Rules Time Temporal Rules.
Fu, T. C., Chung, F. L., Ng, V. Luk, R.
(2001). Pattern Discovery from Stock Time Series
Using Self-Organizing Maps.
Harms, S. K., Deogun, J. Tadesse, T. (2002).
Discovering Sequential Association Rules with
Constraints and Time Lags in Multiple Sequences.
Hetland, M. L. Sætrom, P. (2002). Temporal
Rules Discovery Using Genetic Programming and
Specialized Hardware.
Jin, X., Lu, Y. Shi, C. (2002). Distribution
Discovery Local Analysis of Temporal Rules.
Yairi, T., Kato, Y. Hori, K. (2001). Fault
Detection by Mining Association Rules in
House-keeping Data.
Tino, P., Schittenkopf, C. Dorffner, G.
(2000). Temporal Pattern Recognition in Noisy
Non-stationary Time Series Based on Quantization
into Symbolic Streams.

and many more
48
All these people are fooling themselves!They
are not finding rules in time series, and it is
easy to prove this!
49
A Simple Experiment...
if stock rises then falls greatly, follow a
smaller rise, then we can expect to see within 20
time units, a pattern of rapid decrease followed
by a leveling out.
The punch line is
50
If the top data miners in the world can fool
themselves into finding patterns that dont
exist, what does that say about the field?
51
Conclusions Part I

We have reviewed the state of the art in classic
time series data mining (classification
clustering and indexing).
We have seen that the choice of representation
and similarity measure is critical to the success
of any data mining project.

52
Conclusions Part II

We have seen some cutting edge time series data
mining tasks (Novelty detection, Motif discovery,
rule finding).
In every case, we have seen initial promising
results, but further work remains to be done

53
Questions?
All datasets and code used in this talk can be
found at www.cs.ucr.edu/eamonn/TSDMA/index.html

Write a Comment

User Comments (0)