Sequential Patterns - PowerPoint PPT Presentation

1 / 30

About This Presentation

Title:

Sequential Patterns

Description:

Sequential Patterns. Process Mining. Current State of Research ... (a,b)(c)(a,b,d) a1, a2, a3 (3)(4,5)(8) contained in (7) ... stores the postfix ... – PowerPoint PPT presentation

Number of Views:103

Avg rating:3.0/5.0

Slides: 31

Provided by: edeg4

Category:

more less

Transcript and Presenter's Notes

Title: Sequential Patterns

1
Sequential PatternsProcess Mining

Current State of Research
Edgar de Graaf
LIACS

2
Mining Sequential Patterns

Sequential Patterns
Sequence Databases
AprioriAll
PrefixSpan
Gap Constraints

3
Sequential Patterns

lt(a,b)(c)(a,b,d)gt
lt a1, a2, a3 gt
lt(3)(4,5)(8)gt contained in lt(7)(3,8)(9)(4,5,6)(8)gt
lt(3)(4,5)(8)gt not contained in lt(7)(3,8)(9)(4)(5,6
)(8)gt

4
Sequential databases
The Database with sequences
5
Sequential databases
lt(3)(4,5)(8)gt
Support count 0
A Generated Candidate Pattern
6
Sequential databases
lt(3)(4,5)(8)gt
Support count 0
1
7
Sequential databases
Support count 1
lt(3)(4,5)(8)gt
Not Contained ? Not Counted
8
Sequential databases
Contained
Support count 1
2
3
4
5
Contained
Contained
IF Minimal Support 50 THEN lt(3)(4,5)(8)gt
frequent
Contained
Contained
9
Lifting order (1)

Notation by examples
ltA,B,Cgt, a ordered list of sets sequence
Every set A,B and C is unordered. E.g. A
(x,y,z) (y,z,x) (z,y,x)
x,y,z is an extension we ignore the order when
counting frequency

10
Lifting order (2)

lt(t1)(t2)(t3)(t4)gt and
lt(t1)(t3)(t2)(t4)gt frequent
?
lt(t1)(t3,t2)(t4)gt is frequent
Says t3 and t2 occurs frequent in-between t1 and
t4 in either order

11
Lifting Order (3)

lt(t1)(t2)(t3)(t4)gt and
lt(t1)(t3)(t2)(t4)gt infrequent
suppose (t1)t3,t2(t4) frequent
Says often t3 and t2 occur in-between t1 and t4

12
Existing Algorithms

AprioriAll the first algorithm based on the
anti-monotone principles
PrefixSpan currently the fastest algorithm
around, it uses projected databases

13
AprioriAll (1)

AprioriAll(DB, min_sup)
L1 frequent sequences size 1
k 2
while(Lk-1 is not empty)
Ck candidateGeneration(Lk-1,k)
Ck candidatePruning(Ck, k)
Lk supportBasedPruning(Ck)
k

14
AprioriAll (2)

candidateGeneration(Lk-1, k)
Ck ø
for each a in Lk-1
for each b in Lk-1
if(all n, 1 n k-2 an bn)
toevoegen aan Ck de sequences
a1ak-2, ak-1, bk-1 en
a1ak-2, bk-1, ak-1

15
PrefixSpan (1)

Assume that the prefix lt(a,b)(c)gt
Scan de projected database to find every frequent
item x such that
lt(a,b)(c,x)gt is frequent or
lt(a,b)(c)(x)gt is frequent
Append the x to the prefix and output the pattern
Now call recursively e.g. PrefixSpan(lt(a,b)(c,x)gt
, newProjDB)

16
PrefixSpan (2)

A projected DB only stores the postfix
E.g. if prefix lt(a,b)gt then we store lt(a,b,x)gt
as lt( _, x)gt
New projected DB Old projected DB sequences
without prefix

17
PrefixSpan (3)

Faster than AprioriAll
No non-existing candidates
Testing on a shrinking projected DB

18
Gap Constraint

Simple idea between sequence-item-sets a maximal
distance
lt(a)(c)(d)(e)gt, e.g. pattern lt(a)(e)gt and gap
1 then this sequence is not counted

19
Process Mining

What is process mining?
Using D/F tables and graphs
Genetic Algorithms
Problem areas
Using sequential patterns

20
What is process mining? (1)

The ordering of events is known e.g. lt(task
A)(task B)(task C)gt
Process mining constructs a petri net

pay
ready
claim
register
to_be_evaluated
send_letter
Source Workflow Management by W. van der Aalst
and K. van Hee. (1997)
21
What is process mining? (2)