Sequential Patterns - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Sequential Patterns

Description:

Sequential Patterns. Process Mining. Current State of Research ... (a,b)(c)(a,b,d) a1, a2, a3 (3)(4,5)(8) contained in (7) ... stores the postfix ... – PowerPoint PPT presentation

Number of Views:103
Avg rating:3.0/5.0
Slides: 31
Provided by: edeg4
Category:

less

Transcript and Presenter's Notes

Title: Sequential Patterns


1
Sequential PatternsProcess Mining
  • Current State of Research
  • Edgar de Graaf
  • LIACS

2
Mining Sequential Patterns
  • Sequential Patterns
  • Sequence Databases
  • AprioriAll
  • PrefixSpan
  • Gap Constraints

3
Sequential Patterns
  • lt(a,b)(c)(a,b,d)gt
  • lt a1, a2, a3 gt
  • lt(3)(4,5)(8)gt contained in lt(7)(3,8)(9)(4,5,6)(8)gt
  • lt(3)(4,5)(8)gt not contained in lt(7)(3,8)(9)(4)(5,6
    )(8)gt

4
Sequential databases
The Database with sequences
5
Sequential databases
lt(3)(4,5)(8)gt
Support count 0
A Generated Candidate Pattern
6
Sequential databases
lt(3)(4,5)(8)gt
Support count 0
1
7
Sequential databases
Support count 1
lt(3)(4,5)(8)gt
Not Contained ? Not Counted
8
Sequential databases
Contained
Support count 1
2
3
4
5
Contained
Contained
IF Minimal Support 50 THEN lt(3)(4,5)(8)gt
frequent
Contained
Contained
9
Lifting order (1)
  • Notation by examples
  • ltA,B,Cgt, a ordered list of sets sequence
  • Every set A,B and C is unordered. E.g. A
    (x,y,z) (y,z,x) (z,y,x)
  • x,y,z is an extension we ignore the order when
    counting frequency

10
Lifting order (2)
  • lt(t1)(t2)(t3)(t4)gt and
  • lt(t1)(t3)(t2)(t4)gt frequent
  • ?
  • lt(t1)(t3,t2)(t4)gt is frequent
  • Says t3 and t2 occurs frequent in-between t1 and
    t4 in either order

11
Lifting Order (3)
  • lt(t1)(t2)(t3)(t4)gt and
  • lt(t1)(t3)(t2)(t4)gt infrequent
  • suppose (t1)t3,t2(t4) frequent
  • Says often t3 and t2 occur in-between t1 and t4

12
Existing Algorithms
  • AprioriAll the first algorithm based on the
    anti-monotone principles
  • PrefixSpan currently the fastest algorithm
    around, it uses projected databases

13
AprioriAll (1)
  • AprioriAll(DB, min_sup)
  • L1 frequent sequences size 1
  • k 2
  • while(Lk-1 is not empty)
  • Ck candidateGeneration(Lk-1,k)
  • Ck candidatePruning(Ck, k)
  • Lk supportBasedPruning(Ck)
  • k

14
AprioriAll (2)
  • candidateGeneration(Lk-1, k)
  • Ck ø
  • for each a in Lk-1
  • for each b in Lk-1
  • if(all n, 1 n k-2 an bn)
  • toevoegen aan Ck de sequences
  • a1ak-2, ak-1, bk-1 en
  • a1ak-2, bk-1, ak-1

15
PrefixSpan (1)
  • Assume that the prefix lt(a,b)(c)gt
  • Scan de projected database to find every frequent
    item x such that
  • lt(a,b)(c,x)gt is frequent or
  • lt(a,b)(c)(x)gt is frequent
  • Append the x to the prefix and output the pattern
  • Now call recursively e.g. PrefixSpan(lt(a,b)(c,x)gt
    , newProjDB)

16
PrefixSpan (2)
  • A projected DB only stores the postfix
  • E.g. if prefix lt(a,b)gt then we store lt(a,b,x)gt
    as lt( _, x)gt
  • New projected DB Old projected DB sequences
    without prefix

17
PrefixSpan (3)
  • Faster than AprioriAll
  • No non-existing candidates
  • Testing on a shrinking projected DB

18
Gap Constraint
  • Simple idea between sequence-item-sets a maximal
    distance
  • lt(a)(c)(d)(e)gt, e.g. pattern lt(a)(e)gt and gap
    1 then this sequence is not counted

19
Process Mining
  • What is process mining?
  • Using D/F tables and graphs
  • Genetic Algorithms
  • Problem areas
  • Using sequential patterns

20
What is process mining? (1)
  • The ordering of events is known e.g. lt(task
    A)(task B)(task C)gt
  • Process mining constructs a petri net

pay
ready
claim
register
to_be_evaluated
send_letter
Source Workflow Management by W. van der Aalst
and K. van Hee. (1997)
21
What is process mining? (2)
  • Usability of process mining
  • Given the audit trails, what is the workflow
    network?
  • Mined workflow network original design? (Delta
    Analysis)
  • Mined workflow network better than the original
    design? (Performance Analysis)

22
Using D/F tables and graphs (1)
  • For every task a D/F table
  • Intuition if A is often followed by B then the
    probability of A causing B increases

23
Using D/F tables and graphs (2)
  • A D/F graph is constructed
  • IF((A?B N) AND (A gt B s) AND
  • (B lt A s) THEN connection A to B
  • More complicated rules deal with recursion and
    short loops

24
Using D/F tables and graphs (3)
  • D/F Graph example

25
Using D/F tables and graphs (4)
  • AND/OR-Splits
  • OR if neither C gt B or B gt C is higher
  • than the threshold
  • AND if both are higher than threshold

B
A
C
26
Genetic Algorithms (1)
  • Create a initial population of workflows
  • Calculate their fitness using audit trails
  • Create a child
  • Mutate the child
  • Repeat 3 to 4 to create the new population
  • Go to 2

27
Genetic Algorithms (2)
  • Advantages
  • Can deal with duplicate tasks and non-free
    choice.
  • Disadvantages
  • The structure of the chromosome
  • How do we measure fitness?
  • How do we do cross-over and mutation?

28
Problem Areas (1)
  • Hidden tasks
  • Duplicate tasks when tasks have the same name

B
C
29
Problem Areas (2)
  • Mining non-free-choice

A
D
C
B
E
30
Problem Areas (3)
  • Mining Loops
  • ABCDBCD

D
A
B
C
31
Problem Areas (4)
  • Delta analysis how do we compare two models?
  • Other problems time, dealing with noise and
    incompleteness.

32
Using sequential patterns
  • Mining loops?
  • Fitness measure in a GA?
  • Use in delta analysis?
  • Generate the important frequent subsequences to
    help the designer

33
Further research in sequences
  • How about gaps between items in different item
    sets?
  • What type of frequent subsequences to use in
    fitness?
  • Lifting order, is it useful in workflow
    generation?
  • Further research of lifting order

34
The End
  • Thank you for your attention
  • Edgar de Graaf
  • edegraaf_at_liacs.nl
Write a Comment
User Comments (0)
About PowerShow.com