Mining Sequential Patterns - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Mining Sequential Patterns

Description:

Requires set of attributes deciding each tuple's class. Call this the class ... Joe, 10000, {knife, orange juice, beer} Sarah, 10001, {knife, milk, beer} ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 30
Provided by: jeremy58
Learn more at: http://www.cs.uvm.edu
Category:

less

Transcript and Presenter's Notes

Title: Mining Sequential Patterns


1
Mining Sequential Patterns
  • Authors Rakesh Agrawal and Ramakrishnan Srikant.
  • Presenter Jeremy Dalmer.

2
Introduction
  • What is a sequential pattern?
  • Answers to final exam questions.

3
What is a sequential pattern?
  • Requires set of attributes deciding each tuples
    class. Call this the class set.
  • Exampleitems-purchasedclass set customer-id
  • Tuples are sorted into classes.

4
  • Requires set of attributes used for ordering
    tuples. Call this the order set.
  • Exampleitems-purchasedorder set transaction-time
  • Tuples within classes are sorted according to an
    order defined over order set codomain.

5
  • Specifying a value for each attribute in (class
    set U order set) must be specifying at most one
    tuple. (class set U order set forms primary
    key.)
  • Support and confidence measure classes now, not
    tuples.

6
  • Exampleitems-purchasedorder set transaction-timecl
    ass set customer-idorange juice, beerbeerink, Band-Aids

7
  • Classes Joe, Sarah

8
  • Ordering within classes according to order set

9
  • A large sequence (support 100) is
  • (intuitive)
  • I, , ,
  • , , and
  • are also large sequences.

10
  • Exampleorder set
    year, monthclass set goldfish, lobsterlobstertiger

11
  • Classes

12
  • Ordering within classes (class!) according to
  • order set

13
  • Intuition suggests large sequence
  • but this is not considered any larger than
  • and
  • because there is only one class.

14
  • One more point about the previous example.
  • Having recorded
  • monkey as a large sequence, why
  • record subsequences?
  • and , though
  • large sequences, are not informative.
  • Maximal sequence.

15
final exam questions
  • Root of each algorithm(1) Group into classes
    and order.(2) Find all large itemsets.(3) For
    each tuple, drop everything except a record of
    the large itemsets contained in that tuple.(4)
    Find all large sequences (of large itemsets).(5)
    Discard large sequences not maximal.

16
  • Consider a previous example.

17
  • Large itemsets (min-sup 100) knife, beer,
    knife, beer, Band-Aids.
  • Set knife to 1, beer to 2, knife, beer to
    3, and Band-Aids to 4.
  • Transform tuples to((1 2 3) (1 4))((1 2 3) (4))

18
  • Large sequences (actually with 100 support)
    are((1)), ((2)), ((3)), ((4)),((1) (4)), ((2)
    (4)), and ((3) (4))
  • But, since ((3) (4)) implies all the others, only
    ((3) (4)) is a maximal large sequence.

19
  • Potentially large vs. Definitely large (candidate
    sequences vs. large sequences).
  • Potentially large no counting, but many.
  • Definitely large counting, but few.
  • Algorithms similar to Apriori, but with sequences
    of large itemsets instead of large sets of items.

20
  • AprioriAll Counts every large sequence,
    including those not maximal.
  • AprioriSome Generates every candidate sequence,
    but skips counting some large sequences (Forward
    Phase). Then, discards candidates not maximal
    and counts remaining large sequences (Backward
    Phase).

21
  • AprioriAll scans the database more, taking more
    time.
  • AprioriSome keeps more potentially large
    sequences in memory, degenerating to AprioriAll
    when requests for memory fail.

22
  • There were two types of algorithms presented to
    find sequential patterns, CountSome and CountAll.
    What was the main difference between the two
    algorithms?

23
  • CountAll (AprioriAll) is careful with respect to
    minimum support, careless with respect to
    maximality.CountSome (AprioriSome) is careful
    with respect to maximality, careless with respect
    to minimum support.

24
  • What was the greatest hardware concern regarding
    the algorithms contained in the paper?

25
  • Main memory capacity. When there is little main
    memory, or many potentially large sequences, the
    benefits of AprioriSome vanish.

26
  • How did the two best sequence mining algorithms
    (AprioriAll and AprioriSome) perform compared
    with each other? Take into consideration memory,
    speed, and usefulness of the data.

27
  • MemoryIn terms of main memory usage, AprioriAll
    is better.In terms of secondary storage access,
    AprioriSome is better.

28
  • SpeedWith sufficient memory, as minimum support
    decreases the difference between AprioriAll and
    AprioriSome increases. (AprioriSome is
    better.)More large sequences not maximal are
    generated.

29
  • Usefulness of the dataFor the problem of
    finding maximal large sequences, the answer is
    Precisely the same..However, AprioriAll finds
    all large sequences, while AprioriSome discards
    some large sequences that arent maximal.
    AprioriAll, then, generates more useful
    data.The user may want to know the ratio of the
    number of people who bought the first k 1 items
    in a sequence to the number of people who bought
    the first k items.
Write a Comment
User Comments (0)
About PowerShow.com