Discovering Calendarbased Temporal Association Rules - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Discovering Calendarbased Temporal Association Rules

Description:

Calendar Schema ... E.g., given the calendar schema (year, month, Thursday) ... Given a calendar schema R, a set T of timestamped transactions and a match ratio ... – PowerPoint PPT presentation

Number of Views:174
Avg rating:3.0/5.0
Slides: 33
Provided by: Sin3
Category:

less

Transcript and Presenter's Notes

Title: Discovering Calendarbased Temporal Association Rules


1
Discovering Calendar-based Temporal Association
Rules
TIME 01, 8th International Symposium on Temporal
Representation and Reasoning
  • SHOU Yu Tao
  • May. 21st, 2003

2
Outline of the Presentation
  • Background
  • Temporal Association Rule Mining w.r.t Precise
    Match
  • Temporal Association Rule Mining w.r.t Fuzzy
    Match
  • Experiments
  • Conclusions
  • References
  • Q A

3
Background
  • Temporal Association RuleAssociation rules
    along with their temporal intervals
  • E.g. turkey? pumpkin pie is a temporal
    association rule along with the temporal interval
    within the week before thanksgiving.
  • Why interested in temporal association rule
    mining?
  • We may discover different association rules
    regarding different time intervals. Some
    association rules may hold during some intervals
    but not during others.
  • ? this may lead to useful information.

4
Calendar Schema
  • Relational schema R (fnDn, fn-1Dn-1, .
    f1D1) together with a valid constraint
  • fi a calendar unit name like year, month, etc.
  • Di a finite subset of the positive integers.
  • A constraint valid a boolean function on DnD1
    specifying which combinations of the values in
    DnD1 are valid.

5
Calendar Schema
  • E.g.a calendar schema (year1995,1996,..2002,
    month1,2,..,12, day1,2,..,31) with the
    constraint valid that evaluates to True
    only if the combination gives a valid date.
  • e.g., is not valid
  • Simply stated, a calendar schema is determined by
    a hierarchy of calendar concept
  • e.g. (year, month, day)

6
Calendar Pattern
  • Defines a set of time intervals based on the
    calendar schema
  • e.g is a calendar pattern based on
    the calendar schema
    corresponding to the time intervals consisting of
    all the 16th days of all months in year 2000
  • Time intervals or periodic cycles can be easily
    described by calendar patterns with appropriate
    calendar schemas.
  • E.g. the periodic cycle every seven days can
    be expressed by a calendar pattern , where
    1
    day) depending on which day the cycle starts

7
Problem Formulation
  • Given a calendar schema R, a set T of timestamped
    transactions and a match ratio (optional), we
    want to discover all interesting association
    rules w.r.t.
  • Precise Match
  • Fuzzy Match
  • Assumption we are not interested in the
    association rules that only hold during basic
    time intervals. Indeed, such rules do not reveal
    much interesting information in terms of time.
  • E.g. if the calendar schema is (year, month,
    day), we are not going to find the association
    rules hold during each single day.
  • -- Basic time interval a calendar pattern with
    no wild-card symbol

8
Problem Formulation
  • Temporal Association Rule w.r.t. Precise Match
  • Given a calendar schema R and a set T of
    timestamped transactions, a temporal association
    rule (r,e) hold if and only if the association
    rule r holds for each basic time interval t
    covered by star calendar pattern e.
  • -- Star calendar pattern a calendar pattern with
    at least one wild-card symbol
  • E.g., given the calendar schema (year, month,
    Thursday), we may have a temporal association
    rule (turkey?pumpkin pie, ) that holds
    w.r.t precise match. The rule means that the
    association rule (turkey?pumpkin pie) holds on
    all Thanksgiving days, which is the 4th Thursday
    in November of every year.

9
Problem Formulation
  • Temporal Association Rule w.r.t. Fuzzy Match
  • Given a calendar schema R, a set T of timestamped
    transactions and a match ratio m, a temporal
    association rule (r,e) hold if and only if the
    association rule r holds for at least 100m of
    basic time interval t covered by star calendar
    pattern e.
  • E.g., given the calendar schema (year, month,
    Thursday) and match ratio m0.8, we may have a
    temporal association rule (turkey?pumpkin pie,
    ) that holds w.r.t fuzzy match. This
    means that the association rule (turkey?pumpkin
    pie) holds on at least 80 of Thanksgiving days.

10
Temporal Association Rule Mining
  • Two sub-problems
  • Finding all large itemsets for all the star
    calendar patterns on the given calendar schema
    (based on Apriori AS94) crux of the discovery
    of temporal association rules.
  • Generating temporal association rules using the
    large itemsets and their calendar patterns the
    same as traditional association rule generation
    approach AS94.

11
Outline of the Algorithm (for both precise and
fuzzy match)
critical step! Because fewer candidate large
itemsets, less time for phase II needed.
The same as traditional approach
The same here
12
Phase III for Precise Match
  • After the basic time interval e0 is processed in
    pass k, the large k-itemsets for all the calendar
    patterns e that covers e0 are updated as follows,
  • If Lk(e) is updated for the first time
    (i.g.,Lk(e) NULL), let Lk(e)Lk(e0)
  • Else Lk(e) Lk(e) Lk(e0)
  • E.g.
  • given calendar patterns (1995, , 1) and (,2,)
    and L2(1995, , 1) AB, DE and L2(,2,)
    AB, BC, DE. suppose after processing basic time
    interval (1995,2,1), we get L2 AC, BC, DE
  • ? L2(1995, , 1) DE
  • ? L2(,2,) BC, DE
  • So after all the basic time intervals are
    processed, the set of large k-itemsets for each
    calendar pattern could be discovered.

13
Phase III for Fuzzy Match
  • Associate a counter c_update with each candidate
    for each star calendar pattern.
  • Counters are initially set to 1
  • When Lk(e0) is used to update Lk(e) in phase III,
    the counters of the itemsets in Lk(e) that are
    also in Lk(e0) are increment by 1
  • Suppose there are totally N basic time intervals
    covered by e and this is the nth update to Lk(e),
    an itemset cannot be large for e if its counter
    c_update does not satisfy c_update (N-n) mN

14
Phase III for Fuzzy Match
  • Example
  • Calendar schema R (week, day)
  • fuzzy match ratio m 0.8
  • Consider calendar pattern , suppose there
    are only 5 basic time intervals covered. (N5)
  • This is the 3rd time that L2() is updated
    (n3)
  • So we only keep the itemsets with c_update mN
    (N-n) 2

15
Candidate Generation (Phase I)
  • Direct-Apriori A naïve approach to generate
    candidate itemsets is to treat each basic time
    interval individually and directly apply
    Aprioris candidate generation approach.
  • For both precise match and fuzzy match

16
Candidate Generation for Precise MatchTemporal
AprioriGen
  • Since we are not interested in the large itemsets
    for basic time intervals, if a Ck(e0) cannot be
    large for any of the star calendar patterns that
    cover the basic time interval e0, simply ignore
    it.
  • So, we can generate the candidate Ck (k1) as
    follows

17
Candidate Generation for Precise MatchTemporal
AprioriGen
  • Example
  • Consider the calendar schema R
    (week1,..,5, day1,..,7). Suppose we already
    have L2()AB,AC,AD,AE,BC,BD,CD,CE
    L2() AB,AC,AD,BC,BD,CE
    L2()AB,AC,AD,BD,CD.
  • By using temporal aprioriGen C3()ABC,ABD
    C3()ABD,ACD
  • C3()C3() U C3()ABC,ABD,ACD
  • B y using Direct-Apriori,
  • C3() ABC,ABD,ACD,ACE,BCD

18
Candidate Generation for Precise
MatchHorizontal Pruning
  • If an itemset l in Ck(e0) does not appear in any
    of the tentative Lk(e1), where e1 is a 1-star
    pattern that covers e0, then l cannot be large
    for any star pattern e that covers e0.
    Therefore, we drop l from Ck(e0)

19
Candidate Generation for Precise
MatchHorizontal Pruning
  • Example
  • suppose when the
    basic time interval is being processed, we
    already have
  • L3()ABD
  • L3()ABD,ACD.
  • we get C3() ABC,ABD,ACD after using
    temporal aprioriGen, we can further prune it by
  • C3()C3() (L3() U L3())
  • ABD, ACD

20
Candidate Generation for Fuzzy Match Temporal
AprioriGen
Temporal AprioriGen for precise match cannot be
directly applied to solve the fuzzy match
problem, because an itemset may be large for a
star calendar e even if it is not large for any
1-star pattern covered by e.
For example Consider a schema R (week, day)
and fuzzy ratio m 0.8. We can see and
is large and is not large
21
Candidate Generation for Fuzzy Match Temporal
AprioriGen
  • Change the temporal aprioriGen to apply to fuzzy
    match as follows,
  • Change blue underline part to Lk-1(e) when memory
    is the critical resource.

22
Candidate Generation for Fuzzy Match Temporal
AprioriGen
  • Example
  • Suppose we already have
  • L2() AB,AC,AD,AE,BD,CD,CE
  • L2() AB,AC,AD,BC,BD,CE
  • L2() AB,AC,AD,BD,CD
  • L2() AB,AD,BD,CD,AC,AE
  • LT L2() L2() AB,AC,AD,BD,CE
  • C3() aprioriGen(LT) ABD
  • Similarly, we can get C3()ABD,ACD and
    C3()ABD,ACE
  • ?C3() C3() U C3() U C3()
    ABD,ACD,ACE

23
Candidate Generation for Fuzzy MatchHorizontal
Pruning
  • The pruning idea is to discard the candidate
    itemsets that cannot be large for calendar
    pattern e even if they are large for basic time
    interval e0.

24
Candidate Generation for Fuzzy MatchHorizontal
Pruning
  • Example
  • Suppose we already have
  • C3() ABD,ACD,ACE
  • L3(), L3(), L3() have been
    updated once.
  • C3()ABD,ABE C3()ABD,ACDC3()ABD
  • then C3() can be pruned as
  • C3() C3() (C3()UC3()UC3(,))
  • ABD,ACD

25
Experiments
  • Real Data set
  • Data file consists of homepage request records,
    each of which contains attribute values
    describing the request and the person who sent
    the request.
  • Data file records are from Jan 30 to Mar 31,2000
  • Calendar schema used R week, day, timeofday,
    where timeofday contains
    (0am-8am), daytime (8am-4pm), evening (4pm-12pm)
  • Data set contains 777,480 transactions, 23.4
    items per transaction on average.

26
Experiments
  • Real data set result

27
Experiments
  • Synthetic Data Set
  • Extend the data generator propose in AS94 to
    incorporate temporal features.

28
Experiments
  • Synthetic Data Set Result

29
Conclusions
  • Develops a new representation mechanism for
    temporal association rules on the basis of
    calendars and identify two classes of interesting
    temporal association rules w.r.t. precise match
    and fuzzy match.
  • The representation requires less prior knowledge
    and resulting time intervals are easier to
    understand
  • Extend the algorithm Apriori and develop two
    optimization techniques to discover both classes
    of temporal association rules
  • Experiments show that the optimization techniques
    are effective

30
Possible Future Works
  • It requires for a calendar schema (fn,fn-1,,f1),
    each calendar unit of fi is uniquely contained in
    a unit of fi1, where 0
  • E.g., (year, month, week) is NOT allowed because
    a week may not be contained in a unique month
  • Consider temporal patterns in other data mining
    problems such as clustering, etc.

31
References
  • Y. Li, P. Ning, X. S. Wang, and S. Jajodia.
    Discovering calendar-based temporal association
    rules. In the Eighth International Symposium on
    Temporal Representation and Reasoning (TIME 01)
  • AS94 R. Agrawal and R. Srikant. Fast algorithms
    for mining association rules in large databases.
    VLDB 94
  • S. Ramaswamy, S. Mahajan and A. Silberschatz. On
    the discovery of interesting patterns in
    association rules. VLDB98

32
Questions and Answers
  • Any Questions?

?
Write a Comment
User Comments (0)
About PowerShow.com