Association Rules Mining with SQL - PowerPoint PPT Presentation

About This Presentation
Title:

Association Rules Mining with SQL

Description:

exec sql select allocSpace() into :blob from onerecord; ... Ck is converted into a BLOB and passed as an argument to the table function. ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 72
Provided by: kirstenru
Category:

less

Transcript and Presenter's Notes

Title: Association Rules Mining with SQL


1
Association Rules Mining with SQL
  • Kirsten Nelson
  • Deepen Manek
  • November 24, 2003

2
Organization of Presentation
  • Overview Data Mining and RDBMS
  • Loosely-coupled data and programs
  • Tightly-coupled data and programs
  • Architectural approaches
  • Methods of writing efficient SQL
  • Candidate generation, pruning, support counting
  • K-way join, SubQuery, GatherJoin, Vertical,
    Hybrid
  • Integrating taxonomies
  • Mining sequential patterns

3
Early data mining applications
  • Most early mining systems were developed largely
    on file systems, with specialized data structures
    and buffer management strategies devised for each
  • All data was read into memory before beginning
    computation
  • This limits the amount of data that can be mined

4
Advantage of SQL and RDBMS
  • Make use of database indexing and query
    processing capabilities
  • More than a decade spent on making these systems
    robust, portable, scalable, and concurrent
  • Exploit underlying SQL parallelization
  • For long-running algorithms, use checkpointing
    and space management

5
Organization of Presentation
  • Overview Data Mining and RDBMS
  • Loosely-coupled data and programs
  • Tightly-coupled data and programs
  • Architectural approaches
  • Methods of writing efficient SQL
  • Candidate generation, pruning, support counting
  • K-way join, SubQuery, GatherJoin, Vertical,
    Hybrid
  • Integrating taxonomies
  • Mining sequential patterns

6
Use of Database in Data Mining
  • Loose coupling of application and data
  • How would you write an Apriori program?
  • Use SQL statements in an application
  • Use a cursor interface to read through records
    sequentially for each pass
  • Still two major performance problems
  • Copying of record from database to memory
  • Process context switching for each record
    retrieved

7
Organization of Presentation
  • Overview Data Mining and RDBMS
  • Loosely-coupled data and programs
  • Tightly-coupled data and programs
  • Architectural approaches
  • Methods of writing efficient SQL
  • Candidate generation, pruning, support counting
  • K-way join, SubQuery, GatherJoin, Vertical,
    Hybrid
  • Integrating taxonomies
  • Mining sequential patterns

8
Tightly-coupled applications
  • Push computations into the database system to
    avoid performance degradation
  • Take advantage of user-defined functions (UDFs)
  • Does not require changes to database software
  • Two types of UDFs we will use
  • Ones that are executed only a few times,
    regardless of the number of rows
  • Ones that are executed once for each selected row

9
Tight-coupling using UDFs
  • Procedure TightlyCoupledApriori()
  • begin
  • exec sql connect to database
  • exec sql select allocSpace() into blob from
    onerecord
  • exec sql select from sales where GenL1(blob,
    TID, ITEMID) 1
  • notDone true

10
Tight-coupling using UDFs
  • while notDone do
  • exec sql select aprioriGen(blob)
  • into blob from onerecord
  • exec sql select
  • from sales
  • where itemCount(blob, TID,
  • ITEMID)1
  • exec sql select GenLk(blob) into notDone from
    onerecord

11
Tight-coupling using UDFs
  • exec sql select getResult(blob) into resultBlob
    from onerecord
  • exec sql select deallocSpace(blob) from
    onerecord
  • compute Answer using resultBlob
  • end

12
Organization of Presentation
  • Overview Data Mining and RDBMS
  • Loosely-coupled data and programs
  • Tightly-coupled data and programs
  • Architectural approaches
  • Methods of writing efficient SQL
  • Candidate generation, pruning, support counting
  • K-way join, SubQuery, GatherJoin, Vertical,
    Hybrid
  • Integrating taxonomies
  • Mining sequential patterns

13
Methodology
  • Comparison done with Association Rules against
    IBM DB2
  • Only consider generation of frequent itemsets
    using Apriori algorithm
  • Five alternatives considered
  • Loose-coupling through SQL cursor interface as
    described earlier
  • UDF tight-coupling as described earlier
  • Stored-procedure to encapsulate mining algorithm
  • Cache-mine caching data and mining on the fly
  • SQL implementations to force processing in the
    database
  • Consider two classes of implementations
  • SQL-92 four different implementations
  • SQL-OR (with object relational extensions) six
    implementations

14
Architectural Options
  • Stored procedure
  • Apriori algorithm encapsulated as a stored
    procedure
  • Implication runs in the same address space as
    the DBMS
  • Mined results stored back into the DBMS.
  • Cache-mine
  • Variation of stored-procedure
  • Read entire data once from DBMS, temporarily
    cache data in a lookaside buffer on a local disk
  • Cached data is discarded when execution completes
  • Disadvantage requires additional disk space for
    caching
  • Use Intelligent Miners space option

15
Organization of Presentation
  • Overview Data Mining and RDBMS
  • Loosely-coupled data and programs
  • Tightly-coupled data and programs
  • Architectural approaches
  • Methods of writing efficient SQL
  • Candidate generation, pruning, support counting
  • K-way join, SubQuery, GatherJoin, Vertical,
    Hybrid
  • Integrating taxonomies
  • Mining sequential patterns

16
Terminology
  • Use the following terminology
  • T table of items
  • tid,item pairs
  • Data is normally sorted by transaction id
  • Ck candidate k-itemsets
  • Obtained from joining and pruning frequent
    itemsets from previous iteration
  • Fk frequent items sets of length k
  • Obtained from Ck and T

17
Candidate Generation in SQL join step
  • Generate Ck from Fk-1 by joining Fk-1 with itself
  • insert into Ck select I1.item1,,I1.itemk-1,I2.ite
    mk-1
  • from Fk-1 I1,Fk-1 I2
  • where I1.item1 I2.item1 and
  • I1.itemk-2 I2.itemk-2 and
  • I1.itemk-1 lt I2.itemk-1

18
Candidate Generation Example
  • F3 is 1,2,3,1,2,4,1,3,4,1,3,5,2,3,4
  • C4 is 1,2,3,4,1,3,4,5

Table F3 (I1)
Table F3 (I2)
item1 item2 item3
1 2 3
1 2 4
1 3 4
1 3 5
2 3 4
item1 item2 item3
1 2 3
1 2 4
1 3 4
1 3 5
2 3 4
19
Pruning
  • Modify candidate generation algorithm to ensure
    all k subsets of Ck of length (k-1) are in Fk-1
  • Do a k-way join, skipping itemn-2 when joining
    with the nth table (2ltnk)
  • Create primary index (item1, , itemk-1) on Fk-1
    to efficiently process k-way join
  • For k4, this becomes
  • insert into C4 select I1.item1, I1.item2,
    I1.item3,I2.item3 from F3 I1,F3 I2,
  • F3 I3, F3 I4 where I1.item1 I2.item1 and
    I1.item3 lt I2.item3 and
  • I1.item2 I3.item1 and I1.item3 I3.item2 and
    I2.item3 I3.item3 and
  • I1.item1 I4.item1 and I1.item3 I4.item2 and
    I2.item3 I4.item3

20
Pruning Example
  • Evaluate join with I3 using previous example
  • C4 is 1,2,3,4

Table F3 (I1)
Table F3 (I2)
Table F3 (I3)
item1 item2 item3
1 2 3
1 2 4
1 3 4
1 3 5
2 3 4
item1 item2 item3
1 2 3
1 2 4
1 3 4
1 3 5
2 3 4
item1 item2 item3
1 2 3
1 2 4
1 3 4
1 3 5
2 3 4
21
Support counting using SQL
  • Two different approaches
  • Use the SQL-92 standard
  • Use standard SQL syntax such as joins and
    subqueries to find support of itemsets
  • Use object-relational extensions of SQL (SQL-OR)
  • User Defined Functions (UDFs) table functions
  • Binary Large Objects (BLOBs)

22
Support Counting using SQL-92
  • 4 different methods, two of which detailed in the
    papers
  • K-way Joins
  • SubQuery
  • Other methods not discussed because of
    unacceptable performance
  • 3-way join
  • 2 Group-Bys

23
SQL-92 K-way join
  • Obtain Fk by joining Ck with table T of
    (tid,item)
  • Perform group by on the itemset
  • insert into Fk select item1,,itemk,count()
  • from Ck, T t1, , T tk,
  • where t1.item Ck.item1, , and
  • tk.item Ck.itemk and
  • t1.tid t2.tid and
  • tk-1.tid tk.tid
  • group by item1,,itemk
  • having count() gt minsup

24
K-way join example
  • C3B,C,E and minimum support required is 2
  • Insert into F3 B,C,E,2

25
K-way join Pass-2 optimization
  • When calculating C2, no pruning is required after
    we join F1 with itself
  • Dont calculate and materialize C2- replace C2 in
    2-way join algorithm with join of F1 with itself
  • insert into F2 select I1.item1, I2.item1,count()
  • from F1 I1, F1 I2, T t1, T t2
  • where I1.item1 lt I2.item1 and
  • t1.item I1.item1 and t2.item I2.item1 and
  • t1.tid t2.tid
  • group by I1.item1,I2.item1
  • having count() gt minsup

26
SQL-92 SubQuery based
  • Split support counting into cascade of k
    subqueries
  • nth subquery Qn finds all tids that match the
    distinct itemsets formed by the first n items of
    Ck
  • insert into Fk select item1, , itemk, count()
  • from (Subquery Qk) t
  • Group by item1, item2 , itemk having count() gt
    minsup
  • Subquery Qn (for any n between 1 and k)
  • select item1, , itemn, tid
  • from T tn, (Subquery Qn-1) as rn-1
  • (select distinct item1, , itemn from CK) as dn
  • where rn-1.item1 dn.item1 and and
    rn-1.itemn-1 dn.itemn
  • and rn-1.tid tn.tid and tn.item dn.itemn

27
Example of SubQuery based
  • Using previous example from class
  • C3 B,C,E, minimum support 2
  • Q0 No subquery Q0
  • Q1 in this case becomes
  • select item1, tid
  • From T t1,
  • (select distinct item1from C3) as d1
  • where t1.item d1.item1

28
Example of SubQuery based cntd
  • Q2 becomes
  • select item1, item2, tid from T t2, (Subquery Q1)
    as r1,
  • (select distinct item1, item2 from C3) as d2
    where r1.item1 d2.item1 and r1.tid t2.tid and
    t2.item d2.item2

29
Example of SubQuery based cntd
  • Q3 becomes
  • select item1,item2,item3, tid from T t3,
    (Subquery Q2) as r2,
  • (select distinct item1,item2,item3 from C3) as d3
  • where r2.item1 d3.item1 and r2.item2 d3.item2
    and
  • r2.tid t3.tid and t3.item d3.item3

30
Example of SubQuery based cntd
  • Output of Q3 is
  • Insert statement becomes
  • insert into F3 select item1, item2, item3,
    count()
  • from (Subquery Q3) t
  • group by item1, item2 ,item3 having count() gt
    minsup
  • Insert the row B,C,E,2
  • For Q2, pass-2 optimization can be used

31
Performance Comparisons of SQL-92 approaches
  • Used Version 5 of DB2 UDB and RS/6000 Model 140
  • 200 Mhz CPU, 256 MB main memory, 9 GB of disk
    space, Transfer rate of 8 MB/sec
  • Used 4 different item sets based on real-world
    data
  • Built the following indexes, which are not
    included in any cost calculations
  • Composite index (item1, , itemk) on Ck
  • k different indices on each of the k items in Ck
  • (item,tid) and (tid,item) indexes on the data
    table T

32
Performance Comparisons of SQL-92 approaches
  • Best performance obtained by SubQuery approach
  • SubQuery was only comparable to loose-coupling in
    some cases, failing to complete in other cases
  • DataSet C, for support of 2, SubQuery
    outperforms loose-coupling but decreasing support
    to 1, SubQuery takes 10 times as long to
    complete
  • Lower support will increase the size of Ck and Fk
    at each step, causing the join to process more
    rows

33
Support Counting using SQL with
object-relational extensions
  • 6 different methods, four of which detailed in
    the papers
  • GatherJoin
  • GatherCount
  • GatherPrune
  • Vertical
  • Other methods not discussed because of
    unacceptable performance
  • Horizontal
  • SBF

34
SQL Object-Relational Extension GatherJoin
  • Generates all possible k-item combinations of
    items contained in a transaction and joins them
    with Ck
  • An index is created on all items of Ck
  • Uses the following table functions
  • Gather Outputs records tid,item-list, with
    item-list being a BLOB or VARCHAR containing all
    items associated with the tid
  • Comb-K returns all k-item combinations from the
    transaction
  • Output has k attributes T_itm1, , T_itmk

35
GatherJoin
  • insert into Fk select item1,, itemk, count()
  • from Ck,
  • (select t2.T_itm1,,t2.itmk from T,
  • table(Gather(T.tid,T.item)) as t1,
  • table(Comb-K(t1.tid,t1.item-list)) as t2)
  • where t2.T_itm1 Ck.item1 and and
  • t2.T_itmk Ck.itemk
  • group by Ck.item1,,Ck.itemk
  • having count() gt minsup

36
Example of GatherJoin
  • t1 (output from Gather) looks like
  • t2 (generated by Comb-K from t1) will be joined
    with C3 to obtain F3
  • 1 row from Tid 10
  • 1 row from Tid 20
  • 4 rows from Tid 30
  • Insert B,C,E,2

37
GatherJoin Pass 2 optimization
  • When calculating C2, no pruning is required after
    we join F1 with itself
  • Dont calculate and materialize C2 - replace C2
    with a join to F1 before the table function
  • Gather is only passed frequent 1-itemset rows
  • insert into F2 select I1.item1, I2.item1,
    count() from F1 I1,
  • (select t2.T_itm1,t2.T_itm2 from T,
    table(Gather(T.tid,T.item)) as t1,
  • table(Comb-K(t1.tid,t1.item-list)) as t2 where
    T.item I1.item1)
  • group by t2.T_itm1,t2.T_itm2
  • having count() gt minsup

38
Variations of GatherJoin - GatherCount
  • Perform the GROUP BY inside the table function
    Comb-K for pass 2 optimization
  • Output of the table function Comb-K
  • Not the candidate frequent itemsets (Ck)
  • But the actual frequent itemsets (Fk) along with
    the corresponding support
  • Use a 2-dimensional array to store possible
    frequent itemsets in Comb-K
  • May lead to excessive memory use

39
Variations of GatherJoin - GatherPrune
  • Push the join with Ck into the table function
    Comb-K
  • Ck is converted into a BLOB and passed as an
    argument to the table function.
  • Will have to pass the BLOB for each invocation of
    Comb-K - of rows in table T

40
SQL Object-Relational Extension Vertical
  • For each item, create a BLOB containing the tids
    the item belongs to
  • Use function Gather to generate item,tid-list
    pairs, storing results in table TidTable
  • Tid-list are all in the same sorted order
  • Use function Intersect to compare two different
    tid-lists and extract common values
  • Pass-2 optimization can be used for Vertical
  • Similar to K-way join method
  • Upcoming example does not show optimization

41
Vertical
  • insert into Fk select item1, , itemk,
    count(tid-list) as cnt
  • from (Subquery Qk) t where cnt gt minsup
  • Subquery Qn (for any n between 2 and k)
  • Select item1, , itemn,
  • Intersect(rn-1.tid-list, tn.tid-list) as tid-list
  • from TidTable tn, (Subquery Qn-1) as rn-1
  • (select distinct item1, , itemn from CK) as dn
  • where rn-1.item1 dn.item1 and and
  • rn-1.itemn-1 dn.itemn-1 and
  • tn.item dn.itemn
  • Subquery Q1 (select from TidTable)

42
Example of Vertical
  • Using previous example from class
  • C3 B,C,E, minimum support 2
  • Q1 is TidTable

43
Example of Vertical cntd
  • Q2 becomes
  • Select item1, item2, Intersect(r1.tid-list,
    t2.tid-list) as tid-list
  • from TidTable t2, (Subquery Q1) as r1
  • (select distinct item1, item2 from C3) as d2
  • where r1.item1 d2.item1 and t2.item d2.item2

44
Example of Vertical cntd
  • Q3 becomes
  • select item1, item2, item3, intersect(r2.tid-list,
    t3.tid-list) as tid-list
  • from TidTable t3, (Subquery Q2) as r2
  • (select distinct item1, item2, item3 from C3) as
    d3
  • where r2.item1 d3.item1 and r2.item2 d3.item2
    and
  • t3.item d3.item3

45
Performance Comparisons using SQL-OR
46
Performance Comparisons using SQL-OR
47
Performance comparison of SQL object-relational
approaches
  • Vertical has best overall performance, sometimes
    an order of magnitude better than other 3
    approaches
  • Majority of time is transforming the data in
    item,tid-list pairs
  • Vertical spends too much time on the second pass
  • Pass-2 optimization has huge impact on
    performance of GatherJoin
  • For Dataset-B with support of 0.1 , running time
    for Pass 2 went from 5.2 hours to 10 minutes
  • Comb-K in GatherJoin generates large number of
    potential frequent itemsets we must work with

48
Hybrid approach
  • Previous charts and algorithm analysis show
  • Vertical spends too much time on pass 2 compared
    to other algorithms, especially when the support
    is decreased
  • GatherJoin degrades when the of frequent items
    per transaction increases
  • To improve performance, use a hybrid algorithm
  • Use Vertical for most cases
  • When size of candidate itemset is too large,
    GatherJoin is a good option if number of frequent
    items per transaction (Nf) is not too large
  • When Nf is large, GatherCount may be the only
    good option

49
Architecture Comparisons
  • Compare five alternatives
  • Loose-Coupling, Stored-procedure
  • Basically the same except for address space
    program is being run in
  • Because of limited difference in performance,
    focus solely on stored procedure in following
    charts
  • Cache-Mine
  • UDF tight-coupling
  • Best SQL approach (Hybrid)

50
Performance Comparisons of Architectures
51
Performance Comparisons of Architectures cntd
52
Performance Comparisons of Architectures cntd
  • Cache-Mine is the best or close to the best
    performance in all cases
  • Factor of 0.8 to 2 times faster than SQL approach
  • Stored procedure is the worst
  • Difference between Cache-Mine directly related to
    the number of passes through the data
  • Passes increase when the support goes down
  • May need to make multiple passes if all
    candidates cannot fit in memory
  • UDF time per pass decreases 30-50 compared to
    stored procedure because of tighter coupling with
    DB

53
Performance Comparisons of Architectures cntd
  • SQL approach comes in second in performance to
    Cache-Mine
  • Somewhat better than Cache-Mine for high support
    values
  • 1.8 3 times better than Stored-procedure/loose-c
    oupling approach, getting better when support
    value decreases
  • Cost of converting to Vertical format is less
    than cost of converting to binary format in
    Cache-Mine
  • For second pass through data, SQL approach takes
    much more time than Cache-Mine, particularly when
    we decrease the support

54
Organization of Presentation
  • Overview Data Mining and RDBMS
  • Loosely-coupled data and programs
  • Tightly-coupled data and programs
  • Architectural approaches
  • Methods of writing efficient SQL
  • Candidate generation, pruning, support counting
  • K-way join, SubQuery, GatherJoin, Vertical,
    Hybrid
  • Integrating taxonomies
  • Mining sequential patterns

55
Taxonomies - example
Beverages
Snacks
Soft Drinks
Alcoholic Drinks
Pretzels
Chocolate Bar
Parent Child
Beverages Soft Drinks
Beverages Alcoholic Drinks
Soft Drinks Pepsi
Soft Drinks Coke
Alcoholic Drinks Beer
Snacks Pretzels
Snacks Chocolate Bar
Pepsi
Coke
Beer
Example rule Soft Drinks ? Pretzels with 30
confidence, 2 support
56
Taxonomy augmentation
  • Algorithms similar to previous slides
  • Requires two additions to algorithm
  • Pruning itemsets containing an item and its
    ancestor
  • Pre-computing the ancestors for each item
  • Will also consider support counting

57
Pruning items and ancestors
  • In the second pass we will join F1 with F1 to
    give C2
  • This will give, for example
  • beverages,pepsi
  • snacks,coke
  • pretzels,chocolate bar
  • But beverages,pepsi is redundant!

58
Pruning items and ancestors
  • The following modification to the SQL statement
    eliminates such redundant combinations from being
    selected
  • insert into C2 (select I1.item1, I2.item1 from F1
    I1, F1 I2
  • where I1.item1 lt I2.item1) except
  • (select ancestor, descendant from Ancestor union
  • select descendant, ancestor from Ancestor)

59
Pre-computing ancestors
  • An ancestor table is created
  • Format (ancestor, descendant)
  • Use the transitive closure operation
  • insert into Ancestor with R-Tax (ancestor,
    descendant) as
  • (select parent, child from Tax union all
  • select p.ancestor, c.child from R-Tax p, Tax c
  • where p.descendant c.parent)
  • select ancestor, descendant from R-Tax

60
Support Counting
  • Extensions to handle taxonomies
  • Straightforward, but
  • Non-trivial
  • Need an extended transaction table
  • For example, if we have coke, pretzels
  • We add also soft drinks, pretzels, beverages,
    pretzels, coke, snacks, soft drinks, snacks,
    beverages, snacks

61
Extended transaction table
  • Can be obtained by the following SQL
  • Query to generate T
  • select item, tid from T union
  • select distinct A.ancestor as item, T.tid
  • from T, Ancestor A
  • where A.descendant T.item
  • The select distinct clause gets rid of items
    with common ancestor e.g. dont want
    beverages, beverages being added twice from
    pepsi, coke

62
Pipelining of Query
  • No need to actually build T
  • Make following modification to SQL

insert into Fk with T(tid, item) as (Query for
T) select item1,,itemk,count() from Ck, T t1,
, T tk, where t1.item Ck.item1, ,
and tk.item Ck.itemk and t1.tid t2.tid and
tk-1.tid tk.tid group by item1,,itemk having
count() gt minsup
63
Organization of Presentation
  • Overview Data Mining and RDBMS
  • Loosely-coupled data and programs
  • Tightly-coupled data and programs
  • Architectural approaches
  • Methods of writing efficient SQL
  • Candidate generation, pruning, support counting
  • K-way join, SubQuery, GatherJoin, Vertical,
    Hybrid
  • Integrating taxonomies
  • Mining sequential patterns

64
Sequential patterns
  • Similar to papers covered on Nov 17
  • Input is sequences of transactions
  • E.g. ((computer,modem),(printer))
  • Similar to association rules, but dealing with
    sequences as opposed to sets
  • Can also specify maximum and minimum time gaps,
    as well as sliding time windows
  • Max-gap, min-gap, window-size

65
Input and output formats
  • Input has three columns
  • Sequence identifier (sid)
  • Transaction time (time)
  • Idem identifier (item)
  • Output format is a collection of frequent
    sequences, in a fixed-width table
  • (item1, eno1,,itemk, enok, len)
  • For smaller lengths, extra column values are set
    to NULL

66
GSP algorithm
  • Similar to algorithms shown earlier
  • Each Ck has transactions and times, but no length
    has fixed length of k
  • Candidates are generated in two steps
  • Join join Fk-1 with itself
  • Sequence s1 joins with s2 if the subsequence
    obtained by dropping the first item of s1 is the
    same as the one obtained by dropping the last
    item of s2
  • When generating C2, we need to generate sequences
    where both of the items appear as a single
    element as well as two separate elements
  • Prune
  • All candidate sequences that have a non-frequent
    contiguous (k-1) subsequence are deleted

67
GSP Join SQL
  • insert into Ck
  • select I1.item1, I1.eno1, ... , I1.itemk-1,
    I1.enok-1,
  • I2.itemkk-1, I1.enok-1 I2.enok-1 I2.enok-2
  • from Fk-1 I1, Fk-1 I2
  • where I1.item2 I2.item1 and ... and
    I1.itemk-1 I2.itemk-2 and
  • I1.eno3-I1.eno2 I2.eno2 I2.eno1 and ... and
  • I1.enok-1 I1.enok-2 I2.enok-2 I2.enok-3

68
GSP Prune SQL
  • Write as a k-way join, similar to before
  • There are at most k contiguous subsequences of
    length (k-1) for which Fk-1 needs to be checked
    for membership
  • Note that all (k-1) subsequences may not be
    contiguous because of the max-gap constraint
    between consecutive elements.

69
GSP Support Counting
  • In each pass, we use the candidate table Ck and
    the input data-sequences table D to count the
    support
  • K-way join
  • We use select distinct before the group by to
    ensure that only distinct data-sequences are
    counted
  • We have additional predicates between sequence
    numbers to handle the special time elements

70
GSP Support Counting SQL
  • (Ck.enoj Ck.enoi and abs(dj.time di.time)
    window-size) or (Ck.enoj Ck.enoi 1 and
    dj.time di.time max-gap and dj.time di.time gt
    min-gap) or (Ck.enoj gt Ck.enoi 1)

71
References
  • Developing Tightly-Coupled Data Mining
    Applications on a Relational Database System
  • Rakesh Agrawal, Kyuseok Shim, 1996
  • Integrating Association Rule Mining with
    Relational Database Systems Alternatives and
    Implications
  • Sunita Sarawagi, Shiby Thomas, Rakesh Agrawal,
    1998
  • Refers to 1) above
  • Mining Generalized Association Rules and
    Sequential Patterns Using SQL Queries
  • Shiby Thomas, Sunita Sarawagi, 1998
  • Refers to 1) and 2) above
Write a Comment
User Comments (0)
About PowerShow.com