ITLMINE: Mining Frequent Itemsets More Efficiently - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

ITLMINE: Mining Frequent Itemsets More Efficiently

Description:

CHESS. 3196 trans. Max 37 items/trans. MUSHROOM. 8124 trans. Max 23 items ... ITL-Mine outperforms Apriori and H-Mine on typical data sets. 15. Further Work ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 16
Provided by: YUD3
Category:

less

Transcript and Presenter's Notes

Title: ITLMINE: Mining Frequent Itemsets More Efficiently


1
ITL-MINE Mining Frequent Itemsets More
Efficiently
  • Raj P. Gopalan Yudho Giri Sucahyo
  • School of Computing
  • Curtin University of Technology
  • Bentley, Western Australia 6102
  • raj, sucahyoy_at_computing.edu.au

2
Introduction
  • Association rule mining Finds interesting
    relationships among items in a data set.
  • Two steps
  • Find the frequent itemsets.
  • Use the result of Step 1 to generate association
    rules.
  • Step 1 computationally very expensive.
  • So, focus of significant research effort.

3
Finding Frequent Itemsets
  • Two general approaches
  • Candidate generation-and-test
  • Apriori and its variants
  • Pattern Growth Approach
  • FP-Growth, H-Mine

4
Contributions
  • Present a new data structure called
    Item-TransLink (ITL).
  • Propose a more efficient algorithm (called
    ITL-Mine) based on the pattern growth approach.
  • Performance compared with Apriori and H-Mine
    algorithms.

5
Association Rules
  • Given a database of transactions containing
    various items, statements of the form
  • A ? B (10, 80)
  • 80 of transactions that purchase A also purchase
    B and 10 of all transactions contain both of
    them.

6
Binary Representation of Transactions
Sample database
7
ITL Data Structure
  • Based on these observations
  • Item identifiers may be mapped to a range of
    integers.
  • Transaction identifiers can be ignored provided
    the items of each transaction are linked
    together.

8
ITL Data Structure
  • ItemTable
  • Every item, with its support and a link to the
    first occurrence in TransLink.
  • TransLink
  • Every transaction in database, with items in
    sorted order.
  • Each item has a link to the next occurrence.

9
ITL-MINE Algorithm
  • Three steps
  • Construct ItemTable and TransLink from
    transaction database.
  • Prune any item below minimum support.
  • Mine Frequent Itemsets of 2 or more items.
  • Algorithm details in paper.

10
Example
11
Performance Study
  • CHESS
  • 3196 trans
  • Max 37 items/trans
  • MUSHROOM
  • 8124 trans
  • Max 23 items/trans

12
Performance Study
BMS-Web-View1 - 59602 trans - Max 267 items/trans
  • T25I10D10K
  • 10000 trans
  • Max 25 items/trans

13
Comparing with H-Mine
  • ITL-Mine traverses the database only once. H-Mine
    traverses twice.
  • ITL remains unchanged during mining. H-struct
    in H-Mine continually re-adjusted.
  • ITL-Mine builds a TempList and uses
    tid-intersection. H-Mine builds a series of
    header tables linked to H-struct.

14
Conclusion
  • A generic data structure (ITL) and a new
    algorithm (ITL-Mine) were presented.
  • ITL-Mine outperforms Apriori and H-Mine on
    typical data sets.

15
Further Work
  • Extend ITL-Mine for very large databases.
  • Integrate constraints into ITL-Mine.
Write a Comment
User Comments (0)
About PowerShow.com