The AIM-F Algorithm review - PowerPoint PPT Presentation

About This Presentation
Title:

The AIM-F Algorithm review

Description:

1. The AIM-F Algorithm review. Presented by Sagi Shporer. 2. Frequent Itemset Problem ... Let D be a dataset of n transactions. ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 19
Provided by: shpo
Category:
Tags: aim | aim | algorithm | com | review

less

Transcript and Presenter's Notes

Title: The AIM-F Algorithm review


1
The AIM-F Algorithm review
Presented by Sagi Shporer
2
Frequent Itemset Problem
  • Let Ii1,i2,,im be a set of items
  • Let T?I be a transaction
  • Let D be a dataset of n transactions.
  • Task Find all X ? I s.t. support(X)minsupport
    (e.g. there are at least minsupport transactions
    for which X ?T).

3
Example Frequent Itemsets
What itemsets are frequent itemsets (FI)?
a, b, c, d, e,
ab, ac, ad, bc, bd, be, cd, ce, de,
abc, abd, acd, bcd, cde,
abcd
4
Previous research work
  • Candidate set generate-and-test approach
  • Apriori, VLDB 94, R. Agrawal.
  • Sampling technique
  • H. Toivonen
  • Adaptive Support
  • SLPMiner, ICDM 2002, M. Seno G. Karypis
  • Data transform
  • FP-tree, SIGMOD 2000, J. Han.

5
General
  • Goal Mining Frequent Itemsets
  • Main features
  • DFS generate-and-test
  • Compressed vertical database
  • Diffsets
  • PEP
  • Dynamic reordering
  • Vector projection
  • Optimized Initialization

6
Enumeration tree
7
Pruning - PEP
8
An Example (Illustration only)
abcd
9
Diffsets
  • Let t(P) be the set of transactions (TIDs)
    supporting P.
  • Define diffset d(PX)t(P)\t(X)
  • Then support(PX)support(P)-d(PX)

10
Diffsets
  • How to Calculate support(PXY) using d(PX) and
    d(PY) ?
  • support(PXY)support(PX)-d(PXY)
  • d(PXY)d(PY) - d(PX)

11
Example
t(X)
t(P)
t(Y)
d(PY)
d(PX)
d(PXY)
t(PXY)
12
Contributions
  • Dynamical use of various itemset mining
    optimizations (Specifically diffsets).
  • Use of compressed vertical bit vector with
    diffsets.

13
Dynamic Optimization Usage
  • Every optimization has strengths and weaknesses.
  • Optimizations should be used only when they give
    some benefit.

14
Dynamic Optimization Usage Cont.
  • Diffsets Start using diffsets only when d(PX) lt
    t(PX)
  • Optimized Initialization Use only for sparse
    datasets (when the number of 1s reach a
    threshold)

15
Compressed Bit Vector
  • Sparse Vertical Bit Vector Hold only the needed
    cells in the vertical bit vector

16
Compressed Bit Vector Cont.
  • Use of diffsets directly from the compressed form
  • Faster than tid-list for dense datasets.
  • Competitive with tid-list for sparse datasets

17
Optimization Contributions
18
Questions Comments
  • THANK YOU !
Write a Comment
User Comments (0)
About PowerShow.com