Applications of BFS and DFS: the Apriori and FPGrowth Algorithms - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Applications of BFS and DFS: the Apriori and FPGrowth Algorithms

Description:

Applications of BFS and DFS: the Apriori and FPGrowth Algorithms Modified from Slides of Stanford CS345A and UIUC CS412 Jianlin Feng School of Software – PowerPoint PPT presentation

Number of Views:476
Avg rating:3.0/5.0
Slides: 20
Provided by: ssSysuE
Category:

less

Transcript and Presenter's Notes

Title: Applications of BFS and DFS: the Apriori and FPGrowth Algorithms


1
Applications of BFS and DFS the Apriori and
FPGrowth Algorithms
Modified from Slides of Stanford CS345A and UIUC
CS412
  • Jianlin Feng
  • School of Software
  • SUN YAT-SEN UNIVERSITY

2
The Market-Basket Model
  • A large set of items,
  • e.g., things sold in a supermarket.
  • A large set of baskets,
  • each of which is a small set of the items,
  • e.g., the things one customer buys on one day.

3
Applications (1)
  • Items products
  • baskets sets of products someone bought in one
    trip to the store.
  • Example application given that many people buy
    beer and diapers together
  • Run a sale on diapers raise price of beer.
  • Only useful if many buy diapers beer.

4
Applications (2)
  • items words.
  • Baskets Web pages
  • Unusual words appearing together in a large
    number of documents,
  • may indicate an interesting relationship.
  • e.g., Brad and Angelina

5
Support
  • Support for itemset I
  • the number of baskets containing all items in I.
  • Sometimes given as a percentage.
  • Given a support threshold s,
  • sets of items that appear in at least s baskets
    are called frequent itemsets.

6
Example Frequent Itemsets
  • Itemsmilk, coke, pepsi, beer, juice.
  • Support 3 baskets.
  • B1 m, c, b B2 m, p, j
  • B3 m, b B4 c, j
  • B5 m, p, b B6 m, c, b, j
  • B7 c, b, j B8 b, c
  • Frequent itemsets m, c, b, j,

7
Association Rules
  • If-then rules about the contents of baskets.
  • i1, i2,,ik ? j means if a basket contains
    all of i1,,ik then it is likely to contain j.
  • Confidence of this association rule is the
    probability of j given i1,,ik.

8
Example Confidence
  • B1 m, c, b B2 m, p, j
  • B3 m, b B4 c, j
  • B5 m, p, b B6 m, c, b, j
  • B7 c, b, j B8 b, c
  • An association rule m, b ? c.
  • Confidence 2/4 50.

_ _

9
Finding Association Rules
  • Question find all association rules with
    support s and confidence c .
  • Note support of an association rule is the
    support of the set of items on the left.
  • Hard part finding the frequent itemsets.
  • Note if i1, i2,,ik ? j has high support and
    confidence, then both i1, i2,,ik and
    i1, i2,,ik ,j will be frequent.

10
A-Priori Algorithm (1)
  • Key idea monotonicity
  • if a set of items appears at least s times, so
    does every subset.
  • Contrapositive for pairs if item i does not
    appear in s baskets, then no pair including i
    can appear in s baskets.

11
A-Priori Algorithm (2)
  • Pass 1 Read baskets and count in main memory the
    occurrences of each item.
  • Requires only memory proportional to items.
  • Items that appear at least s times are the
    frequent items.

12
A-Priori Algorithm (3)
  • Pass 2 Read baskets again and count in main
    memory only those pairs both of which were found
    in Pass 1 to be frequent.
  • Requires memory proportional to square of
    frequent items only (for counts), plus a list of
    the frequent items (so you know what must be
    counted).

13
Picture of A-Priori

Item counts
Frequent items
Counts of pairs of frequent items
Pass 1
Pass 2
14
Frequent Triples, Etc.
  • For each k, we construct two sets of k -sets
    (sets of size k )
  • Ck candidate k -sets those that might be
    frequent sets (support gt s ) based on information
    from the pass for k 1.
  • Lk the set of truly frequent k -sets.

15
Filter
Filter
Construct
Construct
C1
L1
C2
L2
C3
First pass
Second pass
16
The Apriori Algorithm (Pseudo-Code)
  • Ck Candidate itemset of size k
  • Lk frequent itemset of size k
  • L1 frequent items
  • for (k 1 Lk !? k) do begin
  • Ck1 candidates generated from Lk
  • for each transaction t in database do
  • increment the count of all candidates in Ck1
    that are contained in t
  • Lk1 candidates in Ck1 with min_support
  • end
  • return ?k Lk

17
Implementation of Apriori
  • How to generate candidates?
  • Step 1 self-joining Lk
  • Step 2 pruning
  • Example of Candidate-generation
  • L3abc, abd, acd, ace, bcd
  • Self-joining L3L3
  • abcd from abc and abd
  • acde from acd and ace
  • Pruning
  • acde is removed because ade is not in L3
  • C4 abcd

18
Depth-First SearchFrequent Pattern Growth
Approach
  • Bottlenecks of the Apriori approach
  • Breadth-first (i.e., level-wise) search
  • Candidate generation and test
  • Often generates a huge number of candidates
  • The FPGrowth Approach (J. Han, J. Pei, and Y.
    Yin, SIGMOD 00)
  • Depth-first search
  • Avoid explicit candidate generation
  • Major philosophy Grow long patterns from short
    ones using local frequent items only
  • abc is a frequent pattern
  • Get all transactions having abc, i.e., project
    DB on abc DBabc
  • d is a local frequent item in DBabc ? abcd is
    a frequent pattern

19
Construct FP-tree from a Transaction Database
TID Items bought (ordered) frequent
items 100 f, a, c, d, g, i, m, p f, c, a, m,
p 200 a, b, c, f, l, m, o f, c, a, b,
m 300 b, f, h, j, o, w f, b 400 b, c,
k, s, p c, b, p 500 a, f, c, e, l, p, m,
n f, c, a, m, p
min_support 3
  1. Scan DB once, find frequent 1-itemset (single
    item pattern)
  2. Sort frequent items in frequency descending
    order, f-list
  3. Scan DB again, construct FP-tree

F-list f-c-a-b-m-p
Write a Comment
User Comments (0)
About PowerShow.com