Applications of BFS and DFS: the Apriori and FPGrowth Algorithms

About This Presentation

Title:

Applications of BFS and DFS: the Apriori and FPGrowth Algorithms

Description:

Applications of BFS and DFS: the Apriori and FPGrowth Algorithms Modified from Slides of Stanford CS345A and UIUC CS412 Jianlin Feng School of Software – PowerPoint PPT presentation

Number of Views:476

Avg rating:3.0/5.0

Slides: 20

Provided by: ssSysuE

Category:

more less

Transcript and Presenter's Notes

Title: Applications of BFS and DFS: the Apriori and FPGrowth Algorithms

1
Applications of BFS and DFS the Apriori and
FPGrowth Algorithms
Modified from Slides of Stanford CS345A and UIUC
CS412

Jianlin Feng
School of Software
SUN YAT-SEN UNIVERSITY

2
The Market-Basket Model

A large set of items,
e.g., things sold in a supermarket.
A large set of baskets,
each of which is a small set of the items,
e.g., the things one customer buys on one day.

3
Applications (1)

Items products
baskets sets of products someone bought in one
trip to the store.
Example application given that many people buy
beer and diapers together
Run a sale on diapers raise price of beer.
Only useful if many buy diapers beer.

4
Applications (2)

items words.
Baskets Web pages
Unusual words appearing together in a large
number of documents,
may indicate an interesting relationship.
e.g., Brad and Angelina

5
Support

Support for itemset I
the number of baskets containing all items in I.
Sometimes given as a percentage.
Given a support threshold s,
sets of items that appear in at least s baskets
are called frequent itemsets.

6
Example Frequent Itemsets

Itemsmilk, coke, pepsi, beer, juice.
Support 3 baskets.
B1 m, c, b B2 m, p, j
B3 m, b B4 c, j
B5 m, p, b B6 m, c, b, j
B7 c, b, j B8 b, c
Frequent itemsets m, c, b, j,

7
Association Rules

If-then rules about the contents of baskets.
i1, i2,,ik ? j means if a basket contains
all of i1,,ik then it is likely to contain j.
Confidence of this association rule is the
probability of j given i1,,ik.

8
Example Confidence

B1 m, c, b B2 m, p, j
B3 m, b B4 c, j
B5 m, p, b B6 m, c, b, j
B7 c, b, j B8 b, c
An association rule m, b ? c.
Confidence 2/4 50.

_ _

9
Finding Association Rules

Question find all association rules with
support s and confidence c .
Note support of an association rule is the
support of the set of items on the left.
Hard part finding the frequent itemsets.
Note if i1, i2,,ik ? j has high support and
confidence, then both i1, i2,,ik and
i1, i2,,ik ,j will be frequent.

10
A-Priori Algorithm (1)

Key idea monotonicity
if a set of items appears at least s times, so
does every subset.
Contrapositive for pairs if item i does not
appear in s baskets, then no pair including i
can appear in s baskets.

11
A-Priori Algorithm (2)

Pass 1 Read baskets and count in main memory the
occurrences of each item.
Requires only memory proportional to items.
Items that appear at least s times are the
frequent items.

12
A-Priori Algorithm (3)

Pass 2 Read baskets again and count in main
memory only those pairs both of which were found
in Pass 1 to be frequent.
Requires memory proportional to square of
frequent items only (for counts), plus a list of
the frequent items (so you know what must be
counted).

13
Picture of A-Priori

Item counts
Frequent items
Counts of pairs of frequent items
Pass 1
Pass 2
14
Frequent Triples, Etc.

For each k, we construct two sets of k -sets
(sets of size k )
Ck candidate k -sets those that might be
frequent sets (support gt s ) based on information
from the pass for k 1.
Lk the set of truly frequent k -sets.

15
Filter
Filter
Construct
Construct
C1
L1
C2
L2
C3
First pass
Second pass
16
The Apriori Algorithm (Pseudo-Code)

Ck Candidate itemset of size k
Lk frequent itemset of size k
L1 frequent items
for (k 1 Lk !? k) do begin
Ck1 candidates generated from Lk
for each transaction t in database do
increment the count of all candidates in Ck1
that are contained in t
Lk1 candidates in Ck1 with min_support
end
return ?k Lk

17
Implementation of Apriori

How to generate candidates?
Step 1 self-joining Lk
Step 2 pruning
Example of Candidate-generation
L3abc, abd, acd, ace, bcd
Self-joining L3L3
abcd from abc and abd
acde from acd and ace
Pruning
acde is removed because ade is not in L3
C4 abcd

18
Depth-First SearchFrequent Pattern Growth
Approach

Bottlenecks of the Apriori approach
Breadth-first (i.e., level-wise) search
Candidate generation and test
Often generates a huge number of candidates
The FPGrowth Approach (J. Han, J. Pei, and Y.
Yin, SIGMOD 00)
Depth-first search
Avoid explicit candidate generation
Major philosophy Grow long patterns from short
ones using local frequent items only
abc is a frequent pattern
Get all transactions having abc, i.e., project
DB on abc DBabc
d is a local frequent item in DBabc ? abcd is
a frequent pattern

19
Construct FP-tree from a Transaction Database
TID Items bought (ordered) frequent
items 100 f, a, c, d, g, i, m, p f, c, a, m,
p 200 a, b, c, f, l, m, o f, c, a, b,
m 300 b, f, h, j, o, w f, b 400 b, c,
k, s, p c, b, p 500 a, f, c, e, l, p, m,
n f, c, a, m, p
min_support 3