Association Rule Mining III - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Association Rule Mining III

Description:

Multi-level, quantitative association rules, correlation and causality, ratio ... and markets, telephone calling patterns, Weblog click streams, DNA sequences and ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 24
Provided by: WeiW8
Category:
Tags: iii | association | mining | rule

less

Transcript and Presenter's Notes

Title: Association Rule Mining III


1
Association Rule Mining III
  • COMP 790-90 Seminar
  • BCB 713 Module
  • Spring 2009

2
Mining Various Kinds of Rules or Regularities
  • Multi-level, quantitative association rules,
    correlation and causality, ratio rules,
    sequential patterns, emerging patterns, temporal
    associations, partial periodicity
  • Classification, clustering, iceberg cubes, etc.

3
Multiple-level Association Rules
  • Items often form hierarchy
  • Flexible support settings Items at the lower
    level are expected to have lower support.
  • Transaction database can be encoded based on
    dimensions and levels
  • explore shared multi-level mining

4
Multi-dimensional Association Rules
  • Single-dimensional rules
  • buys(X, milk) ? buys(X, bread)
  • MD rules ? 2 dimensions or predicates
  • Inter-dimension assoc. rules (no repeated
    predicates)
  • age(X,19-25) ? occupation(X,student) ?
    buys(X,coke)
  • hybrid-dimension assoc. rules (repeated
    predicates)
  • age(X,19-25) ? buys(X, popcorn) ? buys(X,
    coke)
  • Categorical Attributes finite number of possible
    values, no order among values
  • Quantitative Attributes numeric, implicit order

5
Quantitative/Weighted Association Rules
Numeric attributes are dynamically
discretized maximize the confidence or
compactness of the rules 2-D quantitative
association rules Aquan1 ? Aquan2 ? Acat Cluster
adjacent association rules to form general
rules using a 2-D grid.
Income
age(X,33-34) ? income(X,30K - 50K) ?
buys(X,high resolution TV)
Age
6
Mining Distance-based Association Rules
  • Binning methods do not capture semantics of
    interval data
  • Distance-based partitioning
  • Density/number of points in an interval
  • Closeness of points in an interval

7
Constraint-based Frequent-pattern Mining
  • Why constraint-based mining?
  • Anti-monotonicity, monotonicity succinctness
  • Mining frequent patterns with convertible
    constraints

8
Constraint-based Data Mining
  • Find all the patterns in a database autonomously?
  • The patterns could be too many but not focused!
  • Data mining should be interactive
  • User directs what to be mined
  • Constraint-based mining
  • User flexibility provides constraints on what to
    be mined
  • System optimization push constraints for
    efficient mining

9
Constraints in Data Mining
  • Knowledge type constraint
  • classification, association, etc.
  • Data constraint using SQL-like queries
  • find product pairs sold together in stores in New
    York
  • Dimension/level constraint
  • in relevance to region, price, brand, customer
    category
  • Rule (or pattern) constraint
  • small sales (price lt 10) triggers big sales (sum
    gt200)
  • Interestingness constraint
  • strong rules support and confidence

10
Constrained Frequent Pattern Mining Query
Optimization
  • Mining frequent patterns with constraint C
  • Sound only find patterns satisfying the
    constraints C
  • Complete find all patterns satisfying the
    constraints C
  • A naïve solution
  • Constraint test as a post-processing
  • More efficient approaches
  • Analyze the properties of constraints
    comprehensively
  • Push constraints as deeply as possible inside the
    frequent pattern mining

11
A General Picture of Constraints
12
Classification of Constraints
Monotone
Antimonotone
Strongly convertible
Succinct
Convertible anti-monotone
Convertible monotone
Inconvertible
13
Sequential Pattern Mining
  • Why sequential pattern mining?
  • GSP algorithm
  • FreeSpan and PrefixSpan
  • Boarder Collapsing
  • Constraints and extensions

14
Sequence Databases and Sequential Pattern Analysis
  • (Temporal) order is important in many situations
  • Time-series databases and sequence databases
  • Frequent patterns ? (frequent) sequential
    patterns
  • Applications of sequential pattern mining
  • Customer shopping sequences
  • First buy computer, then CD-ROM, and then digital
    camera, within 3 months.
  • Medical treatment, natural disasters (e.g.,
    earthquakes), science engineering processes,
    stocks and markets, telephone calling patterns,
    Weblog click streams, DNA sequences and gene
    structures

15
What Is Sequential Pattern Mining?
  • Given a set of sequences, find the complete set
    of frequent subsequences

A sequence lt (ef) (ab) (df) c b gt
A sequence database
An element may contain a set of items. Items
within an element are unordered and we list them
alphabetically.
lta(bc)dcgt is a subsequence of lta(abc)(ac)d(cf)gt
Given support threshold min_sup 2, lt(ab)cgt is a
sequential pattern
16
Challenges on Sequential Pattern Mining
  • A huge number of possible sequential patterns are
    hidden in databases
  • A mining algorithm should
  • Find the complete set of patterns satisfying the
    minimum support (frequency) threshold
  • Be highly efficient, scalable, involving only a
    small number of database scans
  • Be able to incorporate various kinds of
    user-specific constraints

17
A Basic Property of Sequential Patterns Apriori
  • A basic property Apriori (Agrawal Sirkant94)
  • If a sequence S is not frequent
  • Then none of the super-sequences of S is frequent
  • E.g, lthbgt is infrequent ? so do lthabgt and lt(ah)bgt

Given support threshold min_sup 2
18
Basic Algorithm Breadth First Search (GSP)
  • L1
  • While (ResultL ! NULL)
  • Candidate Generate
  • Prune
  • Test
  • LL1

19
Finding Length-1 Sequential Patterns
  • Initial candidates all singleton sequences
  • ltagt, ltbgt, ltcgt, ltdgt, ltegt, ltfgt, ltggt, lthgt
  • Scan database once, count support for candidates

min_sup 2
20
The Mining Process
min_sup 2
21
Generating Length-2 Candidates
51 length-2 Candidates
Without Apriori property, 8887/292 candidates
Apriori prunes 44.57 candidates
22
Generating Length-4 Candidates
23
Pattern Growth (prefixSpan)
  • Prefix and Suffix (Projection)
  • ltagt, ltaagt, lta(ab)gt and lta(abc)gt are prefixes of
    sequence lta(abc)(ac)d(cf)gt
  • Given sequence lta(abc)(ac)d(cf)gt

24
Example
An Example ( min_sup2)
25
PrefixSpan (the example to be continued)
Step1 Find length-1 sequential patterns
ltagt4, ltbgt4, ltcgt4, ltdgt3, ltegt3, ltfgt3
support
pattern
Step2 Divide search space six
subsets according to the six prefixes
Step3 Find subsets of sequential patterns
By constructing corresponding projected
databases and mine each
recursively.
26
Example to be continued
27
Example
  • Find sequential patterns having prefix ltagt
  • Scan sequence database S once. Sequences in S
    containing ltagt are projected w.r.t ltagt to form
    the ltagt-projected database.
  • Scan ltagt-projected database once, get six
    length-2 sequential patterns having prefix ltagt
  • ltagt2 , ltbgt4, lt(_b)gt2, ltcgt4, ltdgt2, ltfgt2
  • ltaagt2 , ltabgt4, lt(ab)gt2, ltacgt4, ltadgt2,
    ltafgt2
  • Recursively, all sequential patterns having
    prefix ltagt can be further partitioned into 6
    subsets. Construct respective projected databases
    and mine each.
  • e.g. ltaagt-projected database has two
    sequences
  • lt(_bc)(ac)d(cf)gt and lt(_e)gt.

28
PrefixSpan Algorithm
Main Idea Use frequent prefixes to divide the
search space and to project sequence databases.
only search the relevant sequences.
PrefixSpan(?, i, S?)
  • Scan S? once, find the set of frequent items b
    such that
  • b can be assembled to the last element of ? to
    form a sequential pattern or
  • ltbgt can be appended to ? to form a sequential
    pattern.
  • For each frequent item b, appended it to ? to
    form a sequential pattern ?, and output ?
  • For each ?, construct ?-projected database
    S?, and call PrefixSpan(?, i1,S?).
Write a Comment
User Comments (0)
About PowerShow.com