Manuel Brandozzi - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Manuel Brandozzi

Description:

Nodes whose lower bound is greater than U may be safely pruned ... a weaker constrain C' and use C' in the candidate generation and pruning phases ... – PowerPoint PPT presentation

Number of Views:145
Avg rating:3.0/5.0
Slides: 15
Provided by: BRAV1
Category:

less

Transcript and Presenter's Notes

Title: Manuel Brandozzi


1
Scalable Data Mining with Model Constraints
by Minos Garofalakis and Rajeev Rastogi Bell Labs
  • Presented by
  • Manuel Brandozzi

2
Introduction
  • Users interested to specific patterns only
  • Traditional approaches require disproportionate
    cost
  • They provide overwhelming volume of rules or
    patterns

3
Overcoming These Drawbacks
  • The authors introduce scalable constraint-based
    algorithms
  • Construct decision trees with accuracy and size
    constraints
  • Extract sequential patterns with regular
    expression constraints

4
Decision Tree Construction
  • Traditional algorithms prevent overfitting by MDL
    principle
  • MDL principle based on cost of encoding
  • ?Structure
  • ?Attribute and split value
  • ?Classes of data records in leaves

5
Pushing Constraints into Tree Building
  • Traditional algorithms enforce user constraints
    after a full decision tree is built
  • The new algorithm periodically enforces the
    constraints while building the tree

6
How Does It Work?
  • Compute cost U of cheapest subtree of size at
    most k of the partial tree Tp
  • Compute lower bounds on cost of subtrees rooted
    at the nodes of Tp
  • Nodes whose lower bound is greater than U may be
    safely pruned

7
Sequential Pattern Discovery with RE Constraints
  • RE constraint regular expression over the
    alphabet of sequence elements
  • A RE constraint can be expressed as a
    deterministic automaton

8
SPIRIT Family of Algorithms
  • Get frequent sequences that satisfy constraint C
  • Work in passes
  • They relax C by using a weaker constrain C and
    use C in the candidate generation and pruning
    phases

9
SPIRIT Family of Algorithms -cont
  • The goal is to restrict the number of candidate
    k-sequences at each step
  • Constraint based pruning
  • Support based pruning

10
Type of Constraints and Strategies
  • Anti-Monotone push C all the way inside
    computation
  • Non Anti-monotone take the anti-monotone
    relaxation?
  • May not have non-trivial ones

11
Pushing Non Anti-Monotone Constraints
  • The strength of the non anti-monotone relaxation
    of C impacts both constraint-based and
    support-based pruning

12
The SPIRIT Family
  • Spirit(N) all elements appear in R
  • Spirit(L) legal wrt some state of Ar
  • Spirit(V) valid wrt some state of Ar
  • Spirit(R) valid, i.e. C is R

13
Experimental results
  • There is no clear winner between the different
    algorithms
  • SPIRIT(R) best for highly selective algorithms
  • SPIRIT(V) best overall

14
Conclusion
  • Summarizes previous work of the authors on
    constrained DT and AR
  • Additional implementation details may be found
    there
  • Experimental results verify their hypothesis
Write a Comment
User Comments (0)
About PowerShow.com