Optimal rule discovery and applications - PowerPoint PPT Presentation

1 / 63
About This Presentation
Title:

Optimal rule discovery and applications

Description:

Rule based classification systems are competitive to many other systems, such as ... some approaches, e.g. the nearest neighbour substitution (Batista & Monard 2003) ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 64
Provided by: sciUs
Category:

less

Transcript and Presenter's Notes

Title: Optimal rule discovery and applications


1
Optimal rule discoveryand applications
  • Dr Jiuyong (John) Li
  • Dept of Mathematics and Computing
  • The University Southern of Queensland
  • Toowoomba, Australia

2
Outline
  • Introductions
  • Optimal rule discovery
  • Robust rule based classification
  • Mining risk patterns in medical data
  • Summaries

3
Rules
  • Strong implications
  • If outlook is sunny, and humidity is normal, then
    play tennis.
  • Advantages
  • Straightforward and expressive
  • Human understandable
  • Rule based classification systems are competitive
    to many other systems, such as neural networks,
    nearest neighbor classifiers, and Bayesian
    classifiers.

4
Rule types 1
  • Traditional classifications rules
  • Decision trees, e.g. C4.5rules (Quinlan 1993),
    Covering algorithm based, e.g. AQ15 (Michalski,
    Mozetic, Hong Lavrac 1986) and CN2 (Clark
    Niblett 1989)
  • Efficient
  • Heuristic search, may miss many quality rules

5
Data
6
A decision tree
7
Decision rules
  • If outlook is sunny and humidity is high, then do
    not play tennis.
  • If outlook is sunny and humidity is normal, then
    play tennis.
  • If outlook is overcast, then play tennis.
  • If outlook is rain and wind is strong, then do
    not play tennis.
  • If outlook is rain and wind is weak, then play
    tennis.

8
Rule types 2
  • Association rules
  • Complete search
  • Too many rules
  • Bottle-neck problem (combinatorial explosion)
  • Searching by some anti-monotone properties
  • Apriori (Agrawal Srikant 1994) and FP-growth
    (Han, Pei Yin 2000) based on anti-monotone
    property of support
  • Many variants
  • Non-redundant association rules (Zaki 2004)
  • Based on anti-monotone property of closure

9
Association rules 1
  • Items attribute-value pairs
  • (outlook, sunny), (humidity, normal)
  • Patterns set of attribute-value pairs
  • (outlook, sunny), (humidity, normal)
  • Implications pattern -gt class
  • (outlook, sunny), (humanity, normal)-gtplay
  • Support fraction of pattern, class in data set
  • Confidence
  • support of pattern, class / support of pattern
  • Support 2/14 0.14, confidence 2/2 100

10
Association rules 2
  • Association rules
  • implications whose support and confidence are
    greater than the user specified minimum support
    and confidence
  • Frequent patterns (rules)
  • Support gt minimum support
  • Super (sub) patterns (rules)
  • (outlook, sunny), (humidity, normal),
  • (outlook, sunny)

11
Association rules 3
  • Anti-monotone property of support
  • If a pattern (rule) is infrequent, all of its
    super patterns (rules) are infrequent
  • Complete search space
  • A1xA2xxAm gt 2 power m
  • Practical infeasible
  • Association rule mining
  • Anti-monotone property of support makes the
    association rule mining feasible
  • The minimum support cannot be too small

12
Why optimal rules
  • Optimal rules
  • Complete
  • Defined by a variant interestingness criteria
  • Reduce the number of rules
  • New anti-monotone property that supports the
    efficient search
  • Work well with low minimum support
  • Wide applications
  • Robust classification
  • Medical data mining
  • Related work
  • Constraint association rule mining method
    (Bayardo, Agrawal Gunopulos 2000)
  • Mining most interesting rules (Bayardo Agrawal,
    1999)

13
Various interestingness criteria
  • Many interestingness criteria have been presented
    as a substitute of confidence
  • Such as lift (interest or strength), gain,
    addedvalue, Klosgen, conviction, p-s, Laplace,
    cosine, certainty factor, Jaccard, and many
    others Tan, Kumar and Srivastava, 2004
  • Confidence (or an interestingness criterion) has
    no effect in pruning the search space
  • Confidence is used in forming rules when the
    major computational task has finished.

14
Uninteresting rules
  • Some rules do not carry useful information
  • If outlook is overcast, then play tennis.
    (support 4/14 confidence 100)
  • If outlook is overcast and temperature is hot,
    then play tennis. (support 2/14 confidence 100)
  • The latter rule is redundant
  • Redundant rules are not optimal. Some
    non-redundant rules are not optimal neither.

15
Optimal rules 1
  • General and specific relationships
  • Given two rules P -gt c and Q -gt c where
  • P Q, we say that the latter is more specific
    than the former and the former is more general
    than the latter.
  • The optimal rule set
  • A rule set is optimal with respect to an
    interestingness metric if it contains all rules
    except those with no greater interestingness than
    one of its more general rule.

16
Optimal rules 2
  • An association rule set
  • a -gt z (conf 80), ab -gt z (conf 70), abc
    -gt z (conf 70), b -gt z (conf 60 )
  • An optimal rule set
  • a -gt z (conf 80), b -gt z (conf 60 )
  • A non-redundant association rule set
  • a -gt z (conf 80), ab -gt z (conf 70), b -gt
    z (conf 60 )

17
Main results 1
  • Anti-monotonic property
  • if supp(PX c) supp(P c) then rule PX -gt c
    and all its more specific rules will not occur in
    an optimal rule set defined by confidence, odds
    ratio, lift (interest or strength), gain,
    added-value, Klosgen, conviction, p-s (or
    leverage), Laplace, cosine, certainty factor or
    Jaccard.
  • The relationship with the non-redundant rule set
  • An optimal rule set is a subset of a
    non-redundant rule set.

18
An illustration
19
Main results 2
  • Closure property
  • If supp(P) supp(PX), then rule PX -gt c for any
    c and all its more specific rules do not occur in
    an optimal rule set defined by confidence, odds
    ratio, lift (interest or strength), gain,
    added-value, Klosgen, conviction, p-s (or
    leverage), Laplace, cosine, certainty factor or
    Jaccard.
  • Termination property
  • If supp(P c) 0, then all more specific
    rules of the rule
  • P -gt c do not occur in an optimal rule set
    defined by confidence, odds ratio, lift (interest
    or strength), gain, added-value, Klosgen,
    conviction, p-s (or leverage), Laplace, cosine,
    certainty factoror Jaccard.

20
More illustrations
21
More illustration
22
Data
23
Patterns searched by exhaustive search
  • 1-patterns 3 3 2 2 10
  • 2-patterns 3 X (3 2 2) 3 X (2 2) 2 X
    2 35
  • 3-patterns 3 X 3 X 2 3 X 3 X 2 3 X 2 X 2
    3 X 2 X 2 60
  • 4-patterns 3 X 3 X 2 X 2 36
  • Total 141

24
Patterns searched by association rule discovery
(103)
25
Patterns searched by optimal rule discovery (42)
26
Experimental results 1
27
Experimental results 2
28
Experimental results 3
29
Experimental results 4
30
Conclusions
  • Rules defined by various interestingness criteria
    can be discovered in the optimal rule discovery
    framework, i.e. they satisfy the same
    anti-monotone property.
  • Optimal rule discovery is an efficient approach.
    It is significantly more efficient than
    association rule discovery and more efficient
    than non-redundant rule discovery.

31
More details
  • J Li, On Optimal Rule Discovery, IEEE
    transactions on Knowledge and Data Engineering,
    18(4), 2006.
  • J. Li, H. Shen and R. Topor, Mining the optimal
    class association rule set, Knowledge-based
    systems, 15 (7), 2002, 399-405, Elsevier Science.

32
Data
33
Why robust 1
34
Why robust 2
  • If outlook is sunny and humidity is high, then do
    not play tennis.
  • If outlook is sunny and humidity is normal, then
    play tennis.
  • If outlook is overcast, then play tennis.
  • If outlook is rain and wind is strong, then do
    not play tennis.
  • If outlook is rain and wind is weak, then play
    tennis.

35
Some additional rules are useful
  • If humidity is normal and wind is weak, then play
    tennis.
  • If temperature is cool and wind is weak, then
    play tennis.
  • If temperature is mild and humidity is normal,
    then play tennis.
  • If humidity is normal, then play tennis.

36
Motivations
  • Those additional useful rules are not found by
    decision trees.
  • An association rule set includes too many rules,
    and even an optimal rule set includes too many
    rules.
  • For example, mushrooms data set
  • Association rules 99126
  • Optimal rules 1691
  • C4.5rules 16
  • How to choose a reasonable rule set for data with
    missing values?

37
Robust prediction problem 1
  • Problem
  • Making prediction on a test data that is less
    complete than the training data.
  • Practical implication
  • Training data, typically some selective history
    data, more controllable.
  • Test data, future coming data, less controllable.

38
Robust prediction problem 2
  • General methods for handling missing values are
    to pre-process data by substituting missing
    values with estimations by some approaches, e.g.
    the nearest neighbour substitution (Batista
    Monard 2003).
  • treatment
  • The proposed method does not estimate and
    substitute any missing values, but builds a model
    to tolerate certain number of missing values in
    test data.
  • immunisation

39
Definitions 1
  • Ordered rule based classifiers
  • Rules are organised as a sequence usually in the
    descending accuracy order, and only the first
    matching rule makes a prediction. For example,
    C4.5rules (Quinlan 1993) and CBA (Liu, Hsu Ma
    1998).
  • Predictive rule
  • Let T be a record in data set D and R a rule set
    for D. A rule r in R is predictive for T wrt R if
    r covers T. If two rules cover T we choose the
    one with the greater accuracy. If two rules have
    the same accuracy we choose the one with higher
    support. If two rules have the same support we
    choose the one with the shorter antecedent.

40
Definitions 2
  • Robustness
  • Let D be a data set, and R1 and R2 be two rule
    sets for D. R2 is at least as robust as R1 if,
    for all and ,
    predictions made by R2 are at least as accurate
    as those by R1.
  • K-incomplete data set
  • Let D be a data set with n attributes, and k gt
    0. The k-incomplete data set of D is
  • Dk
  • K-optimal rule set
  • A k-optimal rule set contains the set of all
    predictive rules on the k-incomplete data set.

41
Major results
  • The optimal rule set is the most robust rule set
    with the smallest rule set size.
  • A (k 1)-optimal rule set is at least the same
    robust as a k-optimal rule set.
  • A (k 1)-optimal rule set is a super rule set of
    a k-optimal rule set.

42
An illustrative example
  • When a is missing
  • min-optimal rule set does not work
  • 1-optimal rule set works

43
Experiment design
  • Use 10 cross validation.
  • Randomly add missing values to test data
    controlled by parameter l (on average each record
    has l missing values).
  • Repeat 10 X 10 times for one data set.
  • Experiment on 28 data sets form UCML.
  • Compare with some benchmark classifiers
    C4.5rules and CBA.
  • Compare with some missing values handling
    methods most common value substitution and
    k-nearest neighbour substitution.

44
Experimental results 1
45
Experimental results 2
46
Experimental results 3
47
Experimental results 4
48
Main conclusions
  • Optimal classifiers are more robust than some
    benchmark rule based classifiers, such as
    C4.5rules and CBA. They make higher accurate
    predictions on test data with missing values than
    C4.5rules and CBA do.
  • Building optimal classifiers is better than some
    missing value handling, such as most k-nearest
    neighbour substitution and most common value
    substitution.

49
More details
  • J. Li, Robust Rule-based Prediction A Redundant
    Rule Approach, IEEE transactions on Knowledge and
    Data Engineering, 18(8), 2006.
  • H. Hu and J. Li, Using association rules to make
    rule-based classifiers robust, Proceedings of
    sixteenth Australasian database conference (ADC),
    2005, 47 52, ACS Society.
  • J. Li, R. Topor and H. Shen, Construct robust
    rule sets for classification, Proceedings of the
    eighth ACMKDD international conference on
    knowledge discovery and data mining (KDD), 2002,
    Edmonton, Canada, 564-569, ACM press.

50
Risk patterns 1
  • Out of 200 smokers, 3 suffer lung cancer
  • Out of 800 non-smokers. 0.5 suffer lung cancer
  • Smoking is 6 times more risky to lung cancer than
    non-smokers

51
Risk patterns 2
Relative risk
A concept that has been widely used in
epidemiological research.
52
Problems
  • Relative risk metric is not consistent with
    accuracy, and a normal classification system does
    not work well.
  • Data set is normally very skewed, and the global
    support of association rule mining is not
    suitable.
  • Patterns may contain many conditions, and this
    causes combinatorial explosion.

53
A solution
  • Replace the (global) support by local support
  • It can be characterised as the optimal rule
    discovery problem
  • Both local support and relative risk satisfy
    anti-monotone properties
  • If a pattern is not frequent, neither are its
    super patterns
  • If (supp(Pxa) supp(Pa)) then pattern Px and
    all its super patterns do not occur in the
    optimal risk pattern set.

54
A real world case study 1
  • This method has been applied to a real world
    project of detecting adverse drug reactions
  • The project has been sponsored by the Australian
    Commonwealth Department of Health and Aging
  • The data set used is a linked data set of
    hospital, pharmaceutical and medical service data
  • To determine how ACE inhibitor usage is
    associated with Angioedema.

55
A real world case study 2
56
A real world case study 3
  • Pattern 1 RR 3.99
  • Gender Female
  • Hospital Circulatory Flag Yes
  • Usage of Drugs in category Various Yes
  • Pattern 2 RR 3.82
  • Age gt 60
  • Usage of drugs in category of Genito urinary
    system and sex hormones Yes
  • Usage of drugs in category of Systematic
    hormonal preparations Yes
  • Pattern 3 RR 3.41
  • Usage of drugs in category of Genito urinary
    system and sex hormones Yes
  • Usage of drugs in category of General
    anti-infective for systematic use Yes
  • Usage of drugs in category of Nervous system
    No

57
A real world case study 4
58
A real world case study 5
59
A real world case study 6
60
Conclusions
  • An optimal rule discovery method is efficient
    approach in discovering risk patterns in large
    skewed medical data sets.
  • More details
  • J. Li, A. Fu, H. He, J. Chen, H. Jin, D.
    McAullay, G. Williams, R. Sparks and C. Kelman,
    Mining risk patterns in medical data, Proceeding
    of the eleventh ACM SIGKDD international
    conference on knowledge discovery in data mining
    (KDD05), 2005, 770-775, Chicago, ACM Press, New
    York.

61
Summaries
  • Optimal rule discovery is an efficient approach
    in discovering various optimal rules
  • Optimal classifiers are more robust than some
    benchmark rule based classifiers, such as
    C4.5rules and CBA
  • An optimal rule discovery method is efficient in
    discovering risk patterns in large skewed medical
    data set

62
Acknowledgements
  • Collaborators
  • Hong Shen, Rodney Topor, Hong Hu, Ada Fu,
    Hongxing He, Jie Chen, Huidong Jin, Graham
    Williams, and et al.
  • Internal reviewers
  • Tony Roberts, Ron House, and Xiaodi Huang
  • Australian Research Council grant, P0559090
  • USQ Early Career Researcher Program grant,
    4710/1000479

63
Thank you
  • Questions

My papers and software tools are available
from http//www.sci.usq.edu.au/staff/jiuyong
Write a Comment
User Comments (0)
About PowerShow.com