Rule Induction - PowerPoint PPT Presentation

1 / 80
About This Presentation
Title:

Rule Induction

Description:

ACAI 05. ADVANCED COURSE ON KNOWLEDGE DISCOVERY. 2. Talk outline. Predictive vs. Descriptive DM ... Classification (learning of rulesets, decision trees, ... – PowerPoint PPT presentation

Number of Views:700
Avg rating:3.0/5.0
Slides: 81
Provided by: Opte
Category:
Tags: induction | rule

less

Transcript and Presenter's Notes

Title: Rule Induction


1
Rule Induction
ACAI 05 ADVANCED COURSE ON KNOWLEDGE DISCOVERY
  • Nada Lavrac
  • Department of Knowledge Technologies
  • Jožef Stefan Institute
  • Ljubljana, Slovenia

2
Talk outline
  • Predictive vs. Descriptive DM
  • Predictive rule induction
  • Classification vs. estimation
  • Classification rule induction
  • Heuristics and rule quality evaluation
  • Descriptive rule induction
  • Predictive vs. Descriptive DM summary


3
Types of DM tasks
  • Predictive DM
  • Classification (learning of rulesets, decision
    trees, ...)
  • Prediction and estimation (regression)
  • Predictive relational DM (RDM, ILP)
  • Descriptive DM
  • description and summarization
  • dependency analysis (association rule learning)
  • discovery of properties and constraints
  • segmentation (clustering)
  • subgroup discovery
  • Text, Web and image analysis



H

-
-
-
x
x
x
x
H
x

x
x
4
Predictive vs. descriptive induction
  • Predictive induction Inducing classifiers for
    solving classification and prediction tasks,
  • Classification rule learning, Decision tree
    learning, ...
  • Bayesian classifier, ANN, SVM, ...
  • Data analysis through hypothesis generation and
    testing
  • Descriptive induction Discovering interesting
    regularities in the data, uncovering patterns,
    ... for solving KDD tasks
  • Symbolic clustering, Association rule learning,
    Subgroup discovery, ...
  • Exploratory data analysis

5
Predictive vs. descriptive induction A rule
learning perspective
  • Predictive induction Induces rulesets acting as
    classifiers for solving classification and
    prediction tasks
  • Descriptive induction Discovers individual rules
    describing interesting regularities in the data
  • Therefore Different goals, different heuristics,
    different evaluation criteria

6
Supervised vs. unsupervised learning A rule
learning perspective
  • Supervised learning Rules are induced from
    labeled instances (training examples with class
    assignment) - usually used in predictive
    induction
  • Unsupervised learning Rules are induced from
    unlabeled instances (training examples with no
    class assignment) - usually used in descriptive
    induction
  • Exception Subgroup discovery
  • Discovers individual rules describing
    interesting regularities in the data induced from
    labeled examples

7
Subgroups vs. classifiers
  • Classifiers
  • Classification rules aim at pure subgroups
  • A set of rules forms a domain model
  • Subgroups
  • Rules describing subgroups aim at significantly
    higher proportion of positives
  • Each rule is an independent chunk of knowledge
  • Link SD can be viewed as a form of
    cost-sensitive classification

8
Talk outline
  • Predictive vs. Descriptive DM
  • Predictive rule induction
  • Classification vs. estimation
  • Classification rule induction
  • Heuristics and rule quality evaluation
  • Descriptive rule induction
  • Predictive vs. Descriptive DM summary


9
Predictive DM - Classification
  • data are objects, characterized with attributes -
    objects belong to different classes (discrete
    labels)
  • given the objects described by attribute values,
    induce a model to predict different classes
  • decision trees, if-then rules, ...

10
Illustrative example Contact lenses data
11
Decision tree forcontact lenses recommendation
12
Illustrative example Customer data
13
Induced decision trees
Income
? 102000
? 102000
Age
yes
? 58
? 58
Gender
no
yes
female
male
Age
no
? 49
? 49
no
yes
14
Predictive DM - Estimation
  • often referred to as regression
  • data are objects, characterized with attributes
    (discrete or continuous), classes of objects are
    continuous (numeric)
  • given objects described with attribute values,
    induce a model to predict the numeric class value
  • regression trees, linear and logistic regression,
    ANN, kNN, ...

15
Illustrative example Customer data
16
Customer data regression tree
Income
? 108000
? 108000
Age
12000
? 42.5
? 42.5
16500
26700
17
Predicting algal biomass regression tree
Month
Jan.-June
July - Dec.
Ptot
Si
? 9.34
gt 9.34
? 10.1
gt10.1
2.34?1.65
Ptot
Ptot
4.32?2.07
? 9.1
gt 9.1
? 5.9
gt 5.9
Si
1.28?1.08
2.08 ?0.71
2.97?1.09
gt 2.13
? 2.13
0.70?0.34
1.15?0.21
18
Talk outline
  • Predictive vs. Descriptive DM
  • Predictive rule induction
  • Classification vs. estimation
  • Classification rule induction
  • Heuristics and rule quality evaluation
  • Descriptive rule induction
  • Predictive vs. Descriptive DM summary


19
Ruleset representation
  • Rule base is a disjunctive set of conjunctive
    rules
  • Standard form of rules IF Condition THEN Class
  • Class IF Conditions
  • Class ? Conditions
  • Examples
  • IF OutlookSunny ? HumidityNormal THEN
    PlayTennisYesIF OutlookOvercast THEN
    PlayTennisYesIF OutlookRain ? WindWeak THEN
    PlayTennisYes
  • Form of CN2 rules IF Conditions THEN
    MajClass ClassDistr
  • Rule base R1, R2, R3, , DefaultRule

20
Classification Rule Learning
  • Rule set representation
  • Two rule learning approaches
  • Learn decision tree, convert to rules
  • Learn set/list of rules
  • Learning an unordered set of rules
  • Learning an ordered list of rules
  • Heuristics, overfitting, pruning

21
Decision tree vs. rule learning Splitting vs.
covering

  • Splitting (ID3, C4.5, J48, See5)
  • Covering (AQ, CN2)





-
-
-
-
-






-
-
-
-
-
22
PlayTennis Training examples
23
PlayTennis Using a decision tree for
classification
Outlook
Sunny
Overcast
Rain
Humidity
Wind
Yes
Weak
Strong
Normal
High
No
Yes
No
Yes
Is Saturday morning OK for playing
tennis? OutlookSunny, TemperatureHot,
HumidityHigh, WindStrong PlayTennis No,
because OutlookSunny ? HumidityHigh
24
PlayTennis Converting a tree to rules
  • IF OutlookSunny ? HumidityNormal THEN
    PlayTennisYes
  • IF OutlookOvercast THEN PlayTennisYes
  • IF OutlookRain ? WindWeak THEN PlayTennisYes
  • IF OutlookSunny ? HumidityHigh THEN
    PlayTennisNo
  • IF OutlookRain ? WindStrong THEN PlayTennisNo

25
Contact lense classification rules
  • tear productionreduced gt lensesNONE
  • S0,H0,N12
  • tear productionnormal astigmatismno gt
    lensesSOFT S5,H0,N1
  • tear productionnormal astigmatismyes spect.
    pre.myope gt lensesHARD S0,H3,N2
  • tear productionnormal astigmatismyes spect.
    pre.hypermetrope gt lensesNONE
  • S0,H1,N2
  • DEFAULT lenses NONE

26
Unordered rulesets
  • rule Class IF Conditions is learned by first
    determining Class and then Conditions
  • NB ordered sequence of classes C1, , Cn in
    RuleSet
  • But unordered (independent) execution of rules
    when classifying a new instance all rules are
    tried and predictions of those covering the
    example are collected voting is used to obtain
    the final classification
  • if no rule fires, then DefaultClass (majority
    class in E)

27
Contact lense decision list
  • Ordered (order dependent) rules
  • IF tear productionreduced THEN lensesNONE
  • ELSE /tear productionnormal/
  • IF astigmatismno THEN lensesSOFT
  • ELSE /astigmatismyes/
  • IF spect. pre.myope THEN lensesHARD
  • ELSE / spect.pre.hypermetrope/
  • lensesNONE

28
Ordered set of rules if-then-else decision lists
  • rule Class IF Conditions is learned by first
    determining Conditions and then Class
  • Notice mixed sequence of classes C1, , Cn in
    RuleBase
  • But ordered execution when classifying a new
    instance rules are sequentially tried and the
    first rule that fires (covers the example) is
    used for classification
  • Decision list R1, R2, R3, , D rules Ri are
    interpreted as if-then-else rules
  • If no rule fires, then DefaultClass (majority
    class in Ecur)

29
Original covering algorithm(AQ, Michalski
1969,86)
  • Basic covering algorithm
  • for each class Ci do
  • Ei Pi U Ni (Pi pos., Ni neg.)
  • RuleBase(Ci) empty
  • repeat learn-set-of-rules
  • learn-one-rule R covering some positive examples
    and no negatives
  • add R to RuleBase(Ci)
  • delete from Pi all pos. ex. covered by R
  • until Pi empty







-
-
-
-
-
30
Learning unordered set of rules (CN2, Clark and
Niblett)
  • RuleBase empty
  • for each class Ci do
  • Ei Pi U Ni, RuleSet(Ci) empty
  • repeat learn-set-of-rules
  • R Class Ci IF Conditions, Conditions
    true
  • repeat learn-one-rule R Class Ci IF
    Conditions AND Cond (general-to-specific beam
    search of Best R)
  • until stopping criterion is satisfied (no
    negatives covered
  • or Performance(R) lt ThresholdR)
  • add R to RuleSet(Ci)
  • delete from Pi all positive examples covered by
    R
  • until stopping criterion is satisfied (all
    positives covered or Performance(RuleSet(Ci)) lt
    ThresholdRS)
  • RuleBase RuleBase U RuleSet(Ci)

31
Learn-one-rule Greedy vs. beam search
  • learn-one-rule by greedy general-to-specific
    search, at each step selecting the best
    descendant, no backtracking
  • beam search maintain a list of k best candidates
    at each step descendants (specializations) of
    each of these k candidates are generated, and the
    resulting set is again reduced to k best
    candidates

32
Illustrative example Contact lenses data
33
Learn-one-rule as heuristic search
Lenses hard IF true
?S???H???N???
...
Lenses hard IF Astigmatism no
Lenses hard IF Tearprod. reduced
S5, H0, N7
S0, H0, N12
Lenses hard IF Astigmatism yes
Lenses hard IF Tearprod. normal
S0, H4, N8
S5, H4, N3
Lenses hard IF Tearprod. normalAND
Spect.Pre. myope
Lenses hard IF Tearprod. normalAND
Astigmatism yes
S2, H3, N1
Lenses hard IF Tearprod. normalAND
Astigmatism no
Lenses hard IF Tearprod. normalAND
Spect.Pre. hyperm.
S0, H4, N2
S5, H0, N1
S3, H1, N2
34
Rule learning summary
  • Hypothesis construction find a set of n rules
  • usually simplified by n separate rule
    constructions
  • Rule construction find a pair (Class, Cond)
  • select rule head (class) and construct rule body,
    or
  • construct rule body and assign rule head (in
    ordered algos)
  • Body construction find a set of m features
  • usually simplified by adding to rule body one
    feature at a time

35
Talk outline
  • Predictive vs. Descriptive DM
  • Predictive rule induction
  • Classification vs. estimation
  • Classification rule induction
  • Heuristics and rule quality evaluation
  • Descriptive rule induction
  • Predictive vs. Descriptive DM summary


36
Evaluating rules and rulesets
  • Predictive evaluation measures maximizing
    accuracy, minimizing Error 1 - Accuracy,
    avoiding overfitting
  • Estimating accuracy percentage of correct
    classifications
  • on the training set
  • on unseen / testing instances
  • cross validation, leave-one-out, ...
  • Other measures comprehensibility (size),
    information contents (information score),
    significance, ...
  • Other measures of rule interestingness for
    descriptive induction

37
n-fold cross validation
  • A methods for accuracy estimation of classifiers
  • Partition set D into n disjoint, almost
    equally-sized folds Ti where Ui Ti D
  • for i 1, ..., n do
  • form a training set out of n-1 folds Di D\Ti
  • induce classifier Hi from examples in Di
  • use fold Ti for testing the accuracy of Hi
  • Estimate the accuracy of the classifier by
    averaging accuracies over 10 folds Ti

38
  • Partition

D
T1
T2
T3
39
  • Partition

D
T1
T2
T3
  • Train

D\T1D1
D\T2D2
D\T3D3
40
  • Partition

D
T1
T2
T3
  • Train

D\T1D1
D\T2D2
D\T3D3
41
  • Partition

D
T1
T2
T3
  • Train

D\T1D1
D\T2D2
D\T3D3
  • Test

T1
T2
T3
42
Overfitting and accuracy
  • Typical relation between hypothesis size and
    accuracy
  • Question how to prune optimally?

43
Overfitting
  • Consider error of hypothesis h over
  • training data T ErrorT(h)
  • entire distribution D of data ErrorD(h)
  • Hypothesis h ? H overfits training data T if
    there is an alternative hypothesis h ? H such
    that
  • ErrorT(h) lt ErrorT(h), and
  • ErrorD(h) gt ErrorD(h)
  • Prune a hypothesis (decision tree, ruleset) to
    avoid overfitting T

44
Avoiding overfitting
  • Decision trees
  • Pre-pruning (forward pruning) stop growing the
    tree e.g., when data split not statistically
    significant or too few examples are in a split
  • Post-pruning grow full tree, then post-prune
  • Rulesets
  • Pre-pruning (forward pruning) stop growing the
    rule e.g., when too few examples are covered by a
    rule
  • Post-pruning construct a full ruleset, then
    prune

Pre-pruning
Post-pruning
45
Rule post-pruning (Quinlan 1993)
  • Very frequently used method, e.g., in C4.5
  • Procedure
  • grow a full tree (allowing overfitting)
  • convert the tree to an equivalent set of rules
  • prune each rule independently of others
  • sort final rules into a desired sequence for use

46
Performance metrics
  • Rule evaluation measures - aimed at avoiding
    overfitting
  • Heuristics for guiding the search
  • Heuristics for stopping the search
  • Confusion matrix, contingency table for the
    evaluation of individual rules and ruleset
    evaluation
  • Area under ROC evaluation (employing the
    confusion matrix information)

47
Learn-one-rule PlayTennis training examples
48
Learn-one-rule as search PlayTennis example
Play tennis yes IF
...
Play tennis yes IF Windweak
Play tennis yes IF Humidityhigh
Play tennis yes IF Humiditynormal
Play tennis yes IF Windstrong
Play tennis yes IF Humiditynormal,
Windweak
Play tennis yes IF Humiditynormal,
Outlookrain
Play tennis yes IF Humiditynormal,
Windstrong
Play tennis yes IF Humiditynormal,
Outlooksunny
49
Learn-one-rule as heuristic search PlayTennis
example
Play tennis yes IF
9,5- (14)
...
Play tennis yes IF Windweak
Play tennis yes IF Humidityhigh
6,2- (8)
Play tennis yes IF Humiditynormal
Play tennis yes IF Windstrong
3,4- (7)
6,1- (7)
3,3- (6)
Play tennis yes IF Humiditynormal,
Windweak
Play tennis yes IF Humiditynormal,
Outlookrain
Play tennis yes IF Humiditynormal,
Windstrong
Play tennis yes IF Humiditynormal,
Outlooksunny
2,0- (2)
50
Heuristics for learn-one-rule PlayTennis
example
  • PlayTennis yes 9,5- (14)
  • PlayTennis yes ? Windweak 6,2- (8) ?
    Windstrong 3,3- (6) ? Humiditynormal
    6,1- (7) ?
  • PlayTennis yes ? Humiditynormal Outlooksu
    nny 2,0- (2) ?
  • Estimating accuracy with probability
  • A(Ci ? Cond) p(Ci Cond)
  • Estimating probability with relative frequency
  • covered pos. ex. / all covered ex.
  • 6,1- (7) 6/7, 2,0- (2) 2/2 1

51
Probability estimates
  • Relative frequency of covered positive examples
  • problems with small samples
  • Laplace estimate
  • assumes uniform prior distribution of k classes
  • m-estimate
  • special case p()1/k, mk
  • takes into account prior probabilities pa(C)
    instead of uniform distribution
  • independent of the number of classes k
  • m is domain dependent (more noise, larger m)

52
Learn-one-rule search heuristics
  • Assume two classes (,-), learn rules for
    class (Cl). Search for specializations of one
    rule R Cl ? Cond from RuleBase.
  • Expected classification accuracy A(R)
    p(ClCond)
  • Informativity (info needed to specify that
    example covered by Cond belongs to Cl) I(R)
    - log2p(ClCond)
  • Accuracy gain (increase in expected accuracy)
  • AG(R,R) p(ClCond) - p(ClCond)
  • Information gain (decrease in the information
    needed)
  • IG(R,R) log2p(ClCond) -
    log2p(ClCond)
  • Weighted measures favoring more general rules
    WAG, WIG
  • WAG(R,R)
  • p(Cond)/p(Cond) . (p(ClCond) - p(ClCond))
  • Weighted relative accuracy trades off coverage
    and relative accuracy WRAcc(R) p(Cond) .
    (p(ClCond) - pa(Cl))

53
What is high accuracy?
  • Rule accuracy should be traded off against the
    default accuracy of the rule Cl -gt true
  • 68 accuracy is OK if there are 20 examples of
    that class in the training set, but bad if there
    are 80
  • Relative accuracy
  • RAcc(Cl -gt Cond) p(Cl Cond) p(Cl)

54
Weighted relative accuracy
  • If a rule covers a single example, its accuracy
    is either 0 or 100
  • maximizing relative accuracy tends to produce
    many overly specific rules
  • Weighted relative accuracy
  • WRAcc(Cl -gt Cond)
  • p(Cond).p(Cl Cond) p(Cl)

55
Weighted relative accuracy
  • WRAcc is a fundamental rule evaluation measure
  • WRAcc can be used if you want to assess both
    accuracy and significance
  • WRAcc can be used if you want to compare rules
    with different heads and bodies - appropriate
    measure for use in descriptive induction, e.g.,
    association rule learning

56
Talk outline
  • Predictive vs. Descriptive DM
  • Predictive rule induction
  • Classification vs. estimation
  • Classification rule induction
  • Heuristics and rule quality evaluation
  • Descriptive rule induction
  • Subgroup discovery
  • Association rule learning
  • Predictive vs. Descriptive DM summary


57
Descriptive DM
  • Often used for preliminary data analysis
  • User gets feel for the data and its structure
  • Aims at deriving descriptions of characteristics
    of the data
  • Visualization and descriptive statistical
    techniques can be used

58
Descriptive DM
  • Description
  • Data description and summarization describe
    elementary and aggregated data characteristics
    (statistics, )
  • Dependency analysis
  • describe associations, dependencies,
  • discovery of properties and constraints
  • Segmentation
  • Clustering separate objects into subsets
    according to distance and/or similarity
    (clustering, SOM, visualization, ...)
  • Subgroup discovery find unusual subgroups that
    are significantly different from the majority
    (deviation detection w.r.t. overall class
    distribution)

59
Subgroup Discovery
  • Given a population of individuals and a property
    of individuals we are interested in
  • Find population subgroups that are statistically
    most interesting, e.g., are as large as
    possible and have most unusual statistical
    (distributional) characteristics w.r.t. the
    property of interest

60
Subgroup interestingness
  • Interestingness criteria
  • As large as possible
  • Class distribution as different as possible from
    the distribution in the entire data set
  • Significant
  • Surprising to the user
  • Non-redundant
  • Simple
  • Useful - actionable

61
Classification Rule Learning for Subgroup
Discovery Deficiencies
  • Only first few rules induced by the covering
    algorithm have sufficient support (coverage)
  • Subsequent rules are induced from smaller and
    strongly biased example subsets (pos. examples
    not covered by previously induced rules), which
    hinders their ability to detect population
    subgroups
  • Ordered rules are induced and interpreted
    sequentially as a if-then-else decision list

62
CN2-SD Adapting CN2 Rule Learning to Subgroup
Discovery
  • Weighted covering algorithm
  • Weighted relative accuracy (WRAcc) search
    heuristics, with added example weights
  • Probabilistic classification
  • Evaluation with different interestingness measures

63
CN2-SD CN2 Adaptations
  • General-to-specific search (beam search) for
    best rules
  • Rule quality measure
  • CN2 Laplace Acc(Class ? Cond)
  • p(ClassCond) (nc1)/(nrulek)
  • CN2-SD Weighted Relative Accuracy
  • WRAcc(Class ? Cond)
  • p(Cond) (p(ClassCond) - p(Class))
  • Weighted covering approach (example weights)
  • Significance testing (likelihood ratio
    statistics)
  • Output Unordered rule sets (probabilistic
    classification)

64
CN2-SD Weighted Covering
  • Standard covering approach
  • covered examples are deleted from current
    training set
  • Weighted covering approach
  • weights assigned to examples
  • covered pos. examples are
  • re-weighted in all covering loop
  • iterations, store count i how
  • many times (with how many
  • rules induced so far) a pos. example has
  • been covered w(e,i), w(e,0)1







-
-
-
-
-
65
CN2-SD Weighted Covering
  • Additive weights w(e,i) 1/(i1)
  • w(e,i) pos. example e being covered i times
  • Multiplicative weights w(e,i) gammai,
    0ltgammalt1
  • note gamma 1 ? find the same (first) rule
    again and again
    gamma 0 ? behaves as standard CN2







-
-
-
-
-
66
CN2-SD Weighted WRAcc Search Heuristic
  • Weighted relative accuracy (WRAcc) search
    heuristics, with added example weights
  • WRAcc(Cl ? Cond) p(Cond) (p(ClCond) - p(Cl))
  • increased coverage, decreased of rules, approx.
    equal accuracy (PKDD-2000)

67
CN2-SD Weighted WRAcc Search Heuristic
  • In WRAcc computation, probabilities are estimated
    with relative frequencies, adapt
  • WRAcc(Cl ? Cond) p(Cond) (p(ClCond) - p(Cl))
  • n(Cond)/N ( n(Cl.Cond)/n(Cond) -
    n(Cl)/N)
  • N sum of weights of examples
  • n(Cond) sum of weights of all covered examples
  • n(Cl.Cond) sum of weights of all correctly
    covered examples

68
Probabilistic classification
  • Unlike the ordered case of standard CN2 where
    rules are interpreted in an IF-THEN-ELSE fashion,
    in the unordered case and in CN2-SD all rules are
    tried and all rules which fire are collected
  • If a clash occurs, a probabilistic method is used
    to resolve the clash

69
Probabilistic classification
  • A simplified example
  • classbird ? legs2 feathersyes 13,0
  • classelephant ? sizelarge fliesno 2,10
  • classbird ? beakyes20,0
  • 35,10
  • Two-legged, feathered, large,
  • non-flying animal with a beak?
  • bird !

70
Talk outline
  • Predictive vs. Descriptive DM
  • Predictive rule induction
  • Classification vs. estimation
  • Classification rule induction
  • Heuristics and rule quality evaluation
  • Descriptive rule induction
  • Subgroup discovery
  • Association rule learning
  • Predictive vs. Descriptive DM summary


71
Association Rule Learning
  • Rules X gtY, if X then Y
  • X, Y itemsets (records, conjunction of items),
    where items/features are binary-valued
    attributes)
  • Transactions i1 i2
    i50
  • itemsets (records) t1 1 1 1 .
    0
  • t2 1 0
  • Example
    ...
  • Market basket analysis
  • peanuts chips gt beer coke (0.05, 0.65)
  • Support Sup(X,Y) XY/D p(XY)
  • Confidence Conf(X,Y) XY/X Sup(X,Y)/Sup(X)
    p(XY)/p(X) p(YX)

72
Association Rule Learning
  • Given a set of transactions D
  • Find all association rules that hold on the set
    of transactions that have support gt MinSup and
    confidence gt MinConf
  • Procedure
  • find all large itemsets Z, Sup(Z) gt MinSup
  • split every large itemset Z into XY, compute
    Conf(X,Y) Sup(X,Y)/Sup(X), if Conf(X,Y) gt
    MinConf then X gtY (Sup(X,Y) gt MinSup, as
    XY is large)

73
Induced association rules
  • Age ? 52 BigSpender no gt
  • Gender male
  • Age ? 52 BigSpender no gt
  • Gender male Income ? 73250
  • Gender male Age ? 52 Income ? 73250 gt
    BigSpender no
  • ....

74
Association Rule Learning for Classification
APRIORI-C
  • Simplified APRIORI-C
  • Discretise numeric attributes, for each discrete
    attribute with N values create N items
  • Run APRIORI
  • Collect rules whose right-hand side consists of a
    single target item, representing a value of the
    target attribute

75
Association Rule Learning for Classification
APRIORI-C
  • Improvements
  • Creating rules Class ? Conditions during search
  • Pruning of irrelevant items and itemsets
  • Pre-processing Feature subset selection
  • Post-processing Rule subset selection

76
Association Rule Learning for Subgroup Discovery
Advantages
  • May be used to create rules of the form
  • Class ? Conditions
  • Each rule is an independent chunk of knowledge,
    with
  • high support and coverage (p(Class.Cond) gt
    MinSup, p(Cond) gt MinSup)
  • high confidence p(ClassCond) gt MinConf
  • all interesting rules found (complete search)
  • Building small and easy-to-understand classifiers
  • Appropriate for unbalanced class distributions

77
Association Rule Learning for Subgroup Discovery
APRIORI-SD
  • Further improvements
  • Create a set of rules Class ? Conditions with
    APRIORI-C - advantage exhaustive set of rules
    above the MinConf and MinSupp threshold
  • Order a set of induced rules w.r.t. decreased
    WRAcc
  • Post-process Rule subset selection by a
    weighted covering approach
  • Take the best rule w.r.t. WRAcc
  • Decrease the weights of covered examples
  • Reorder the remaining rules and repeat until
    stopping criterion is satisfied
  • significance threshold
  • WRAcc threshold

78
Talk outline
  • Predictive vs. Descriptive DM
  • Predictive rule induction
  • Classification vs. estimation
  • Classification rule induction
  • Heuristics and rule quality evaluation
  • Descriptive rule induction
  • Predictive vs. Descriptive DM summary


79
Predictive vs. descriptive induction Summary
  • Predictive induction Induces rulesets acting as
    classifiers for solving classification and
    prediction tasks
  • Rules are induced from labeled instances
  • Descriptive induction Discovers individual rules
    describing interesting regularities in the data
  • Rules are induced from unlabeled instances
  • Exception Subgroup discovery
  • Discovers individual rules describing
    interesting regularities in the data induced from
    labeled examples

80
Rule induction Literature
  • P. Flach and N. Lavrac Rule Induction
  • chapter in the book Intelligent Data Analysis,
    Springer, edited by M. Berthold and D. Hand
  • See references to other sources in this book
    chapter
Write a Comment
User Comments (0)
About PowerShow.com