Diversity in Random Subspacing Ensembles - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Diversity in Random Subspacing Ensembles

Description:

Diversity in Random Subspacing Ensembles. Alexey Tsymbal, Padraig Cunningham ... a large number of such 'staircases', the diagonal decision boundary can be ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 26
Provided by: tktl
Category:

less

Transcript and Presenter's Notes

Title: Diversity in Random Subspacing Ensembles


1
Diversity in Random Subspacing Ensembles
DaWaK2004 Zaragoza, Spain September 1-3, 2004
  • Alexey Tsymbal, Padraig Cunningham
  • Department of Computer ScienceTrinity College
    DublinIrelandMykola PechenizkiyDepartment of
    Computer ScienceUniversity of Jyväskylä Finland

2
Contents
  • Introduction the task of classification
  • Introduction to ensembles
  • Ensemble feature selection and random subspacing
  • Integration methods for ensembles
  • Measures of diversity in classification ensembles
  • Experimental results correlation between the
    diversities and improvement due to ensembles
  • Conclusions and future work

3
The task of classification
J classes, n training observations, p features
New instance to be classified
Training Set
CLASSIFICATION
Examples - prognostics of recurrence of breast
cancer - diagnosis of thyroid diseases - heart
attack prediction, etc.
Class Membership of the new instance
4
What is ensemble learning?
  • Ensemble learning refers to a collection of
    methods that learn a target function by training
    a number of individual learners and combining
    their predictions

5
Ensemble learning
6
Why ensemble learning?
  • Accuracy a more reliable mapping can be
    obtained by combining the output of multiple
    experts
  • Efficiency a complex problem can be decomposed
    into multiple sub-problems that are easier to
    understand and solve (divide-and-conquer
    approach). Mixture of experts, ensemble feature
    selection.
  • There is no single model that works for all
    pattern recognition problems! (no free lunch
    theorem)

To solve really hard problems, well have to use
several different representations. It is time
to stop arguing over which type of
pattern-classification technique is best.
Instead we should work at a higher level of
organization and discover how to build managerial
systems to exploit the different virtues abd
evade the different limitations of each of these
ways of comparing things. Minsky, 1991.
7
When ensemble learning?
  • When you can build base classifiers that are more
    accurate than chance (accuracy), and, more
    importantly,
  • that are as much as possible independent from
    each other (diversity)

8
Why do ensembles work? 1/2
  • The desired target function may not be
    implementable with individual classifiers, but
    may be approximated by ensemble averaging
  • Assume you want to build a decision boundary
    with decision trees
  • The decision boundaries of decision trees are
    hyperplanes parallel to the coordinate axes, as
    in the figures
  • By averaging a large number of such staircases,
    the diagonal decision boundary can be
    approximated with arbitrarily good accuracy

Class 1
Class 1
Class 2
Class 2
a
b
9
Why do ensembles work? 2/2
  • Theoretical results by Hansen Solomon (1990)
  • If we can assume that classifiers are independent
    in predictions and their accuracy gt 50, we can
    push accuracy arbitrarily high by combining more
    classifiers
  • Key assumption
  • classifiers are independent in their predictions
  • not a very reasonable assumption
  • more realistic for data points where classifiers
    predict with gt50 accuracy, we can push accuracy
    arbitrarily high (some data points are just too
    difficult)

10
How to make an effective ensemble?
  • Two basic decisions when designing ensembles
  • How to generate the base classifiers?
  • How to integrate them?

11
Methods for generating the base classifiers
  • Subsampling the training examples
  • Manipulating the input features
  • Manipulating the output targets
  • Modifying the learning parameters of the
    classifier
  • Using heterogeneous models (not often used)

12
Ensemble Feature Selection and RSM
  • How to prepare inputs for the generation of the
    base classifiers ?
  • Sample the training set
  • Manipulate input features
  • Manipulate output target (class values)
  • Goal of traditional feature selection
  • find and remove features that are unhelpful or
    destructive to learning making one feature subset
    for single classifier
  • Goal of ensemble feature selection
  • find and remove features that are unhelpful or
    destructive to learning making different feature
    subsets for a number of classifiers
  • find feature subsets that will promote
    disagreement between the classifiers
  • Random Subspace Method (RSM)
  • Accuracy of ensemble members is compensated for
    by their diversity

13
Integration of classifiers
Integration
Selection
Combination
Dynamic Voting with Selection (DVS)
Static
Dynamic
Dynamic
Static
Weighted Voting (WV)
Dynamic Selection (DS)
Static Selection (CVM)
Dynamic Voting (DV)
  • Motivation for the Dynamic Integration
  • The main assumption is that each classifier is
    the best in some sub-areas of the whole data set,
    where its local error is comparatively less than
    the corresponding errors of the other classifiers.

14
The space model motivation for dynamic
integration
  • Information about classifiers errors on training
    instances can be used for learning just as
    original instances are used for learning.

Motivation for the Dynamic Integration The
main assumption is that each classifier is the
best in some sub-areas of the whole data set,
where its local error is comparatively less than
the corresponding errors of the other classifiers.
15
Dynamic integration of classifiers an example
  • 3 base classifiers
  • 2 features X1 and X2

X
2
(000)
(100)
(000)
(000)
NN
(010)
2
NN
3
d
d
2
3
(000)
(000)
P
d
NN
1
1
(0.30.60)
(000)
d
max
(110)
(000)
(010)
(001)
X
1
16
Ensembles the need for diversity
  • Overall error depends on average error of
    ensemble members
  • Increasing ambiguity decreases overall error
  • Provided it does not result in an increase in
    average error
  • (Krogh and Vedelsby, 1995)

17
Measuring ensemble diversity 1/4
  • The ensemble ambiguity in regression is measured
    as the weighted average of the squared
    differences in the predictions of the base
    networks and the ensemble (regression case)
  • The case of classification 1) plain
    disagreement, and 2) fail/non-fail disagreement

18
Measuring ensemble diversity 2/4
  • The case of classification 3) double fault, and
    4) Q statistic

19
Measuring ensemble diversity 3/4
  • The case of classification 5) correlation
    coefficient, and 6) kappa statistic

20
Measuring ensemble diversity 4/4
  • The case of classification 7) entropy, and 8)
    variance

21
Experimental investigations
  • 21 datasets from UCI Machine Learning Repository
  • 70 test runs of Monte-Carlo cross-validation
  • 70/30 train/test set division
  • 5 different ensemble sizes 5, 10, 25, 50, and
    100
  • 6 integration methods Static Selection, SS,
    Simple Voting, V, Weighted Voting, WV, Dynamic
    Selection, DS, Dynamic Voting, DV, and Dynamic
    Voting with Selection, DVS
  • the test environment of previously developed
    ESF_SBC algorithm was used (Ensemble Feature
    Selection with Simple Bayesian Classification)
  • the objective was to measure the correlations
    between the diversity measures and improvement
    due to ensemble

22
Experiments results 1/2
Fig. 1. The correlations for the eight
diversities and five ensemble sizes averaged over
the data sets and integration methods
23
Experiments results 2/2
Fig. 2. The correlations for the eight
diversities and six integration methods averaged
over the data sets and ensemble sizes
24
Conclusions
  • we have considered 8 ensemble diversity metrics,
    6 of which are pairwise measures
  • to check the goodness of each measure of
    diversity, we calculated its correlation with the
    improvement in the classification accuracy due to
    ensembles
  • the best correlations were shown by div_plain,
    div_dis, div_ent, and div_amb
  • surprisingly, div_DF and div_Q had the worst
    average correlation
  • the correlations changed with the change of the
    integration method, showing the different use of
    diversity by the integration methods
  • the best correlations were shown with dynamic
    integration, and DV in particular
  • the correlations decreased almost linearly with
    the increase in the ensemble size
  • different contexts as other ensemble generation
    strategies and integration methods can be tired
    in the future

25
Contact info
  • Alexey Tsymbal, Padraig Cunningham
  • Dept of Computer Science
  • Trinity College Dublin
  • IrelandAlexey.Tsymbal_at_cs.tcd.ie,
    Padraig.Cunningham_at_cs.tcd.ie

Mykola PechenizkiyDepartment of Computer
ScienceUniversity of Jyväskylä
Finland mpechen_at_cs.jyu.fi
Write a Comment
User Comments (0)
About PowerShow.com