Sequential Genetic Search for Ensemble Feature Selection - PowerPoint PPT Presentation

About This Presentation
Title:

Sequential Genetic Search for Ensemble Feature Selection

Description:

IJCAI'2005, Edinburgh, Scotland August 1-5, 2005. 2. IJCAI'2005 Edinburgh, Scotland, August 1-5, 2005 ... IJCAI'2005 Edinburgh, Scotland, August 1-5, 2005 ... – PowerPoint PPT presentation

Number of Views:185
Avg rating:3.0/5.0
Slides: 31
Provided by: mykolapec
Category:

less

Transcript and Presenter's Notes

Title: Sequential Genetic Search for Ensemble Feature Selection


1
Sequential Genetic Search for Ensemble Feature
Selection
IJCAI2005, Edinburgh, Scotland August 1-5, 2005
  • Alexey Tsymbal, Padraig Cunningham
  • Department of Computer ScienceTrinity College
    DublinIreland
  • Mykola PechenizkiyDepartment of Computer
    ScienceUniversity of Jyväskylä Finland

2
Contents
  • Introduction
  • Classification and Ensemble Classification
  • Ensemble Feature Selection
  • strategies
  • sequential genetic search
  • Our GAS-SEFS strategy
  • Genetic Algorithm-based Sequential Search for
    Ensemble Feature Selection
  • Experiment design
  • Experimental results
  • Conclusions and future work

3
The Task of Classification
J classes, n training observations, p features
New instance to be classified
Training Set
CLASSIFICATION
Examples - prognostics of recurrence of breast
cancer - diagnosis of thyroid diseases -
antibiotic resistance prediction
Class Membership of the new instance
4
Ensemble classification
How to prepare inputs for generation of the base
classifiers?
5
Ensemble classification
How to combine the predictions of the base
classifiers?
6
Ensemble feature selection
  • How to prepare inputs for generation of the base
    classifiers ?
  • Sampling the training set
  • Manipulation of input features
  • Manipulation of output targets (class values)
  • Goal of traditional feature selection
  • find and remove features that are unhelpful or
    misleading to learning (making one feature subset
    for single classifier)
  • Goal of ensemble feature selection
  • find and remove features that are unhelpful or
    destructive to learning making different feature
    subsets for a number of classifiers
  • find feature subsets that will promote diversity
    (disagreement) between classifiers

7
Search in EFS
Search space 2Features Classifiers Search
strategies include
  • Ensemble Forward Sequential Selection (EFSS)
  • Ensemble Backward Sequential Selection (EBSS)
  • Hill-Climbing (HC)
  • Random Subspacing Method (RSM)
  • Genetic Ensemble Feature Selection (GEFS)

Fitness function
8
Measuring Diversity
The fail/non-fail disagreement measure the
percentage of test instances for which the
classifiers make different predictions but for
which one of them is correct
  • The kappa statistic

9
Random Subspace Method
  • RSM itself is simple but effective technique for
    EFS
  • the lack of accuracy in the ensemble members is
    compensated for by their diversity
  • does not suffer from the curse of dimensionality
  • RS is used as a base in other EFS strategies,
    including Genetic Ensemble Feature Selection.
  • Generation of initial feature subsets using (RSM)
  • A number of refining passes on each feature set
    while there is improvement in fitness

10
Genetic Ensemble Feature Selection
  • Genetic search important direction in FS
    research
  • GA as effective global optimization technique
  • GA for EFS
  • Kuncheva, 1993 Ensemble accuracy instead of
    accuracies of base classifiers
  • Fitness function is biased towards particular
    integration method
  • Preventive measures to avoid overfitting
  • Alternative use of individual accuracy and
    diversity
  • Overfitting of individual is more desirable than
    overfitting of ensemble
  • Opitz, 1999 Explicitly used diversity in fitness
    function
  • RSM for initial population
  • New candidates by crossover and mutation
  • Roulette-wheel selection (p proportional to
    fitness)

11
Genetic Ensemble Feature Selection
12
Basic Idea behind GA for EFS
Ensemble (generation)
RSM
init
BC1
Current Population (diversity)
GA
BCi
New Population (fitness)
BCEns. Size
13
Basic Idea behind GAS-SEFS
RSM
init
Generation
Ensemble
BC1
Current Population (accuracies)
New Population (fitness)
GAi1
diversity
new BC (fitness)
BCi
BCi1
BCi1
14
GAS-SEFS 1 of 2
  • GAS-SEFS (Genetic Algorithm-based Sequential
    Search for Ensemble Feature Selection)
  • instead of maintaining a set of feature subsets
    in each generation like in GA, consists in
    applying a series of genetic processes, one for
    each base classifier, sequentially.
  • After each genetic process one base classifier is
    selected into the ensemble.
  • GAS-SEFS uses the same fitness function, but
  • diversity is calculated with the base classifiers
    already formed by previous genetic processes
  • In the first GA process accuracy only.
  • GAS-SEFS uses the same genetic operators as GA.

15
GAS-SEFS 2 of 2
  • GA and GAS-SEFS peculiarities
  • Full feature sets are not allowed in RS
  • The crossover operator may not produce a full
    feature subset.
  • Individuals for crossover are selected randomly
    proportional to log(1fitness) instead of just
    fitness
  • The generation of children identical to their
    parents is prohibited.
  • To provide a better diversity in the length of
    feature subsets, two different mutation operators
    are used
  • Mutate1_0 deletes features randomly with a given
    probability
  • Mutate0_1 adds features randomly with a given
    probability.

16
Computational complexity
Complexity of GA-based search does not depend on
the features GAS-SEFS GA where S is
the number of base classifiers, S is the number
of individuals (feature subsets) in one
generation, and Ngen is the number of
generations. EFSS and EBSS where S is the
number of base classifiers, N is the total
number of features, and N is the number of
features included or deleted on average in an FSS
or BSS search.
17
Integration of classifiers
Weighted Voting (WV)
Dynamic Selection (DS)
Static Selection (CVM)
Dynamic Voting with Selection (DVS)
Motivation for the Dynamic Integration Each
classifier is best in some sub-areas of the whole
data set, where its local error is comparatively
less than the corresponding errors of the other
classifiers.
18
Experimental Design
  • Parameter settings for GA and GAS-SEFS
  • a mutation rate - 50
  • a population size 10
  • a search length of 40 feature subsets/individuals
  • 20 are offsprings of the current population of 10
    classifiers generated by crossover,
  • 20 are mutated offsprings (10 with each mutation
    operator).
  • 10 generations of individuals were produced
  • 400 (GA) and 4000 (GAS-SEFS) feature subsets.
  • To evaluate GA and GAS-SEFS
  • 5 integration methods
  • Simple Bayes as Base Classifier
  • stratified random-sampling with 60/20/20 of
    instances in the training/validation/test set
  • 70 test runs on each of 21 UCI data set for each
    strategy and diversity.

19
GA vs GAS-SEFS on two groups of datasets
DVS F/N-F disagreement
Ensemble Size
Ensemble accuracies for GA and GAS-SEFS on two
groups of data sets (1) lt 9 and (2) gt 9
features with four ensemble sizes
20
GA vs GAS-SEFS for Five Integration Methods
Ensemble Size 10
  • Ensemble accuracies for five integration methods
    on Tic-Tac-Toe

21
Conclusions and Future Work
  • Diversity in ensemble of classifiers is very
    important
  • We have considered two genetic search strategies
    for EFS.
  • The new strategy, GAS-SEFS, consists in employing
    a series of genetic search processes
  • one for each base classifier.
  • GAS-SEFS results in better ensembles having
    greater accuracy
  • especially for data sets with relatively larger
    numbers of features.
  • one reason each of the core GA processes leads
    to significant overfitting of a corresponding
    ensemble member
  • GAS-SEFS is significantly more time-consuming
    than GA.
  • GAS-SEFS ensemble_size GA
  • Oliveira et al., 2003 better results for
    single FSS based on Pareto-front dominating
    solutions.
  • Adaptation of this technique to EFS is an
    interesting topic for further research.

22
Thank you!
  • Alexey Tsymbal, Padraig Cunningham
  • Dept of Computer ScienceTrinity College
    DublinIrelandAlexey.Tsymbal_at_cs.tcd.ie,
  • Padraig.Cunningham_at_cs.tcd.ie

Mykola PechenizkiyDepartment of Computer
Scienceand Information Systems University of
Jyväskylä Finland mpechen_at_cs.jyu.fi
23
Additional Slides
24
References
  • Kuncheva, 1993 Ludmila I. Kuncheva. Genetic
    algorithm for feature selection for parallel
    classifiers, Information Processing Letters 46
    163-168, 1993.
  • Kuncheva and Jain, 2000 Ludmila I. Kuncheva and
    Lakhmi C. Jain. Designing classifier fusion
    systems by genetic algorithms, IEEE Transactions
    on Evolutionary Computation 4(4) 327-336, 2000.
  • Oliveira et al., 2003 Luiz S. Oliveira, Robert
    Sabourin, Flavio Bortolozzi, and Ching Y. Suen. A
    methodology for feature selection using
    multi-objective genetic algorithms for
    handwritten digit string recognition, Pattern
    Recognition and Artificial Intelligence 17(6)
    903-930, 2003.
  • Opitz, 1999 David Opitz. Feature selection for
    ensembles. In Proceedings of the 16th National
    Conference on Artificial Intelligence, pages
    379-384, 1999, AAAI Press.

25
GAS-SEFS Algorithm
26
Other interesting findings
  • alpha
  • were different for different data sets,
  • for both GA and GAS-SEFS, alpha for the dynamic
    integration methods is bigger than for the static
    ones (2.2 vs 0.8 on average).
  • GAS-SEFS needs slightly higher values of alpha
    than GA (1.8 vs 1.5 on average).
  • GAS-SEFS always starts with a classifier, which
    is based on accuracy only, and the subsequent
    classifiers need more diversity than accuracy.
  • of selected features falls as the ensemble size
    grows,
  • this is especially clear for GAS-SEFS, as the
    base classifiers need more diversity.
  • integration methods (for both GA and GAS-SEFS)
  • the static, SS and WV, and the dynamic DS start
    to overfit the validation set already after 5
    generations and show lower accuracies,
  • accuracies of DV and DVS continue to grow up to
    10 generations.

27
Paper Summary
  • New strategy for genetic ensemble feature
    selection, GAS-SEFS, is introduced
  • In contrast with previously considered algorithm
    (GA), it is sequential a serious of genetic
    processes for each base classifier
  • More time-consuming, but with better accuracy
  • Each base classifier has a considerable level of
    overfitting with GAS-SEFS, but the ensemble
    accuracy grows
  • Experimental comparisons demonstrate clear
    superiority on 21 UCI datasets, especially for
    datasets with many features (gr1 vs gr2)

28
Simple Bayes as Base Classifier
  • Bayes theorem
  • P(CX) P(XC)P(C) / P(X)
  • Naïve assumption attribute independence
  • P(x1,,xkC)
    P(x1C)P(xkC)
  • If i-th attribute is categoricalP(xiC) is
    estimated as the relative freq of samples having
    value xi as i-th attribute in class C
  • If i-th attribute is continuousP(xiC) is
    estimated thru a Gaussian density function
  • Computationally easy in both cases

29
Datasets characteristics
Data set Instances Classes Features Features
Data set Instances Classes Categ. Num.
Balance 625 3 0 4
Breast Cancer 286 2 9 0
Car 1728 4 6 0
Diabetes 768 2 0 8
Glass Recognition 214 6 0 9
Heart Disease 270 2 0 13
Ionosphere 351 2 0 34
Iris Plants 150 3 0 4
LED 300 10 7 0
LED17 300 10 24 0
Liver Disorders 345 2 0 6
Lymphography 148 4 15 3
MONK-1 432 2 6 0
MONK-2 432 2 6 0
MONK-3 432 2 6 0
Soybean 47 4 0 35
Thyroid 215 3 0 5
Tic-Tac-Toe 958 2 9 0
Vehicle 846 4 0 18
Voting 435 2 16 0
Zoo 101 7 16 0
30
GA vs GAS-SEFS for Five Integration Methods
Ensemble Size
  • Ensemble accuracies for GA (left) and GAS-SEFS
    (right) for five integration methods and four
    ensemble sizes on Tic-Tac-Toe
Write a Comment
User Comments (0)
About PowerShow.com