Title: Feature Selection for Ensembles: A Hierarchical MultiObjective Genetic Algorithm Approach
1Feature Selection for Ensembles A Hierarchical
Multi-Objective Genetic Algorithm Approach
7th International Conference on Document Analysis
and Recognition Edinburgh, Scotland, August 4
2003.
- L. S. Oliveira, R. Sabourin, F. Bortolozzi, and
C. Y. Suen - École de Technologie Supérieure (ETS)
- Centre for Pattern Recognition and Machine
Intelligence (CENPARMI) - Pontifícia Universidade Católica do Paraná (PUCPR)
2Introduction
- Ensemble of classifiers has been widely used to.
- Reduce model uncertainty.
- Improve generalization performance.
- Good ensemble consists of
- Good classifiers.
- Make errors on different parts of the feature
space.
3Methods For Ensembles
- Classical methods
- Bagging Breiman96, boosting Freund97.
- Random subspace Ho98.
- Varies the subsets of features.
- The literature has shown that by varying the
subsets of features used by each member of the
ensemble should help to promote diversity.
4Methods For Ensembles
- GA-based methods
- Single GA.
- Varies the subsets of features by performing
feature selection. - Usually produce only one ensemble Optiz99.
- Must combine multiple objective functions into
one global function.
5The Proposed Method
- Based on a hierarchical multi-objective GA.
- 1st level feature selection IJPRAI 02.
- Finds a set of good (diverse) classifiers.
- 2nd combines those classifiers.
- Maximizing the generalization power of the
ensemble and a measure of diversity. - Produces a set of ensembles.
6(No Transcript)
7Finding Ensembles
- To combine the classifiers produced in the
previous level to provide a set of powerful
ensembles. - Each gene of the chromosome stands for a
classifier of the Pareto generated during the
feature selection. - If a chromosome has all bits selected, all
classifiers of the Pareto will be included in the
ensemble.
8Types Of Classifiers In The Pareto
9Summary Of The 1st Level Classifiers
102nd-level Population
1st Level Pareto
2nd-Level Population
1
2
n
11Objective Functions
- To find the most diverse set of classifiers that
brings a good generalization. - Maximization of the recognition rate of the
ensemble. - Maximization of a measure of diversity.
- Overlap, entropy, ambiguity, etc
12Combination Of Classifiers
- It is necessary in order to compute the
generalization power of the ensemble. - Average.
- The literature has been shown that it is a simple
and effective scheme of combining predictions of
the neural networks.
13Ambiguity
- If the classifiers implement the same functions,
the ambiguity will be small.
14Ensembles Produced By The 2nd Level
Distances
Edge Maps
Concavities
15Concavities
Varying the size of the data
16Distances
17Edge Maps
18Performance of the Ensembles
- Same performance when working at zero-rejection
level. - Compelling improvements when working at low error
rates - Real systems.
19Concavities
Error Rate ()
47
115
78
1.36
0.84
122
Number of Features
20Distances
89
80
2.08
91
86
1.96
Number of Features
21Edge Maps
114
108
101
3.25
2.95
90
116
113
106
Number of Features
22Improvements at Low Error Rates
Concavities
Distances
Edge Maps
Combination
23Concavities
0.36
1.66
24Distances
2.48
4.11
25Edge Maps
4
8
26Combination of the three Ensembles
27Some Errors
mislabeling
28Future Works
- Different feature sets (i.e. Feed the 2nd level
MOGA with different Pareto fronts) to build a
ensemble. - Measures of diversity.
- Unsupervised FS in the first level.
29Thanks !!
30(No Transcript)
31Different Database Sizes
5k
50k