Title: Search Strategies for Ensemble Feature Selection in Medical Diagnostics
1Search Strategies forEnsemble Feature Selection
in Medical Diagnostics
- Alexey Tsymbal, Pádraig CunninghamDepartment of
Computer ScienceTrinity College
DublinIrelandMykola Pechenizkiy, Seppo
PuuronenDepartment of Computer
ScienceUniversity of Jyväskylä Finland
2Contents
- Introduction the task of classification
- Classification of acute abdominal pain
- Ensembles of classifiers and feature selection
ensemble feature selection - Simple Bayes ensembles pro et contra
- HC, EFSS, EBSS, and GEFS strategies
- Experimental results appendicitis
- Conclusions and future work
3The task of classification
J classes, n training observations, p instance
attributes
New instance to be classified
Training Set
CLASSIFICATION
Examples - prognostics of recurrence of breast
cancer - diagnosis of thyroid diseases - heart
attack prediction, etc.
Class Membership of the new instance
4Classification of acute abdominal pain
- 3 large datasets with cases of acute abdominal
pain (AAP) 1254, 2286, and 4020 instances, and
18 parameters (features) from history-taking and
clinical examination - the task of separating acute appendicitis
- the second most important cause of abdominal
surgeries - AAP I from 6 surgical departments in Germany,
AAP II from 14 centers in Germany, and AAP III
from 16 centers in Central and Eastern Europe - the 18 features are standardized by the World
Organization of Gastroenterology (OMGE)
Features 1 Sex 2 Age 3 Progress of pain 4
Duration of pain 5 Type of pain 6 Severity of
pain 7 Location of pain at present 8 Location of
pain at onset 9 Previous similar complaints 10
Previous abdominal operation 11 Distended
abdomen 12 Tenderness 13 Severity of
tenderness 14 Movement of abdominal wall 15
Rigidity 16 Rectal tenderness 17 Rebound
tenderness 18 Leukocytes
The data sets for research were kindly provided
by the Laboratory for System Design, Faculty
of Electrical Engineering and Computer Science,
University of Maribor, Slovenia, and the
Theoretical Surgery Unit, Department of General
and Trauma Surgery, Heinrich-Heine University,
Düsseldorf, Germany
5Ensemble classification
6Ensemble feature selection
- How to prepare inputs for generation of the base
classifiers ? - Sampling the training set
- Manipulation of input features
- Manipulation of output targets (class values)
- Goal of traditional feature selection
- find and remove features that are unhelpful or
destructive to learning making one feature subset
for single classifier - Goal of ensemble feature selection
- find and remove features that are unhelpful or
destructive to learning making different feature
subsets for a number of classifiers - find feature subsets that will promote
disagreement between classifiers
7Simple Bayesian classification
- Bayes theorem
- P(CX) P(XC)P(C) / P(X)
- Naïve assumption attribute independence
- P(x1,,xkC)
P(x1C)P(xkC) - If i-th attribute is categoricalP(xiC) is
estimated as the relative freq of samples having
value xi as i-th attribute in class C - If i-th attribute is continuousP(xiC) is
estimated thru a Gaussian density function - Computationally easy in both cases
8Bayesian ensembles pro et contra
CONTRA
- naive feature independence assumption
- extremely stable algorithm
PRO
- simplicity, speed, interpretability
- SB is optimal even when the independence
assumption is violated (theory experiments) - can be effectively used with boosting (bias
reduction) - feature selection reduces the error of the
naïve assumption
9Integration of classifiers
Integration
Selection
Combination
Dynamic Voting with Selection (DVS)
Static
Voting-type
Meta-type
Dynamic
Weighted Voting (WV)
Dynamic Selection (DS)
Static Selection (CVM)
Motivation for the Dynamic Integration The
main assumption is that each classifier is best
in some sub-areas of the whole data set, where
its local error is comparatively less than the
corresponding errors of the other classifiers.
10Search in EFS
- Search space
- 2NumOfFeaturesNumOfClassifiers 21825 6
553 600 - 4 search strategies to heuristically explore the
search space - Hill-Climbing (HC) (CBMS2002)
- Ensemble Forward Sequential Selection (EFSS)
- Ensemble Backward Sequential Selection (EBSS)
- Genetic Ensemble Feature Selection (GEFS)
11Hill-Climbing (HC) strategy (CBMS2002)
- Generation of initial feature subsets using the
random subspace method (RSM) - A number of refining passes on eachfeature set
while there is improvement in fitness
12Ensemble Forward Sequential Selection (EFSS)
forward selection
13Ensemble Backward Sequential Selection (EBSS)
.64
backward elimination
1,2,3,4
14Genetic Ensemble Feature Selection (GEFS)
15Computational complexity
EFSS and EBSS where S is the number of base
classifiers, N is the total number of features,
and N is the number of features included or
deleted on average in an FSS or BSS search.
Example EFSS 251831350 (and not 6 553
600!) HC where Npasses is the average number of
passes through the feature subsets in HC until
there is some improvement. GEFS where S is
the number of individuals (feature subsets) in
one generation, and Ngen is the number of
generations.
16An Example EFSS on AAP III, alfa4
C1
C2
C3
f2 age
f6 severity of pain
f6 severity of pain
f7 location of pain at present
f13 severity of tenderness
f13 severity of tenderness
C4
C5
C6
f9 previous similar complaints
f3 progress of pain
f2 age
f14 movement of abdominal wall
f15 rigidity
f16 rectal tenderness
C7
C8
C9
f1 sex
f4 duration of pain
f4 duration of pain
f12 tenderness
f18 leukocytes
C10
f11 distended abdomen
17Experiments with the AAP data sets
- HC, EFSS, EBSS, and GEFS strategies
- integration three DI variations (DS, DV, and
DVS), weighted voting (WV), and static selection
(SS) - 30 test runs of Monte-Carlo cross-validation
- collected characteristics classification
accuracy, sensitivity, specificity, relative
number of features in the base classifiers, total
ensemble diversity, and ensemble coverage - the test environment is implemented within the
MLC framework (the Machine Learning Library in
C)
18Experiments results
19Feature importance table (EFSS, alfa0)
20Conclusions
- 4 new strategies proposed and analyzed
- EFSS is the best strategy (only for this
domain!) - the best previously published specificity and
sensitivity achieved - only 7 features in each classifier on average
(less than 3) - importance of the features was analyzed
21Future work
- collaboration with medical experts is needed to
analyze the results obtained - other medical domains, especially including many
features with complex inter-feature dependencies - other search strategies (beam search, simulated
annealing, etc.) - better tuning of the GEFS parameters
-
22Contact info
- Alexey Tsymbal
- Dept of Computer ScienceTrinity College
DublinIreland - Alexey.Tsymbal_at_cs.tcd.ie