Title: Feature Selection and Weighting using Genetic Algorithm for Off-line Character Recognition Systems
1Feature Selection and Weighting using Genetic
Algorithm for Off-line Character Recognition
Systems
The University of British Columbia Department of
Electrical Computer Engineering
Presented by
Faten Hussein
2Outline
- Introduction Problem Definition
- Motivation Objectives
- System Overview
- Results
- Conclusions
3Introduction
Off-line Character Recognition System
Text document
Scanning
- Address readers
- Bank Cheques readers
- Reading data entered in forms (tax forms)
- Detecting forged signatures
Pre-Processing
Feature Extraction
Classification
Classified text
Post-Processing
4Introduction
For typical handwritten recognition task
- Many variants of character (symbol) shape, size.
- Different writers have different writing styles.
- Same person could have different writing style.
- Thus, unlimited number of variations for a single
character exists.
5Introduction
Variations in handwritten digits extracted from
zip codes
L0, E3
To overcome this diversity, a large number of
features must be added
L1, E1
L2, E0
An example of features that we used are moment
invariants, number of loops, number of end
points, centroid, area, circularity and so on.
6Problem
Dilemma
Add more features
- Increase run time/memory for classification
- To accommodate variations in symbols
- Add-hoc process, depends on experience and trail
and error
Character Recognition System
- Might add redundant/irrelevant features which
decrease the accuracy
- Hope to increase classification accuracy
7Feature Selection
Solution Feature Selection
Definition Select a relevant subset of features
from a larger set of features while maintaining
or enhancing accuracy
Advantages
- Remove irrelevant and redundant features
- Total of 40 features -gt reduced to 16
- 7 Hu moments -gt only first three
- Area removed -gt redundant (Circularity)
- Maintain/enhance the classification accuracy
70 recognition rate using 40 features -gt 75
after FS using only 16 features
- Faster classification and less memory
requirements
8Feature Selection/Weighting
Feature Selection (FS) Feature Weighting (FW)
Special Case General Case
Binary weights (0 for irrelevant/redundant 1 for relevant) Real-valued weights (variable weights depending on the feature relevance)
Number of feature subset combinations Number of feature subset combinations
- The process of assigning weights (binary or real
valued) to features needs a search algorithm to
search for the set of weights that results in
best classification accuracy (optimization
problem) - Genetic algorithm is a good search method for
optimization problems
9Genetic Feature Selection/Weighting
Why use GA for FS/FW
- Has been proven to be a powerful search method
for FS problem - Does not require derivative information or any
extra knowledge only the objective function
(classifiers error rate) to evaluate the quality
of the feature subset - Search a population of solutions in parallel, so
they can provide a number of potential solutions
not only one - GA is resistant to becoming trapped in local
minima
10Objectives Motivations
Build a genetic feature selection/weighting
system to be applied to character recognition
problem and investigate the following issues
- Study the effect of varying weight values on the
number of selected features (FS often eliminates
more features than FW, how much ??) - Compare the performance of genetic feature
selection/weighting in the presence of irrelevant
redundant features (not studied before) - Compare the performance of genetic feature
selection/weighting for regular cases (test the
hypothesis that says that FW should have better
or at least same results as FS ??) - Evaluate the performance of the better method
(GFS or GFW) in terms of optimality and time
complexity (study the feasibility of genetic
search for optimality time)
11Methodology
- The recognition problem is to classify isolated
handwritten digits - Used k-nearest-neighbor as a classifier (k1)
- Used genetic algorithm as search method
- Applied genetic feature selection and weighting
in the wrapper approach (i.e. fitness function is
the classifiers error rate) - Used two phases during the program run
training/testing phase and validation phase
12System Overview
Best feature subset (M ltN)
Pre-Processing Module
Feature Extraction Module
All Extracted features N
Feature selection/weighting Module (GA)
Input (isolated handwritten digits images)
Clean images
Assessment of feature subset
Feature subset
Evaluation Module (KNN classifier)
Training/Testing
Evaluation
Validation
13Results (Comparison 1)
Effect of varying weight values on the number of
selected features
- As the number of weight values increase, the
probability of a feature having weight value0
(POZ) decreases, so the number of eliminated
features decreases - GFS eliminates more features (thus selects less
features) than GFW because of its smaller number
of weight values (0/1) and without compromising
classification accuracy
14Results (Comparison 2)
Performance of genetic feature selection/weighting
in the presence of irrelevant features
- The performance of 1-NN classifier rapidly
degrades by increasing the number of irrelevant
features - As the number of irrelevant features increases,
FS outperform all FW settings in both
classification accuracy and elimination of
features
15Results (Comparison 3)
Performance of genetic feature selection/weighting
in the presence of redundant features
- The classification accuracy of 1-NN does not
suffer so much by adding redundant features, but
they increase the problem size - As the number of redundant features increases, FS
has slightly better classification accuracy than
all FW settings, but significantly outperform FW
in elimination of features
16Results (Comparison 4)
Performance of genetic feature selection/weighting
for regular cases (not necessarily having
irrelevant/redundant)
- FW has better training accuracies than FS, but FS
is better in generalization (have better
accuracies for unseen validation samples) - FW over-fits the training samples
17Results (Evaluation 1)
Convergence of GFS to an Optimal or Near-Optimal
Set of Features
Number of features Best Exhaustive (class. rate ) Best GA (class. rate ) Average GA (5 runs)
8 74 74 74
10 75.2 75.2 75.2
12 77.2 77.2 77.04
14 79 79 78.56
16 79.2 79 78.28
18 79.4 79.4 78.92
- GFS was able to return optimal or near-optimal
values (reached by the exhaustive search) - The worst average value obtained by GFS less than
1 away from optimal value
18Results (Evaluation 2)
Convergence of GFS to an Optimal or Near-Optimal
Set of Features within an Acceptable Number of
Generations
The time needed for GFS is bounded by (lower)
linear-fit and (upper) exponential-fit curves
The use of GFS for highly dimensional problems
need parallel processing
19Conclusions
- GFS is superior to GFW in feature reduction and
without compromising classification accuracy - In the presence of irrelevant features, GFS is
better than GFW in both feature reduction and
classification accuracy - In the presence of redundant features, GFS is
also preferred over GFW due its increased ability
to feature reduction - For regular databases, it is advisable to use 2
or 3 weight values at most to avoid over-fitting - GFS is a reliable method to find optimal or
near-optimal solution, but need parallel
processing for large problem sizes
20Questions ?