Title: A Significance Test-Based Feature Selection Method for the Detection of Prostate Cancer from Proteomic Patterns
1A Significance Test-Based Feature Selection
Method for the Detection of Prostate Cancer from
Proteomic Patterns
M.A.Sc. Candidate
Supervisors
Dr. M. Kamel Dr. M. M. A. Salama
2Highlight
Proteomic Pattern Analysis for Prostate Cancer
Detection
Significance Test-Based Feature Selection (STFS)
- STFS can be generally used for any problems of
supervised pattern recognition - Very good performances have been obtained on
several benchmark datasets, especially with a
large number of features
- Sensitivity 97.1, Specificity 96.8
- Suggestion of mistaken label by prostatic biopsy
3Outline of Part I
Significance Test-Based Feature Selection (STFS)
on Supervised Pattern Recognition
- Introduction
- Methodology
- Experiment Results on Benchmark Datasets
- Comparison with MIFS
4Introduction
Problems on Features
Increasing computational complexity
- Large number
- Irrelevant
- Noise
- Correlation
Reducing recognition rate
5Mutual Information Feature Selection
- One of most important heuristic feature selection
methods, it can be very useful in any
classification systems.
- But estimation of the mutual information is
difficult
- Large number of features and the large number of
classes - Continuous data
6Problems on Feature Selection Methods
Two key issues
- Computational complexity
- Optimal deficiency
7Proposed Method
Criterion of Feature Selection
Significance of feature
Significant difference
Independence
X
Pattern separabilityon individual candidate
features
Noncorrelation betweencandidate feature and
already-selected features
8Measurement of Pattern Separability of Individual
Features
Statistical Significant Difference
Continuous data with normal distribution
Continuous data with non-normal distribution or
rank data
Categorical data
Chi-squaretest
Two classes
More than two classes
Two classes
More than two classes
t-test
ANOVA
Mann-Whitneytest
Kruskal-Wallistest
9Independence
Independence
Continuous data with normal distribution
Continuous data with non-normal distribution or
rank data
Categorical data
Pearson contingency coefficient
Spearman rank correlation
Pearson correlation
10Selecting Procedure
MSDI Maximum Significant Differenceand
Independence Algorithm
MIC Monotonically Increasing Curve Strategy
11Maximum Significant Difference and Independence
(MSDI) Algorithm
Compute the significance difference (sd) of every
initial features
Select the feature with maximum sd as the first
feature
Computer the independence level (ind) between
every candidate feature and the already-selected
feature(s)
Select the feature with maximum feature
significance (sf sd x ind) as the new feature
12Monotonically Increasing Curve (MIC) Strategy
Performance Curve
The feature subset selected by MSDI
1
Plot performance curve
0.8
Rate of recognition
Delete the features that have no good
contribution to the increasing of recognition
0.6
0.4
0
10
20
30
Number of features
Until the curve is monotonically increasing
13Example I Handwritten Digit Recognition
- 32-by-32 bitmaps are divided into 8X864 blocks
- The pixels in each block is counted
- Thus 8x8 matrix is generated, that is 64 features
14Performance Curve
MSDI Maximum Significant Difference and
Independence MIFS Mutual Information Feature
Selector
1
0.9
Battitis MIFS
0.8
Rate of recognition
0.7
It is need to determined ß
0.6
Random ranking
0.5
0.4
0
10
20
30
40
50
60
Number of features
15Computational Complexity
Selecting 15 features from the 64 original
feature set MSDI 24
seconds Battitis MIFS 1110 seconds
(5 vales of ß are searched in the range of 0-1)
16Example II Handwritten digit recognition
The 649 features that distribute over the
following six feature sets
- 76 Fourier coefficients of the character shapes,
- 216 profile correlations,
- 64 Karhunen-Love coefficients,
- 240 pixel averages in 2 x 3 windows,
- 47 Zernike moments,
- 6 morphological features.
17Performance Curve
MSDI Maximum Significant difference and
independence MIC Monotonically Increasing Curve
1
0.8
Rate of recognition
MSDI
0.6
0.4
0.2
0
10
20
30
40
50
Number of features
18Comparison with MIFS
MSDI Maximum Significant Difference and
Independence MIFS Mutual Information Feature
Selector
MSDI is much better on large number of features
1
0.9
MSDI
0.8
MIFS (ß0.2)
Rate of recognition
MIFS (ß0.5)
0.7
0.6
0.5
MIFS is better on small number of features
0.4
0
10
20
30
40
50
Number of features
19Summary on Comparing MSDI with MIFS
- MSDI is much more computational effective
- MIFS need to calculate the pdfs
- The computational effective criterion (Battitis
MIFS) still need to determine ß - MSDI only involves the simple statistical
calculation
- MSDI can select more optimal feature subset from
a large number of feature, because it is based on
relevant statistical models
- MIFS is more suitable on small volume of data and
small feature subset
20Outline of Part II
Mass Spectrometry-Based Proteomic Pattern
Analysis for Detection of Prostate Cancer
- Problem Statement
- Methods
- Feature
- Classification
- optimization
- Results and Discussion
21Problem Statement
15154 points (features)
- Very large number of features
- Electronic and chemical noise
- Biological variability of human disease
- Little knowledge in the proteomic mass spectrum
22The system of Proteomic Pattern Analysis
STFS Significance Test-Based Feature
Selection PNN Probabilistic Neural
Network RBFNN Radial Basis Function Neural
Network
Training dataset (initial features gt 104)
Most significant featuresselected by STFS
Optimization of the size of featuresubset and
the parameters of classifierby minimizing ROC
distance
RBFNN / PNN learning
Trained neural classifier
Mature classifier
23Feature Selection STFS
Significanceof feature
Significantdifference
MSDI
Independence
x
StudentTest
Pearsoncorrelation
MIC
STFS Significance Test-Based Feature
Selection MSDI Maximum Significant Difference
and Independence Algorithm MIC Monotonically
Increasing Curve Strategy
24Classification PNN / RBFNN
RBFNN is a modifiedfour-layer structure
PNN is a standard structure with four layers
x
y
yd
1
y(1)
x
S1
x1
2
Pool 1
x
3
x2
y(2)
x
n
xn
Pool 2
S2
PNN Probabilistic Neural Network RBFNN
Radial Basis Function Neural Network
25Optimization ROC Distance
1
dROC
a
b
True positive rate(sensitivity)
Minimizing the ROC distanceto optimize -
Feature subset numbers m - Gaussian spread s -
RBFNN pattern decision weight ?
0
0
False positive rate(1-specificity)
1
ROC Receiver Operating Characteristic
26Results Sensitivity and Specificity
Sensitivity Specificity
Our results 97.1 96.8
Petricoin (2002) 94.7 75.9
DRE 55-68 6-33
PSA 29-80 --
27Pattern Distribution
Cut-point
28The possible causes onthe unrecognizable samples
- The algorithm of the classifier is not able to
recognize all the samples - The proteomics is not able to provide enough
information - Prostatic biopsies mistakenly label the cancer
29Possibility of Mistaken Diagnosis of Prostatic
Biopsy
- Biopsy has limited sensitivity and specificity
- Proteomic classifier has very high sensitivity
and specificity correlated with biopsy - The results of proteomic classifier are not
exactly the same as biopsy - All unrecognizable sample are outliers
True non-cancer
False non-cancer
False cancer
True cancer
Cut-point
30Why Accuracy of Biopsy is limited?
Limited sensitivity (to detection cancer) Biopsy
is impossible to reach all areas of prostate, and
the small sample volume will never exactly
represent the entire organ. 83.3 for the sextant
biopsies.
Limited Specificity (to detection
non-cancer) Biopsy may detect low-volume tumours
(clinically insignificant prostate cancer) that
may not threaten a man's future health. 97.3 if
cutoff volume gt 2 cc considered as cancer.
31Summary (1)
Significance Test-Based Feature Selection (STFS)
- STFS selects features by maximum significant
difference and independence (MSDI), it aims to
determine minimum possible feature subset to
achieve maximum recognition rate - Feature significance (selecting criterion ) is
estimated based on the optimal statistical models
in accordance with the properties of the data - Advantages
- Computationally effective
- Optimality
32Summary (2)
Proteomic Pattern Analysis for Detection of
Prostate Cancer
- The system consists of three parts feature
selection by STFS, classification by PNN/RBFNN,
optimization and evaluation by minimum ROC
distance - Sensitivity 97.1, Specificity 96.8, it would be
an asset to early and accurately detect prostate,
and to prevent a large number of aging men from
undergoing unnecessary prostatic biopsies - Suggestion of mistaken label by prostatic biopsy
through pattern analysis may lead to a novel
direction in the diagnostic research of prostate
cancer
33Thanks for your time