A Significance Test-Based Feature Selection Method for the Detection of Prostate Cancer from Proteomic Patterns

presentation player overlay
1 / 32
About This Presentation
Transcript and Presenter's Notes

Title: A Significance Test-Based Feature Selection Method for the Detection of Prostate Cancer from Proteomic Patterns


1
A Significance Test-Based Feature Selection
Method for the Detection of Prostate Cancer from
Proteomic Patterns
  • Qianren (Tim) Xu

M.A.Sc. Candidate
Supervisors
Dr. M. Kamel Dr. M. M. A. Salama
2
Highlight
Proteomic Pattern Analysis for Prostate Cancer
Detection
Significance Test-Based Feature Selection (STFS)
  • STFS can be generally used for any problems of
    supervised pattern recognition
  • Very good performances have been obtained on
    several benchmark datasets, especially with a
    large number of features
  • Sensitivity 97.1, Specificity 96.8
  • Suggestion of mistaken label by prostatic biopsy

3
Outline of Part I
Significance Test-Based Feature Selection (STFS)
on Supervised Pattern Recognition
  • Introduction
  • Methodology
  • Experiment Results on Benchmark Datasets
  • Comparison with MIFS

4
Introduction
Problems on Features
Increasing computational complexity
  • Large number
  • Irrelevant
  • Noise
  • Correlation

Reducing recognition rate
5
Mutual Information Feature Selection
  • One of most important heuristic feature selection
    methods, it can be very useful in any
    classification systems.
  • But estimation of the mutual information is
    difficult
  • Large number of features and the large number of
    classes
  • Continuous data

6
Problems on Feature Selection Methods
Two key issues
  • Computational complexity
  • Optimal deficiency

7
Proposed Method
Criterion of Feature Selection
Significance of feature
Significant difference

Independence
X
Pattern separabilityon individual candidate
features
Noncorrelation betweencandidate feature and
already-selected features
8
Measurement of Pattern Separability of Individual
Features
Statistical Significant Difference
Continuous data with normal distribution
Continuous data with non-normal distribution or
rank data
Categorical data
Chi-squaretest
Two classes
More than two classes
Two classes
More than two classes
t-test
ANOVA
Mann-Whitneytest
Kruskal-Wallistest
9
Independence
Independence
Continuous data with normal distribution
Continuous data with non-normal distribution or
rank data
Categorical data
Pearson contingency coefficient
Spearman rank correlation
Pearson correlation
10
Selecting Procedure
MSDI Maximum Significant Differenceand
Independence Algorithm
MIC Monotonically Increasing Curve Strategy
11
Maximum Significant Difference and Independence
(MSDI) Algorithm
Compute the significance difference (sd) of every
initial features
Select the feature with maximum sd as the first
feature
Computer the independence level (ind) between
every candidate feature and the already-selected
feature(s)
Select the feature with maximum feature
significance (sf sd x ind) as the new feature
12
Monotonically Increasing Curve (MIC) Strategy
Performance Curve
The feature subset selected by MSDI
1
Plot performance curve
0.8
Rate of recognition
Delete the features that have no good
contribution to the increasing of recognition
0.6
0.4
0
10
20
30
Number of features
Until the curve is monotonically increasing
13
Example I Handwritten Digit Recognition
  • 32-by-32 bitmaps are divided into 8X864 blocks
  • The pixels in each block is counted
  • Thus 8x8 matrix is generated, that is 64 features

14
Performance Curve
MSDI Maximum Significant Difference and
Independence MIFS Mutual Information Feature
Selector
1
0.9
Battitis MIFS
0.8
Rate of recognition
0.7
It is need to determined ß
0.6
Random ranking
0.5
0.4
0
10
20
30
40
50
60
Number of features
15
Computational Complexity
Selecting 15 features from the 64 original
feature set MSDI 24
seconds Battitis MIFS 1110 seconds
(5 vales of ß are searched in the range of 0-1)
16
Example II Handwritten digit recognition
The 649 features that distribute over the
following six feature sets
  • 76 Fourier coefficients of the character shapes,
  • 216 profile correlations,
  • 64 Karhunen-Love coefficients,
  • 240 pixel averages in 2 x 3 windows,
  • 47 Zernike moments,
  • 6 morphological features.

17
Performance Curve
MSDI Maximum Significant difference and
independence MIC Monotonically Increasing Curve
1
0.8
Rate of recognition
MSDI
0.6
0.4
0.2
0
10
20
30
40
50
Number of features
18
Comparison with MIFS
MSDI Maximum Significant Difference and
Independence MIFS Mutual Information Feature
Selector
MSDI is much better on large number of features
1
0.9
MSDI
0.8
MIFS (ß0.2)
Rate of recognition
MIFS (ß0.5)
0.7
0.6
0.5
MIFS is better on small number of features
0.4
0
10
20
30
40
50
Number of features
19
Summary on Comparing MSDI with MIFS
  • MSDI is much more computational effective
  • MIFS need to calculate the pdfs
  • The computational effective criterion (Battitis
    MIFS) still need to determine ß
  • MSDI only involves the simple statistical
    calculation
  • MSDI can select more optimal feature subset from
    a large number of feature, because it is based on
    relevant statistical models
  • MIFS is more suitable on small volume of data and
    small feature subset

20
Outline of Part II
Mass Spectrometry-Based Proteomic Pattern
Analysis for Detection of Prostate Cancer
  • Problem Statement
  • Methods
  • Feature
  • Classification
  • optimization
  • Results and Discussion

21
Problem Statement
15154 points (features)
  1. Very large number of features
  2. Electronic and chemical noise
  3. Biological variability of human disease
  4. Little knowledge in the proteomic mass spectrum

22
The system of Proteomic Pattern Analysis
STFS Significance Test-Based Feature
Selection PNN Probabilistic Neural
Network RBFNN Radial Basis Function Neural
Network
Training dataset (initial features gt 104)
Most significant featuresselected by STFS
Optimization of the size of featuresubset and
the parameters of classifierby minimizing ROC
distance
RBFNN / PNN learning
Trained neural classifier
Mature classifier
23
Feature Selection STFS
Significanceof feature
Significantdifference
MSDI

Independence
x
StudentTest
Pearsoncorrelation
MIC
STFS Significance Test-Based Feature
Selection MSDI Maximum Significant Difference
and Independence Algorithm MIC Monotonically
Increasing Curve Strategy
24
Classification PNN / RBFNN
RBFNN is a modifiedfour-layer structure
PNN is a standard structure with four layers
x
y
yd
1
y(1)
x
S1
x1
2
Pool 1
x
3
x2
y(2)
x
n
xn
Pool 2
S2
PNN Probabilistic Neural Network RBFNN
Radial Basis Function Neural Network
25
Optimization ROC Distance
1
dROC
a
b
True positive rate(sensitivity)
Minimizing the ROC distanceto optimize -
Feature subset numbers m - Gaussian spread s -
RBFNN pattern decision weight ?
0
0
False positive rate(1-specificity)
1
ROC Receiver Operating Characteristic
26
Results Sensitivity and Specificity
Sensitivity Specificity
Our results 97.1 96.8
Petricoin (2002) 94.7 75.9
DRE 55-68 6-33
PSA 29-80 --
27
Pattern Distribution
Cut-point
28
The possible causes onthe unrecognizable samples
  1. The algorithm of the classifier is not able to
    recognize all the samples
  2. The proteomics is not able to provide enough
    information
  3. Prostatic biopsies mistakenly label the cancer

29
Possibility of Mistaken Diagnosis of Prostatic
Biopsy
  • Biopsy has limited sensitivity and specificity
  • Proteomic classifier has very high sensitivity
    and specificity correlated with biopsy
  • The results of proteomic classifier are not
    exactly the same as biopsy
  • All unrecognizable sample are outliers

True non-cancer
False non-cancer
False cancer
True cancer
Cut-point
30
Why Accuracy of Biopsy is limited?
Limited sensitivity (to detection cancer) Biopsy
is impossible to reach all areas of prostate, and
the small sample volume will never exactly
represent the entire organ. 83.3 for the sextant
biopsies.
Limited Specificity (to detection
non-cancer) Biopsy may detect low-volume tumours
(clinically insignificant prostate cancer) that
may not threaten a man's future health. 97.3 if
cutoff volume gt 2 cc considered as cancer.
31
Summary (1)
Significance Test-Based Feature Selection (STFS)
  • STFS selects features by maximum significant
    difference and independence (MSDI), it aims to
    determine minimum possible feature subset to
    achieve maximum recognition rate
  • Feature significance (selecting criterion ) is
    estimated based on the optimal statistical models
    in accordance with the properties of the data
  • Advantages
  • Computationally effective
  • Optimality

32
Summary (2)
Proteomic Pattern Analysis for Detection of
Prostate Cancer
  • The system consists of three parts feature
    selection by STFS, classification by PNN/RBFNN,
    optimization and evaluation by minimum ROC
    distance
  • Sensitivity 97.1, Specificity 96.8, it would be
    an asset to early and accurately detect prostate,
    and to prevent a large number of aging men from
    undergoing unnecessary prostatic biopsies
  • Suggestion of mistaken label by prostatic biopsy
    through pattern analysis may lead to a novel
    direction in the diagnostic research of prostate
    cancer

33
Thanks for your time
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com