Prediction model building and feature selection with SVM in breast cancer diagnosis - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Prediction model building and feature selection with SVM in breast cancer diagnosis

Description:

Prediction model building and feature selection with SVM in breast cancer ... HSV-1 (herpes simplex virus type 1) EBV (Epstein-Barr virus) CMV (cytomegalovirus) ... – PowerPoint PPT presentation

Number of Views:807
Avg rating:3.0/5.0
Slides: 28
Provided by: Tul56
Category:

less

Transcript and Presenter's Notes

Title: Prediction model building and feature selection with SVM in breast cancer diagnosis


1
Prediction model building and feature selection
with SVM in breast cancer diagnosis
  • Cheng-Lung Huang, Hung-Chang Liao, Mu-Chen Chen

Expert Systems with Applications 2008
2
Introduction
  • Breast cancer is a serious problem for the young
    women of Taiwan.
  • Almost 64.1 of women with breast cancer are
    diagnosed before the age of 50 and 29.3 of women
    with breast cancer are diagnosed before the age
    of 40.
  • However, the causes are still unknown.

3
Introduction
  • This study (Ziegler et al., 1993) shows that
    fibroadenoma shared some risk factors with breast
    cancer.
  • HSV-1 (herpes simplex virus type 1)
  • EBV (Epstein-Barr virus)
  • CMV (cytomegalovirus)
  • HPV (human papillomavirus)
  • HHV-8 (human herpesvirus-8)

4
Introduction
  • DNA viruses, as causes, are closely related to
    the human cancers as part of the high-risk
    factors.
  • In order to obtain the relationship between DNA
    viruses and breast tumors.
  • This paper uses the support vector machines (SVM)
    to find the pertinent bioinformatics.

5
Two Important Challenge
  • When using SVM, two problems are confronted
  • How to choose the optimal input feature subset
    for SVM.
  • How to set the best kernel parameters.
  • These two problems are crucial because the
    feature subset choice influences the appropriate
    kernel parameters and vice versa.

6
Feature Selection
  • Feature selection is an important issue in
    building classification systems.
  • It is advantageous to limit the number of input
    features in a classifier in order to have a good
    predictive and less computationally intensive
    model.
  • This study tried F-score calculation to select
    input features.

7
F-Score
8
F-Score Algorithm
9
Parameters Optimization
  • To design a SVM, one must choose a kernel
    function,set the kernel parameters and determine
    a soft margin constant C.
  • The grid algorithm is an alternative to finding
    the best C and gamma when using the RBF kernel
    function.
  • This study tried grid search to find the best SVM
    model parameters.

10
Grid-Search Algorithm
11
Data collection
  • The source of 80 data points (tissue samples)
  • 52 specimens of non-familial invasive ductal
    breast cancer.
  • 28 mammary fibroadenomas.
  • (From Chung-Shan Medical University Hospital )

12
Data partition
  • Data set is further randomly partitioned into
    training and independent testing sets via a
    stratified 5-fold cross validation.

13
SVM-based optimize parameters and feature
selection
14
The relative feature importance with F-score
15
The relative importance of DNA virus based on the
F-score
16
The five feature subsets based on the F-score
17
Overall training and testing accuracy for each
feature subset
18
Type I and type II errors
  • Type I errors (the "false positive") the error
    of rejecting the null hypothesis given that it is
    actually true
  • Type II errors (the "false negative") the error
    of failing to reject the null hypothesis given
    that the alternative hypothesis is actually true

19
Detail testing accuracy for feature subset of
size 2 and 3
20
Linear discriminate analysis (LDA)
  • Originally developed in 1936 by R.A. Fisher,
    Discriminate Analysis is a classic method of
    classification.
  • Discriminate analysis can be used only for
    classification
  • Linear discriminant analysis finds a linear
    transformation ("discriminant function") of the
    two predictors, X and Y, that yields a new set of
    transformed values that provides a more accurate
    discrimination than either predictor alone
  • Transformed Target C1X C2Y

21
The P-level of each attribute for LDA
Selection criteria P-level value lt 0.05
22
Training and testing accuracy for LDA
23
Comparison summary between SVM and LDA
24
Conclusion
  • In order to find the correlation DNA viruses with
    breast tumor, and to achieve a high
    classificatory accuracy.
  • F-score is adapted to find the important
    features.
  • grid search approach is used to search the
    optimal SVM parameters.
  • The results revealed that the SVM-based model has
    good performance in diagnosing breast cancer
    according to our data set.

25
Conclusion
  • The present studys results also show that the
    attributesHSV-1, HHV-8 or HSV-1, HHV-8, CMV
    can achieve identical high accuracy, at 86 of
    average overall hit rate.
  • This study suggests simultaneously considering
    HSV-1 and HHV-8 is feasible however, only
    considering HHV-8 or HSV-1 is less accurate.

26
Future Work
  • The practical obstacle of the SVM-based (as well
    as neural networks) classification model is its
    black-box nature.
  • A possible solution for this issue is the use of
    SVM rule extraction techniques or the use of
    hybrid-SVM model combined with other more
    interpretable models.

27
  • Thank You
Write a Comment
User Comments (0)
About PowerShow.com