Support Vector Machine: Introduction and Applications - PowerPoint PPT Presentation

1 / 93
About This Presentation
Title:

Support Vector Machine: Introduction and Applications

Description:

Auto-correction one bit error ... X' when the amino acids are classi?ed as four groups charged, polar, aromatic, and nonpolar ... – PowerPoint PPT presentation

Number of Views:214
Avg rating:3.0/5.0
Slides: 94
Provided by: neuronCsi
Category:

less

Transcript and Presenter's Notes

Title: Support Vector Machine: Introduction and Applications


1
Support Vector Machine Introduction and
Applications
  • Jung-Ying Wang 12/2/2006

2
Outline
  • Basic concept of SVM
  • SVC formulations
  • Kernel function
  • Model selection (tuning SVM hyperparameters)
  • SVM applications

3
Introduction
  • Learning
  • supervised learning (classification)
  • unsupervised learning (clustering)
  • Data classification
  • training
  • testing

4
Basic Concept of SVM
  • Consider linear separable case
  • Training data two classes

5
(No Transcript)
6
Decision Function
  • f(x) gt 0 ? class 1
  • f(x) lt 0 ? class 2
  • How to find good w and b?
  • There are many possible (w,b)

7
Support Vector Machines
  • a promising technique for data classification
  • statistic learning theorem maximize the distance
    between two classes
  • linear separating hyperplane

8
  • Maximal margin distance between

9
Questions?
  • Linear nonseparable case
  • How to solve w,b?
  • 3. Is this (w,b) good?
  • 4. Multiple-class case

10
Method to Handle Non-separable Case
nonlinear case
  • mapping the input data into a higher dimensional
    feature space

11
Example
12
  • Find a linear separating hyperplane

13
  • Questions
  • 1. How to choose ? ?
  • 2. Is it really better? Yes.
  • Some times even in high dimension spaces. Data
    may still not separable.
  • ? Allow training error

14
example
  • non-linear curves linear hyperplane in high
    dimension space (feature space)

15
SVC formulations (the soft margin hyperplane)
  • Expect if separable,

16
If f is convex, x is opt. (KKT condition)
17
How to solve an opt. problem with
constraints? Using Lagrangian multipliers
  • Given an optimisation problem

18
What is good in Dual than Primal?
  • Consider the following primal problem
  •  
  •  
  • (P) variables w? dimension of ?(x) ( very big
    number) , b?1, ?? l
  • (D) variables l
  • Derive its dual.

19
 Derive the Dual The primal Lagrangian for the
problem is
The corresponding dual is found by
differentiating with respect to w, ?, and b.
20
Resubstituting the relations obtained into the
primal to obtain the following adaptation of the
dual objective function     Let
then   Hence, maximizing the
above objective over is equivalent to
maximizing  
21
(No Transcript)
22
  • Primal and dual problem have the same KKT
    conditions
  • Primal variables very large (shortcoming)
  • Dual of variable l
  • High dim. Inner product
  • Reduce its computational time
  • For special ? question can be efficiently
    calculated.

23
Kernel function
24
(No Transcript)
25
Model selection (Tuning SVM hyperparameters)
  • Cross validation can avoid overfitting
  • Ex 10 fold cross-validation, l data separated to
    10 groups. Each time 9 groups as training data,
    1group as test data.
  • LOO (leave-one-out)
  • cross validation with l groups, each time (l-1)
    data for training, 1 for testing.

26
Model Selection
  • The commonly used method of the model selection
    is grid method

27
Model Selection of SVMs Using GA Approach
  • Peng-Wei Chen, Jung-Ying Wang  and Hahn-Ming
    Lee  2004 IJCNN International Joint Conference
    on Neural Networks, 26 - 29 July 2004.
  • Abstract A new automatic search methodology for
    model selection of support vector machines, based
    on the GA-based tuning algorithm, is proposed to
    search for the adequate hyperameters of SVMs.

28
Model Selection of SVMs Using GA Approach
  • Procedure GA-based Model Selection Algorithm
  • Begin
  • Read in dataset
  • Initialize hyperparameters
  • While (not termination condition) do
  • Train SVMs
  • Estimate general error
  • Create hyperparameters by tuning algorithm
  • End
  • Output the best hyperparameters
  • End

29
Experiment Setup
  • The initial population is selected at random and
    the chromosome consists of one string of bits
    with fixed length 20.
  • Each bit can have the value 0 or 1.
  • The first 10 bits encode the integer value of C,
    and the rest 10 bits encode the decimal value of
    s.
  • Suggestion of population size N 20 is used
  • The crossover rate 0.8 and mutation rate 1/20
    0.05 is chosen

30
SVM Application Breast Cancer Diagnosis 
Software WEKA
31
Coding for Weka
  • _at_relation breast_training
  • _at_attribute a1 real
  • _at_attribute a2 real
  • _at_attribute a3 real
  • _at_attribute a4 real
  • _at_attribute a5 real
  • _at_attribute a6 real
  • _at_attribute a7 real
  • _at_attribute a8 real
  • _at_attribute a9 real
  • _at_attribute class 2,4

32
Coding for Weka
  • _at_data
  • 5 ,1 ,1 ,1 ,2 ,1 ,3 ,1 ,1 ,2
  • 5 ,4 ,4 ,5 ,7 ,10,3 ,2 ,1 ,2
  • 3 ,1 ,1 ,1 ,2 ,2 ,3 ,1 ,1 ,2
  • 6 ,8 ,8 ,1 ,3 ,4 ,3 ,7 ,1 ,2
  • 8 ,10,10,7 ,10,10,7 ,3 ,8 ,4
  • 8 ,10,5 ,3 ,8 ,4 ,4 ,10,3 ,4
  • 10,3 ,5 ,4 ,3 ,7 ,3 ,5 ,3 ,4
  • 6 ,10,10,10,10,10,8 ,10,10,4
  • 1 ,1 ,1 ,1 ,2 ,10,3 ,1 ,1 ,2
  • 2 ,1 ,2 ,1 ,2 ,1 ,3 ,1 ,1 ,2
  • 2 ,1 ,1 ,1 ,2 ,1 ,1 ,1 ,5 ,2

33
Running Results using Weka 3.3.6 predictor
Support Vector Machines  (in Weka called
Sequential Minimal Optimization algorithm
Weka SMO result for 400 training data
34
Weka SMO result for 283 test data
35
Software and Model Selection
  • software LIBSVM
  • mapping function use Radial Basis Function
  • find the best parameter C and kernel parameter g
  • use cross validation to do the model selection

36
LIBSVM Model Selection using Grid Method
-c 1000 -g 10 3-fold accuracy
69.8389 -c 1000 -g 1000 3-fold accuracy
69.8389 -c 1 -g 0.002 3-fold
accuracy 97.0717 winner -c 1 -g 0.004
3-fold accuracy 96.9253
37
Coding for LIBSVM
2 1 2 2 3 3 1 4 1 5 5 6 1 7 1 8 1 9 1
2 1 3 2 2 3 2 4 3 5 2 6 3 7 3 8 1 9 1 4
110 210 310 4 7 510 610 7 8 8 2 9 1 2
1 4 2 3 3 3 4 1 5 2 6 1 7 3 8 3 9 1 2
1 5 2 1 3 3 4 1 5 2 6 1 7 2 8 1 9 1 2
1 3 2 1 3 1 4 1 5 2 6 1 7 1 8 1 9 1 4
1 9 210 310 410 510 610 710 810 9 1 2
1 5 2 3 3 6 4 1 5 2 6 1 7 1 8 1 9 1 4
1 8 2 7 3 8 4 2 5 4 6 2 7 5 810 9 1
38
Summary
39
Summary
40
Multi-class SVM
  • one-against-all method
  • k SVM models (k the number of classes)
  • ith SVM trained with all examples in the ith
    class as positive, and others as negative
  • one-against-one method
  • k(k-1)/2 classifiers where each one trains data
    from two classes

41
SVM Application in Bioinformatics
  • SVM-Cabins Prediction of Solvent Accessibility
    Using Accumulation Cutoff Set and Support Vector
    Machine
  • Prediction of protein secondary structure
  • SVM application in protein fold assignment

42
Solvent Accessibility
  • Waters can touch residues at the surface of a
    protein
  • prediction of solvent-accessible surface area
    helps us to understand the complete tertiary
    structure of proteins

43
Motivation
  • Traditionally the ASA prediction is the
    classification problem of binary or multiple
    classes, but the arbitrary choice of cutoff
    thresholds become a problem and drawback to
    develop a prediction system
  • To overcome this, recently many statistical,
    regression and machine learning methods have been
    proposed to predict the real values of solvent
    accessibility.
  • We want to propose a novel method for real value
    prediction of solvent accessibility

44
Related Methods
  • Statistical information Wang, et al., 2004
  • Multiple linear regression Wang, et al., 2005
  • Neural network Ahmad et al., 2003 Garg et al.,
    2005
  • Neural networks-based regression Adamczak, et
    al., 2004
  • Support vector regression Yuan and Huang, 2004.

45
Data Sets
  • Rost and Shander data set (RS126)
  • Cuff and Barton data set (CB502)

46
Evolutionary Information
  • We use the BLASTP to generate multiple sequence
    alignments of proteins
  • Expectation value (E-value) of 0.01 and choosing
    the non-redundant protein sequence database (NCBI
    nr database) to search.
  • The alignments were represented as profiles or
    position specific substitution matrices (PSSM)

47
Coding Scheme
  • A moving window of 13 neighboring residues and
    each position of a window has 22 possible values
  • The data obtained from PSSM which includes 20
    amino acid substitution scores, indel (inserting
    and deleting) and entropy were directed used as
    input to our algorithm.
  • Prediction is made for the central residue in the
    window frame

48
Intuition Idea
  • Using multi-class classifier to assign a real
    value to the test datum

49
Two Problems
  • The performance of SVM was poor when the number
    of labeled positive data is small.
  • This is mainly due to the optimal hyper-plane of
    SVM may be biased when the positive data are much
    less than the negative data.
  • Traditional 3 approaches to solve
  • Crisp set problem

50
Algorithm to Transfer N Binary-class SVM Models
to Real Values of Solvent Accessibility
  • We construct 13 accumulation cabins from the two
    end-point of ASA real value.
  • There are 0, 0, 0, 5, 0, 10, 0, 20, 0,
    30, 0, 40,
  • 0, 50, 50, 100, 60, 100, 70, 100, 80,
    100, 90, 100, and 100, 100

51
SVM Model Selection
52
Accuracy for Each Binary-class SVM Model
53
The Mainly Output Vector Patterns (over 97) for
13 Binary-class SVM Models
54
Algorithm to Assign the Prediction Result
55
An Example
  • For example, the vector 1110000111111
  • Four binary-class SVM models predict it belonging
    to positive class. That is, 0, 20, 0, 30, 0,
    40, and 0, 50
  • We can infer that this datum must be inside the
    cabin range of 0, 20.
  • We can get the same result by taking the
    intersection set of above four continuing
    positive cabin ranges

56
An Example
  • The test datum should not be included inside the
    nine cabin ranges of 0, 0, 0, 5, 0, 10,
    50, 100, 60, 100, 70, 100, 80, 100, 90,
    100, and 100, 100
  • So, we can further infer that the datum should be
    inside the cabin range of 10, 20
  • We can also use the set difference between the
    cabin range of 0, 20 and above nine negative
    cabin ranges to get the result of 10, 20.
  • Finally, we use the middle point 15 of the cabin
    range 10, 20 as our real ASA prediction value.

57
Algorithm to Assign the Prediction Result
  • Auto-correction one bit error
  • Because each binary-class SVM has at least 77
    accuracy, when the number of 1 appears inside the
    contiguous 0, we have the confidence to correct
    it.
  • About 1.5 test data their vector patterns
    belong to the case of one bit error inside the
    two contiguous 0.

58
Algorithm to Assign the Prediction Result
  • The last 1.5 test data patterns we could use our
    previous methods, including look-up table or
    multiple linear regression to assign their real
    value of ASA.
  • But in here, we use the simplest ways to assign
    the ASA values that is, using the ASA of
    individual residues to evaluate an average ASA in
    our experimental data set and then assign this
    average for a new residue as a prediction.
  • For example, the average ASA for Alanine (A)
    residues in the Barton502 data set was 22.8,
    which could then be assigned to all Alanine
    residues in the last 1.5 test data.

59
Validation Method
  • Seven-fold cross validation of results was
    carried out for RS126 data set.
  • Five-fold cross validation of results was carried
    out for CB502 data set.

60
Assessment of Prediction Performance
  • Mean absolute error (MAE)
  • Correlation coefficient between the predicted and
    experimental values of ASA

61
Results and Discussion
62
Results and Discussion
63
Results and Discussion
Table 2. Variation in prediction error for
different ranges of ASA
64
Table3. Mean absolute error for different amino
acid types (all values are in percentage scale)
65
Table 4. Effect of Protein Length on Mean
Absolute Error ( number of protein chains)
66
Table 5. Comparison with other real value
prediction methods
67
Introduction to Secondary Structure
  • The prediction of protein secondary structure is
    an important step to determine structural
    properties of proteins.
  • The secondary structure consists of local folding
    regularities maintained by hydrogen bonds and is
    traditionally subdivided into three classes
    alpha-helices, beta-sheets, and coil.

68
(No Transcript)
69
The Secondary Structure Prediction Task
70
Coding ExampleProtein Secondary Structure
Prediction
  • given an amino-acid sequence
  • predict a secondary-structure state
  • (a, b, coil) for each residue in the sequence
  • coding considering a moving window on n
    (typically 13-21) neighboring residues
  • FGWYALVLAMFFYOYQEKSVMKKGD

71
Methods
  • statistical information ( Figureau et al., 2003
    Yan et al., 2004)
  • neural networks (Qian and Sejnowski, 1988 Rost
    and Sander, 1993 Pollastri et al., 2002 Cai et
    al., 2003 Kaur and Raghava, 2004 Wood and
    Hirst, 2004 Lin et al., 2005)
  • nearest-neighbor algorithms
  • hidden Markov modes
  • support vector machines (Hua and Sun, 2001
    Hyunsoo and Haesun, 2003 Ward et al., 2003 Guo
    et al., 2004).

72
Milestone
  • In 1988, using Neural Networks first achieved
    about 62 accuracy (Qian and Sejnowski, 1988
    Holley and Karplus, 1989).
  • In 1993, using evolutionary information, Neural
    Network system had improved the prediction
    accuracy to over 70 (Rost and Sander, 1993).
  • Recently there have been approaches (e.g. Baldi
    et al., 1999 Petersen et al., 2000 Pollastr and
    McLysaght, 2005) using neural networks which
    achieve even higher accuracy (gt 78).

73
Benchmark (Data Set Used in Protein Secondary
Structure)
  • Rost and Sander data set (Rost and Sander, 1993)
    (referred as RS126)
  • Note that the RS126 data set consists of 25,184
    data points in three classes where 47 are coil,
    32 are helix, and 21 are strand.
  • Cuff and Barton data set (Cuff and Barton, 1999)
    (referred as CB513)
  • The performance accuracy is verified by a 7-fold
    cross validation.

74
Secondary Structure Assignment
  • According to the DSSP (Dictionary of Secondary
    Structures of Proteins) algorithm (Kabsch and
    Sander, 1983), which distinguishes eight
    secondary structure classes
  • We converted the eight types into three classes
    in the following way H (a-helix), I (p-helix),
    and G (310-helix) as helix (a), E (extended
    strand) as ß-strand (ß), and all others as coil
    (c).
  • Different conversion methods influence the
    prediction accuracy to some extent, as discussed
    by Cutt and Barton (Cutt and Barton, 1999).

75
Assessment of Prediction Accuracy
  • Overall three-state accuracy Q3. (Qian and
    Sejnowski, 1988 Rost and Sander, 1993). Q3 is
    calculated by
  • N is the total number of residues in the test
    data sets, and qs is the number of residues of
    secondary structure type s that are predicted
    correctly.

76
Assessment of Prediction Accuracy
  • A more complicated measure of accuracy is
    Matthews correlation coefficient (MCC)
    introduced in (Matthews, 1975)
  • Where TPi, TNi, FPi and FNi are numbers of true
    positives, true negatives, false positives, and
    false negatives for class i, respectively. It can
    be clearly seen that a higher MCC is better.

77
Support Vector Machines Predictor
  • Using the software LIBSVM (Chang and Lin, 2005)
    as our SVM predictor
  • Using the RBF kernel for all experiments
  • Choosing optimal parameter for support vector
    machines. Find the pair of C 10 and ? 0.01
    that achieves the best prediction rate

78
Coding Scheme
79
Coding Scheme for Support Vector Machines
  • We use the BLASTP to generate the alignments
    (profile) of proteins in our database
  • The expectation value (E-value) of 10.0 and
    choosing the non-redundant protein sequence
    database (NCBI nr database) to search.
  • The profile data obtained from BLASTPGP are
    normalized to 01, and then used as inputs to our
    SVM predictor.

80
Last position-specific scoring matrix computed,
weighted observed percentages rounded down,
information per position, and relative weight of
gapless real matches to pseudocounts
A R N D C Q E G H I L K M F P S
T W Y V A R N D C Q E G H
I L K M F P S T W Y V 1 R
-1 5 -1 -1 -4 3 0 -2 -1 -3 -3 3 -2 -3 -2 -1
-1 -3 -2 -3 0 50 0 0 0 17 0 0 0
0 0 33 0 0 0 0 0 0 0 0 0.63
0.09 2 T 0 -2 -1 -2 5 -1 -2 -2 -2 -2 -2
-1 -2 -3 -2 3 4 -3 -2 -1 0 0 0 0 22
0 0 0 0 0 0 0 0 0 0 30 48 0
0 0 0.54 0.15 3 D -1 -3 4 1 -4 -2 -2
6 -2 -5 -5 -2 -4 -4 -3 -1 -2 -4 -4 -4 0 0
24 8 0 0 0 68 0 0 0 0 0 0
0 0 0 0 0 0 1.04 0.34 4 C -2 0
1 -3 9 -3 -3 -3 -3 -2 -2 -3 -2 0 -3 -1 2 -3 -3
-2 0 6 10 0 61 0 0 0 0 0 0
0 0 6 0 0 17 0 0 0 1.14 0.38
5 Y -3 -3 -4 -5 -4 -3 -4 -5 0 -3 -2 -3 -2 2
-4 -3 -3 1 9 -3 0 0 0 0 0 0 0
0 0 0 0 0 0 3 0 0 0 0 97 0
1.84 0.53 6 G 1 -3 -2 -2 -4 -2 0 6 -3
-4 -5 -3 -4 -4 -3 -1 1 -4 -4 -4 11 0 0 0
0 0 9 72 0 0 0 0 0 0 0 0
9 0 0 0 1.10 0.56 7 N -3 -2 4 4 4
0 1 -3 2 -3 -3 -1 2 -4 -3 -1 -2 -5 -3 -2 0
0 30 26 11 3 9 0 5 0 0 3 9
0 0 2 0 0 0 2 0.64 0.56 8 V -2
-4 -4 -4 -3 -4 -4 -5 -5 6 0 -4 0 -2 -4 -3 1
-4 -3 3 0 0 0 0 0 0 0 0 0 68
0 0 0 0 0 0 9 0 0 23 0.95
0.56 9 N 1 1 3 -2 -3 1 1 -3 -2 -2 -2
-1 5 -3 -3 2 2 -4 -3 -2 11 8 14 0 0
6 8 0 0 0 0 0 23 0 0 17 12 0
0 0 0.40 0.57 10 R 0 2 1 -1 2 1 0
-3 -2 -3 -2 1 -3 -4 -3 2 3 -4 -3 -3 5 11
5 3 5 7 5 0 0 0 3 11 0 0 0
20 24 0 0 0 0.38 0.57 11 I 0 -4 -4
-5 -3 -3 -4 -4 -4 4 3 -4 3 -2 -4 -3 -2 -4 -3
3 9 0 0 0 0 0 0 0 0 29 33
0 10 0 0 0 0 0 0 20 0.65 0.57
12 D -2 -2 -1 4 -4 3 3 -3 -2 -4 -4 1 -3 -5
-3 2 1 -5 -4 -4 0 0 0 26 0 16 26
0 0 0 0 7 0 0 0 17 7 0 0 0
0.67 0.57
81
Results
  • Results for the ROST126 protein set
  • (Using the seven-fold cross validation)

82
Results
  • Results for the CB513 protein set
  • (Using the seven-fold cross validation)

83
SVM Application in Protein Fold Assignment
  • "Fine-grained Protein Fold Assignment by Support
    Vector Machines using generalized n-peptide
    Coding Schemes and jury voting from multiple
    parameter sets",
  • Chin-Sheng Yu, Jung-Ying Wang, Jin-Moon Young,
    P.-C. Lyu, Chih-Jen Lin, Jenn-Kang Hwang,
  •  Proteins Structure, Function, Genetics, 50,
    531-536 (2003).

84
Data Sets
  • Ding and Dubchak which consists of 386 proteins
    of the most populated 27 SCOP folds in which the
    protein pairs have sequence identity below 35
    for the aligned subsequences longer than 80
    residues.
  • These 27 proteins folds cover most major
    structural classes and have at least 7 or more
    proteins in their classes

85
Coding Scheme
  • We denote the coding schemes by X if all 20 amino
    acids are used
  • X when the amino acids are classi?ed as four
    groups charged, polar, aromatic, and nonpolar
  • X, if predicted secondary structures are used
  • We assign the symbol X the values of D, T, Q,
    and P, denoting the distributions of dipeptides,
    3-peptides, and 4-peptides, respectively.

86
Methods
87
Results
88
Results
89
Results
90
Results
91
Structure Example Jury SVM Predictor
92
Structure ExampleSVM Combiner
                             

                             
93
  • Thanks!
Write a Comment
User Comments (0)
About PowerShow.com