Predicting Secondary Structure of AllHelical Proteins Using Hidden Markov Support Vector Machines Bl - PowerPoint PPT Presentation

About This Presentation
Title:

Predicting Secondary Structure of AllHelical Proteins Using Hidden Markov Support Vector Machines Bl

Description:

Predicting Secondary Structure. of All-Helical Proteins Using ... William Thies, Andrew Lee, Marten van Dijk, and Srinivas Devadas ... – PowerPoint PPT presentation

Number of Views:105
Avg rating:3.0/5.0
Slides: 46
Provided by: BillT82
Category:

less

Transcript and Presenter's Notes

Title: Predicting Secondary Structure of AllHelical Proteins Using Hidden Markov Support Vector Machines Bl


1
Predicting Secondary Structure of All-Helical
Proteins UsingHidden Markov Support Vector
MachinesBlaise Gassend, Charles W. O'Donnell,
William Thies, Andrew Lee, Marten van Dijk, and
Srinivas Devadas
  • Computer Science and Artificial Intelligence
    Laboratory
  • Massachusetts Institute of Technology
  • Workshop on Pattern Recognition in Bioinformatics
    August 20, 2006

2
Protein Structure Prediction
  • Classical problem given sequence, predict
    structure
  • High-level approaches
  • 1. Energy-minimization (ab-initio) techniques
  • - Elegant, but often lack correct parameters
  • 2. Homology-based techniques
  • - Useful, but hard to predict new proteins

Sequence
Structure
Our approach Use energy minimization, butlearn
parameters from existing proteins
3
Our Framework (Training)
Protein Data Bank
Correct structure
Amino-acid Sequence
Prediction Algorithm
Energy Parameters
Predictedstructure
LearningAlgorithm
correct
incorrect
Done!
Constraints energy(incorrect) gt energy(correct)
4
Our Framework (Testing)
Amino-acid Sequence
Prediction Algorithm
Energy Parameters
Predictedstructure
5
Initial Focus Secondary Structure
  • Classify each residue as alpha helix, beta
    strand, coil
  • In this paper, restrict to all-alpha proteins
  • Applications
  • Informing tertiary structure predictors
  • Identification of homologous proteins
  • Identification of active sites (coils)

6
Secondary Structure Predictors
7
Secondary Structure Predictors
Sequence
Sequence
Sequence
Sequence
Alignment
Alignment
Only
Only
Statistical Methods
Statistical Methods
HMMs
8
Secondary Structure Predictors
Sequence
Sequence
Sequence
Sequence
Alignment
Alignment
Only
Only
Statistical Methods
Statistical Methods
Neural Networks
Neural Networks
HMMs
9
Secondary Structure Predictors
Sequence
Sequence
Sequence
Sequence
Alignment
Alignment
Only
Only
Statistical Methods
Statistical Methods
Neural Networks
Neural Networks
SVMs
HMMs
10
Secondary Structure Predictors
Sequence
Sequence
Sequence
Sequence
Alignment
Alignment
Only
Only
Statistical Methods
Statistical Methods
Neural Networks
Neural Networks
SVMs
HMMs
HMMs
11
Secondary Structure Predictors
Sequence
Sequence
Sequence
Sequence
Alignment
Alignment
Only
Only
Statistical Methods
Statistical Methods
1400-2900 parameters
Neural Networks
Neural Networks
SVMs
HMMs
HMMs
680 MB of support vectors
471 parameters
  • Exploits biochemical models
  • Offers biological insight

12
Secondary Structure Predictors
Sequence
Sequence
Sequence
Sequence
Alignment
Alignment
Only
Only
Statistical Methods
Statistical Methods
302 params
1400-2900 parameters
Neural Networks
Neural Networks
SVMs
HMMs
HMMs
680 MB of support vectors
471 parameters
  • Exploits biochemical models
  • Offers biological insight

13
Our Framework Applied to Helix Prediction
Protein Data Bank
Alpha Helices
Correct structure
Amino-acid Sequence
MNIFEMLRIDEGL HHHHHHHHH
HiddenMarkov Model
Prediction Algorithm
Energy Parameters
Support Vector Machines
Predictedstructure
LearningAlgorithm
correct
incorrect
Done!
Constraints energy(incorrect) gt energy(correct)
14
Energy Parameters
302 Total
15
Energy Parameters
302 Total
  • Example
  • Sequence MNIFELRIDEGL
  • Structure HHHHHH
  • Energy

16
Energy Parameters
302 Total
  • Example
  • Sequence MNIFELRIDEGL
  • Structure HHHHHH
  • Energy HF HE HL HR HI HD
    (Helix)

17
Energy Parameters
302 Total
  • Example
  • Sequence MNIFELRIDEGL
  • Structure HHHHHH
  • Energy HF HE HL HR HI HD
    (Helix)
  • NM,-3 NN,-2 NI,-1 NF,0 NE,1
    NL,2 NR,3 (N-cap)

18
Energy Parameters
302 Total
  • Example
  • Sequence MNIFELRIDEGL
  • Structure HHHHHH
  • Energy HF HE HL HR HI HD
    (Helix)
  • NM,-3 NN,-2 NI,-1 NF,0 NE,1
    NL,2 NR,3 (N-cap)
  • CL,-3 CR,-2 CI,-1 CD,0
    CE,1 CG,2 CL,3 (C-cap)

19
Energy Parameters
302 Total
  • Example
  • Sequence MNIFELRIDEGL
  • Structure HHHHHH
  • Energy HF HE HL HR HI HD
    (Helix)
  • NM,-3 NN,-2 NI,-1 NF,0 NE,1
    NL,2 NR,3 (N-cap)
  • CL,-3 CR,-2 CI,-1 CD,0
    CE,1 CG,2 CL,3 (C-cap)

20
Learning the Parameters
Feature Space
Energy ( ) HAA HGG
w A G
Legal structure
Correct structure
where w represents the energy parameters HA HG
G of Glycines in Helices
Highest energy in direction of energy parameters w
A of Alanines in Helices
21
Learning the Parameters
Feature Space
Energy ( ) HAA HGG
w A G
Legal structure
Correct structure
where w represents the energy parameters HA HG
G of Glycines in Helices
Highest energy in direction of energy parameters w
w
A of Alanines in Helices
22
Learning the Parameters
Feature Space
1. Predict stucture
Legal structure
Correct structure
Predicted structure
G of Glycines in Helices
w
A of Alanines in Helices
23
Learning the Parameters
Feature Space
1. Predict stucture
Legal structure
Correct structure
Predicted structure
G of Glycines in Helices
A of Alanines in Helices
24
Learning the Parameters
Feature Space
1. Predict stucture 2. Refine parameters
Legal structure
Correct structure
Predicted structure
G of Glycines in Helices
Separating Hyperplane
A of Alanines in Helices
25
Learning the Parameters
Feature Space
1. Predict stucture 2. Refine parameters
Legal structure
Correct structure
Predicted structure
G of Glycines in Helices
w
Separating Hyperplane
A of Alanines in Helices
26
Learning the Parameters
Feature Space
1. Predict stucture 2. Refine parameters
Legal structure
Correct structure
Predicted structure
G of Glycines in Helices
w
A of Alanines in Helices
27
Learning the Parameters
Feature Space
1. Predict stucture 2. Refine parameters 3.
Predict structure
Legal structure
Correct structure
Predicted structure
G of Glycines in Helices
w
A of Alanines in Helices
28
Learning the Parameters
Feature Space
1. Predict stucture 2. Refine parameters 3.
Predict structure
Legal structure
Correct structure
Predicted structure
G of Glycines in Helices
A of Alanines in Helices
29
Learning the Parameters
Feature Space
1. Predict stucture 2. Refine parameters 3.
Predict structure 4. Refine parameters
Legal structure
Correct structure
Predicted structure
G of Glycines in Helices
A of Alanines in Helices
30
Learning the Parameters
Feature Space
1. Predict stucture 2. Refine parameters 3.
Predict structure 4. Refine parameters
Legal structure
Correct structure
Predicted structure
G of Glycines in Helices
w
A of Alanines in Helices
31
Learning the Parameters
Feature Space
1. Predict stucture 2. Refine parameters 3.
Predict structure 4. Refine parameters
Legal structure
Correct structure
Predicted structure
G of Glycines in Helices
w
A of Alanines in Helices
32
Learning the Parameters
Feature Space
1. Predict stucture 2. Refine parameters 3.
Predict structure 4. Refine parameters 5. Predict
structure
Legal structure
Correct structure
Predicted structure
G of Glycines in Helices
w
A of Alanines in Helices
33
Learning the Parameters
Feature Space
1. Predict stucture 2. Refine parameters 3.
Predict structure 4. Refine parameters 5. Predict
structure
Legal structure
Correct structure
Predicted structure
G of Glycines in Helices
A of Alanines in Helices
34
Learning the Parameters
Feature Space
1. Predict stucture 2. Refine parameters 3.
Predict structure 4. Refine parameters 5. Predict
structure 6. Refine parameters
Legal structure
Correct structure
Predicted structure
G of Glycines in Helices
A of Alanines in Helices
35
Learning the Parameters
Feature Space
1. Predict stucture 2. Refine parameters 3.
Predict structure 4. Refine parameters 5. Predict
structure 6. Refine parameters
Legal structure
Correct structure
Predicted structure
G of Glycines in Helices
w
A of Alanines in Helices
36
Learning the Parameters
Feature Space
1. Predict stucture 2. Refine parameters 3.
Predict structure 4. Refine parameters 5. Predict
structure 6. Refine parameters
Legal structure
Correct structure
Predicted structure
G of Glycines in Helices
w
A of Alanines in Helices
37
Learning the Parameters
Feature Space
1. Predict stucture 2. Refine parameters 3.
Predict structure 4. Refine parameters 5. Predict
structure 6. Refine parameters 7. Predict
structure
Legal structure
Structurealready predicted
Correct structure
Predicted structure
G of Glycines in Helices
w
A of Alanines in Helices
38
Learning the Parameters
Feature Space
1. Predict stucture 2. Refine parameters 3.
Predict structure 4. Refine parameters 5. Predict
structure 6. Refine parameters 7. Predict
structure 8. Terminate
Legal structure
Structurealready predicted
Correct structure
Predicted structure
G of Glycines in Helices
w
A of Alanines in Helices
39
Learning the Parameters
Feature Space
1. Predict stucture 2. Refine parameters 3.
Predict structure 4. Refine parameters 5. Predict
structure 6. Refine parameters 7. Predict
structure 8. Terminate
Legal structure
Structurealready predicted
Correct structure
Predicted structure
G of Glycines in Helices
Details in paper - How to converge faster -
Early termination condition -
w
A of Alanines in Helices
Tsochantaridis et al., ICML02
40
Experimental Methodology
  • Data set 300 non-homologous all-alpha proteins
  • From EVAs sequence-unique subset of the PDB,
    July 2005
  • Only consider alpha helices (H symbol in DSSP)
  • Randomly split into 150 training, 150 test
    proteins

41
Results
  • Comparison to others
  • Best HMM method to date that does not utilize
    alignment info
  • Offers 3.5 (Q?), 0.2 (SOV?) over previous best
  • Lags behind neural networks e.g., Porter overall
    SOV 76.6
  • However, we could likely gain 6-8 from alignment
    profiles
  • Caveats
  • Moving beyond all-alpha proteins, we could suffer
    3
  • By considering 3/10 helices, we could decrease 2

Nguyen02
Rost93
Jones99
42
Conclusions
  • Represents first step toward learning biophysical
    parameters for energy minimization techniques
  • Iterative, demand-driven learning process using
    SVMs
  • Promising results on alpha-helix prediction
  • 77.6 among best Q? for methods without alignment
    info
  • Future work super-secondary structure
  • Will predict full contact maps rather than
    3-state labels
  • For beta sheets, replace HMMs by multi-tape
    grammars

http//protein.csail.mit.edu/
43
Extra Slides
44
Prediction Algorithm
  • Parameters represent energetic benefitof a given
    feature in a protein structure
  • Features are fixed, chosen by designer
  • Example features
  • Number of prolines in an alpha helix
  • Number of coils shorter than 2 residues
  • Energy (structure) ?features 2 structure Energy
    (feature)
  • Minimal-energy structure found with dynamic prog.
  • Idea consider all structures, exploiting
    overlapping problems
  • Implemented as HMM using Viterbi algorithm

Structure withMinimal Energy
45
Learning Algorithm
  • Constraints have form
  • For all incorrectly predicted structures Si,
  • in future selection of the parameters w
  • Energyw (Si) gt Energyw (correct structure)
  • Constraints are linear in the energy parameters.
  • If feasible, could solve with linear programming
  • In general, solve with Support Vector Machines
    (SVMs)
  • Energy(Si) Energy (correct structure) 1 -
    ?i (?i 0)
  • Find parameters w minimizing ½ w2 C/n ?i1
    ?i

n
Provides general solution using soft-margin
criterion
Write a Comment
User Comments (0)
About PowerShow.com