Protein Secondary Structure Prediction C606 Paper presentation - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Protein Secondary Structure Prediction C606 Paper presentation

Description:

A Two-stage Classifier for Protein -turn Prediction using Support Vector Machines ... Predicting -turns -turns are a type of coil. Frequent. Important in ... – PowerPoint PPT presentation

Number of Views:133
Avg rating:3.0/5.0
Slides: 22
Provided by: duanes
Category:

less

Transcript and Presenter's Notes

Title: Protein Secondary Structure Prediction C606 Paper presentation


1
Protein Secondary Structure PredictionC606 Paper
presentation
2
The Papers
  • Protein Secondary Structure Prediction based on
    Position Specific Scoring Matrices
  • David T. Jones
  • J. Mol. Biol. (1999)
  • (PSIPRED)
  • A Two-stage Classifier for Protein ?-turn
    Prediction using Support Vector Machines -
  • Chiu et al.
  • Unpublished (2006)

3
Protein Structure Review
  • What you should already know
  • 8 protein states
  • HBEGITSC
  • Reduce to 3 states
  • G, H ? H
  • E, B ? E
  • All others ? C

4
Why Predict Structure?
  • Structure implies function
  • Functional sites are often
  • Drug targets
  • Protein interaction sites
  • Thus, learning about structure helps us to
    understand life itself

5
PSIPRED Method
  • Create a PSIBLAST database
  • Extract non-redundant proteins from DB
  • Filter to remove
  • Repetitive regions
  • Transmembrane regions

6
PSIPRED Method
  • Run PSIBLAST using our DB
  • 3 iterations
  • Extract PSSM
  • l x 20 where l is sequence length
  • Each matrix entry Mij represents the likelihood
    of an amino type j appearing at i
  • Scale to 0,1

7
PSIPRED Method
  • Entries of PSSM becomes features
  • Take a window of 15 aa

8
PSIPRED Method
  • Neural network
  • Inspired by biology
  • Highly interconnected elements
  • Learn by example to associate certain inputs with
    certain outputs
  • Hidden layers encode this relation

9
PSIPRED Method
  • 2 Neural networks
  • 1st network
  • 315 inputs (1521) (why 21?)
  • 75 hidden units
  • 3 outputs
  • 2nd network
  • 60 inputs (153)
  • 60 hidden units
  • 3 outputs

10
PSIPRED Evaluation
  • Cross validation for SS is tricky
  • if proteins in test and train fold are too
    similar results will be artificially high
  • 3 partitions, 187 proteins
  • No similarity across partitions
  • Sequence
  • Fold
  • CATH (structure classification DB)

11
PSIPRED Results
  • Cross validation
  • Q3 76
  • SOV 73.5
  • CASP3 Q3
  • PSIPRED 76.3
  • JPRED 72.4 (includes PHD)

12
Predicting ?-turns
  • ?-turns are a type of coil
  • Frequent
  • Important in ?-hairpins and ?-sheets
  • Often in exposed regions
  • Most likely to interact with other proteins

13
SVM
  • Support Vector Machines
  • Again, learn to associate input with output
  • Represent input as points in n-dimensional space
  • Define a maximum margin hyperplane to separate
    two kinds of input in space

14
?-turn Methods
  • Input
  • Primary sequence
  • PSSM
  • HYPROSPII output
  • Combines PSIPRED and PROSP
  • PROSP uses DB of small structure segments
  • Sliding window
  • w1 15
  • w2 9

15
?-turn Methods
  • 2 levels of SVMs

16
?-turn Methods
  • 1st level

H/E
SVM_TREE1
SVM_TREE3
SVM_TREE2
17
?-turn Methods
  • 2nd level
  • Combine results from 1st level
  • SVM_MAX_D
  • pick most sure answer
  • SVM_VOTE
  • Majority rules vote
  • If classified as Coil proceed to ?/? SVM

18
?-turn Methods
  • ?/? SVM
  • Takes
  • output of SVM_VOTE/SVM_MAX_D
  • PSSM
  • Primary sequence
  • HYPROSPII (SSE)

?/?
?
?

19
?-turn Results
  • 10 fold cross validation, 426 proteins
  • SVM_VOTE gt SVM_MAX_D
  • SVM_TREE3 performs poorly, but has best precision
  • reluctant to classify a coil, but usually right
    when it does

20
?-turn Results
  • Q3 79.25 (?1.1)
  • Current best is BetaTurn 1.1
  • Q3 77.3
  • No standard deviation reported

21
Questions?
Write a Comment
User Comments (0)
About PowerShow.com