Rapid Discrimination of Decoy Protein Structures Using Multiple Energy Functions PowerPoint PPT Presentation

presentation player overlay
1 / 24
About This Presentation
Transcript and Presenter's Notes

Title: Rapid Discrimination of Decoy Protein Structures Using Multiple Energy Functions


1
Rapid Discrimination of Decoy Protein Structures
Using Multiple Energy Functions
  • Beating back bogus biomolecules since 2004.

2
Machine Learning vs. Bioinformatics vs.
Structural Biology in a battle to the finish!
3
Designing a Potential
  • Preserve hydrophobic character
  • Use standard threading f (Bryant Lawrence)
  • Preserve hydrogen bonding
  • Calculate Hbond geometry and strength
  • Preserve van der Waals character
  • Count bad contacts/distances
  • Preserve good stereochemistry
  • Ramachandran allowed/disallowed regions
  • Preserve globular shape/size
  • Limit radius of gyration
  • Preserve secondary structure
  • Assess predicted 2o to observed 2o

4
Energy Functions
  • Threading Energy
  • Bryant Lawrence
  • Ramachandran Score
  • Secondary Structure Score

Si score for residue i according to the
Ramachandran plot core 0, allowed 1,
generally allowed 2, disallowed 3
5
Energy Functions
  • Radius of Gyration Score

Dij C? distance between residue i and
centroid Pradgyr n0.33 2.4
6
Energy Functions
  • Hydrogen Bond Score

Ei Hydrogen bond energy for residue i with
hydrogen bond Nh Number of residues involved
in hydrogen and disulfide bonds
7
Energy Functions
  • Bump Score






8
Other Scores
  • Average Hbond Energy
  • Hbonds (and various functions thereof)
  • Hbonds/SS
  • (Actual - Theoretical) Radius of Gyration (and
    absolute value)
  • Secondary Structure Errors
  • Beta, Helix, Coil
  • Statistical chi functions

9
Other Inputs
  • Secondary structure as calculated by PsiPred
    based only on the protein sequence
  • Actual secondary structure as computed by VADAR

10
Development Data Sources
  • Real structures Richardson lab Top500 set, as
    assessed by their All-Atom Contact method
  • http//kinemage.biochem.duke.edu/databases/top500.
    php
  • Decoy Structures 76,200 decoys of 41 structures
    derived from the Rosetta process
  • (Baker Lab, from Tsai et al, 2001)

11
Upcoming Data Sources
  • PDB Select 25 set
  • Trents NMR structures
  • Structures generated by Homodeller for BacMap
  • Homodeller forced to do bad threading

12
Initial Idea
  • Try to predict RMSD from real structure based on
    some combination of the scoring functions (using
    MINER and other methods)
  • Modest success R0.7, Spearman 0.7

13
Simpler Task
  • Just distinguish real from decoy structures.
  • Unclear which scores are important, so use data
    driven method.

14
C4.5 (Quinlan)
  • Induces a decision tree from data
  • Splits the data set at branch based on
    information-theoretic criteria
  • Very general
  • Handles discrete and numeric attributes equally
    well.

15
Sample Decision Tree
  • chi4n lt 0.502986
  • beta_perc lt 0.1875
  • hbond_erg lt -1.301656
  • helix_perc lt 0.675
  • rofg lt 0.455691 no
    (10.0)
  • rofg gt 0.455691 yes
    (16.0/1.0)
  • helix_perc gt 0.675
  • bump lt 0.000773
  • hbond_erg lt
    -1.448668
  • ss lt 0.740741
    yes (13.0/2.0)
  • ss gt 0.740741 no
    (2.0)

16
Should I Go To This Talk?
Should I Go To This Talk?
Information S(prob)-log(prob) 0.811 bits
17
Split on Interesting
Info 0.721 bits
Info 0.918 bits
Total Info 6/8 0.721 2/8 0.918 0.771
18
Split on Food
Info 0 bits
Info 0.918 bits
Total Info 3/8 0.918 0.344
19
Real/Decoy Decision Tree
  • 127 branches, 64 leaves (equivalently 64 rules)
  • Real structures (446) 384 right, 62 wrong
  • Decoys (63546) 63537 right, 9 wrong
  • Given a real structure, the tree will identify it
    correctly 86 of the time.
  • Given a decoy, theres an extremely good chance
    (99.986) it will be identified as such.

20
The Most Effective Rule
  • If abs(actual radius of gyration theoretical
    radius of gyration) lt 2.698442, AND
  • (Fraction of Residues Predicted As Beta and
    Evaluated as Helix) 0, AND
  • Chi Score4 gt 0.502986, AND
  • (Fraction of Residues Predicted As Beta and
    Evaluated as Beta) lt 0.468750, THEN
  • Its a decoy
  • Correct in 49862 cases, wrong in 36.
  • A real finding, or a bias in how Rosetta
    generates decoys?

21
Chi Scoring Functions
  • Another decision tree
  • Inputs
  • amino acid type
  • Phi
  • Psi
  • Next AA
  • Next chi
  • Preceding AA
  • Preceding Chi
  • Coil or Not
  • (Chi2 angle)

22
Chi Results
  • 1296 branches, 917 leaves
  • w/ chi2, correctly predicts 80 of chi angles
  • w/o chi2, correctly predicts 73 of chi angles

23
Going Forward
  • Use a lot more datasets and different
    combinations thereof.
  • Try predicting RMSD within classes (0 2 A, 2
    4 A, etc.)
  • Examine incorrectly classified structures more
    carefully
  • Test with contrived structures (i.e. real
    structure with polar residue in core)
  • Different machine learning methods.

24
Acknowledgements
  • Joey Cruz
  • The Proteome Analysts (Roman Eisner, Alona Fyshe,
    Brandon Pearcey, Brett Poulin)
  • Warren Ward
  • Dr. Wishart
  • Haiyan Zhang
Write a Comment
User Comments (0)
About PowerShow.com