Title: Practical Aspects of Structure Prediction Michael Feig, Michigan State University
1Practical Aspects of Structure
PredictionMichael Feig, Michigan State
University
Theory and Computation in Molecular Biological
Physics Center for Theoretical Biological
Physics Research Workshop and Summer
School August 9-20, 2004 La Jolla, CA
2What is protein structure prediction?
Amino Acid Sequence
3D Model of Biologically Active Conformation
Structural Analogy
Empirical Knowledge
Physical Theory
Experimental Constraints
3Why structure prediction?
- Structural information leads to protein
function - Protein structures allow rational drug design
- Difficulties in experimental structure
determination - Can be fully automated
4Protein Structure Elements
5Secondary Structure Prediction
Helical? Extended? Random coil? Disordered?
Accuracy
PSIPRED http//bioinf.cs.ucl.ac.uk/psipred/ JPRED
http//www.compbio.dundee.ac.uk/www-jpred/ SA
M-T99 http//www.cse.ucsc.edu/research/compbio/HM
M-apps/T99-query.html
6Tertiary Structure Prediction
How do secondary structure elements fold?
Prediction through homology, analogy, ab initio
7Quaternary Structure Prediction
Domain organization? Oligomers? Complexes?
Prediction requires modeling of protein-protein
interactions
8Sequence Homology
Human thioredoxin (1AUC)
SDKIIHLTDDSFDTDVLKADGAILVDFWAEWCGPCKMIAPILDEIADEYQ
GKLTVA . .. . .. ..
....... . . MVKQIESKTAFQEALDAAGDKLVVVDFSATWCGPCKM
IKPFFHSLSEKYSNVIFL- KLNIDQNPGTAPKYGIRGIPTLLLFKNG
EVAATKVGALSKGQLKEFLDAN---LA ..... . . ..
.. ... . ..
. EVDVDDCQDVASECEVKCMPTFQFFKKGQ----KVGEFS-GANKEKL
EATINELV
E. Coli thioredoxin (1THO)
9Comparative Modeling
Assumption Proteins with similar sequence
have similar structure
E. Coli thioredoxin (1THO)
Human thioredoxin (1AUC)
10Structural Templates from Homology
- Challenges
- Correct alignment
- Loop modeling
- Side chain rebuilding
PGTAPKYGIRGIPTLLLFKNGEVAATKVGALSKGQLKEFLDAN---LA
. . .. .. ... . ..
. QDVASECEVKCMPTFQFFKKGQ----KVGEFS-GANKEKLEATINEL
V
11Accuracy of Predictions by Homology
automatic predictions by SWISS-MODEL web service
12Prediction through Fold Recognition
Assumption Proteins with similar secondary
structure share fold
1N91
1JRM
13Structural Templates from Fold Recognition
- Challenges
- Good template
- Correct alignment
- Fragment modeling
- Refinement
14Ab initio Structure Prediction
Theoretical models provide energetic
description Sampling generates protein-like
conformations Scoring identifies
native-like structure with lowest free energy
SEALGDTIVKNA
15Conformational Sampling
- Needs to generate protein-like conformations
- Needs to include near-native structures
- Needs to be computationally efficient
- Suitable methods
- gt Low resolution on/off-lattice models
(SICHO) - gt Assembly from short protein fragments
(ROSETTA) - gt Torsional space sampling of all-atom
models
16Reduced Protein Representations
all-atom
Ca side chains
lattice
17SICHO Lattice Model
Monte Carlo simulations gt Attempt move gt
Compute DE gt Accept with probability p
Simulated annealing Constant Temperature Replica
Exchange Sampling MONSSTER program
Kolinski Skolnick Proteins 32, 475 (1998)
18SICHO Energy FunctionKnowledge-Based Terms
- Side chain burial propensity
- follows Kyte-Doolittle scale
- Centrosymmetric bias
- rg 2.2 Nres0.38
19SICHO Energy FunctionPMF-based Statistical
Terms
- Short range interactions
- PMF based on statistical
- analysis of PDB
helix
extended
20Conformational Sampling with SICHO
Protein A
21Scoring Functions
- Knowledge-based/statistical
- derived from known protein structures
- limited by training data
- usually fast -gt useful as filters
- Force field based
- model physical energy landscape
- more robust and transferable
- often expensive -gt useful for final
scoring
22Desirable Scoring Function
23MMPB(GB)/SA Scoring Function
DGMM DUbonded DUvdW DUelec
- Solute-solvent interactions
DGsolv DGPB/GB DGhyd DGsolv,vdW
- Total relative free energy
DGMMPB/SA DGMM DGsolv TDSprotein
24Scoring of Lattice ConformationsProtein A
25Performance of Force Field Based
ScoringEvaluation of CASP4 Predictions
M. Feig, C. L. Brooks III Proteins 2002, 49,
232-245
26Global Distance Test
GDT(r) How many overlapping continuous
residues are within r Å?
GDT(TS) (GDT(8) GDT(4) GDT(2) GDT(1))/8
27GDT vs. RMSD
28Ab initio Structure Prediction Protocol
Secondary Structure HM/FR Templates
Lattice Model Sampling Replica Exchange Monte
Carlo Sampling
-gt 20000 _at_ 5-25 Å RMSD
Reconstruction
All-Atom Clustering Scoring MMGB/SA
-gt 10-100 _at_ 5-10 Å RMSD
Refinement Replica Exchange Molecular Dynamics
-gt 1-5 _at_ 2-4 Å RMSD
29Sampling ScoringAb initio predictions of DNase
fragmentation factor (1KOY)
30Sampling, Scoring, ClusteringAb initio
predictions of DNase fragmentation factor (1KOY)
31Ab initio Predictions DNase fragmentation factor
Best-scoring Prediction 7.4 Å RMSD
NMR structure 1KOY
32Ab initio Sampling in Template-based Structure
Prediction
- Template provides
- known protein
- structure
- Ab initio sampling of
- unknown fragments
- in the context of
- template
33Template Restraints Near Flexible Part
Restraint potential
0.1
0.0
0.2
0.4
1.0
0.7
34Loop Sampling
35Sampling with Restraints
- Secondary structure bias
- Secondary structure prediction
- NMR shift data
- Distance restraints
- Experimental restraints
- Side chain contacts from analogous
structures - Template restraints
- Homologous/Analogous Structures (PDB)
36Structure Prediction Restrainedby Sparse
Experimental Data
- Secondary structure information
- NMR chemical shifts, circular dichroism
- Distance restraints (atom-atom,
residue-residue) - NMR NOEs, EPR, cross-linking, TRP
flourescence - Relative orientation of structural elements
- NMR dipolar coupling
- Surface residue distribution
- Antibody epitope scanning
- Molecular shape envelope
- SAXS, electron microscopy
37Structure Refinement
?
predicted
native (NMR)
38Sampling Towards the Native Basin
E
states
q
native basin
39Energy Landscape Towards NativeT0167
native
40Multi-Scale Sampling
Low Resolution
All-Atom
41Multi-Scale Sampling Scheme
Low Resolution Sampling
All-Atom Reconstruction
All-Atom Energy Evaluation
Metropolis
reject
Save Conformation
42Refined Structure Prediction Protocol
Homology Template
Low-Resolution Sampling Long simulated
annealing/replica exchange runs
All-Atom Scoring Clustering, Minimization,
MMGB/SA
5-10 Å RMSD
Low-Resolution Sampling Very short simulated
annealing runs
All-Atom Scoring Clustering, 1-10 ps MD, MMGB/SA
average
2-4 Å RMSD
gt20 Cycles
All-Atom Replica Exchange Sampling gt1 ns per
replica
1-2 Å RMSD
experimental-like native structure?
43Structure Prediction Summary