Title: Stochastic Roadmap Simulation: An efficient representation and algorithm for analyzing molecular motion
1Stochastic Roadmap Simulation An efficient
representation and algorithm for analyzing
molecular motion
- Mehmet Serkan Apayd?n
- May 27th, 2004
2 Molecular motion is an essential process of
life
3Computing pfold, the best order parameter in
protein folding is expensive using classical
simulation techniques
HIV integrase
Du et al. 98
Folded set
Unfolded set
4Stochastic Roadmap Simulation (SRS)
Develop efficient computational representations
and algorithms to study molecular motion pathways
for protein folding and ligand-protein binding
5Contributions
- New computational framework for studying
molecular motion - Transition probabilities
- Correspondence to Monte Carlo
- First step analysis
- Extension to non-uniform sampling
- Computation of ensemble properties
- protein folding pfold parameter
- comparison with Monte Carlo
- Quantitative predictions of experimental values
- ligand-protein binding escape time
- Qualitative predictions about the role of amino
acids in the active site of a protein - Application to distinguish the catalytic site
from a set of potential binding sites
6Outline
- Background
- Stochastic Roadmap Simulation
- Applications
- Protein folding
- Ligand-protein binding
- Extension of basic framework
- Quantitative prediction of experimental results
on protein folding
7Proteins and their structure
- Macromolecule
- Building block of life.
8Ligand-Protein Binding
9Simulating molecular motion
- Monte Carlo (MC) or Molecular Dynamics
http//folding.stanford.edu
10Molecular Representations
- Atomistic model
- Linkage model
- Internal parameter representation (bond angles,
lengths, torsional angles) - Each secondary structure element as a vector
Lotan 04
11Analogy with Robotics
X3
X0
12Molecular Energetics
- E ES EQ ES-B ETor EvdW Edipole
(cs273)
- Force fields
- Go models
- Hydrophobic-Polar models
13MC simulation
14MC simulation
15Problems with Monte Carlo Simulation
16A path planning technique Probabilistic Roadmaps
(PRM) Kavraki et.al.96
Configuration space
edge
C-obstacle
Preprocessing
17Application of PRM to molecular motion
- Study of ligand-protein binding
- Probabilistic roadmaps with edges weighted by
energetic plausibility - Search for the minimum weight paths
Singh, Latombe, Brutlag, 99
18Application of PRM to molecular motion
- Study of ligand-protein binding
- Probabilistic roadmaps with edges weighted by
energetic plausibility - Search for the minimum weight paths
- Extensions to protein folding
Song and Amato, 01 Apaydin et al., 01
Singh, Latombe, Brutlag, 99
19How many pathways are there in a roadmap?
Number of Self-Avoiding Walks on a 2D Grid
1, 2, 12, 184, 8512, 1262816, 575780564,
789360053252, 3266598486981642, (10x10)
41044208702632496804, (11x11) 1568758030464750013
214100, (12x12) 182413291514248049241470885236
n/m 2 3 4 5 6
2 2
3 4 12
4 8 38 184
5 16 125 976 8512
6 32 414 5382 79384 1262816
http//mathworld.wolfram.com/Self-AvoidingWalk.htm
l
20Outline
- Background
- Stochastic Roadmap Simulation
- Applications
- Protein folding
- Ligand-protein binding
- Future work
21New Idea Stochastic Conformational Roadmaps
Capture the stochastic nature of molecular motion
by assigning probabilities to edges
vi
Pij
vj
Apaydin et. al., RECOMB 02, WAFR02 Collaborato
rs C. Guestrin, D. Hsu
22Edge probabilities
Follow Metropolis criteria
Pij
- Correspond to probabilities in Monte Carlo
simulation.
23Relationship to MC simulation
S
Pij
- Each path on graph a path of MC simulation
- Roadmap represents many MC simulation paths
simultaneously - Stochastic Roadmap Simulation and Monte Carlo
Simulation converge to the same distribution ?
(the Boltzmann distribution).
24Using SRS to compute ensemble properties
Treat roadmap as a Markov chain and use
First-Step Analysis
25Application of SRS to protein folding
Probability of Folding pfold
HIV integrase
Du et al. 98
Folded set
Unfolded set
26First-Step Analysis
- One linear equation per node
- Solution gives pfold for all nodes
- No explicit simulation run
- All pathways are taken into account
- Sparse linear system
l
k
j
Pik
Pil
Pij
m
Pim
i
Pii
Let fi pfold(i) After one step fi Pii fi
Pij fj Pik fk Pil fl Pim fm
27In Contrast
- Computing pfold with MC simulation requires
- Performing many MC simulation runs
- Counting the number of times F is attained first
- for every conformation of interest
28Comparison SRS vs. MC (on synthetic landscape)
29Computational Tests on two real proteins
- 1HDD (Engrailed homeodomain)
- 3 ? helices
- 12 DOF
- 1ROP (repressor of primer)
- 2 ? helices
- 6 DOF
H-P energy model with steric clash exclusion Sun
et al., 95
30Differences in pfold values obtained by SRS and
MC for 1ROP and 1HDD
31pfold on real protein ß hairpin
Immunoglobin binding protein (Protein G) Last 16
amino acids C-a based representation Go model
based energy 42 DOFs Zhou and Karplus, 99
32Comparison between SRS and MC for ß hairpin
33Computation Times (ß hairpin)
Monte Carlo (30 simulations)
Over 107 energy computations
10 hours of computer time
1 conformation
Roadmap
50,000 energy computations
23 seconds of computer time
2000 conformations
6 orders of magnitude speedup!
34Outline
- Background
- Stochastic Roadmap Simulation
- Applications
- Protein folding
- Ligand-protein binding
- Extension of basic framework
- Quantitative prediction of experimental results
on protein folding
35Application of SRS toLigand-Protein Interactions
- Distinguishing catalytic site Among several
potential binding sites, which one is the
catalytic site? - Studying effect of catalytic amino acids upon
binding/unbinding
Apaydin et. al., ECCB 02 Collaborators C.
Guestrin, C. Varma
36Funnels of attractions and escape time from a
funnel
- Potential binding sites
- Funnel Energy gradient around a site that
guides the ligand to that site. - Defined as all ligand conformations within 10A
rmsd of the site. - Camacho and Vajda 01
- Computation of escape time from funnels of
attraction around potential binding sites
37Computing Escape Time with Roadmap
l
k
Pil
Pik
m
Pij
j
Pim
i
Pii
Funnel of Attraction
ti 1 Pii ti Pij tj Pik tk Pil tl Pim
tm (escape time is measured as number of
stepsof stochastic simulation)
0
38Results on lactate dehydrogenase
39Results on lactate dehydrogenase
Loop
CH3
ASP-195
C
ALA-193
O
C
NADH
ASP-166
O
O
ARG-169
40Results on lactate dehydrogenase
GLY-245
41Outline
- Background
- Stochastic Roadmap Simulation
- Applications
- Protein folding
- Ligand-protein binding
- Extension of basic framework
- Quantitative prediction of experimental results
on protein folding
42A non uniform sampling strategy sampling local
minima and saddles of the landscape
Henkelman, Jonsson99
43Adding critical points to the roadmap obtains the
same quality in pfold values with less number of
nodes
44Outline
- Background
- Stochastic Roadmap Simulation
- Applications
- Protein folding
- Ligand-protein binding
- Extension of basic framework
- Quantitative prediction of experimental results
on protein folding
45Using pfold to make quantitative predictions
- Connecting theory with experiment
- Rates
- F values
- Transition State computation using
- Energy barriers considering monotonic pathways
- Pfold considering all pathways
Fersht 99
Garbuzynskiy, Finkelstein, Galzitskaya
04 Collaborators TH Chiang, D. Hsu (N.U.
Singapore)
46F Value Results using pfold are better for 3 (out
of 5) proteins
Protein Correlation to experiment in Garbuzynskiy et. al., 04 Correlation to experiment with pfold
B1 IgG-binding domain of protein G 0.74 0.78
Src SH3 domain 0.63 0.65
SH3 domain of a-spectrin 0.81 0.78
Sso7d 0.58 0.28
CI2 0.35 0.51
47Computing rates with pfold results in better
correlation with experiment
--experimental rate --computed rate
Correlation 0.67
Correlation 0.83
log(kf)
Protein
using pfold
Garbuzynskiy et. al., 04
48Contributions
- New computational framework for studying
molecular motion - Transition probabilities
- Correspondence to Monte Carlo
- First step analysis
- Extension to non-uniform sampling
- Computation of ensemble properties
- protein folding pfold parameter
- comparison with Monte Carlo
- Quantitative predictions of experimental values
- ligand-protein binding escape time
- Qualitative predictions about the role of amino
acids in the active site of a protein - Application to distinguish the catalytic site
from a set of potential binding sites
49Future work
- Non-uniform sampling on high-dimensional examples
- Computing and reducing the error in the computed
parameters - Estimating the number of nodes needed
- Exploring larger systems and pushing the
experiment
50SRS code available! Visit http//robotics.stanfor
d.edu/apaydin/software.html
51Acknowledgements
- My advisors
- Prof. Latombe, Prof. Brutlag
- Prof. Van Roy
- Prof. McCluskey
- My committee
- Prof. Motwani, Prof. Vuckovic
- Coauthors
- D. Hsu, C. Guestrin, S. Kasif, A. Singh, C.
Varma - Collaborators
- TH Chiang, J. Greenberg, S. Ieong, F. Schwarzer,
R. Singh, A. Tellez - Faculty
- Prof. Altman, Prof. Baldwin, Prof. Guibas, Prof.
Pande - Prof. Kavraki (Rice)
- Prof. Zell (Tuebingen)
- Prof. Snoeyink (UNC)
- Funding
- David L. Cheriton Stanford Graduate Fellowship
- NSF Biogeometry grant
- Stanfords Bio-X program
- Resources
- Bio-X SGI Supercomputer, Bio-X PC computer
cluster - Colleagues
- N. Batada, A. Ben-Hur, S. Bennett, E. Boas, T.
Bretl, J. Brown, F. Buron, L. Chong, A. Collins,
S. Elmer, P. Fong, A. Garg, S. Gokturk, H.
Gonzales-Banos, K. Hauser, G. Henkelman, P. Isto,
G. Jayachandran, J. Kuffner, S. Larson, M. Liang,
B. Naughton, X. Liu, I. Lotan, H. Mandyam, N.
Mitra, S. Mitra, A. Nguyen, YM Rhee, D. Russel,
M. Saha, G. Sanchez-Ante, S. Saxonov, S.
Schmidler, J. Shapiro, J. Shin, P. Shirvani, M.
Shirts, C. Snow, C. Yu, B. Zagrovic, A.
Zomorodian - Staff
- I. Contreras, P. Cook, J. Engelson, K. Hedjasi,
J. McCormick, H. Nguyen, N. Riewerts, D. Shankle - Friends and family
52