Molecular Motion Pathways: Computation of Ensemble Properties with Probabilistic Roadmaps - PowerPoint PPT Presentation

About This Presentation

Title:

Molecular Motion Pathways: Computation of Ensemble Properties with Probabilistic Roadmaps

Description:

Title: Slide 1 Author: latombe Created Date: 7/23/2003 11:41:08 PM Document presentation format: On-screen Show Company: stanford university Other titles – PowerPoint PPT presentation

Number of Views:71

Avg rating:3.0/5.0

Slides: 66

Provided by: lat63

Learn more at: http://web.stanford.edu

Category:

more less

Transcript and Presenter's Notes

Title: Molecular Motion Pathways: Computation of Ensemble Properties with Probabilistic Roadmaps

1
Molecular Motion Pathways Computation of
Ensemble Properties with Probabilistic Roadmaps

A.P. Singh, J.C. Latombe, and D.L. Brutlag. A
Motion Planning Approach to Flexible Ligand
Binding. Proc. 7th Int. Conf. on Intelligent
Systems for Molecular Biology (ISMB), AAAI Press,
Menlo Park, CA, pp. 252-261, 1999.
N.M. Amato, K.A. Dill, and G. Song. Using Motion
Planning to Map Protein Folding Landscapes and
Analyze Folding Kinetics of Known Native
Structures. J. Comp. Biology, 10(2)239-255,
2003.
M.S. Apaydin, D.L. Brutlag, C. Guestrin, D. Hsu,
J.C. Latombe, and C. Varma. Stochastic Roadmap
Simulation An Efficient Representation and
Algorithm for Analyzing Molecular Motion. J.
Comp. Biology, 10(3-4)257-281, 2003.
N. Singhal, C.D. Snow, and V.S. Pande. Using Path
Sampling to Build Better Markovian State Models
Predicting the Folding Rate and Mechanism of a
Tryptophan Zipper Beta Hairpin, J. Chemical
Physics, 121(1)415-425, 2004.
J. Cortés, T. Siméon, M. Renaud-Siméon, and V.
Tran. Geometric Algorithms for the Conformational
Analysis of Long Protein Loops. J. Comp.
Chemistry, 25956-967, 2004.

2
Molecular motion is an essential process of life
Mad cow disease is caused by misfolding
Drug molecules act bybinding to proteins
3
So, studying molecular motion is of critical
importance in molecular biology
However, few tools are available

Computer simulation
Monte Carlo simulation
Molecular Dynamics

4
Two Major Drawbacks of MD and MC Simulation

Each simulation run yields a single pathway,
while molecules tend to move along many different
pathways
? Interest in ensemble properties

5
Example of Ensemble Property Probability of
Folding pfold
Measure kinetic distance to folded state
Du, Pande, Grosberg, Tanaka, and Shakhnovich.
On the Transition Coordinate for Protein Folding.
Journal of Chemical Physics (1998).
Unfolded state
Folded state
6
Other Examples of Ensemble Properties

Folding
Order of formation of SSEs
Folding rate / Mean first passage time
Key intermediates
Binding
Average time to escape from active site
Average energy barrier

7
Two Major Drawbacks ofMD and MC Simulation

Each simulation run yields a single pathway,
while molecules tend to move along many different
pathways
Each simulation run tends to waste much time in
local minima

8
? Roadmap-Based Representation

Compact representation of many motion pathways
Coarse resolution relative to MC and MD
simulation
Efficient algorithms for analyzing multiple
pathways

9
Roadmaps for Robot Motion Planning
10
Initial Work A.P. Singh, J.C. Latombe, and D.L.
Brutlag. A Motion Planning Approach to Flexible
Ligand Binding. Proc. 7th ISMB, pp. 252-261, 1999

Study of ligand-protein binding
The ligand is a small flexible molecule, but the
protein is assumed rigid
A fixed coordinate system P is attached to the
protein and a moving coordinate system L is
defined using three bonded atoms in the ligand
A conformation of the ligand is defined by the
position and orientation of L relative to P and
the torsional angles of the ligand

11
Roadmap Construction (Node Generation)

The nodes of the roadmap are generated by
sampling conformations of the ligand uniformly at
random in the parameter space (around the
protein)
The energy E at each sampled conformation is
computed
E Einteraction Einternal Einteraction
electrostatic van der Waals potential Einterna
l Snon-bonded pairs of atoms electrostatic
van der Waals

12
Roadmap Construction (Node Generation)

The nodes of the roadmap are generated by
sampling conformations of the ligand uniformly at
random in the parameter space (around the
protein)
The energy E at each sampled conformation is
computed
E Einteraction Einternal Einteraction
electrostatic van der Waals potential Einterna
l Snon-bonded pairs of atoms electrostatic
van der Waals
A sampled conformation is retained as a node of
the roadmap with probability 0 if E gt Emax
Emax-E
Emax-Emin
1 if E lt Emin
? Denser distribution of nodes in low-energy
regions of conformational space

13
Roadmap Construction (Edge Generation)

Each node is connected to its closest neighbors
by straight edges
Each edge is discretized so that between qi and
qi1 no atom moves by more than some e ( 1Å)
If any E(qi) gt Emax , then the edge is rejected

E
14
Roadmap Construction (Edge Generation)

Any two nodes closer apart than some threshold
distance are connected by a straight edge
Each edge is discretized so that between qi and
qi1 no atom moves by more than some e ( 1Å)
If all E(qi) ? Emax , then the edge is retained
and is assigned two weights w(q?q) and w(q?q)
where
(probability that the ligand moves from qi to
qi1 when it is constrained to move along the
edge)

15
Querying the Roadmap

For a given goal node qg (e.g., binding
conformation), the Dijkstras single-source
algorithm computes the lowest-weight paths from
qg to each node (in either direction) in O(N
logN) time, where N number of nodes
Various quantities can then be easily computed
in O(N) time, e.g., average weights of all
paths entering qg and of all paths leaving qg
( binding and dissociation rates Kon and Koff)

Protein Lactate dehydrogenase Ligand Oxamate (7
degrees of freedom)
16
Experiments on 3 Complexes

PDB ID 1ldm
Receptor Lactate Dehydrogenase (2386 atoms, 309
residues)
Ligand Oxamate (6 atoms, 7 dofs)
PDB ID 4ts1
Receptor Mutant of tyrosyl-transfer-RNA
synthetase (2423 atoms, 319 residues)
Ligand L- leucyl-hydroxylamine (13 atoms, 9
dofs)
PDB ID 1stp
Receptor Streptavidin (901 atoms, 121 residues)
Ligand Biotin (16 atoms, 11 dofs)

17
Computation of Potential Binding Conformations

Sample many (several 1000s) ligands
conformations at random around protein
Repeat several times
Select lowest-energy conformations that are
close to protein surface
Resample around them
Retain k (10) lowest-energy conformations
whose centers of mass are at least 5Å apart

lactate dehydrogenase
18
Results for 1ldm

Some potential binding sites have slightly lower
energy than the active site ? Energy is not a
discriminating factor
Average path weights (energetic difficulty) to
enter and leave binding site are significantly
greater for the active site ? Indicates that the
active site is surrounded by an energy barrier
that traps the ligand

19
(No Transcript)
20
Application of Roadmaps to Protein Folding
N.M. Amato, K.A. Dill, and G. Song. Using Motion
Planning to Map Protein Folding Landscapes and
Analyze Folding Kinetics of Known Native
Structures. J. Comp. Biology, 10(2)239-255, 2003

Known native state
Degrees of freedom f-? angles
Energy van der Waals, hydrogen bonds,
hydrophobic effect
New idea Sampling strategy
Application Finding order of SSE formation

21
Sampling Strategy(Node Generation)

High dimensionality ? non-uniform sampling
Conformations are sampled using Gaussian
distribution around native state
Conformations are sorted into bins by number of
native contacts (pairs of C? atoms that are
closeapart in native structure)
Sampling ends when all bins have minimum number
of conformations ? good coverage of
conformational space

22
Application Order of Formation of Secondary
Structures

The lowest-weight path is extracted from each
denatured conformation to the folded one
The order of formation of SSEs is computed along
each path
The formation order that appears the most often
over all paths is considered the SSE formation
order of the protein

23
Method

The contact matrix showing the time step when
each native contact appears is built

24
Protein CI2 (1a 4 b)
25
Protein CI2 (1a 4 b)
26
Method

The contact matrix showing the time step when
each native contact appears is built
The time step at which a structure appears is
approximated as the average of the appearance
time steps of its contacts

27
a forms at time step 122 (II) b3 and b4 come
together at 187 (V) b2 and b3 come together at
210 (IV) b1 and b4 come together at 214 (I) a and
b4 come together at 214 (III)
Protein CI2 (1a 4 b)
28
Method

The contact matrix showing the time step when
each native contact appears is built
The time step at which a structure appears is
approximated as the average of the appearance
time steps of its contacts

29
Comparison with Experimental Data
30
Stochastic Roadmaps M.S. Apaydin, D.L. Brutlag,
C. Guestrin, D. Hsu, J.C. Latombe and C. Varma.
Stochastic Roadmap Simulation An Efficient
Representation and Algorithm for Analyzing
Molecular Motion. J. Comp. Biol.,
10(3-4)257-281, 2003

New Idea Capture the stochastic nature of
molecular motion by assigning probabilities to
edges

31
Edge probabilities
Follow Metropolis criteria
Self-transition probability
vj
Roadmap nodes are sampled uniformly at random
and energy profilealong edges is not considered
32
Stochastic Roadmap Simulation
V
Pij
33
Roadmap as Markov Chain
j
Pij
i

Transition probability Pij depends only on i and
j

34
Example 1 Probability of Folding pfold
Unfolded state
Folded state
35
First-Step Analysis

One linear equation per node
Solution gives pfold for all nodes
No explicit simulation run
All pathways are taken into account
Sparse linear system

l
k
j
Pik
Pil
Pij
m
Pim
i
Pii
Let fi pfold(i) After one step fi Pii fi
Pij fj Pik fk Pil fl Pim fm
36
Number of Self-Avoiding Walks on a 2D Grid
1, 2, 12, 184, 8512, 1262816, 575780564,
789360053252, 3266598486981642, (10x10)
41044208702632496804, (11x11) 1568758030464750013
214100, (12x12) 182413291514248049241470885236 gt
1028
http//mathworld.wolfram.com/Self-AvoidingWalk.htm
l
37
In contrast

Computing pfold with MC simulation requires
For every conformation q of interest
Perform many MC simulation runs from q
Count number of times F is attained first

38
Computational Tests

1ROP (repressor of primer)
2 a helices
6 DOF

1HDD (Engrailed homeodomain)
3 a helices
12 DOF

H-P energy model with steric clash exclusion Sun
et al., 95
39
Correlation with MC Approach
1ROP
40
pfold for ß hairpin
Immunoglobin binding protein (Protein G) Last 16
amino acids Ca based representation Go model
energy function 42 DOFs Zhou and Karplus,
99
41
Comparison between SRS and MC for ß hairpin
for 100 conformations
42
Computation Times (ß hairpin)
Monte Carlo (30 simulations)
Over 107 energy computations
10 hours of computer time
1 conformation
Roadmap
50,000 energy computations
23 seconds of computer time
2000 conformations
6 orders of magnitude speedup!
43
Example 2 Ligand-Protein Interaction
Computation of escape time from funnels of
attraction around potential binding
sites Funnel of attraction ball of 10Å rmsd
around bound state Camacho and Vajda, 01
44
Computation Through Simulation Sept, Elcock and
McCammon 99
10K to 30K independent simulations
45
Computing Escape Time with Roadmap
ti 1 Pii ti Pij tj Pik tk Pil tl Pim
tm (escape time is measured as number of
stepsof stochastic simulation)
0
46
Distinguishing Active Site

Given several potential binding sites,which one
is the active one?

Energy electrostatic van der Waals solvation
free energy terms
47
Complexes Studied
ligand protein random nodes DOFs
oxamate 1ldm 8000 7
Streptavidin 1stp 8000 11
Hydroxylamine 4ts1 8000 9
COT 1cjw 8000 21
THK 1aid 8000 14
IPM 1ao5 8000 10
PTI 3tpi 8000 13
48
Distinction Using Escape Time
Protein Bound state Best potential binding site
1stp 3.4E9 1.1E7
4ts1 3.8E10 1.8E6
3tpi 1.3E11 5.9E5
1ldm 8.1E5 3.4E6
1cjw 5.4E8 4.2E6
1aid 9.7E5 1.6E8
1ao5 6.6E7 5.7E6
Able to distinguish catalytic site
Not able
( steps)
49
Using Path Sampling to Construct Roadmaps N.
Singhal, C.D. Snow, and V.S. Pande. Using Path
Sampling to Build Better Markovian State Models
Predicting the Folding Rate and Mechanism of a
Tryptophan Zipper Beta Hairpin, J. Chemical
Physics, 121(1)415-425, 2004

New idea
Paths computed with Molecular Dynamics
simulation techniques are used to create the
nodes of the roadmap? More pertinent/better
distributed nodes
? Edges are labeled with the time needed to
traverse them

50
Sampling Nodes from Computed Paths (Path Shooting)
F
U
51
Sampling Nodes from Computed Paths (Path Shooting)
F
U
52
Node Merging

If two nodes are closer apart than some e, they
are merged into one and merging rules are applied
to update edge probabilities and times

53
Node Merging

If two nodes are closer apart than some e, they
are merged into one and merging rules are applied
to update edge probabilities and times

? Approximately uniform distribution of nodes
over the reachable subset of conformational space
54
Application Computation of MFPT

Mean First Passage Time the average time when a
protein first reaches its folded state
First-Step Analysis yields
MPFT(i) Sj Pij x (tij MPFT(j))
MPFT(i) 0 if i ? F
Assuming first-order kinetics, the probability
that a protein folds at time t is
where r is the folding rate
MFPT 1/r

55
Computational Test

12-residue tryptophan zipper beta hairpin (TZ2)
Folding_at_Home used to generate trajectories (fully
atomistic simulation) ranging from 10 to 450 ns
1750 trajectories (14 reaching folded state)
? 22,400-node roadmap
MFPT 2-9 ms, which is similar to experimental
measurements (from fluorescence and IR)

56
Conformational Analysis of Protein Loops J.
Cortés, T. Siméon, M. Renaud-Siméon, and V. Tran.
Geometric Algorithms for the Conformational
Analysis of Long Protein Loops. J. Comp.
Chemistry, 25956-967, 2004

New idea
Explore the clash-free subset of the
conformational space of a loop, by building a
tree-shaped roadmap
Kinematic model f-y angles on the backbone ci
torsional angles in side-chains

Amylosucrase (AS)
- Only enzyme in its family that acts on
sucrose substrate
The 17-residue loop (named loop 7) between
Gly433 and Gly449 is
believed to play a pivotal role

58
Roadmap Construction

A tree-shaped roadmap is created from a start
conformation qstart
At each step of the roadmap construction, a
conformation qrand of the loop is picked at
random, and a new roadmap node is created by
iteratively pulling toward it the existing node
that is closest to qrand

59
Roadmap Construction
C
Cfree
Cclosed
qstart
60
Roadmap Construction
C
Cfree
Cclosed
qstart
61
Roadmap Construction
C
Cfree
Cclosed
qstart
62
Roadmap Construction
C
Cfree
Cclosed
qstart
Stops when one cant get closer to qrand or a
clash is detected
63
Computational Results

Surprisingly, loop 7 cant move much
Main bottleneck is residue Asp231

Positions of theCa atom of middleresidue
(Ser441)
64
Computational Results

Surprisingly, loop 7 cant move much
Main bottleneck is residue Asp231

65
Computational Results

If residue Asp231 is removed, then loop 7s
mobility increases dramatically. The Ca atom of
Ser441 can be displaced by more than 9Å from its
crystallographic position

66
Conclusion

Probabilistic roadmaps are a recent, but
promising tool for exploring conformational space
and computing ensemble properties of molecular
pathways
Current/future research
Better sampling strategies able to handle more
complex molecular models (protein-protein
binding)
More work to include time information in
roadmaps
More thorough experimental validation to compare
computed and measured quantitative properties