Molecular Motion Pathways: Computation of Ensemble Properties with Probabilistic Roadmaps - PowerPoint PPT Presentation

About This Presentation
Title:

Molecular Motion Pathways: Computation of Ensemble Properties with Probabilistic Roadmaps

Description:

Title: Slide 1 Author: latombe Created Date: 7/23/2003 11:41:08 PM Document presentation format: On-screen Show Company: stanford university Other titles – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 66
Provided by: lat63
Learn more at: http://web.stanford.edu
Category:

less

Transcript and Presenter's Notes

Title: Molecular Motion Pathways: Computation of Ensemble Properties with Probabilistic Roadmaps


1
Molecular Motion Pathways Computation of
Ensemble Properties with Probabilistic Roadmaps
  1. A.P. Singh, J.C. Latombe, and D.L. Brutlag. A
    Motion Planning Approach to Flexible Ligand
    Binding. Proc. 7th Int. Conf. on Intelligent
    Systems for Molecular Biology (ISMB), AAAI Press,
    Menlo Park, CA, pp. 252-261, 1999.
  2. N.M. Amato, K.A. Dill, and G. Song. Using Motion
    Planning to Map Protein Folding Landscapes and
    Analyze Folding Kinetics of Known Native
    Structures. J. Comp. Biology, 10(2)239-255,
    2003.
  3. M.S. Apaydin, D.L. Brutlag, C. Guestrin, D. Hsu,
    J.C. Latombe, and C. Varma. Stochastic Roadmap
    Simulation An Efficient Representation and
    Algorithm for Analyzing Molecular Motion. J.
    Comp. Biology, 10(3-4)257-281, 2003.
  4. N. Singhal, C.D. Snow, and V.S. Pande. Using Path
    Sampling to Build Better Markovian State Models
    Predicting the Folding Rate and Mechanism of a
    Tryptophan Zipper Beta Hairpin, J. Chemical
    Physics, 121(1)415-425, 2004.
  5. J. Cortés, T. Siméon, M. Renaud-Siméon, and V.
    Tran. Geometric Algorithms for the Conformational
    Analysis of Long Protein Loops. J. Comp.
    Chemistry, 25956-967, 2004.

2
Molecular motion is an essential process of life
Mad cow disease is caused by misfolding
Drug molecules act bybinding to proteins
3
So, studying molecular motion is of critical
importance in molecular biology
However, few tools are available
  • Computer simulation
  • Monte Carlo simulation
  • Molecular Dynamics

4
Two Major Drawbacks of MD and MC Simulation
  • Each simulation run yields a single pathway,
    while molecules tend to move along many different
    pathways
  • ? Interest in ensemble properties

5
Example of Ensemble Property Probability of
Folding pfold
Measure kinetic distance to folded state
Du, Pande, Grosberg, Tanaka, and Shakhnovich.
On the Transition Coordinate for Protein Folding.
Journal of Chemical Physics (1998).
Unfolded state
Folded state
6
Other Examples of Ensemble Properties
  • Folding
  • Order of formation of SSEs
  • Folding rate / Mean first passage time
  • Key intermediates
  • Binding
  • Average time to escape from active site
  • Average energy barrier

7
Two Major Drawbacks ofMD and MC Simulation
  1. Each simulation run yields a single pathway,
    while molecules tend to move along many different
    pathways
  2. Each simulation run tends to waste much time in
    local minima

8
? Roadmap-Based Representation
  • Compact representation of many motion pathways
  • Coarse resolution relative to MC and MD
    simulation
  • Efficient algorithms for analyzing multiple
    pathways

9
Roadmaps for Robot Motion Planning
10
Initial Work A.P. Singh, J.C. Latombe, and D.L.
Brutlag. A Motion Planning Approach to Flexible
Ligand Binding. Proc. 7th ISMB, pp. 252-261, 1999
  • Study of ligand-protein binding
  • The ligand is a small flexible molecule, but the
    protein is assumed rigid
  • A fixed coordinate system P is attached to the
    protein and a moving coordinate system L is
    defined using three bonded atoms in the ligand
  • A conformation of the ligand is defined by the
    position and orientation of L relative to P and
    the torsional angles of the ligand

11
Roadmap Construction (Node Generation)
  • The nodes of the roadmap are generated by
    sampling conformations of the ligand uniformly at
    random in the parameter space (around the
    protein)
  • The energy E at each sampled conformation is
    computed
  • E Einteraction Einternal Einteraction
    electrostatic van der Waals potential Einterna
    l Snon-bonded pairs of atoms electrostatic
    van der Waals

12
Roadmap Construction (Node Generation)
  • The nodes of the roadmap are generated by
    sampling conformations of the ligand uniformly at
    random in the parameter space (around the
    protein)
  • The energy E at each sampled conformation is
    computed
  • E Einteraction Einternal Einteraction
    electrostatic van der Waals potential Einterna
    l Snon-bonded pairs of atoms electrostatic
    van der Waals
  • A sampled conformation is retained as a node of
    the roadmap with probability 0 if E gt Emax
  • Emax-E
  • Emax-Emin
  • 1 if E lt Emin
  • ? Denser distribution of nodes in low-energy
    regions of conformational space

13
Roadmap Construction (Edge Generation)
  • Each node is connected to its closest neighbors
    by straight edges
  • Each edge is discretized so that between qi and
    qi1 no atom moves by more than some e ( 1Å)
  • If any E(qi) gt Emax , then the edge is rejected

E
14
Roadmap Construction (Edge Generation)
  • Any two nodes closer apart than some threshold
    distance are connected by a straight edge
  • Each edge is discretized so that between qi and
    qi1 no atom moves by more than some e ( 1Å)
  • If all E(qi) ? Emax , then the edge is retained
    and is assigned two weights w(q?q) and w(q?q)
  • where
  • (probability that the ligand moves from qi to
    qi1 when it is constrained to move along the
    edge)

15
Querying the Roadmap
  • For a given goal node qg (e.g., binding
    conformation), the Dijkstras single-source
    algorithm computes the lowest-weight paths from
    qg to each node (in either direction) in O(N
    logN) time, where N number of nodes
  • Various quantities can then be easily computed
    in O(N) time, e.g., average weights of all
    paths entering qg and of all paths leaving qg
    ( binding and dissociation rates Kon and Koff)

Protein Lactate dehydrogenase Ligand Oxamate (7
degrees of freedom)
16
Experiments on 3 Complexes
  • PDB ID 1ldm
  • Receptor Lactate Dehydrogenase (2386 atoms, 309
    residues)
  • Ligand Oxamate (6 atoms, 7 dofs)
  • PDB ID 4ts1
  • Receptor Mutant of tyrosyl-transfer-RNA
    synthetase (2423 atoms, 319 residues)
  • Ligand L- leucyl-hydroxylamine (13 atoms, 9
    dofs)
  • PDB ID 1stp
  • Receptor Streptavidin (901 atoms, 121 residues)
  • Ligand Biotin (16 atoms, 11 dofs)

17
Computation of Potential Binding Conformations
  • Sample many (several 1000s) ligands
    conformations at random around protein
  • Repeat several times
  • Select lowest-energy conformations that are
    close to protein surface
  • Resample around them
  • Retain k (10) lowest-energy conformations
    whose centers of mass are at least 5Å apart

lactate dehydrogenase
18
Results for 1ldm
  • Some potential binding sites have slightly lower
    energy than the active site ? Energy is not a
    discriminating factor
  • Average path weights (energetic difficulty) to
    enter and leave binding site are significantly
    greater for the active site ? Indicates that the
    active site is surrounded by an energy barrier
    that traps the ligand

19
(No Transcript)
20
Application of Roadmaps to Protein Folding
N.M. Amato, K.A. Dill, and G. Song. Using Motion
Planning to Map Protein Folding Landscapes and
Analyze Folding Kinetics of Known Native
Structures. J. Comp. Biology, 10(2)239-255, 2003
  • Known native state
  • Degrees of freedom f-? angles
  • Energy van der Waals, hydrogen bonds,
    hydrophobic effect
  • New idea Sampling strategy
  • Application Finding order of SSE formation

21
Sampling Strategy(Node Generation)
  • High dimensionality ? non-uniform sampling
  • Conformations are sampled using Gaussian
    distribution around native state
  • Conformations are sorted into bins by number of
    native contacts (pairs of C? atoms that are
    closeapart in native structure)
  • Sampling ends when all bins have minimum number
    of conformations ? good coverage of
    conformational space

22
Application Order of Formation of Secondary
Structures
  • The lowest-weight path is extracted from each
    denatured conformation to the folded one
  • The order of formation of SSEs is computed along
    each path
  • The formation order that appears the most often
    over all paths is considered the SSE formation
    order of the protein

23
Method
  1. The contact matrix showing the time step when
    each native contact appears is built

24
Protein CI2 (1a 4 b)
25
Protein CI2 (1a 4 b)
26
Method
  1. The contact matrix showing the time step when
    each native contact appears is built
  2. The time step at which a structure appears is
    approximated as the average of the appearance
    time steps of its contacts

27
a forms at time step 122 (II) b3 and b4 come
together at 187 (V) b2 and b3 come together at
210 (IV) b1 and b4 come together at 214 (I) a and
b4 come together at 214 (III)
Protein CI2 (1a 4 b)
28
Method
  1. The contact matrix showing the time step when
    each native contact appears is built
  2. The time step at which a structure appears is
    approximated as the average of the appearance
    time steps of its contacts

29
Comparison with Experimental Data
30
Stochastic Roadmaps M.S. Apaydin, D.L. Brutlag,
C. Guestrin, D. Hsu, J.C. Latombe and C. Varma.
Stochastic Roadmap Simulation An Efficient
Representation and Algorithm for Analyzing
Molecular Motion. J. Comp. Biol.,
10(3-4)257-281, 2003
  • New Idea Capture the stochastic nature of
    molecular motion by assigning probabilities to
    edges

31
Edge probabilities
Follow Metropolis criteria
Self-transition probability
vj
Roadmap nodes are sampled uniformly at random
and energy profilealong edges is not considered
32
Stochastic Roadmap Simulation
V
Pij
33
Roadmap as Markov Chain
j
Pij
i
  • Transition probability Pij depends only on i and
    j

34
Example 1 Probability of Folding pfold
Unfolded state
Folded state
35
First-Step Analysis
  • One linear equation per node
  • Solution gives pfold for all nodes
  • No explicit simulation run
  • All pathways are taken into account
  • Sparse linear system

l
k
j
Pik
Pil
Pij
m
Pim
i
Pii
Let fi pfold(i) After one step fi Pii fi
Pij fj Pik fk Pil fl Pim fm
36
Number of Self-Avoiding Walks on a 2D Grid
1, 2, 12, 184, 8512, 1262816, 575780564,
789360053252, 3266598486981642, (10x10)
41044208702632496804, (11x11) 1568758030464750013
214100, (12x12) 182413291514248049241470885236 gt
1028
http//mathworld.wolfram.com/Self-AvoidingWalk.htm
l
37
In contrast
  • Computing pfold with MC simulation requires
  • For every conformation q of interest
  • Perform many MC simulation runs from q
  • Count number of times F is attained first

38
Computational Tests
  • 1ROP (repressor of primer)
  • 2 a helices
  • 6 DOF
  • 1HDD (Engrailed homeodomain)
  • 3 a helices
  • 12 DOF

H-P energy model with steric clash exclusion Sun
et al., 95
39
Correlation with MC Approach
1ROP
40
pfold for ß hairpin
Immunoglobin binding protein (Protein G) Last 16
amino acids Ca based representation Go model
energy function 42 DOFs Zhou and Karplus,
99
41
Comparison between SRS and MC for ß hairpin
for 100 conformations
42
Computation Times (ß hairpin)
Monte Carlo (30 simulations)
Over 107 energy computations
10 hours of computer time
1 conformation
Roadmap
50,000 energy computations
23 seconds of computer time
2000 conformations
6 orders of magnitude speedup!
43
Example 2 Ligand-Protein Interaction
Computation of escape time from funnels of
attraction around potential binding
sites Funnel of attraction ball of 10Å rmsd
around bound state Camacho and Vajda, 01
44
Computation Through Simulation Sept, Elcock and
McCammon 99
10K to 30K independent simulations
45
Computing Escape Time with Roadmap
ti 1 Pii ti Pij tj Pik tk Pil tl Pim
tm (escape time is measured as number of
stepsof stochastic simulation)
0
46
Distinguishing Active Site
  • Given several potential binding sites,which one
    is the active one?

Energy electrostatic van der Waals solvation
free energy terms
47
Complexes Studied
ligand protein random nodes DOFs
oxamate 1ldm 8000 7
Streptavidin 1stp 8000 11
Hydroxylamine 4ts1 8000 9
COT 1cjw 8000 21
THK 1aid 8000 14
IPM 1ao5 8000 10
PTI 3tpi 8000 13
48
Distinction Using Escape Time
Protein Bound state Best potential binding site
1stp 3.4E9 1.1E7
4ts1 3.8E10 1.8E6
3tpi 1.3E11 5.9E5
1ldm 8.1E5 3.4E6
1cjw 5.4E8 4.2E6
1aid 9.7E5 1.6E8
1ao5 6.6E7 5.7E6
Able to distinguish catalytic site
Not able
( steps)
49
Using Path Sampling to Construct Roadmaps N.
Singhal, C.D. Snow, and V.S. Pande. Using Path
Sampling to Build Better Markovian State Models
Predicting the Folding Rate and Mechanism of a
Tryptophan Zipper Beta Hairpin, J. Chemical
Physics, 121(1)415-425, 2004
  • New idea
  • Paths computed with Molecular Dynamics
    simulation techniques are used to create the
    nodes of the roadmap? More pertinent/better
    distributed nodes
  • ? Edges are labeled with the time needed to
    traverse them

50
Sampling Nodes from Computed Paths (Path Shooting)
F
U
51
Sampling Nodes from Computed Paths (Path Shooting)
F
U
52
Node Merging
  • If two nodes are closer apart than some e, they
    are merged into one and merging rules are applied
    to update edge probabilities and times

53
Node Merging
  • If two nodes are closer apart than some e, they
    are merged into one and merging rules are applied
    to update edge probabilities and times

? Approximately uniform distribution of nodes
over the reachable subset of conformational space
54
Application Computation of MFPT
  • Mean First Passage Time the average time when a
    protein first reaches its folded state
  • First-Step Analysis yields
  • MPFT(i) Sj Pij x (tij MPFT(j))
  • MPFT(i) 0 if i ? F
  • Assuming first-order kinetics, the probability
    that a protein folds at time t is
  • where r is the folding rate
  • MFPT 1/r

55
Computational Test
  • 12-residue tryptophan zipper beta hairpin (TZ2)
  • Folding_at_Home used to generate trajectories (fully
    atomistic simulation) ranging from 10 to 450 ns
  • 1750 trajectories (14 reaching folded state)
  • ? 22,400-node roadmap
  • MFPT 2-9 ms, which is similar to experimental
    measurements (from fluorescence and IR)

56
Conformational Analysis of Protein Loops J.
Cortés, T. Siméon, M. Renaud-Siméon, and V. Tran.
Geometric Algorithms for the Conformational
Analysis of Long Protein Loops. J. Comp.
Chemistry, 25956-967, 2004
  • New idea
  • Explore the clash-free subset of the
    conformational space of a loop, by building a
    tree-shaped roadmap
  • Kinematic model f-y angles on the backbone ci
    torsional angles in side-chains

57
  • Amylosucrase (AS)
  • - Only enzyme in its family that acts on
    sucrose substrate
  • The 17-residue loop (named loop 7) between
    Gly433 and Gly449 is
  • believed to play a pivotal role

58
Roadmap Construction
  • A tree-shaped roadmap is created from a start
    conformation qstart
  • At each step of the roadmap construction, a
    conformation qrand of the loop is picked at
    random, and a new roadmap node is created by
    iteratively pulling toward it the existing node
    that is closest to qrand

59
Roadmap Construction
C
Cfree
Cclosed
qstart
60
Roadmap Construction
C
Cfree
Cclosed
qstart
61
Roadmap Construction
C
Cfree
Cclosed
qstart
62
Roadmap Construction
C
Cfree
Cclosed
qstart
Stops when one cant get closer to qrand or a
clash is detected
63
Computational Results
  • Surprisingly, loop 7 cant move much
  • Main bottleneck is residue Asp231

Positions of theCa atom of middleresidue
(Ser441)
64
Computational Results
  • Surprisingly, loop 7 cant move much
  • Main bottleneck is residue Asp231

65
Computational Results
  • If residue Asp231 is removed, then loop 7s
    mobility increases dramatically. The Ca atom of
    Ser441 can be displaced by more than 9Å from its
    crystallographic position

66
Conclusion
  • Probabilistic roadmaps are a recent, but
    promising tool for exploring conformational space
    and computing ensemble properties of molecular
    pathways
  • Current/future research
  • Better sampling strategies able to handle more
    complex molecular models (protein-protein
    binding)
  • More work to include time information in
    roadmaps
  • More thorough experimental validation to compare
    computed and measured quantitative properties
Write a Comment
User Comments (0)
About PowerShow.com