Title: Robotics Algorithms for the Study of Protein Structure and Motion
1Robotics Algorithms for the Study of Protein
Structure and Motion
- Jean-Claude Latombe
- Computer Science DepartmentStanford University
Based on Itay Lotans PhD
2Folded (native) state
3Folded State
- Loops connect ? helices and ? strands
4Protein Sequence Structure
amino-acid (residue)
5f-y Kinematic Linkage Model
? Conformational space
6Molecule ? Robot
7Why Studying Proteins?
- They perform many vital functions, e.g.
- catalysis of reactions
- storage of energy
- transmission of signals
- building blocks of muscles
- They are linked to key biological problems that
raise major computational challenges - mostly due to their large sizes (100s to several
1000s of atoms), many degrees of kinematic
freedom, and their huge number (millions)
8Two problems
- Structure determination from electron density
maps - Inverse kinematics techniques
- Itay Lotan, Henry van den Bedem, Ashley Deacon
(Joint Center for Structural Genomics) - Energy maintenance during Monte Carlo simulation
- Distance computation techniques
- Itay Lotan, Fabian Schwarzer, and Danny
Halperin (Tel Aviv University)
9Structure Determination X-Ray Crystallography
10Software
- Software systems RESOLVE, TEXTAL, ARP/wARP, MAID
- 1.0Å lt d lt 2.3Å 90 completeness
- 2.3Å d lt 3.0Å 67 completeness (varies
widely)1
1.0Å
3.0Å
JCSG 43 of data sets ? 2.3Å
- ? Manually completing a model
- Labor intensive, time consuming
- Existing tools are highly interactive
? Model completion is high-throughput bottleneck
1Badger (2003) Acta Cryst. D59
11The Completion Problem
- Input
- Electron-density map
- Partial structure
- Two anchor residues
- Amino-acid sequence of missing fragment
(typically 4 15 residues long) - Output
- Ranked conformations Q of fragment that
- Respect the closure constraint
- Maximize target function T(Q) measuring fit with
electron-density map - No atomic clashes
Partial structure(folded)
12Two-Stage IK Method
- Candidate generations? Closed fragments
- Candidate refinement? Optimize fit with EDM
13Stage 1 Candidate Generation
- Generate a random conformation of fragment (only
one end attached to anchor) - Close fragment (i.e., bring other end to second
anchor) using Cyclic Coordinate Descent (CCD)
(Wang Chen 91, Canutescu Dunbrack 03)
14Closure Distance
A.A. Canutescu and R.L. Dunbrack Jr.Cyclic
coordinate descent A robotics algorithm for
protein loop closure. Prot. Sci. 12963972,
2003.
Compute bias toward
avoiding steric clashes
15Exact Inverse Kinematics
- Repeat for each conformation of a closed
fragment - Pick 3 amino-acids at random (3 pairs of f-y
angles) - Apply exact IK solver to generate all IK
solutions Coutsias et al, 2004
16TM0813
GLU-83
GLY-96
17Stage 2 Candidate Refinement
- Target function T (Q) measuring quality of the
fit with the EDM - Minimize T while retaining closure
- Closed conformations lie on a self-motion
manifold of lower dimension
Null space
1-D manifold
18Closure and Null Space
- dX J dQ, where J is the 6?n Jacobian matrix (n
gt 6) - Null space dQ J dQ 0 has dim n 6
- N orthonormal basis of null space
- dQ NNT ?T(Q)
X
19 Computation of N
SVD of J
S6?6
dX
U6?6
VT6?n
dQ
20Refinement Procedure
- Repeat until minimum of T is reached
- Compute J and N at current Q
- Compute ?T at current Q(analytical expression of
?T linear-time recursive computation Abe et
al., Comput. Chem., 1984) - Move by small increment along dQ NNT ?T
( Monte Carlo / simulated annealing protocol to
deal with local minima)
21TM0813
GLU-83
GLY-96
22Tests 1 Artificial Gaps
- TM1621 (234 residues) and TM0423 (376 residues),
SCOP classification a/b - Complete structures (gold standard) resolved with
EDM at 1.6Å resolution - Compute EDM at 2, 2.5, and 2.8Å resolution
- Remove fragments and rebuild
23TM1621 103 Fragments from TM1621 at 2.5Å
Short Fragments 100 lt 1.0Å aaRMSD
Long Fragments 12 96 lt 1.0Å aaRMSD 15 88 lt
1.0Å aaRMSD
Produced by H. van den Bedem
24Example TM0423
PDB 1KQ3, 376 res. 2.0Å resolution 12 residue
gap Best 0.3Å aaRMSD
25Tests 2 True Gaps
- Structure computed by RESOLVE
- Gaps completed independently (gold standard)
- Example TM1742 (271 residues)
- 2.4Å resolution 5 gaps left by RESOLVE
Length Top scorer
4 0.22Å
5 0.78Å
5 0.36Å
7 0.72Å
10 0.43Å
Produced by H. van den Bedem
26TM1621
- Green manually completed conformation
- Cyan conformation computed by stage 1
- Magenta conformation computed by stage 2
- The aaRMSD improved by 2.4Å to 0.31Å
27Current/Future Work
- Software actively being used at the JCSG
- What about multi-modal loops?
28- TM0755 data at 1.8Å
- 8-residue fragment crystallized in 2
conformations - Overlapping density Difficult to interpret
manually
Algorithm successfully identified and built both
conformations
29Current/Future Work
- Software actively being used at the JCSG
- What about multi-modal loops?
- Fuzziness in EDM can then be exploited
- Use EDM to infer probability measure over the
conformation space of the loop
30Amylosucrase
J. Cortés, T. Siméon, M. Renaud-Siméon, and V.
Tran. J. Comp. Chemistry, 25956-967, 2004
31Energy maintenance during Monte Carlo simulation
- joint work with Itay Lotan, Fabian Schwarzer, and
Dan Halperin11 Computer Science Department, Tel
Aviv University
32Monte Carlo Simulation (MCS)
- Random walk through conformation space
- At each attempted step
- Perturb current conformation at random
- Accept step with probability
- The conformations generated by an arbitrarily
long MCS are Boltzman distributed, i.e., - conformations in V
33Monte Carlo Simulation (MCS)
- Used to
- sample meaningful distributions of conformations
- generate energetically plausible motion pathways
- A simulation run may consist of millions of steps
? energy must be evaluated a large number of
times - Problem How to maintain energy efficiently?
34Energy Function
- E S bonded terms S
non-bonded terms
S solvation terms - Bonded terms - O(n)
- Non-bonded terms - E.g., Van der Waals and
electrostatic- Depend on distances between pairs
of atoms - O(n2) ? Expensive to compute - Solvation terms- May require computing molecular
surface
35Non-Bonded Terms
- Energy terms go to 0 when distance increases ?
Cutoff distance (6 - 12Å) - vdW forces prevent atoms from bunching up ?
Only O(n) interacting pairs
HalperinOvermars 98
Problem How to find interacting pairswithout
enumerating all atom pairs?
36Grid Method
- Subdivide 3-space into cubic cells
- Compute cell that contains each atom center
- Represent grid as hashtable
37Grid Method
- T(n) time to build grid
- O(1) time to find interactive pairs for each atom
- T(n) to find all interactive pairs of atoms
HalperinOvermars, 98 - Asymptotically optimal in worst-case
38Can we do better on average?
- Few DOFs are changed at each MC step
simulationof 100,000 attempted steps
39Can we do better on average?
- Few DOFs are changed at each MC step
- Proteins are long chain kinematics
Long sub-chains stay rigid at each step ? Many
interacting pairs of atoms are unchanged ? Many
partial energy sums remain constant
Problem How to find new interacting pairs and
retrieve unchanged partial sums?
40Two New Data Structures
- ChainTree ? Fast detection of interacting atom
pairs -
- EnergyTree ? Retrieval of unchanged partial
energy sums
41ChainTree(Twofold Hierarchy BVs Transforms)
links
42ChainTree(Twofold Hierarchy BVs Transforms)
joints
43Updating the ChainTree
- Update path to root
- Recompute transforms that shortcut the DOF
change - Recompute BVs that contain the DOF change
- O(k log2(2n/k)) work for k changes
44Finding Interacting Pairs
??
45Finding Interacting Pairs
46Finding Interacting Pairs
- Do not search inside rigid sub-chains (unmarked
nodes)
47Finding Interacting Pairs
- Do not search inside rigid sub-chains (unmarked
nodes) - Do not test two nodes with no marked node between
them - ? New interacting pairs
48EnergyTree
E(N,N)
E(K.L)
E(M,M)
E(L,L)
E(J,L)
49EnergyTree
E(N,N)
E(K.L)
E(M,M)
E(L,L)
E(J,L)
50Complexity
- n total number of DOFs
- k number of DOF changes at each MCS step
- k ltlt n
- Complexity of
- updating ChainTree O(k log2(2n/k))
- finding interacting pairs O(n4/3) but performs
much better in practice!!!
51Experimental Setup
- Energy function
- Van der Waals
- Electrostatic
- Attraction between native contacts
- Cutoff at 12Å
- 300,000 steps MCS with Grid and ChainTree
- Steps are the same with both methods
- Early rejection for large vdW terms
52Results 1-DOF change
12.5
7.8
speedup
5.8
3.5
amino acids
53Results 5-DOF change
5.9
speedup
4.5
3.4
2.2
54Two-Pass ChainTree (ChainTree)
1st pass small cutoff distance to detect steric
clashes 2nd pass normal cutoff distance
Tests around native state
gt5
55Interaction with Solvent
- Implicit solvent model solvent as continuous
medium, interface is solvent-accessible surface
E. Eyal, D. Halperin. Dynamic Maintenance of
Molecular Surfaces under Conformational Changes.
http//www.give.nl/movie/publications/telaviv/EH0
4.pdf
56Summary
- Inverse kinematics techniques ? Improve structure
determination from fuzzy electron density maps - Collision detection techniques ? Speedup energy
maintenance during Monte Carlo simulation
57About Computational Biology
- Computational Biology is more than mimicking
nature (e.g., performing Molecular Dynamic
simulation) - One of its goals is to achieve algorithmic
efficiency by exploiting properties of molecules,
e.g. - Atoms cannot bunch up together
- Forces have relatively short ranges
- Proteins are long kinematic chains