Loading...

PPT – An improved hybrid Monte Carlo method for conformational sampling of proteins PowerPoint presentation | free to download - id: 595d5b-ZTVkZ

The Adobe Flash plugin is needed to view this content

An improved hybrid Monte Carlo method for

conformational sampling of proteins

Jesús A. Izaguirre and Scott Hampton Department

of Computer Science and Engineering University of

Notre Dame March 5, 2003 This work is

partially supported by two NSF grants (CAREER and

BIOCOMPLEXITY) and two grants from University of

Notre Dame

Overview

2. Methods for sampling (MD, HMC)

1. Motivation sampling conformational space of

proteins

3. Evaluation of new Shadow HMC

4. Future applications

Protein The Machinery of Life

NH2-Val-His-Leu-Thr-Pro-Glu-Glu- Lys-Ser-Ala-Val-T

hr-Ala-Leu-Trp- Gly-Lys-Val-Asn-Val-Asp-Glu-Val- G

ly-Gly-Glu-..

Protein Structure

Why protein folding?

- Huge gap sequence data and 3D structure data
- EMBL/GENBANK, DNA (nucleotide) sequences 15

million sequence, 15,000 million base pairs - SWISSPROT, protein sequences 120,000 entries
- PDB, 3D protein structures 20,000 entries
- Bridging the gap through prediction
- Aim of structural genomics
- Structurally characterize most of the protein

sequences by an efficient combination of

experiment and prediction, Baker and Sali (2001) - Thermodynamics hypothesis Native state is at

the global free energy minimum Anfinsen

(1973)

Questions related to folding I

- Long time kinetics dynamics of folding
- only statistical correctness possible
- ensemble dynamics
- e.g., folding_at_home
- Short time kinetics
- strong correctness possible
- e.g., transport properties, diffusion coefficients

Questions related to folding II

- Sampling
- Compute equilibrium averages by visiting all

(most) of important conformations - Examples
- Equilibrium distribution of solvent molecules in

vacancies - Free energies
- Characteristic conformations (misfolded and

folded states)

Overview

2. Methods for sampling (MD, HMC)

1. Motivation sampling conformational space of

proteins

3. Evaluation of new Shadow HMC

4. Future applications

Classical molecular dynamics

- Newtons equations of motion
- Atoms
- Molecules
- CHARMM force field (Chemistry at Harvard

Molecular Mechanics)

Bonds, angles and torsions

What is a Forcefield?

In molecular dynamics a molecule is described as

a series of charged points (atoms) linked by

springs (bonds).

To describe the time evolution of bond lengths,

bond angles and torsions, also the nonbond van

der Waals and elecrostatic interactions between

atoms, one uses a forcefield. The forcefield is a

collection of equations and associated constants

designed to reproduce molecular geometry and

selected properties of tested structures.

Energy Terms Described in the CHARMm forcefield

Bond

Angle

Dihedral

Improper

Energy Functions

Ubond oscillations about the equilibrium bond

length Uangle oscillations of 3 atoms about an

equilibrium angle Udihedral torsional rotation

of 4 atoms about a central bond Unonbond

non-bonded energy terms (electrostatics and

Lennard-Jones)

Molecular Dynamics what does it mean?

MD change in conformation over time using a

forcefield

Energy

Energy supplied to the minimized system at the

start of the simulation

Conformation impossible to access through MD

Conformational change

MD, MC, and HMC in sampling

- Molecular Dynamics takes long steps in phase

space, but it may get trapped - Monte Carlo makes a random walk (short steps), it

may escape minima due to randomness - Can we combine these two methods?

MC

MD

HMC

Hybrid Monte Carlo

- We can sample from a distribution with density

p(x) by simulating a Markov chain with the

following transitions - From the current state, x, a candidate state x

is drawn from a proposal distribution S(x,x).

The proposed state is accepted with

prob. min1,(p(x) S(x,x)) / (p(x) S(x,x)) - If the proposal distribution is symmetric,

S(x,x)) S(x,x)), then the acceptance prob.

only depends on p(x) / p(x)

Hybrid Monte Carlo II

- Proposal functions must be reversible
- if x s(x), then x s(x)
- Proposal functions must preserve volume
- Jacobian must have absolute value one
- Valid proposal x -x
- Invalid proposals
- x 1 / x (Jacobian not 1)
- x x 5 (not reversible)

Hybrid Monte Carlo III

- Hamiltonian dynamics preserve volume in phase

space - Hamiltonian dynamics conserve the Hamiltonian

H(q,p) - Reversible symplectic integrators for Hamiltonian

systems preserve volume in phase space - Conservation of the Hamiltonian depends on the

accuracy of the integrator - Hybrid Monte Carlo Use reversible symplectic

integrator for MD to generate the next proposal

in MC

HMC Algorithm

- Perform the following steps
- 1. Draw random values for the momenta p from

normal distribution use given positions q - 2. Perform cyclelength steps of MD, using a

symplectic reversible integrator with timestep

?t, generating (q,p) - 3. Compute change in total energy
- ?H H(q,p) - H(q,p)
- 4. Accept new state based on exp(-? ?H )

Hybrid Monte Carlo IV

- Advantages of HMC
- HMC can propose and accept distant points in

phase space, provided the accuracy of the MD

integrator is high enough - HMC can move in a biased way, rather than in a

random walk (distance k vs sqrt(k)) - HMC can quickly change the probability density

Hybrid Monte Carlo V

- As the number of atoms increases, the total error

in the H(q,p) increases. The error is related to

the time step used in MD - Analysis of N replicas of multivariate Gaussian

distributions shows that HMC takes O(N5/4 ) with

time step ?t O(N-1/4) Kennedy Pendleton, 91

System size N Max ?t

66 0.5

423 0.25

868 0.1

5143 0.05

Hybrid Monte Carlo VI

- The key problem in scaling is the accuracy of the

MD integrator - More accurate methods could help scaling
- Creutz and Gocksch 89 proposed higher order

symplectic methods for HMC - In MD, however, these methods are more expensive

than the scaling gain. They need more force

evaluations per step

Overview

2. Methods for sampling (MD, HMC)

1. Motivation sampling conformational space of

proteins

3. Evaluation of new Shadow HMC

4. Future applications

Improved HMC

- Symplectic integrators conserve exactly (within

roundoff error) a modified Hamiltonian that for

short MD simulations (such as in HMC) stays close

to the true Hamiltonian Sanz-Serna Calvo 94 - Our idea is to use highly accurate approximations

to the modified Hamiltonian in order to improve

the scaling of HMC

Shadow Hamiltonian

- Work by Skeel and Hardy, 2001, shows how to

compute an arbitrarily accurate approximation to

the modified Hamiltonian, called the Shadow

Hamiltonian - Hamiltonian H1/2pTM-1p U(q)
- Modified Hamiltonian HM H O(?t p)
- Shadow Hamiltonian SH2p HM O(?t 2p)
- Arbitrary accuracy
- Easy to compute
- Stable energy graph
- Example, SH4 H f( qn-1, qn-2, pn-1, pn-2

,ßn-1 ,ßn-2)

- See comparison of SHADOW and ENERGY

Shadow HMC

- Replace total energy H with shadow energy
- ?SH2m SH2m (q,p) SH2m (q,p)
- Nearly linear scalability of sampling rate
- Computational cost SHMC, N(11/2m), where m is

accuracy order of integrator - Extra storage (m copies of q and p)
- Moderate overhead (25 for small proteins)

Example Shadow Hamiltonian

ProtoMol a framework for MD

Matthey, et al, ACM Tran. Math. Software (TOMS),

submitted

Modular design of ProtoMol (Prototyping Molecular

dynamics). Available at http//www.cse.nd.edu/lcl

s/protomol

SHMC implementation

- Shadow Hamiltonian requires propagation of ß
- Can work for any integrator

Systems tested

Sampling Metric 1

- Generate a plot of dihedral angle vs. energy for

each angle - Find local maxima
- Label bins between maxima
- For each dihedral angle, print the label of the

energy bin that it is currently in

Sampling Metric 2

- Round each dihedral angle to the nearest degree
- Print label according to degree

Acceptance Rates

More Acceptance Rates

Sampling rate for decalanine (dt 2 fs)

Sampling rate for 2mlt

Sampling rate comparison

- Cost per conformation is total simulation time

divided by number of new conformations discovered

(2mlt, dt 0.5 fs) - HMC 122 s/conformation
- SHMC 16 s/conformation
- HMC discovered 270 conformations in 33000 seconds
- SHMC discovered 2340 conformations in 38000

seconds

Conclusions

- SHMC has a much higher acceptance rate,

particularly as system size and timestep increase - SHMC discovers new conformations more quickly
- SHMC requires extra storage and moderate

overhead. - SHMC works best at relatively large timesteps

Future work

- Multiscale problems for rugged energy surface
- Multiple time stepping algorithms plus

constraining - Temperature tempering and multicanonical ensemble
- Potential smoothing
- System size
- Parallel Multigrid O(N) electrostatics
- Applications
- Free energy estimation for drug design
- Folding and metastable conformations
- Average estimation

Acknowledgments

- Dr. Thierry Matthey, lead developer of ProtoMol,

University of Bergen, Norway - Graduate students Qun Ma, Alice Ko, Yao Wang,

Trevor Cickovski - Dr. Robert Skeel, Dr. Ruhong Zhou, and Dr.

Christoph Schutte for valuable discussions

Multiple time stepping

- Fast/slow force splitting
- Bonded fast
- Small periods
- Long range nonbonded slow
- Large characteristic time
- Evaluate slow forces less frequently
- Fast forces cheap
- Slow force evaluation expensive

The Impulse integrator

- Grubmüller,Heller, Windemuth and Schulten,

1991 Tuckerman, Berne and Martyna, 1992 - The impulse train

Fast impulses, ?t

Time, t

Slow impulses, ?t

How far apart can we stretch the impulse train?

Stretching slow impulses

- ?t 100 fs if accuracy does not degenerate
- 1/10 of the characteristic time
- MaIz, SIAM J. Multiscale

Modeling and Simulation, 2003 (submitted) - Resonances (let ? be the shortest period)
- Natural ?t n ?, n 1, 2, 3,
- Numerical
- Linear ?t ?/2
- Nonlinear ?t ?/3
- MaIS_a, SIAM J. on Sci. Comp. (SISC), 2002 (in

press) - MaIS_b, 2003 ACM Symp. App. Comp. (SAC03), 2002

(in press)

Impulse far from being multiscale!

3rd order nonlinear resonance of Impulse

MaIS_a, SISC, 2002 MaIS_b, ACM SAC03, 2002

Fig. 1 Left flexible water system. Right

Energy drift from 500ps MD simulation of flexible

water at room temperature revealing 31 and 41

nonlinear resonance (3.3363 and 2.4 fs)

Overview

- Introduction
- Molecular dynamics (MD) in action
- Classical MD
- Protein folding
- Nonlinear instabilities of Impulse integrator
- Approximate MD integrators
- Targeted MOLLY
- MUSICO
- Applications
- Summary
- Future work
- Acknowledgements

Objective statement

- Design multiscale integrators that are not

limited by nonlinear and linear instabilities - Allowing larger time steps
- Better sequential performance
- Better scaling

Overview

- Introduction
- Molecular dynamics (MD) in action
- Classical MD
- Protein folding
- Nonlinear instabilities of Impulse integrator
- Approximate MD integrators
- Targeted MOLLY
- MUSICO
- Applications
- Summary
- Future work
- Acknowledgements

Targeted MOLLY (TM)

MaIz, 2003 ACM Symp. App. Comp. (SAC03), 2002

(in press) MaIz, SIAM J. Multiscale Modeling and

Simulation, 2003 (submitted)

- TM MOLLY targeted Langevin coupling
- Mollified Impulse (MOLLY) to overcome linear

instabilities - Izaguirre, Reich and Skeel, 1999
- Stochasticity to stabilize MOLLY
- Izaguirre, Catarello, et al, 2001

Mollified Impulse (MOLLY)

- MOLLY (mollified Impulse)
- Slow potential at time averaged positions, A(x)
- Averaging using only fastest forces
- Mollified slow force Ax(x) F(A(x))
- Equilibrium and B-spline
- B-spline MOLLY
- Averaging over certain time interval
- Needs analytical Hessians
- Step sizes up to 6 fs (50100 speedup)

Introducing stochasticity

- Langevin stabilization of MOLLY (LM)
- Izaguirre, Catarello, et al, 2001
- 12 fs for flexible waters with correct dynamics
- Dissipative particle dynamics (DPD)
- Pagonabarraga, Hagen and Frenkel, 1998
- Pair-wise Langevin force on particles
- Time reversible if self-consistent

Targeted Langevin coupling

- Targeted at trouble-making pairs
- Bonds, angles
- Hydrogen bonds
- Stabilizing MOLLY
- Slow forces evaluated much less frequently
- Recovering correct dynamics
- Coupling coefficient small

TM main results

- 16 fs for flexible waters
- Correct dynamics
- Self-diffusion coefficient, D.
- leapfrog w/ 1fs, D 3.69/-0.01
- TM w/ (16 fs, 2fs), D 3.68/-0.01
- Correct structure
- Radial distribution function (r.d.f.)

TM correct r.d.f.

Fig. 4. Radial distribution function of O-H

(left) and H-H (right) in flexible waters.

TM discussion

- Much larger time steps (100 fs)
- Better force splitting
- Zhou, Harder, etc. 2001
- More stable MOLLY integrators
- Conformational transition rate
- folding_at_home ensemble dynamics
- Celestial systems
- Graphics rendering in gaming applications

Overview

- Introduction
- Molecular dynamics (MD) in action
- Classical MD
- Protein folding
- Nonlinear instabilities of Impulse integrator
- Approximate MD integrators
- Targeted MOLLY
- MUSICO
- Applications
- Summary
- Future work
- Acknowledgements

Lengthening time scales SRP

1 msec simulation of an enzyme using stochastic

reaction path (SRP) method. Courtesy of R. Elber

at Cornell Univ.. SRP requires initial and final

configuration.

What do we learn from SRP?

- Cons final configuration not always available
- Pros large time steps/long simulation
- Motions whose char. time is less than the step

size filtered out - Essential (interesting) motions remain

What are other approaches for filtering out fast

motions?

Lengthening time scales MUSICO

- Ideas
- Splitting into nearly linear and nonlinear parts
- Implicit integration of linear part with

constraining of internal d.o.f. - Explicit treatment of highly nonlinear part
- Optional pairwise stochasticity for stability
- MUSICO (in progress)
- Multi-Scale Implicit-explicit Constrained

molecular dynamics

MUSICO the anticipations

- Larger step sizes ps, ns
- Implicit methods generally more stable

Schlick, et al (1997) - Greater scalability
- Hessian computations mostly local
- Low in communication
- Real speedup
- Linear parts cheap
- 100 fold speedup sequentially?

MUSICO the implicit part

- Implicit equations for nearly linear part
- Newton-Raphson method
- Require Hessians
- Krylov subspace method

Overview

- Introduction
- Molecular dynamics (MD) in action
- Classical MD
- Protein folding
- Nonlinear instabilities of Impulse integrator
- Approximate MD integrators
- Targeted MOLLY
- MUSICO
- Applications
- Summary
- Future work
- Acknowledgements

Applications of approximate MD

- Combining w/ other algorithms
- Steered/interactive MD
- Monte Carlo (MC)
- Escaping from local minima
- Quicker exploring conformational space
- Coarsening
- Large system and/or long simulations
- Protein simulations Folding/Unfolding/Equilibrium

- Drug design
- Other

Approx. MD in drug design

- Designing drugs that regulate proteins
- Examples drugs that bind to
- Enzymes
- Receptors
- Ion channels and transporter systems
- Evaluation of binding affinity using approximate

MD - Very long simulations and better sampling

Approx. MD in drug design (cont.)

ProtoMol a framework for MD

Matthey, et al, ACM Tran. Math. Software (TOMS),

submitted

Modular design of ProtoMol (Prototyping Molecular

dynamics). Available at http//www.cse.nd.edu/lcl

s/protomol

Overview

- Introduction
- Molecular dynamics (MD) in action
- Classical MD
- Protein folding
- Nonlinear instabilities of Impulse integrator
- Approximate MD integrators
- Targeted MOLLY
- MUSICO
- Applications
- Summary
- Future work
- Acknowledgements

Summary

- Introduction
- Nonlinear instabilities (original)
- New algorithms
- Targeted MOLLY (original)
- Larger time steps and speedup
- Correct dynamics and structures
- MUSICO (original, in progress)
- Applications of approx. MD (in progress)

Overview

- Introduction
- Molecular dynamics (MD) in action
- Classical MD
- Protein folding
- Nonlinear instabilities of Impulse integrator
- Approximate MD integrators
- Targeted MOLLY
- MUSICO
- Applications
- Summary
- Future work
- Acknowledgements

Future work

- Short term
- MUSICO / Krylov solver
- Simulation of small proteins in vacuum
- Mid term
- Optimization of MUSICO / Krylov solver
- Simulation of large proteins in water
- Binding affinity in drug design
- Long term
- Distributed protein folding and drug design
- Infrastructure for heterogeneous distributed

computing similar to folding_at_home

Future work (cont.)

- Funding opportunities
- NSF
- NIH
- DOE
- Pharmaceutical industry

Key references

- 1 Izaguirre, Ma, et al. Overcoming

instabilities in Verlet-I/r-RESPA with the

mollified impulse method. Vol. 24 of Lecture

Notes in Comput. Sci. Eng., pages 146-174,

Springer-Verlag, Berlin, New York, 2002 - 2 Ma, Izaguirre and Skeel. Verlet-I/r-RESPA/Impu

lse is limited by nonlinear instability. SIAM J.

Scientific Computing, 2002 (in press). - 3 Ma and Izaguirre. Targeted mollified impulse

method for molecular dynamics. SIAM J. Multiscale

Modeling and Simulation, submitted - 4 Matthey, Cickovski, Hampton, Ko, Ma, Slabach

and Izaguirre. PROTOMOL, an object-oriented

framework for prototyping novel applications of

molecular dynamics. ACM Tran. Math. Software

(TOMS), submitted - 5 Ma, Izaguirre and Skeel. Nonlinear

instability in multiple time stepping molecular

dynamics. 2003 ACM Symposium on Applied Computing

(SAC03). Melborne, Florida. 2002 (in press) - 6 Ma and Izaguirre. Long time step molecular

dynamics using targeted Langevin Stabilization.

Accepted by the 2003 ACM SAC03. Melborne,

Florida. 2002 (in press)

Overview

- Introduction
- Molecular dynamics (MD) in action
- Classical MD
- Protein folding
- Nonlinear instabilities of Impulse integrator
- Approximate MD integrators
- Targeted MOLLY
- MUSICO
- Applications
- Summary
- Future work
- Acknowledgements

Acknowledgements

- People
- Dr. Jesus Izaguirre
- Dr. Robert Skeel, Univ. of Illinois at

Urbana-Champaign - Dr. Thierry Matthey, University of Bergen, Norway

- Resources
- Hydra and BOB clusters at ND
- Norwegian Supercomputing Center, Bergen, Norway
- Funding
- NSF BIOCOMPLEXITY-IBN-0083653, and
- NSF CAREER Award ACI-0135195

THE END. THANKS!

Key references

- 1 J. A. Izaguirre, Q. Ma, T. Matthey, et al.

Overcoming instabilities in Verlet-I/r-RESPA with

the mollified impulse method. In T. Schlick and

H. H. Gan, editors, Proceedings of the 3rd

International Workshop on Algorithms for

Macromolecular Modeling, Vol. 24 of Lecture Notes

in Computational Science and Engineering, pages

146-174, Springer-Verlag, Berlin, New York, 2002 - 2 Q. Ma, J. A. Izaguirre, and R. D. Skeel.

Verlet-I/r-RESPA/Impulse is limited by nonlinear

instability. Accepted by the SIAM Journal on

Scientific Computing, 2002. Available at

http//www.nd.edu/qma1/publication_h.html. - 3 Q. Ma and J. A. Izaguirre. Targeted mollified

impulse method for molecular dynamics. Submitted

to the SIAM Journal on Multiscale Modeling and

Simulation, 2003. - 4 T. Matthey, T. Cickovski, S. Hampton, A. Ko,

Q. Ma, T. Slabach and J. Izaguirre. PROTOMOL, an

object-oriented framework for prototyping novel

applications of molecular dynamics. Submitted to

the ACM Transactions on Mathematical Software

(TOMS), 2003. - 5 Q. Ma, J. A. Izaguirre, and R. D. Skeel.

Nonlinear instability in multiple time stepping

molecular dynamics. Accepted by the 2003 ACM

Symposium on Applied Computing (SAC03).

Melborne, Florida. March 2003 - 6 Q. Ma and J. A. Izaguirre. Long time step

molecular dynamics using targeted Langevin

Stabilization. Accepted by the 2003 ACM Symposium

on Applied Computing (SAC03). Melborne,

Florida. March 2003 - 7 M. Zhang and R. D. Skeel. Cheap implicit

symplectic integrators. Appl. Num. Math.,

25297-302, 1997

Other references

- 8 J. A. Izaguirre, Justin M. Wozniak, Daniel P.

Catarello, and Robert D. Skeel. Langevin

Stabilization of Molecular Dynamics, J. Chem.

Phys., 114(5)2090-2098, Feb. 1, 2001. - 9 T. Matthey and J. A. Izaguirre, ProtoMol A

Molecular Dynamics Framework with Incremental

Parallelization, in Proc. of the Tenth SIAM Conf.

on Parallel Processing for Scientific Computing,

2001. - 10 H. Grubmuller, H. Heller, A. Windemuth, and

K. Schulten, Generalized Verlet algorithm for

efficient molecular dynamics simulations with

long range interactions, Molecular Simulations 6

(1991), 121-142. - 11 M. Tuckerman, B. J. Berne, and G. J.

Martyna, Reversible multiple time scale molecular

dynamics, J. Chem. Phys 97 (1992), no. 3,

1990-2001 - 12 J. A. Izaguirre, S. Reich, and R. D. Skeel.

Longer time steps for molecular dynamics. J.

Chem. Phys., 110(19)98539864, May 15, 1999. - 13 L. Kale, R. Skeel, M. Bhandarkar, R.

Brunner, A. Gursoy, N. Krawetz, J. Phillips, A.

Shinozaki, K. Varadarajan, and K. Schulten.

NAMD2 Greater scalability for parallel molecular

dynamics. J. Comp. Phys., 151283312, 1999. - 14 R. D. Skeel. Integration schemes for

molecular dynamics and related applications. In

M. Ainsworth, J. Levesley, and M. Marletta,

editors, The Graduate Students Guide to

Numerical Analysis, SSCM, pages 119-176.

Springer-Verlag, Berlin, 1999 - 15 R. Zhou, , E. Harder, H. Xu, and B. J.

Berne. Efficient multiple time step method for

use with ewald and partical mesh ewald for large

biomolecular systems. J. Chem. Phys.,

115(5)23482358, August 1 2001.

Other references (cont.)

- 16 E. Barth and T. Schlick. Extrapolation

versus impulse in multiple-time-stepping schemes

Linear analysis and applications to Newtonian and

Langevin dynamics. J. Chem. Phys., 1997. - 17 I. Pagonabarraga, M. H. J. Hagen and D.

Frenkel. Self-consistent dissipative particle

dynamics algorithm. Europhys Lett., 42 (4), pp.

377-382 (1998). - 18 G. Besold, I. Vattulainen, M. Kartunnen, and

J. M. Polson. Towards better integrators for

dissipative particle dynamics simulations.

Physical Review E, 62(6)R7611R7614, Dec. 2000. - 19 R. D. Groot and P. B. Warren. Dissipative

particle dynamics Bridging the gap between

atomistic and mesoscopic simulation. J. Chem.

Phys., 107(11)44234435, Sep 15 1997. - 20 I. Pagonabarraga and D. Frenkel. Dissipative

particle dynamics for interacting systems. J.

Chem. Phys., 115(11)50155026, September 15

2001. - 21 Y. Duan and P. A. Kollman. Pathways to a

protein folding intermediate observed in a

1-microsecond simulation in aqueous solution.

Science, 282740-744, 1998. - 22 M. Levitt. The molecular dynamics of

hydrogen bonds in bovine pancreatic tripsin

inhibitor protein, Nature, 294, 379-380, 1981 - 23 M. P. Allen and D. J. Tildesley, Computer

Simulation of Liquids, Oxford University Press,

New York, 1987. - 24 E. Hairer, C. Lubich and G. Wanner.

Geometric numerical integration

structure-preserving algorithms for ordinary

differential equations. Springer, 2002 - 25 C. B. Anfinsen. Principles that govern the

folding of protein chains. Science, 181, 223-230,

1973

Flood of data genes and proteins

http//www.ncbi.nlm.nih.gov/Genbank/genbankstats.h

tml

Flood of data (cont.)

http//us.expasy.org/sprot/relnotes/relstat.html

Flood of data (cont.)

PDB Content Growth

http//www.rcsb.org/pdb/holdings.html

Flood of data (cont.)

http//us.expasy.org/sprot/relnotes/relstat.html

Pseudo-code of an MD simulation

- MD Simulation
- (1) Pre-processing Construct initial

configuration of x0, v0 and F0 - (2) loop 1 to number of steps
- (a) Update velocities (by a half step)
- (b) Update positions (by a full step)
- (c) Evaluate forces on each atom
- (d) Update velocities (by a half step)
- (3) Post-processing

Algorithm 1 Pseudo-code of an MD simulation.

Why protein folding?

- Huge gap sequence data and 3D structure data
- EMBL/GENBANK, DNA (nucleotide) sequences 15

million sequence, 15,000 million base pairs - SWISSPROT, protein sequences 120,000 entries
- PDB, 3D protein structures 20,000 entries
- Bridging the gap through prediction
- Aim of structural genomics
- Structurally characterize most of the protein

sequences by an efficient combination of

experiment and prediction, Baker and Sali (2001) - Thermodynamics hypothesis Native state is at

the global free energy minimum Anfinsen

(1973)

Folding challenging time scale gap

- Longest fully solvated
- Duan and Kollman, 1998
- 1 ?s single trajectory/100 days Cray T3E
- IBMs Blue Gene project 1999 -- 2005
- Petaflop (1015 flops per second) computer for

folding proteins by 2005

Stömer/Verlet/Leapfrog

- Discretization of Eq. (1) as
- Stömer (astronomy, its higher order variants,

1907, aurora borealis), Verlet (molecular

dynamics, 1967), Leapfrog (partial differential

equations) - An explicit one-step method
- Second order accurate, symplectic, and time

reversible

Verlet-I/r-RESPA/Impulse

- Grubmüller,Heller, Windemuth and Schulten,

1991 Tuckerman, Berne and Martyna, 1992 - The state-of-the-art MTS integrator
- Splitting via switching functions
- 2nd order accurate, time reversible

Algorithm 1. Half step discretization of Impulse

integrator

Nonlinear resonance of Impulse

Ma, Izaguirre and Skeel (2003)

- Approach
- Analytical Stability conditions using only

model parameters for a simple nonlinear problem - Numerical Long simulations differing only in

outer time steps correlation between step size

and instability - Results energy growth occurs unless
- longest ?t lt 1/3 shortest period.
- Unconditionally unstable 3rd order resonance
- Flexible waters outer time step less than

33.3fs - Proteins w/ SHAKE time step less than 45fs

Nonlinear instability analytical

- Approach
- 1-D nonlinear model problem, in the neighborhood

of stable equilibrium - MTS splitting of potential
- Analyze the reversible symplectic map
- Express stability condition in terms of problem

parameters - Result
- 3rd order resonance stable only if equality met
- 4th order resonance stable only if inequality

met - Impulse unstable at 3rd order resonance in

practice

Nonlinear analytical (cont.)

- Main result. Let
- 1. (3rd order) Map stable at equilibrium if

and unstable if - Impulse is unstable in practice.
- 2. (4th order) Map stable if

and unstable if - May be stable at the 4th order resonance.

Nonlinear numerical (cont.)

Fig. 2 Left flexible melittin protein (PDB

entry 2mlt). Right Energy drift from 10ns MD

simulation of flexible 2mlt at room temperature

revealing 31 nonlinear resonance (at 3, 3.27 and

3.78 fs).

Nonlinear numerical (cont.)

Energy drift from 500ps SHAKE-constrained MD

simulation of explicitly solvated 2mlt at room

temperature revealing combined 41 and 31

nonlinear resonance.

B-spline MOLLY (cont.)

TM background (cont.)

- Dissipative particle dynamics
- Coarsening
- Pair-wise random/dissipative force on elements

(thus momentum preserving) - Time reversible if using self-consistent scheme

(self-consistent dissipative leapfrog,

SCD-leapfrog)

Self-consistent dissipative leapfrog

Pagonabarraga, Hagen and Frenkel, 1998

Algorithm 2. One step of the self-consistent

dissipative leapfrog discretization

Binding effectiveness computation

- Structure based drug design
- First, study molecular details of a function or

condition - Then, find a protein target for a disease
- Third, design the drug that binds to the protein

target - Determine protein structure (experiments,

prediction) - Search drug databases screening
- Effectiveness of binding using molecular dynamics

Small proteins

Villin head piece protein

Beta hairpin protein

http//www.stanford.edu/group/pandegroup/Cosm/phas

e2.html

Targeted MOLLY discretization

- One step of Targeted MOLLY
- (1) Update velocities w/ mollified slow forces

(by a half long-step) - (2) Propagate positions and velocities w/ fast

forces and pairwise targeted Langevin coupling

(by a full long-step) - (3) Do a time-averaging of positions using

fastest forces and evaluate the mollified

slow force - (4) Update velocities w/ mollified slow forces

(by a half long-step)

Algorithm 1 Pseudo-code of one step of Targeted

MOLLY

TM one step of discretization

Algorithm 2. One step of the Targeted MOLLY (TM)

discretization

TM canonical ensemble sampling

Fig. 3 Phase diagram (left) and velocity

distribution (right) of the FPU problem Hairer,

C. Lubich and G. Wanner 2002 using TM with

Langevin coupling coefficient of 0.01 showing TM

recovers canonical ensemble after 2E07 steps with

initial conditions x(0) 0, v(0) 1.

MD in action coalescence

Coalescence of water droplets w/ surfactants in

gas phase. http//www.cse.nd.edu/lcls/protomol

ProtoMol object oriented framework for

MD Matthey and Izaguirre (2001), Matthey, et al

(2003)

Kale, Skeel, et al (1999)

http//www.cse.nd.edu/lcls/protomol

ProtoMol (cont.)

Approx. MD ion channel gating

KcsA potassium channel, courtesy of the

Theoretical and Computational Biophysics Group

at UIUC, http//www.ks.uiuc.edu/Research/smd_imd/k

csa/

MD Verlet Method

Energy function used to determine the force on

each atom

Newtons equation represents a set of N second

order differential equations which are solved

numerically at discrete time steps to determine

the trajectory of each atom.

Advantage of the Verlet Method requires only

one force evaluation per timestep

What is a Forcefield?

In molecular dynamics a molecule is described as

a series of charged points (atoms) linked by

springs (bonds).

To describe the time evolution of bond lengths,

bond angles and torsions, also the nonbond van

der Waals and elecrostatic interactions between

atoms, one uses a forcefield. The forcefield is a

collection of equations and associated constants

designed to reproduce molecular geometry and

selected properties of tested structures.

Energy Terms Described in the CHARMm forcefield

Bond

Angle

Dihedral

Improper

Energy Functions

Ubond oscillations about the equilibrium bond

length Uangle oscillations of 3 atoms about an

equilibrium angle Udihedral torsional rotation

of 4 atoms about a central bond Unonbond

non-bonded energy terms (electrostatics and

Lenard-Jones)

Multigrid I

Multigrid II

Results (10-4 rPE)

Simulation Results for Melittin

- PME requires about 3 of the CPU time (17 days 20

hours) when measured against Ewald - MG in pbc requires only about 1
- MG is about 66 faster than PME