An improved hybrid Monte Carlo method for conformational sampling of proteins - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

An improved hybrid Monte Carlo method for conformational sampling of proteins

Description:

... combination of experiment and prediction,' Baker and Sali (2001) ... Dr. Robert Skeel, Dr. Ruhong Zhou, and Dr. Christoph Schutte for valuable discussions ... – PowerPoint PPT presentation

Number of Views:86
Avg rating:3.0/5.0
Slides: 41
Provided by: JesusIz8
Category:

less

Transcript and Presenter's Notes

Title: An improved hybrid Monte Carlo method for conformational sampling of proteins


1
An improved hybrid Monte Carlo method for
conformational sampling of proteins
Jesús A. Izaguirre and Scott Hampton Department
of Computer Science and EngineeringUniversity of
Notre Dame March 5, 2003 This work is
partially supported by two NSF grants (CAREER and
BIOCOMPLEXITY) and two grants from University of
Notre Dame
2
Overview
2. Methods for sampling (MD, HMC)
1. Motivation sampling conformational space of
proteins
3. Evaluation of new Shadow HMC
4. Future applications
3
Protein The Machinery of Life
NH2-Val-His-Leu-Thr-Pro-Glu-Glu- Lys-Ser-Ala-Val-T
hr-Ala-Leu-Trp- Gly-Lys-Val-Asn-Val-Asp-Glu-Val- G
ly-Gly-Glu-..
4
Protein Structure
5
Why protein folding?
  • Huge gap sequence data and 3D structure data
  • EMBL/GENBANK, DNA (nucleotide) sequences 15
    million sequence, 15,000 million base pairs
  • SWISSPROT, protein sequences120,000 entries
  • PDB, 3D protein structures20,000 entries
  • Bridging the gap through prediction
  • Aim of structural genomics
  • Structurally characterize most of the protein
    sequences by an efficient combination of
    experiment and prediction, Baker and Sali (2001)
  • Thermodynamics hypothesis Native state is at
    the global free energy minimum Anfinsen
    (1973)

6
Questions related to folding I
  • Long time kinetics dynamics of folding
  • only statistical correctness possible
  • ensemble dynamics
  • e.g., folding_at_home
  • Short time kinetics
  • strong correctness possible
  • e.g., transport properties, diffusion coefficients

7
Questions related to folding II
  • Sampling
  • Compute equilibrium averages by visiting all
    (most) of important conformations
  • Examples
  • Equilibrium distribution of solvent molecules in
    vacancies
  • Free energies
  • Characteristic conformations (misfolded and
    folded states)

8
Overview
2. Methods for sampling (MD, HMC)
1. Motivation sampling conformational space of
proteins
3. Evaluation of new Shadow HMC
4. Future applications
9
Classical molecular dynamics
  • Newtons equations of motion
  • Atoms
  • Molecules
  • CHARMM force field(Chemistry at Harvard
    Molecular Mechanics)

Bonds, angles and torsions
10
What is a Forcefield?
In molecular dynamics a molecule is described as
a series of charged points (atoms) linked by
springs (bonds).
The forcefield is a collection of equations and
associated constants designed to reproduce
molecular geometry and selected properties of
tested structures.
11
Energy Terms Described in the CHARMm forcefield
Bond
Angle
Dihedral
Improper
12
Energy Functions
Ubond oscillations about the equilibrium bond
length Uangle oscillations of 3 atoms about an
equilibrium angle Udihedral torsional rotation
of 4 atoms about a central bond Unonbond
non-bonded energy terms (electrostatics and
Lennard-Jones)
13
Molecular Dynamics what does it mean?
MD change in conformation over time using a
forcefield
Energy
Energy supplied to the minimized system at the
start of the simulation
Conformation impossible to access through MD
Conformational change
14
MD, MC, and HMC in sampling
  • Molecular Dynamics takes long steps in phase
    space, but it may get trapped
  • Monte Carlo makes a random walk (short steps), it
    may escape minima due to randomness
  • Can we combine these two methods?

MC
MD
HMC
15
Hybrid Monte Carlo
  • We can sample from a distribution with density
    p(x) by simulating a Markov chain with the
    following transitions
  • From the current state, x, a candidate state x
    is drawn from a proposal distribution S(x,x).
    The proposed state is accepted with
    prob.min1,(p(x) S(x,x)) / (p(x) S(x,x))
  • If the proposal distribution is symmetric,
    S(x,x)) S(x,x)), then the acceptance prob.
    only depends on p(x) / p(x)

16
Hybrid Monte Carlo II
  • Proposal functions must be reversible
  • if x s(x), then x s(x)
  • Proposal functions must preserve volume
  • Jacobian must have absolute value one
  • Valid proposal x -x
  • Invalid proposals
  • x 1 / x (Jacobian not 1)
  • x x 5 (not reversible)

17
Hybrid Monte Carlo III
  • Hamiltonian dynamics preserve volume in phase
    space
  • Hamiltonian dynamics conserve the Hamiltonian
    H(q,p)
  • Reversible symplectic integrators for Hamiltonian
    systems preserve volume in phase space
  • Conservation of the Hamiltonian depends on the
    accuracy of the integrator
  • Hybrid Monte Carlo Use reversible symplectic
    integrator for MD to generate the next proposal
    in MC

18
HMC Algorithm
  • Perform the following steps
  • 1. Draw random values for the momenta p from
    normal distribution use given positions q
  • 2. Perform cyclelength steps of MD, using a
    symplectic reversible integrator with timestep
    ?t, generating (q,p)
  • 3. Compute change in total energy
  • ?H H(q,p) - H(q,p)
  • 4. Accept new state based on exp(-? ?H )

19
Hybrid Monte Carlo IV
  • Advantages of HMC
  • HMC can propose and accept distant points in
    phase space, provided the accuracy of the MD
    integrator is high enough
  • HMC can move in a biased way, rather than in a
    random walk (distance k vs sqrt(k))
  • HMC can quickly change the probability density

20
Hybrid Monte Carlo V
  • As the number of atoms increases, the total error
    in the H(q,p) increases. The error is related to
    the time step used in MD
  • Analysis of N replicas of multivariate Gaussian
    distributions shows that HMC takes O(N5/4 ) with
    time step ?t O(N-1/4) Kennedy Pendleton, 91

System size N Max ?t
66 0.5
423 0.25
868 0.1
5143 0.05
21
Hybrid Monte Carlo VI
  • The key problem in scaling is the accuracy of the
    MD integrator
  • More accurate methods could help scaling
  • Creutz and Gocksch 89 proposed higher order
    symplectic methods for HMC
  • In MD, however, these methods are more expensive
    than the scaling gain. They need more force
    evaluations per step

22
Overview
2. Methods for sampling (MD, HMC)
1. Motivation sampling conformational space of
proteins
3. Evaluation of new Shadow HMC
4. Future applications
23
Improved HMC
  • Symplectic integrators conserve exactly (within
    roundoff error) a modified Hamiltonian that for
    short MD simulations (such as in HMC) stays close
    to the true Hamiltonian Sanz-Serna Calvo 94
  • Our idea is to use highly accurate approximations
    to the modified Hamiltonian in order to improve
    the scaling of HMC

24
Shadow Hamiltonian
  • Work by Skeel and Hardy, 2001, shows how to
    compute an arbitrarily accurate approximation to
    the modified Hamiltonian, called the Shadow
    Hamiltonian
  • Hamiltonian H1/2pTM-1p U(q)
  • Modified Hamiltonian HM H O(?t p)
  • Shadow Hamiltonian SH2p HM O(?t 2p)
  • Arbitrary accuracy
  • Easy to compute
  • Stable energy graph
  • Example, SH4 H f( qn-1, qn-2, pn-1, pn-2
    ,ßn-1 ,ßn-2)

25
  • See comparison of SHADOW and ENERGY

26
Shadow HMC
  • Replace total energy H with shadow energy
  • ?SH2m SH2m (q,p) SH2m (q,p)
  • Nearly linear scalability of sampling rate
  • Computational cost SHMC, N(11/2m), where m is
    accuracy order of integrator
  • Extra storage (m copies of q and p)
  • Moderate overhead (25 for small proteins)

27
Example Shadow Hamiltonian
28
ProtoMol a framework for MD
Matthey, et al, ACM Tran. Math. Software (TOMS),
submitted
Modular design of ProtoMol (Prototyping Molecular
dynamics). Available at http//www.cse.nd.edu/lcl
s/protomol
29
SHMC implementation
  • Shadow Hamiltonian requires propagation of ß
  • Can work for any integrator

30
Systems tested
31
Sampling Metric 1
  • Generate a plot of dihedral angle vs. energy for
    each angle
  • Find local maxima
  • Label bins between maxima
  • For each dihedral angle, print the label of the
    energy bin that it is currently in

32
Sampling Metric 2
  • Round each dihedral angle to the nearest degree
  • Print label according to degree

33
Acceptance Rates
34
More Acceptance Rates
35
Sampling rate for decalanine (dt 2 fs)
36
Sampling rate for 2mlt
37
Sampling rate comparison
  • Cost per conformation is total simulation time
    divided by number of new conformations discovered
    (2mlt, dt 0.5 fs)
  • HMC 122 s/conformation
  • SHMC 16 s/conformation
  • HMC discovered 270 conformations in 33000 seconds
  • SHMC discovered 2340 conformations in 38000
    seconds

38
Conclusions
  • SHMC has a much higher acceptance rate,
    particularly as system size and timestep increase
  • SHMC discovers new conformations more quickly
  • SHMC requires extra storage and moderate
    overhead.
  • SHMC works best at relatively large timesteps

39
Future work
  • Multiscale problems for rugged energy surface
  • Multiple time stepping algorithms plus
    constraining
  • Temperature tempering and multicanonical ensemble
  • Potential smoothing
  • System size
  • Parallel Multigrid O(N) electrostatics
  • Applications
  • Free energy estimation for drug design
  • Folding and metastable conformations
  • Average estimation

40
Acknowledgments
  • Dr. Thierry Matthey, co-developer of ProtoMol,
    University of Bergen, Norway
  • Graduate students Qun Ma, Alice Ko, Yao Wang,
    Trevor Cickovski
  • Students in CSE 598K, Computational Biology,
    Spring 2002
  • Dr. Robert Skeel, Dr. Ruhong Zhou, and Dr.
    Christoph Schutte for valuable discussions
  • Dr. Radford Neals presentation Markov Chain
    Sampling Using Hamiltonian Dynamics
    (http//www.cs.utoronto.ca )
  • Dr. Klaus Schultens presentation An
    introduction to molecular dynamics simulations
    (http//www.ks.uiuc.edu )
Write a Comment
User Comments (0)
About PowerShow.com