Folding@Home and Genome@home: Protein folding and design with distributed computing - PowerPoint PPT Presentation

1 / 30

About This Presentation

Title:

Folding@Home and Genome@home: Protein folding and design with distributed computing

Description:

Dr. Jay Ponder (Wash U) Folding_at_home users. Dr. John Desjarlais (Xencor) Jeremy England (Harvard) ... Force fields (e.g. Charmm, Amber) Lots of parameters, ... – PowerPoint PPT presentation

Number of Views:105

Avg rating:3.0/5.0

Slides: 31

Provided by: stefan143

Category:

more less

Transcript and Presenter's Notes

Title: Folding@Home and Genome@home: Protein folding and design with distributed computing

1
Folding_at_Home and Genome_at_home Protein folding and
design with distributed computing
Stefan Larson Pande Group Dept. of Chemistry and
Biophysics Program Stanford University
2
Credits

Pande Group
Dr. Vijay Pande
Folding_at_home
Siraj Khaliq
Young Min Rhee
Michael Shirts
Chris Snow
Eric Sorin
Bojan Zagrovic
Sidney Elmer
Genome_at_home
Stefan Larson
Vishal Vaidyanathan
Amit Garg
Guha Jayachandran

Collaborators
Adam Beberg (Mithral)
Dr. Jed Pitera (IBM)
Dr. Bill Swope (IBM)
Dr. Jay Ponder (Wash U)
Folding_at_home users
Dr. John Desjarlais (Xencor)
Jeremy England (Harvard)
Genome_at_home users

3
Molecular simulations in computational biology
4
Common challenges of Computational Biology

Problems related to folding
Structure prediction
Binding
Protein-protein interaction
Issues
Models
Force fields (e.g. Charmm, Amber)
Lots of parameters, constrained by experiment
good enough?
Sampling
Can simulate 1ns 10-9 sec in a day
Need to sample 104 to 106 ns!

5
Why simulate?

Physics ? chemistry ? biology
Start from the laws of physics and chemistry,
explain the properties of biomolecules
Experiments less detailed
Spectroscopies, FRET, NMR, etc.
Crystals are static
Simulations very detailed
Femtosecond time resolution
Angstrom spatial resolution
Much like having thousands of completely detailed
single molecule experiments

6
Goals

Can we characterize folding computationally?
Accurate rates
Detailed mechanisms
Can we design proteins?
Specific stable structure
Retention of function

7
Challenges of simulation
Sampling (tractability)
Analysis (insight)
Models (force fields)
8
Simulating protein folding
9
The Challenges of Protein Folding Simulation

How can we overcome the long timescales?
Fastest proteins in 10s to 100s ms
Simulations orders of magnitude shorter
Are force fields good enough?
Would we reach the native state (w/o NS info)?
Would we quantitatively predict folding rates,
DG, etc under experimental conditions (30C)?
Can we use simulation to learn about folding?
By what mechanism do they fold?
Do we agree with any folding theories?

10
Protein folding as a paradigm for other hard
problems

Protein folding dynamics
How do proteins fold?
Protein design
Predict sequence given
structure
Ligand binding
How strongly do ligands bind
Rational drug design
Allosteric motions
Protein structure prediction
Predict structure given sequence

above HIV protease dimer bound ligand
11
Relevant timescales
Bond vibration
Isomeris- ation
Water dynamics
Helix forms
Fastest folders
typical folders
slow folders
10-15 femto
10-12 pico
10-9 nano
10-6 micro
10-3 milli
100 seconds
long MD run
where we need to be
MD step
where wed love to be

16 order of magnitude range
Femtosecond timesteps
Need to simulate micro to milliseconds

12
Traditional parallel MDFew, long trajectories

Divide the force calculations between processors
Spatial decomposition for work division
Requires fast communication

T3E supercomputer
IBM Blue Gene
Duan and Kollman, Science (1998)
Problem we need WAY more time than is
available at current supercomputer centers
13
Our methodMany, short trajectories

Advantages of exponential kinetics
Number that fold in time t
M f(t) M1exp(-kt) Mkt for small kt
M 10,000 procs, k 1/10,000ns, t 20ns/proc
expect Mkt 20 simulations to fold
Computationally economical
Doesnt waste resources on communication
Natural for large, heterogeneous clusters
Important for folding
Heterogeneity of paths, statistics
ergodicity

14
http//folding.stanford.edu
15
Distributed computing
The server sends and receives the work units
(essentially just protein structures and
sequences). It verifies, collates and stores the
returned data, completes initial analyses, and
computes user statistics for the website.
The client uses the spare CPU cycles on a users
computer to run the simulation algorithm on the
assigned structure. Results are automatically
returned and exchanged for a new work unit on a
daily basis.
home lab/office anywhere
16
Worldwide distributed computing
17
Protein folding results
18
What to fold?fastest folders
105
60
104
10
CPU years
103
Nanoseconds, CPU-days
1
102
10
1
alpha helix
beta hairpin
PPA
BBA5
villin
19
Rates predicted vs experiment
Experiments villin Raleigh, et al, SUNY,
Stony Brook BBAW Gruebele, et al, UIUC beta
hairpin Eaton, et al, NIH alpha helix Eaton,
et al, NIH PPA Gruebele, et al, UIUC
100000
villin
BBAW
10000
beta hairpin
1000
Predicted folding time (nanoseconds)
100
alpha helix
10
PPA
1
1
10
100
1000
10000
100000
experimental measurement (nanoseconds)
20
Mechanism How did these proteins fold?

Form secondary structure first
Form helices hairpins
Hierarchical, decrease in entropy
Collapse first
Hydrophobically driven
Need to remove water to form hydrogen bonds
Form rough native shape first
Need to find the right topology first
Then pack side chains

21
What have we learned?

Can tackle sampling today
Forcefields sufficient?
? Folding to the native state
? folding rate prediction
Role of water
Explicit solvent not crucial to rate
determination?
Compare to explicit solvent simulation
Universal mechanism of folding?
Maybe no universal mechanism all proteins could
be different?

22
Looking to the future

Folding as a test of force fields
Explicit solvents
Other forcefields (CHARMM, Amber, OPLSAA-2000)
Other comparisons to experiments
Simulating the folding of larger proteins
Examination of more biologically interesting
folding/conformational changes

23
Protein design
24
Exploring sequence space large scale protein
design
Stanford University Stefan Larson Amit Garg Guha
Jayachandran Dr. Vijay Pande Harvard
University Jeremy England Xencor, Inc. Dr. John
Desjarlais
gah.stanford.edu
25
Why design proteins?

Understand folding
If we understand, then we can build it
Build novel structures
Design of novel folds (Harbury)
Structure prediction
Large scale libraries
Mimic evolution to understand it

26
Utility of large sequence libraries

Directed evolution
constrain and guide mutagenesis steps
enrich starting material in structured
sequences.
Homology modeling
broader sequence database for finding homologues
generate sequence profiles for alignments, etc.
Drug design
In silico screening of peptide and
peptide-mimetic ligands to reduce lead libraries
for drug design.

27
Computational exploration of sequence space

Approach
Detailed all-atom protein representations
Standard molecular mechanics force-fields
Generate large sequence libraries
Apply results to relevant biomedical questions

Challenges
modeling backbone flexibility
generating sequence diversity
large scale iteration of design process

28
Sequence prediction algorithm
Wollacott AM, Desjarlais JR. Virtual interaction
profiles of proteins. J Mol Biol. 2001,
313(2)317-42. Raha K, Wollacott AM, Italia MJ,
Desjarlais JR. Prediction of amino acid sequence
from structure. Protein Sci. 2000, 9(6)1106-19.
Johnson EC, Lazar GA, Desjarlais JR, Handel TM.
Solution structure and dynamics of a designed
hydrophobic core variant of ubiquitin. Structure
Fold Des. 1999, 7(8)967-76. Desjarlais JR,
Handel TM. Side-chain and backbone flexibility
in protein core design. J Mol Biol. 1999,
290(1)305-18. Lazar GA, Desjarlais JR, Handel
TM. De novo design of the hydrophobic core of
ubiquitin. Protein Sci. 1997, 6(6)1167-78. Desj
arlais JR, Handel TM. De novo design of the
hydrophobic cores of proteins. Protein Sci.
1995, 4(10)2006-18.

Energy function
Amber/OPLS parameters
implicit solvation
Sampling
genetic algorithm
structure-dependent rotamer space

29
Structural ensembles
Increased sequence diversity
Decreased identity to native sequence
30
Large scale sequence generation
Diversity study
Total structures 253
Total backbone variants 25,300
Total time of data collection 62 days
Processors available 3,000
Total sequences generated 188,725
31
Sequence quality
32
Designability
33
New directions