Folding@Home and Genome@home: Protein folding and design with distributed computing - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Folding@Home and Genome@home: Protein folding and design with distributed computing

Description:

Dr. Jay Ponder (Wash U) Folding_at_home users. Dr. John Desjarlais (Xencor) Jeremy England (Harvard) ... Force fields (e.g. Charmm, Amber) Lots of parameters, ... – PowerPoint PPT presentation

Number of Views:105
Avg rating:3.0/5.0
Slides: 31
Provided by: stefan143
Category:

less

Transcript and Presenter's Notes

Title: Folding@Home and Genome@home: Protein folding and design with distributed computing


1
Folding_at_Home and Genome_at_home Protein folding and
design with distributed computing
Stefan Larson Pande Group Dept. of Chemistry and
Biophysics Program Stanford University
2
Credits
  • Pande Group
  • Dr. Vijay Pande
  • Folding_at_home
  • Siraj Khaliq
  • Young Min Rhee
  • Michael Shirts
  • Chris Snow
  • Eric Sorin
  • Bojan Zagrovic
  • Sidney Elmer
  • Genome_at_home
  • Stefan Larson
  • Vishal Vaidyanathan
  • Amit Garg
  • Guha Jayachandran
  • Collaborators
  • Adam Beberg (Mithral)
  • Dr. Jed Pitera (IBM)
  • Dr. Bill Swope (IBM)
  • Dr. Jay Ponder (Wash U)
  • Folding_at_home users
  • Dr. John Desjarlais (Xencor)
  • Jeremy England (Harvard)
  • Genome_at_home users

3
Molecular simulations in computational biology
4
Common challenges of Computational Biology
  • Problems related to folding
  • Structure prediction
  • Binding
  • Protein-protein interaction
  • Issues
  • Models
  • Force fields (e.g. Charmm, Amber)
  • Lots of parameters, constrained by experiment
    good enough?
  • Sampling
  • Can simulate 1ns 10-9 sec in a day
  • Need to sample 104 to 106 ns!

5
Why simulate?
  • Physics ? chemistry ? biology
  • Start from the laws of physics and chemistry,
  • explain the properties of biomolecules
  • Experiments less detailed
  • Spectroscopies, FRET, NMR, etc.
  • Crystals are static
  • Simulations very detailed
  • Femtosecond time resolution
  • Angstrom spatial resolution
  • Much like having thousands of completely detailed
    single molecule experiments

6
Goals
  • Can we characterize folding computationally?
  • Accurate rates
  • Detailed mechanisms
  • Can we design proteins?
  • Specific stable structure
  • Retention of function

7
Challenges of simulation
Sampling (tractability)
Analysis (insight)
Models (force fields)
8
Simulating protein folding
9
The Challenges of Protein Folding Simulation
  • How can we overcome the long timescales?
  • Fastest proteins in 10s to 100s ms
  • Simulations orders of magnitude shorter
  • Are force fields good enough?
  • Would we reach the native state (w/o NS info)?
  • Would we quantitatively predict folding rates,
    DG, etc under experimental conditions (30C)?
  • Can we use simulation to learn about folding?
  • By what mechanism do they fold?
  • Do we agree with any folding theories?

10
Protein folding as a paradigm for other hard
problems
  • Protein folding dynamics
  • How do proteins fold?
  • Protein design
  • Predict sequence given
  • structure
  • Ligand binding
  • How strongly do ligands bind
  • Rational drug design
  • Allosteric motions
  • Protein structure prediction
  • Predict structure given sequence

above HIV protease dimer bound ligand
11
Relevant timescales
Bond vibration
Isomeris- ation
Water dynamics
Helix forms
Fastest folders
typical folders
slow folders
10-15 femto
10-12 pico
10-9 nano
10-6 micro
10-3 milli
100 seconds
long MD run
where we need to be
MD step
where wed love to be
  • 16 order of magnitude range
  • Femtosecond timesteps
  • Need to simulate micro to milliseconds

12
Traditional parallel MDFew, long trajectories
  • Divide the force calculations between processors
  • Spatial decomposition for work division
  • Requires fast communication

T3E supercomputer
IBM Blue Gene
Duan and Kollman, Science (1998)
Problem we need WAY more time than is
available at current supercomputer centers
13
Our methodMany, short trajectories
  • Advantages of exponential kinetics
  • Number that fold in time t
  • M f(t) M1exp(-kt) Mkt for small kt
  • M 10,000 procs, k 1/10,000ns, t 20ns/proc
    expect Mkt 20 simulations to fold
  • Computationally economical
  • Doesnt waste resources on communication
  • Natural for large, heterogeneous clusters
  • Important for folding
  • Heterogeneity of paths, statistics
  • ergodicity

14
http//folding.stanford.edu
15
Distributed computing
The server sends and receives the work units
(essentially just protein structures and
sequences). It verifies, collates and stores the
returned data, completes initial analyses, and
computes user statistics for the website.
The client uses the spare CPU cycles on a users
computer to run the simulation algorithm on the
assigned structure. Results are automatically
returned and exchanged for a new work unit on a
daily basis.
home lab/office anywhere
16
Worldwide distributed computing
17
Protein folding results
18
What to fold?fastest folders
105
60
104
10
CPU years
103
Nanoseconds, CPU-days
1
102
10
1
alpha helix
beta hairpin
PPA
BBA5
villin
19
Rates predicted vs experiment
Experiments villin Raleigh, et al, SUNY,
Stony Brook BBAW Gruebele, et al, UIUC beta
hairpin Eaton, et al, NIH alpha helix Eaton,
et al, NIH PPA Gruebele, et al, UIUC
100000
villin
BBAW
10000
beta hairpin
1000
Predicted folding time (nanoseconds)
100
alpha helix
10
PPA
1
1
10
100
1000
10000
100000
experimental measurement (nanoseconds)
20
Mechanism How did these proteins fold?
  • Form secondary structure first
  • Form helices hairpins
  • Hierarchical, decrease in entropy
  • Collapse first
  • Hydrophobically driven
  • Need to remove water to form hydrogen bonds
  • Form rough native shape first
  • Need to find the right topology first
  • Then pack side chains

21
What have we learned?
  • Can tackle sampling today
  • Forcefields sufficient?
  • ? Folding to the native state
  • ? folding rate prediction
  • Role of water
  • Explicit solvent not crucial to rate
    determination?
  • Compare to explicit solvent simulation
  • Universal mechanism of folding?
  • Maybe no universal mechanism all proteins could
    be different?

22
Looking to the future
  • Folding as a test of force fields
  • Explicit solvents
  • Other forcefields (CHARMM, Amber, OPLSAA-2000)
  • Other comparisons to experiments
  • Simulating the folding of larger proteins
  • Examination of more biologically interesting
    folding/conformational changes

23
Protein design
24
Exploring sequence space large scale protein
design
Stanford University Stefan Larson Amit Garg Guha
Jayachandran Dr. Vijay Pande Harvard
University Jeremy England Xencor, Inc. Dr. John
Desjarlais
gah.stanford.edu
25
Why design proteins?
  • Understand folding
  • If we understand, then we can build it
  • Build novel structures
  • Design of novel folds (Harbury)
  • Structure prediction
  • Large scale libraries
  • Mimic evolution to understand it

26
Utility of large sequence libraries
  • Directed evolution
  • constrain and guide mutagenesis steps
  • enrich starting material in structured
    sequences.
  • Homology modeling
  • broader sequence database for finding homologues
  • generate sequence profiles for alignments, etc.
  • Drug design
  • In silico screening of peptide and
    peptide-mimetic ligands to reduce lead libraries
    for drug design.

27
Computational exploration of sequence space
  • Approach
  • Detailed all-atom protein representations
  • Standard molecular mechanics force-fields
  • Generate large sequence libraries
  • Apply results to relevant biomedical questions
  • Challenges
  • modeling backbone flexibility
  • generating sequence diversity
  • large scale iteration of design process

28
Sequence prediction algorithm
Wollacott AM, Desjarlais JR. Virtual interaction
profiles of proteins. J Mol Biol. 2001,
313(2)317-42. Raha K, Wollacott AM, Italia MJ,
Desjarlais JR. Prediction of amino acid sequence
from structure. Protein Sci. 2000, 9(6)1106-19.
Johnson EC, Lazar GA, Desjarlais JR, Handel TM.
Solution structure and dynamics of a designed
hydrophobic core variant of ubiquitin. Structure
Fold Des. 1999, 7(8)967-76. Desjarlais JR,
Handel TM. Side-chain and backbone flexibility
in protein core design. J Mol Biol. 1999,
290(1)305-18. Lazar GA, Desjarlais JR, Handel
TM. De novo design of the hydrophobic core of
ubiquitin. Protein Sci. 1997, 6(6)1167-78. Desj
arlais JR, Handel TM. De novo design of the
hydrophobic cores of proteins. Protein Sci.
1995, 4(10)2006-18.
  • Energy function
  • Amber/OPLS parameters
  • implicit solvation
  • Sampling
  • genetic algorithm
  • structure-dependent rotamer space

29
Structural ensembles
Increased sequence diversity
Decreased identity to native sequence
30
Large scale sequence generation
Diversity study
Total structures 253
Total backbone variants 25,300
Total time of data collection 62 days
Processors available 3,000
Total sequences generated 188,725
31
Sequence quality
32
Designability
33
New directions
  • Ongoing work
  • Characterization of sequence space
  • Natural sequence diversity (SH3)
  • Homology modeling database
  • SH3 peptide ligand design
  • Experimental validation of designed sequences
  • Hybrid approaches to protein design
  • Design of peptide-mimetic ligands
  • Design of functional proteins
  • New design algorithms and parameter sets
Write a Comment
User Comments (0)
About PowerShow.com