An Optimization Approach to Protein Structure Prediction - PowerPoint PPT Presentation

About This Presentation
Title:

An Optimization Approach to Protein Structure Prediction

Description:

Robert Schnabel. Brett Bader. Lianjun Jiang. University of Colorado ... Ruckzinski, Kooperberg, Bonneau, and Baker. Proteins 48, 2002. Parallel Organization ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 32
Provided by: betty51
Learn more at: https://www.mcs.anl.gov
Category:

less

Transcript and Presenter's Notes

Title: An Optimization Approach to Protein Structure Prediction


1
An Optimization Approach to Protein Structure
Prediction
  • Richard Byrd
  • Betty Eskow
  • Robert Schnabel
  • Brett Bader
  • Lianjun Jiang
  • University of Colorado
  • Teresa Head-Gordon
  • Univ. of California, Berkeley
  • Silvia Crivelli
  • Lawrence Berkeley Laboratory

2
Problem Definition
Predict the 3-dimensional shape, or native
state, of a protein given its sequence of
constituent amino acids.
Approach
Assuming the native state of a protein
corresponds to its minimum free energy state, use
a global optimization method to find the minimum
energy configuration of the target protein.
3
Importance of Protein Folding
  • 3-Dimensional structure useful in molecular drug
    design.
  • Laboratory experiments are expensive
  • X-ray crystallography
  • NMR
  • Genome projects are providing sequences for many
    proteins whose structure will need to be
    determined.

4
Protein Structures
Proteins consist of a long chain of amino acids
called the primary structure.
Pro
Gly
Leu
Ser
The constituent amino acids may encourage
hydrogen bonding and form regular structures,
called secondary structures.
a-helix
b-sheet
The secondary structures fold together to form a
compact 3-dimensional or tertiary structure.
5
Chemistry of Proteins
Side chain
Amino acid
Backbone
H-bond
Hydrogen bonds strongly influence a proteins
shape. They largely occur in secondary
structures and help hold the protein together.
6
Computational Approaches to Protein Structure
Prediction
  • Comparative Modeling
  • Compares and aligns to a known protein sequence
    of amino acids
  • Fold Recognition
  • Searches for the best fitting fold template from
    a library of known protein folds
  • New Fold Methods
  • Not based on knowledge of complete protein
    sequences or folds
  • e.g. energy minimization

7
Global Optimization Problem
The 3-dimensional structure of the protein found
in nature is believed to minimize potential
energy Min V(x) where x atom coordinates
Challenges
8
Amber Energy Function
V(x)
S
cl(b - b0)2
(b bond length) ? ?
bonds
(q bond angle) ? ?
?
S
ca(q - q0)2

bond angles

S
cd1 cos(n ??)
(w dihedral angle) ? ?
? ?
dihedral angles

S
(rij distance)
charged pairs
S

cwj(rij)
(j Lennard-Jones potential)
nonbonded pairs
Internal coordinates are determined using bonds,
bond angles and dihedral angles
Internal coordinates are determined using bonds,
bond angles and dihedral angles.
9
Additional energy terms to model protein
behavior in an aqueous environment
  • Formulated from simulations of pairs of
    hydrophobic molecules in water
  • ESOLVATION
  • Advantages of this model
  • Provides stabilizing force for forming
    hydrophobic cores.
  • Well defined model of the hydrophobic effect of
    small hydrophobic groups in water.
  • Computationally tractable and differentiable

i,j are aliphatic carbons, M Gaussians with
position(ck ), depth(hk) and width(wk) describe
2 minima (1) molecules in contact and
(2)mol-ecules separated by a distance of 1 water
molecule.
10
Global Optimization Approaches
  • Deterministic methods
  • Branch and bound, interval methods
  • Very reliable, deterministic guarantees
  • Too expensive for more than 20-50 variables
  • Stochastic methods
  • Random steps or sampling
  • Probabilistic guarantees
  • Practical for lt 300 variables
  • Heuristic search
  • e.g. Simulated annealing, Tabu search, Genetic
    algorithms
  • Effective on some very large problems
  • No practical guarantees

11
A Stochastic-Perturbation Global Optimization
Approach
  • Generate and maintain a pool of candidates
    (configurations), as in genetic algorithms.
  • Solve the full-dimensional problem as a series of
    small-dimensional ones.
  • Use protein database information to bias toward
    likely substructures.

12
Algorithm Phases
Given the amino acid sequence of a protein, find
the 3-dimensional structure likely to be found in
nature.
Simplify problem by utilizing domain-specific
knowledge
Generate Initial Population
Global Optimization
Phase 1
Phase 2
13
Phase 1 Create Initial Population
  • Submit amino acid sequence to server
  • EFIAIYDYKAETEEDLTIKKGEKLEIIEKEGDWWKAKAIGSGEI
    GY
  • IPANYIAAAE
  • Use server predictions to determine the
    location of a-helices, ß-strands, and coils
  • CCCCHHHHHHEEEEEEEEEEEECCEEEEEEEEEEEHHHHHHHHCCC
  • HHHHHHCCCC
  • Use ProteinShop visualization tool to form
    configurations with secondary structure
  • Assign ideal values to the dihedral angles
    in the sequence according to the predictions.
    Manipulate ß-strands to form ß-sheets.
  • ? Perform Energy Minimizations

14
Phase 2Improve Local Minima
Select a protein and a subset of dihedral angles
  • Uses a combination of breadth-first and
    depth-first searches from initial pool
  • Dihedral angles act as internal coordinates and
    reduce the number of variables, speeding an
    optimization run

Small-scale global optimization
Full-dimensional local optimization
iterate
Cluster minima and test stopping criteria
15
Small Scale Global Optimization in Phase 2
  • Minimize energy over 5-20 torsion angles
  • Use a stochastic global optimization algorithm
    base on sampling, sample pruning and local
    minimization (Rinooy-Kan et al).
  • From best start points, do local minimizations
    using quasi-Newton

16
Full-scale local minimizations
  • Using best points from small-scale global, do
    local minimizations.
  • Because of problem size we use limited-memory
    quasi-Newton.
  • Best local minimizers are added to pool.

17
Biasing functions
  • Used to form secondary structure during in first
    phase and sometimes infull-dimensional local
    minimizations.
  • Dihedral angle biasing
  • E?? ? dihedrals k f1 cos(f - f0) k?1
    cos(? - ?0)
  • Hydrogen Bond biasing
  • For ?-helices
  • EHB wiwi4 / Dri,i4 (ws are weights
    from the server for residues i and i4 in
    the helix)
  • To form ?-sheets from ?-strands
  • EHB? wiwj / Dri,j


18
Neural Network Predictions
Sequence
SKIGIDGFGRIGRLVLRAALSCGAQ
Neural nets trained on a large database of
proteins can predict secondary structure likely
to be in a target protein.
Sequence Type Weight
SKIGIDGFGRIGRLVLRAALSCGAQ BBBB B AAAAAAA
BBBBB 13552 6789992 56673
19
Forming ß-sheets from the predicted ?-strands
is a combinatorial problem.
Which strands are paired?
?
?
?
Which orientation?
anti-parallel
parallel
Which residues are paired?
even
odd
20
  • Distribution of Beta Sheets in
    Proteins with Applications to Structure
    Prediction
  • Ruckzinski, Kooperberg, Bonneau,
    and Baker
  • Proteins 48, 2002

21
Parallel Organization
  • Select k subsets of dihedral angles
  • Maintain a queue of (configuration,subspace) for
    k optimization crews to work on
  • Each optimization crew performs a small-scale
    global optimization of its assigned configuration
    and subspace.
  • Gather intermediate results and re-insert them
    into the work queue. Idle optimization crews do
    full-dimensional local minimizations or
    additional small-scale global optimization.
  • ?Massively parallel exploration of optimization
    space
  • Automatic load balancing

22
2UTG_A 7.5Å R.M.S.D. from Crystal
1POU 6.3Å R.M.S.D. from NMR structure
23
CASP competition
  • Community-wide experiment on the Critical
    Assessment of Techniques for Protein Structure
    Prediction
  • ? Protein crystallographers and NMR
    spectroscopists provide structures prior to their
    publication for blind prediction by participants.
  • ? Biannual competition open to all
    computational methods including servers.
  • ? Difficulty of targets assessed by which type
    of methods work to predict the structure CM,
    FR, NF.
  • ? We participated in CASP4 (Dec. 2000) and
    CASP5 (Dec. 2002).

24
Our submitted CASP4 models ranked by target
difficulty and relative accuracy
25
Results on Phospholipase C beta C-terminus,
turkey (containing 242 amino acids). Ribbon
structure comparison between experiment (center),
submitted M1 prediction (right), our lowest
energy submission, had an RMSD with experiment of
8.46Å, and next generation run of the global
optimization algorithm (left). This new run
lowered the energy of our previous best
minimizer, resulting in a new structure with an
RMSD of 7.7Å.
26
CASP4 Results Summary
  • ? Best structure predicted on one of the
    hardest targets
  • ? Our method is more effective than some
    knowledge-based methods on targets for which less
    information from known proteins is available.
  • ? Global optimization algorithm is very
    effective at improving structures from a small
    initial population.

27
Our submitted CASP5 models ranked by target
difficulty and relative accuracy
28
Our submitted CASP5 models of targets (domains)
that were assessed in the CASP5 NEW FOLD
category.
29
Our submissions for CASP5 Target 162
30
CASP5 Results Summary
  • Ranked 15/165 groups in assessments of New
    Fold (and NF/FR) Results.
  • Our method uses less knowledge from known protein
    structures than most other (New Fold) methods
    participating in CASP5
  • More diverse starting populations (especially for
    ?-sheet proteins) using the visualization tool
    led to better performance in some cases.

31
Future Research Directions
  • Simpler energy models for early stages of the
    algorithm, and alternative models of solvation.
  • New techniques for choosing ?-strand pairings.
  • Improve our techniques for maintaining existing
    secondary structure in our models.
Write a Comment
User Comments (0)
About PowerShow.com