Computational Protein Design: A problem in combinatorial optimization - PowerPoint PPT Presentation

About This Presentation
Title:

Computational Protein Design: A problem in combinatorial optimization

Description:

There are 20 different amino acids that can be part of the chain. ... Bond, angle, dihedral. i , j are positions in sequence. Ri is rotamer choice at position i ... – PowerPoint PPT presentation

Number of Views:215
Avg rating:3.0/5.0
Slides: 66
Provided by: dfgr
Category:

less

Transcript and Presenter's Notes

Title: Computational Protein Design: A problem in combinatorial optimization


1
Computational Protein DesignA problem in
combinatorial optimization
  • CSE 549 Guest Lecture
  • September 17, 2009
  • David Green
  • Applied Mathematics Statistics

2
What is a protein?
  • Polymers (chains) of amino acids.
  • There are 20 different amino acids that can be
    part of the chain.
  • Machines of the cell.
  • Its proteins that do most of the work involved
    in life!

3
Polymers of amino acids.
  • Amino acids link to form polypeptides.
  • There is a backbone of constant composition.
  • There are side chains that vary.

4
The twenty amino acids.
  • AA side chains vary from
  • Big to small.
  • Non-polar (all C and H) to polar.
  • Positive to negative.
  • Flexible to rigid.

5
The machinery of life.
  • Protein sensors (receptors) are responsible for
    all the senses (sight, smell, taste, touch,
    hearing).
  • Enzymes are proteins the catalyze chemical
    reactions, like the ones that convert food to
    energy.
  • Specialized structural proteins make skin
    elastic, and make the lens of the eye work.
  • Muscles are primarily composed of proteins that
    combine structural and enzymatic parts to make a
    machine.

6
Why design proteins?
  • New sensors based on biology.
  • Proteins have been engineered to detect TNT
    (explosive) and sarin (nerve gas).
  • Proteins are used as treatments for many
    diseases.
  • Protein engineering has helped improve proteins
    that are given to cancer patients on radiation or
    chemo-therapy.
  • Work in the Green lab is on-going to design
    proteins for use as anti-HIV prophylatics.
  • Many nanotechnology applications that havent
    even been considered yet!

7
Where do proteins come from?
  • The genome contains instructions for every
    protein in a cell.
  • A few HUGE molecules of DNA.
  • Each gene is the code for one protein.
  • There are 30,000 genes in humans.
  • Genes are expressed through an intermediate
    molecule, RNA.
  • Many copies of each protein can be made.

8
The Central Dogma of Molecular Biology.
  • Then proteins do the work!

9
How do proteins work?
  • Proteins fold into a unique 3-dimensional
    structure.
  • The amino acid sequence of a protein dictates
    its structure.
  • The function of a protein is controlled by its
    structure.

10
Many polymers are long, unstructured chains.
  • Polyethylene
  • Is made of long chains of the same monomers.
  • Adopts a random mesh of inter-weaving strands.
  • This structure gives us PLASTIC!

11
DNA has the same structure for every sequence.
  • The double-helix is a great structure for
    storing and replication information.

12
Protein structures are well-defined and diverse!
  • One chain or many.
  • Elongated or globular.
  • Many forms of symmetry (or none).

13
What does a protein look like?
  • Cyanovirin A protein that inhibits the entry of
    HIV into human cell.

14
What does a protein look like?
  • The atoms of a protein form a compact,
    well-packed cluster.

15
What does a protein look like?
  • A protein can be thought of as a nearly solid
    object.

16
What does a protein look like?
  • Simplified cartoons make the structure easier to
    see.

17
What does a protein look like?
  • The path of the backbone of a protein is called
    its fold.

18
What does a protein look like?
  • Different types of amino acids are found all
    along the protein chain.

19
What does a protein look like?
  • Each amino acid has a side chain that protrudes
    from the backbone.

20
What does a protein look like?
  • Many proteins bind other molecules, like the
    sugar molecules here.

21
What does a protein look like?
  • Binding interfaces are usually a close fit of two
    complementary surfaces.

22
What does a protein look like?
  • The core of a protein is key in keeping a stable
    structure.

23
Many side chains fill the core.
24
The core is well packed
25
with groups from all along the chain.
26
Each side chain fits perfectly.
27
What is a protein?
  • A protein is a complicated three-dimensional
    structure, made up by an amazing 3-D jigsaw
    puzzle of interlocking amino acids.
  • Amino acids pack together not just geometrically,
    but with complementary chemical groups as well.
  • Proteins move too, but well ignore that for now.

28
How can we design one?!?
  1. Choose a fold (path of the backbone).
  2. Pack the core with the right set of amino acids
    to achieve the desired fold.
  3. Choose other amino acids to achieve the desired
    function (such as binding to a target molecule,
    or getting the right molecular motions).

29
Structure prediction is a forward problem.
  • Given a protein sequence, what is the structure
    that it will adopt (fold to)?
  • This is a VERY hard problem, and it not yet fully
    solved.
  • Prediction is difficult because you are stuck
    with what nature gives you.

30
Protein design is an inverse problem.
  • Given a desired 3-dimensional protein structure,
    what is a sequence that will fold to that
    structure?
  • We have the freedom to add constraints that
    simplify the problem.
  • As a result, methods for protein design have had
    many successes.
  • Pabo. Nature 301 200 (1981).
  • Drexler. PNAS 78 5275-5278 (1981).

31
A designed sequence should fold according to
design.
  • ANY sequence which folds to the correct target
    structure (and carries out the desired function)
    can be considered a successful design
  • There is more than one right answer, unlike in
    prediction!

32
Choosing a backbone fold.
  • The structure dictates the function, and a big
    part of structure is the fold.
  • We still dont really know how to choose the
    best fold.
  • Instead, we just borrow from nature redesign a
    natural protein to do something new.

33
Zinc finger proteins bind DNA.
34
A Zinc ion holds them together.
  • The protein will not fold if zinc is not present.
  • The protein only binds DNA when it is folded.
  • A group at Caltech set out to design a zinc
    finger that doesnt need zinc!

35
1997 The first fully automated protein design!
  • Dahiyat and Mayo. Science 287 82-87 (1997).

36
Designing function.
  • Making a molecule bind is like designing a the
    core we want to make the interface between the
    two pieces complementary.
  • Other functions are a lot trickier and we dont
    have good ways to solve them yet, but were on
    our way.

37
2003 A Duke group designs a set of protein
sensors.
  • Looger, Dwyer, Smith and Hellinga. Nature 423
    185-190 (2003).

38
Protein design is a BIG problem.
  • The zinc finger is one of the smallest protein
    domains about 30 amino acids long.
  • How many different 30 amino acid polypeptides are
    there?
  • Choose from any of 20 amino acids at each
    position.
  • Total sequences 2030 1x1039
  • Mass of earth 6x1027 g
  • Mass of a grain of sand 1x10-3 g
  • A billion earths worth of sand grains
  • Enumeration of possible states is beyond
    impossible must take advantage of need to
    achieve complementary interactions between amino
    acids.

39
Many different structures are possible.
  • An arginine and a glutamate interact.

40
Many different structures are possible.
  • An arginine and a glutamate interact.

41
Many different structures are possible.
  • An arginine and a glutamate interact.

42
Many different structures are possible.
  • An arginine and a glutamate interact in several
    different conformations.

43
Really Big!!!
  • Amino-acid side chains are flexible.
  • But not every shape (conformation) is equal.
  • Each amino acid has a set of preferred
    conformations (rotamers).
  • 1 to 80 per amino acid.
  • Instead of choosing from 20 amino acids we need
    to choose from 400 (at least) amino acid
    rotamers!
  • Total structures 40030 1x1078
  • (approx. number of atoms in the universe!!!!!)

44
Packing side chains a puzzle.
  • How do you solve a jigsaw puzzle?
  • Impossible to try all combinations of piece
    placement
  • Unique ways of placing N pieces on a grid is
    (4N)(N!)
  • For N100, (1.6x1060)(9.3x10157) 1.5x10218
  • Trying each piece one by one is better, but still
    infeasible
  • Number of iterative tries for a N piece puzzle
    is
  • For N100, 1.37x106

45
Packing side chains be smart.
  • How do you solve a jigsaw puzzle?
  • Group pieces by colors and patterns.
  • Iterate over matching of pieces that are
    complementary
  • Shape is important.
  • The pattern must also match.

46
Pattern matching in proteins?
  • What does it mean for two amino acids in the core
    of a protein to match?
  • Must fit close together (but not too close) ?
    Steric complementarity.
  • Neighboring atoms must have complementary charges
    (neutral likes neutral, positive likes negative)
    ? Electrostatic complementarity.

47
Steric fit Lennard-Jones potential.
  • Van der Waals attraction between atoms at
    moderate distances.
  • Repulsion of atoms from one another at short
    distances.
  • If atoms are not nearby, the energy between them
    will be very close to zero.
  • The total score of the goodness of fit in a
    molecule is the sum of the energy for every pair
    of atoms.

48
Electrostatic fit Coulombs Law
  • Atoms in molecules can be thought of as having
    tiny charges on them, even if the total charge on
    a molecule is zero.
  • Coulombs Law describes the energy of how two
    charges interact.
  • The overall electrostatic fit is calculated by
    adding up the energy of all pairs of atoms.
  • Like charges give a positive value.
  • Opposite charges give a negative.
  • Neutral (zero charge) groups dont matter.

49
The total energy describes the fitness of a
structure.
  • Van der Waals Coulombs Law, for every pair of
    atoms, and all added up.
  • Negative energies are favorable, positive
    energies unfavorable.
  • Nature works to MINIMIZE energy.

50
Protein Design as a Discrete Conformational Search
Position 1
Position 2
Position 3
Conformational states of system
51
Tree Pruning with Dead End Elimination
  • Molecular Mechanics energy
  • Van der Waals
  • Coulombic
  • Bond, angle, dihedral

i , j are positions in sequence Ri is rotamer
choice at position i
The Dead-end Elimination Theorem
Given two rotamer choices X, and Y at position I,
if the best energy of X (with any choice of
rotamers at other positions) is worse than the
worst energy of Y, then X can not be part of the
global energy minimum. Need to make the
comparison, but the min and max functions
require evaluating all states.
52
Making DEE feasible.
Ri is used to replace Xi in the first equation.
Last sum is invariant with respect to our choice
at position i, and thus
But note that
This gives a sufficient condition for the DEE
theorem to hold
min and max are evaluated over rotamers at a
single position, so entirely feasible!
53
Improving the bounds for DEE
Original problem statement was to compare the
best structure with Xi at position I to the worst
with Yi it is a easier to satisfy, but still
sufficient, criterion to find the single set of
choices at all other positions with the minimum
difference between choice Xi and Yi.
Again, we use the same trick to bound the minimum
in a feasible manner
This gives a alternate sufficient condition for
the DEE theorem to hold
This is a tighter bound on the true desired
comparison, since
54
DEE as an iterative algorithm
As rotamers are flagged as incompatible with the
global minimum solution, the min/max functions
are evaluated over a smaller and smaller set of
choices, and so additional iterations of the
comparison can eliminate more possibilities.
Thus, the algorithm can be outlined as While
(any rotamers eliminated ) For each position
in the sequence (i) For each rotamer choice
(X) at position i For each rotamer choice
(Y ne X) at position i If DEE criterion
is satisfied Eliminate choice X at
position I
The order of each cycle is NR2, where N is the
number of positions, and R is the number of
choices at each position. However, as R
decreases with iteration, each subsequent cycle
costs less.
55
DEE identifies branches that are incompatible
with the global minimum
Position 1
Position 2
Position 3
Conformational states of system
56
The pruned tree can be much smaller
Position 1
Position 2
Position 3
Conformational states of system
57
DEE can be highly effective
Example of a 5-position design, with 306 choices
per position, done in a simple MATLAB
script Iteration 0 306 306 306 306
306 Iteration 0 Structures 2.682916e12 Iterati
on 1 143 198 145 42 33 Iteration 1 Structures
5.690265e09 from 2.682916e12 Iteration 1
Elapsed time is 96.461053 seconds. Iteration 2
52 83 55 11 8 Iteration 2 Structures 20889440
from 5.690265e09 Iteration 2 Elapsed time is
11.604577 seconds. Iteration 3 4 12 36 6
5 Iteration 3 Structures 51840 from
20889440 Iteration 3 Elapsed time is 0.625266
seconds. Iteration 4 4 12 35 6 5 Iteration 4
Structures 50400 from 51840 Iteration 4
Elapsed time is 0.148598 seconds. Iteration 5 4
12 35 6 5 Iteration 5 Structures 50400 from
50400 Iteration 5 Elapsed time is 0.045631
seconds.
58
DEE Notes and Caveats
Dead-end elimination is not guaranteed not to
eliminate any choices in this case
computational expense is used at zero gain.
However, experience suggests that in the case of
protein design, the algorithm is highly
efficient. For large design problems, even a
highly efficient pruning can leave a tree which
is too large to be searched by enumeration (such
as depth-first search) for example, consider an
original space of 10100 states, reduced to 1020.
An efficient bounding heuristic can be defined
using similar tricks as discussed here this can
be used in the A algorithm to find the global
minimum within the remaining space. The DEE
criterion can be extended to pairs, triples, and
n-tuples of positions. Most applications use
only singles and pairs.
59
Optimization as a tree search.
How to choose a path through a tree, which the
goal of reaching the global minimum state?
First, define an energy to be associated with
each step through the tree
This is the energy of placing rotamer choice R at
position i, given that all positions in the tree
above i have been selected. Note this is similar,
but not identical, to the definition used with
DEE.
At the leaf node, the state energy is then the
sum of all individual path energies
Thus, we wish to find the path with the lowest
total.
60
Efficient search using a heuristic (A)
Challenge is that we do not know the total until
we have traversed the tree to a leaf!
When choosing a path to take at a given node, we
know the path we have already taken, and thus the
true cost. We thus combine this information with
the heuristic in deciding what step to take
At a leaf node, the heuristic is 0, and this
gives the total energy.
However, if we only use the heuristic, how do we
know we get the correct solution?
61
Guaranteed optimality with A
1. Allow backtracking.
At every step, not only choose between the
possible paths from the current node, but also
the paths from all nodes which have been visited
in the past. In this way, if the heuristic turned
out to be poor for a given path (and the true
energy became large), a new path is chosen.
2. Use a heuristic that bounds the true solution.
Consider a heuristic that is guaranteed to be
lower than the true best energy down a path (an
optimistic prediction of the best energy). When a
leaf is reached, a comparison between the true
energy of that leaf and the heuristic energy of
the un-followed choice can be made. Since the
heuristic is an optimistic guess, if the true
energy is lower than the heuristic for all other
choices, it must be the global minimum.
Thus, it is possible to have a guarantee that the
solution found is the true global minimum!
62
Defining the bounding heuristic
Recall our energy definitions
The optimal heuristic is
Now, we bound the second term
63
Overview of A for protein design
Initialize a list of traveled nodes with the
root. While (no mininum leaf in list) Select
minimum f of all paths from nodes in list. Add
this new node to list. If (new node is leaf)
Compare leaf energy to minimum f from list.
If (Leaf energy lt min(f)) Leaf is
global minimum.
Our final heuristic is
The second term involves a minimum over each
choice at following position, which is order
(N-i)R, with an inner minimum over the same,
order (N-j)R. Thus, the cost is less than
(N-j)2R2.
64
A Notes and Caveats
Performance of A is not guaranteed in the
worst case, the entire tree must be enumerated to
find the solution. Again, however, experience
suggests that the algorithm is highly efficient
for protein design. In many cases, we wish not
only to have the global minimum, but all
solutions within a cutoff of the minimum. A
can be adapted to solve this problem as well, but
care must be taken in the first step of pruning
with DEE the elimination criterion must be
modified to prevent elimination of low, but not
minimum, energy states.
65
References
  • Protein design as an inverse problem
  • Pabo. Nature 301 200 (1981).
  • Drexler. PNAS 78 5275-5278 (1981).
  • Examples of successful protein design
  • Dahiyat and Mayo. Science 287 82-87 (1997).
  • Looger, Dwyer, Smith and Hellinga. Nature 423
    185-190 (2003).
  • Development of the Dead-End Elimination
    algorithm
  • Desmet, DeMaeyer, Hazes and Lasters, Nature 356
    539-542 (1992).
  • Goldstein, Biophys. J. 66 1335-1340 (1994).
  • Gordon and Mayo. J. Comput. Chem. 19 1505-1514
    (1998).
  • Formulation of A for protein design
  • Leach and Lemon, Proteins 33 227-239 (1998).
Write a Comment
User Comments (0)
About PowerShow.com