Sidechain Placement and Protein Design - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Sidechain Placement and Protein Design

Description:

Ribose. YPVDLKLVVKQ binding protein. Modify sequence TNT. to change structure binding ... Side chain angles are defined moving outward from the backbone, starting ... – PowerPoint PPT presentation

Number of Views:97
Avg rating:3.0/5.0
Slides: 41
Provided by: Leave
Category:

less

Transcript and Presenter's Notes

Title: Sidechain Placement and Protein Design


1
Sidechain Placement and Protein Design
  • GCMB07, 2 May

2
Protein design
  • Sequence ? Structure ? FunctionKDTIALVVST
    Ribose YPVDLKLVVKQ binding protein
  • Modify sequence TNTto change structure
    bindingand function Looger03
  • or behavior Ambroggio06 folding order

3
Protein Design or Redesign
  • Create an amino acid sequence that folds to a
    stable protein and performs a desired function
  • Avoid
  • Sampling all sequences
  • Solving protein folding
  • Relying on molecular dynamics
  • A successful design strategy build on an
    existing structure
  • Scaffold backbone from a known folded structure
  • Redesign 20 residues
  • Find side chains that fit

4
Outline
  • Sidechain Rotamers Rotamer Libraries
  • Algorithms for Sidechain Placement
  • Brute Force
  • Dead End Elimination
  • Simulated Annealing
  • Stochastic Mean Field
  • Dynamic Programming
  • A Biased View of Protein Structure Design
  • How is design done?
  • Why is it successful?

5
Protein Structure
  • Chemical
  • 1-Dimensional Sequence of amino acids
  • Two components for each amino acid
  • Backbone (NCaCO)
  • Side chain (residue)
  • Placed residue a position in an amino acid
    sequence

S
OH
N
N
H2N
MSS
MSW
O
O
6
Side chain geometry
  • Conformation flexibility from dihedral angles
  • Side chain internal geometry
  • Bond angles and bond lengths fixed
  • Dihedrals c1, c2, may rotate
  • Rotamers rotational isomers
  • Side chains have preferred conformations
  • Prefer dihedrals around 60o, 180o and -60o
  • Rotamer Library set of dihedral angles
    Ponder87, Dunbrack93, Lovel2000

7
Side chain conformation
side chains differ in size ( of atoms) and
degrees of freedom ( of c angles)
N
N
?2
?1
8
Serine c1 distribution
a chosen combination of side chain torsion
angles c1, c2, etc. for a residue is known
as a rotamer.
9
Side chain conformations--canonical staggered
forms
Newman projections for c1 of glutamate
glutamate
ttrans, ggauche
name of conformation
Side chain angles are defined moving outward from
the backbone, starting with the N atom so the c1
angle is NCaCbCg, the c2 angle is CaCbCg Cd
...
IUPAC nomenclature http//www.chem.qmw.ac.uk/iupa
c/misc/biop.html
10
Backbone independent rotamer library
  • Dunbrack Cohen, 1997

11
What do rotamer libraries provide? J. Meiler07
  • Rotamer libraries significantly reduce the number
    of conformations that need to be evaluated during
    the search.
  • This is done with almost no risk of missing the
    real conformations.
  • Even small libraries of about 100-150 rotamers
    cover about 96-97 of the conformations actually
    found in protein structures.
  • The probabilities of each rotamer in the library
    provide estimates of the potential energy due to
    interactions within the side chain and with the
    local backbone atoms, using the Boltzmann
    distribution E ? ln(P)

12
Side chain geometry
  • Conformation flexibility from dihedral angles
  • Side chain internal geometry
  • Bond angles and bond lengths fixed
  • Dihedrals c1, c2, may rotate
  • Rotamers rotational isomers
  • Side chains have preferred conformations
  • Prefer dihedrals around 60o, 180o and -60o
  • Rotamer Library set of dihedral angles
    Ponder87, Dunbrack93, Lovel2000
  • http//dunbrack.fccc.edu/bbdep/bbdepdownload.php
    (Backbone dependent and independent libraries)
  • http//kinemage.biochem.duke.edu/databases/rotamer
    .html (Backbone independent library)

13
Rotemers in crystallographic refinement
Fit structure to electron density from x-ray
diffraction
  • Red indicate clashes w/ added hydrogen atoms

better choice of side chain
14
Outline
  • Sidechain Rotamers Rotamer Libraries
  • Algorithms for Sidechain Placement
  • Brute Force Search
  • Dead End Elimination
  • Simulated Annealing
  • Stochastic Mean Field
  • Dynamic Programming

15
Side Chain Placement Problem
  • Given
  • A fixed protein backbone
  • A set of fixed (background) residues
  • A set of changing (molten) residues
  • A list of allowed amino acids for each molten
    residue
  • A rotamer library
  • A pairwise decomposable energy function
  • Find the assignment of rotamers to the molten
    residues, S, that minimizes the energy function

Kinemage rotamers for Ubiquitin surface residues
16
Energy Functions
  • f Protein Structure ? ?
  • Lennard-Jones
  • van der Waals attractive energies
  • atom overlap repulsive overlap
  • Electrostatics
  • Solvent Effects
  • Hydrogen bonds
  • Often pairwise decomposable
  • sum of atom-pair or rotamer-pair interaction
    energies

17
Side Chain Placement Problem
  • Find the assignment of rotamers to the molten
    residues, S, that minimizes the energy function
  • Functions stated in terms of rotamer energies
  • rotamer / background energy
  • rotamer pair energies

Esingle
Epair
18
Side Chain Placement Problem
  • NP-Complete
  • Reduction from SAT Pierce2002
  • Techniques
  • Optimality Guarantee
  • Dead-End Elimination Desmet92, Goldstein94,
    Looger2001
  • Integer Linear Programming Erickson2001
  • Branch and Bound Gordon99, Canutescu2003
  • Dynamic Programming Leaver-Fay2005
  • No Optimality Guarantee
  • Genetic Algorithms Jones94
  • Simulated Annealing Holm92,Hellinga94,Kuhlman03
  • Self-Consistent Mean Field Koehl96

19
Dead End Elimination (DEE)
  • Reduce the search space without losing the Global
    Minimum Energy Conformation (GMEC).
  • Eliminates rotamers which cannot be in the GMEC,
    using more accurate (and more computationally
    expensive) upper and lower bounds.
  • Uses brute force search on rotamers remaining.
  • Typically assumes that the scoring function can
    be expressed as a sum of pair-wise interactions

20
A first, simple condition for elimination
  • A rotamer can be eliminated for a residue when
    the minimum (best) energy it obtains by
    interaction with other rotamers is still
    higher (worse) than the maximum energy of some
    other rotamer

21
The Goldstein improvement
  • A rotamer can be safely eliminated when there
    exists a rotamer that has lower (better) energy
    for each given environment.
  • This criteria is more powerful, and typically
    requires though more computational time.

22
Even more powerful criteria can be obtained with
even more computation
  • A rotamer can be safely eliminated when, for each
    environment, there exists some rotamer that has
    lower (better) energy.

23
Dynamic Programming via an Interaction Graph
  • Surface residues on Ubiquitins b-sheet

Interaction Graph defined by Rosettas
energy function
24
Interaction Graph
  • G V, E, a multi-hypergraph
  • vertices ? molten residues v
  • state space ? rotamers for a residue S(v)
  • edge ? possibility of residue interaction e ?V
  • scoring function ? interaction energy fe ?S(v)
    ? ?

v?e
Hypergraph
Graph
25
Interaction Graph Evaluation (Pairwise case)
  • For G V, E, min
  • Each vertex, v, has a function to capture
    interactions with the background fv S(v) ? R
  • Each pair of interacting vertices, u, v,
    defines an edge with a function to capture pair
    interactions fu,v S(u) x S(v) ? R
  • Given an interaction graph, GV,E, find the
    state assignment S that minimizes Sw?V?E fw

26
Bottom Up Dynamic Programming
  • Eliminate node v
  • Let Ev be the edges incident upon v
  • Let Nv be the neighbors of v
  • For each edge e ? Ev with scoring function fe,
    let fe,vs be edge e s scoring function with
    vertex v fixed in state s
  • Create a new hyperedge incident upon Nv.
  • Compute fNv min s ? S(v) ? e ? Ev fe,vs
  • Remove v from graph

27
Scoring Function Representation Tables
u
Edge e u,v
S(v)
S(u)
v
f
g
h
i
j
a
b
c
d
e
28
Scoring Function Representation Tables
w
Edge e u,v,w
v
u
29
Experiments and Results
  • Rotamer Relaxation Task
  • Sequence fixed choose new rotamers for each
    residue
  • Redesign Task
  • Search of conformation and sequence spaces.
  • Ubiquitins 15 surface residues
  • Large rotamer library
  • Relaxation, 32 states per vertex, tw-4
    interaction graph
  • Redesign, 680 states per vertex, tw-3 interaction
    graph (drop one edge)

Running Time Memory
Relaxation 200 ms (small)
Redesign 15.99 hrs 3.7 GB
30
Dynamic Programming for Hydrogen Placement
  • Dynamic programming (DP) limited by treewidth of
    graph instances
  • Treewidths from graphs in protein design too
    large for DP to be practical
  • Adding hydrogen atoms to PDB
  • Hydrogen placement via combinatorial
    optimization REDUCE Word99
  • Non-pairwise decomposable energy function
  • Previously used brute force
  • Replaced with dynamic programming
  • Interaction graphs have low treewidth
  • Effective in practice minutes to ms.
  • REDUCE v3.02 in Molprobity suite, and distributed
    from http//kinemage.biochem.duke.edu/software/red
    uce.php

H
O
31
Simulated Annealing
  • Stochastic optimization technique
  • Monte Carlo
  • Make a random change, determine ?E
  • Metropolis criterion Metropolis57
  • accept with probability
  • Gradually lower temperature T
  • In Side Chain Placement
  • Assign each residue a rotamer
  • Repeat
  • Select a random residue, and a random alternate
    rotamer
  • Find ?E induced by substituting the alternate
    rotamer
  • Accept/Reject substitution according to
    Metropolis criterion

32
Self-consistent mean field
  • I planned to cull a description from Patrices
    BioEbook sections
  • http//nook.cs.ucdavis.edu8080/koehl/BioEbook/de
    sign_scmf.html
  • http//nook.cs.ucdavis.edu8080/koehl/BioEbook/sc
    mf.html
  • but didnt have time in class.

33
The practical problem of side chain modeling M07
  • The way we deal today with the problem of protein
    structure prediction is very different from the
    way nature deals with it.
  • Due to technical issues such as computation time
    we are usually forced to accept a fixed backbone
    and only then put the side chains on it.
  • The quality of the side chain modeling is
    therefore heavily dependent on the position of
    the backbone. If the initial backbone
    conformation is wrong, the side chain modeling
    quality will be accordingly bad.
  • What is really needed is a combined algorithm
    that optimizes backbone conformation
    simultaneously with side chain modeling.

34
Protein Design or Redesign
  • Create an amino acid sequence that folds to a
    stable protein and performs a desired function
  • Avoid
  • Sampling all sequences
  • Solving protein folding
  • Relying on molecular dynamics
  • A successful design strategy build on an
    existing structure
  • Scaffold backbone from a known folded structure
  • Redesign 20 residues
  • Find side chains that fit

35
Why Design Proteins?
  • Nature uses proteins
  • to signal events
  • to catalyze reactions
  • to move cells (motors)
  • to bear weight (I-beams)
  • Design is an experiment to help understand
    folding/binding
  • Industrial biosynthesis
  • Proteins are both efficient and specific
  • Cure disease
  • Antibodies
  • Inhibition peptides as drugs
  • Perturb cell signaling pathways

36
Why do RosettaDesign, Dezymer, work?
  • Geometric approximations (3d jigsaw puzzles) are
    surprisingly effective in design.
  • They mine PDB structures for behaviors of native
    proteins and fragments.
  • They precompute energies for pairwise
    interactions.
  • They use many fast computers to allow detailed
    sampling of discrete conformations.
  • Fast optimization algorithms
  • Competition

37
How do RosettaDesign, Dezymer, fail?
  • Computationally difficult to achieve good packing
    and hydrogen bond satisfaction in protein core
  • Scores for packing, solvation and hydrogen bond
    satisfaction cannot be pairwise additive.
  • Scores often used as filters wed prefer to
    optimize.
  • Stability of designed proteins
  • Multistate or negative design

38
Protein Stability
  • A naturally occurring protein adopts a compact
    geometry when placed in water
  • Stability is difference in free energies of the
    folded and unfolded states

39
Protein Stability
  • A naturally occurring protein adopts a compact
    geometry when placed in water
  • Different proteins have different free energies
    in their unfolded states

40
Challenges in Protein Design
  • Side chain placement is hard
  • The complexities of individual instances of SCPP
    are related to the treewidth of their interaction
    graphs.
  • Tight, collision-free packing is often impossible
    on the input scaffold
  • The interaction graph to allow simultaneous
    optimization of side chain and backbone
    structures
  • Protein stability is not well captured by
    pairwise decomposable energy functions
  • The interaction graph supports using non-pairwise
    decomposable energy functions during side chain
    placement
Write a Comment
User Comments (0)
About PowerShow.com