Combinatorial optimisation in protein structure prediction and recognition: Background, review, and - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Combinatorial optimisation in protein structure prediction and recognition: Background, review, and

Description:

1. Combinatorial optimisation in protein ... ( Different hydrophobicity) ... More complicated hydrophobicity (Atkins and Hart (1999) discussed fixed energy ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 36
Provided by: mathsan
Category:

less

Transcript and Presenter's Notes

Title: Combinatorial optimisation in protein structure prediction and recognition: Background, review, and


1
Combinatorial optimisation in protein structure
prediction and recognition Background, review,
and research direction
  • Speaker Vicky Mak

2
Whats in this talk?
  • What is protein structure prediction and
    recognition?
  • Who has done what before?
  • Whats interesting and hasnt been done?
  • Being critical about others work is easy.
  • Doing something brilliant is difficult.
  • This talk addresses the easy problem.

3
Combining two Amino acids
Before
After
4
Protein polypeptide chain
N-terminal
C-terminal
A polypeptide chain chain of amino acids linked
together by peptide bonds. Each amino acid is
the same except for the residues. There are 20
such amino acids. Different combinations of these
20 amino acids make different proteins. A
protein sequence can contain from tens to
thousands of amino acids.
5
An example
Primary structure individual amino acids.
Quaternary structure greenblue chains.
Secondary structure ?-helix and ?-sheet.
The green chain defines a tertiary structure. So
is the blue chain.
6
Motivation
  • Notice It is the 3-D structures of the proteins
    that are important (2 different sequences can
    have exactly the same structure!)
  • Need to know the shape of a protein, so as to
    develop antibodies that bind that shape - Fold
    prediction.
  • Antibodies produced against one protein may also
    work for another protein that looks similar -
    Structure recognition.

7
Structure prediction
8
HP models (Ab initio prediction )
  • Given a sequence of amino acids, determine the
    structure from scratch.
  • Hydrophobic-hydrophilic (HP) model
  • proposed by Dill (1985)
  • Two groups of amino acids
  • Hydrophobic acids (H)
  • Hydrophilic acids (P)
  • Self avoiding walks on lattices
  • Objective minimise global free energy
  • Meaning, its good to put as many Hydrophobic
    acids as close together as possible.

9
HP model on latticesa 2-dimensional example
Hydrophobic acids
Hydrophilic acids
10
HP model on latticesa 2-dimensional example
Hydrophobic acids
Hydrophilic acids
Fold with 5 hydrophobic contacts
11
Previous work on HP models
  • Most previous work involves complete enumeration
    of self-avoiding random walks on various lattices
    (e.g. Lau and Dill (1989), Irback and Troein
    (2002))
  • Irback and Troein (2002) managed sequences with
    up to 25 amino acids
  • Unger and Moult (1993) - hybrid Genetic Algorithm
    and Simulated Annealing (2-D)
  • size 20-64. Opt for size 36,48,60 (Opt ?! How do
    they know?)
  • Shakhovich et al. (1991) tried SA on 30 27-acid
    problems. (Only 1 found global minimum.
    Inappropriate local search is to blame.)
  • Backofen (2001) constraint programming approach
  • tested problems of size 27-36, time 20min -
    1hr38min (opt)
  • IP models proposed recently in Greenberg, Hart
    and Lancia (2002). No numerical results reported
    as yet.
  • (See pages 1-4 of pdf file)

12
Problems with IP models
  • Dealing with symmetry
  • Methods are suggested in Greenberg, Hart and
    Lancia (2002) and in Beckofens PhD thesis.
  • What about other lattices?
  • Number of lattice points unnecessarily large.
  • Lau and Dill (1989) proposed maximal compact
    chain conformations Lattice walks in which every
    point is occupied by exactly one amino acid.
  • E.g. 3x3x3 cubic lattice for a 27-amino
    acid-chain
  • May be not that tight, but definitely not n2.
  • May be a union of some of those maximal compact
    chain conformations.

13
Lets be critical
  • Cubic lattices probably not good enough. But its
    a good start anyway.
  • Faulon, Rintoul and Young (2002) tried 2-D
    honeycomb, 2-D square, 3-D diamond and 3-D cubic
    lattices. Agarwala et al. triangular lattice
    (Constrained SAW, no optimisation involved).
  • Use energy matrix rather than simple unit credit
    for each HH interaction? (Different
    hydrophobicity)
  • Energy released by putting different pairs of
    H-acids together are different, and are
    depending on how far they are apart in sequence!
  • Dills HP model is too simplified.
  • Besides, interactions between H-acids should be
    defined differently to the Domain and
    Neighbourhood.

14
Under old definitions, suppose are hydrophobic
acids,
are all the same.
15
look better than
But surely
16
Research opportunities
  • Exact algorithms
  • Alternative ILP formulations (with tight LP
    relaxation bounds)
  • Difference in lattice neighbourhood and
    hydrophobic interaction neighbourhood (use
    Euclidean distance for the latter).
  • Development of solution methodologies
  • Modify Dills model to deal with reality
  • Alternative lattices (apply optimisation
    techniques as supposed to complete or simple
    constrained numeration).
  • More complicated hydrophobicity (Atkins and Hart
    (1999) discussed fixed energy matrix and proved
    NP-hardness).
  • Previous methods either constraints programming
    or integer linear programming. Why not a hybrid
    CP and ILP approach?

17
Research opportunities
  • No methods so far can manage a sequence with gt100
    amino acids
  • Heuristics
  • Meta-heuristics still room for research, try
    different neighbourhood scheme
  • Tailor-made search techniques that considers
    folding patterns
  • Development of problem-specific heuristic or
    greedy heuristic
  • At least that will provide quick initial bounds
    for exact methods.

18
Structure recognition
19
  • Sequence alignment
  • Comparing a sequence of amino acids with known
    sequences in Protein Data Bank on the primary
    structure level.
  • Does this sequence look alike that sequence?
  • Methods well developed e.g. BLAST.
  • Fold recognition
  • Comparing the structure of an unknown protein
    with known protein structures in PDB.
  • Contact Map Optimisation (primary-structure
    comparisons)
  • Arthur Lesks model (secondary-structure
    comparisons)
  • Ip et al.s model (secondary-structure
    comparisons)

20
Contact Map Optimisation
  • Comparing 3-D structures of two sequences of
    amino acids, e.g. s(s1..sm) and t(t1..tn).
    (Assuming you already know how each of them look
    like, and you now want to know how much they look
    alike each other.)
  • Construct an undirected graph for each of s and
    t, amino-acids as vertices.
  • For each sequence, two amino acids that are
    within a certain Euclidean distance from each
    other are connected by an edge.

21
Contact Map Optimisation
s
s1
s2
sm
tn
t1
t2
t
22
Contact Map Optimisation
One way of mapping. 4 pairs of edges mapped.
23
Contact Map Optimisation
Another way of mapping. 5 edges mapped.
24
Wait a minute...
  • Remember from the HP models, amino acids are
    divided into two groups. What is the point of
    mapping a hydrophobic amino acid in one graph to
    a hydrophilic amino acid in another or vice
    versa???
  • Adding constraints that only amino acids of the
    same group are supposed to be matched might be
    helpful!!!

25
Who has done what?
  • No one noticed the HP issue so models arent 100
    cool.
  • Lancia et al. (2001) ILP model (see pages 5-6 of
    pdf file)
  • LP-relaxation of no-crossing constraints
    typically weak, hence clique constraints
    (exponentially many) are introduced.
  • Problem can be converted to a max independent
    problem, for which cliques inequalities are
    facet-defining.
  • O(n2) time separation for cliques.
  • Root-node LP relaxation (from 1min to 2hours for
    62-74 acids and 80-140 contacts. The more alike
    of the two proteins the faster LP relaxation can
    be solved!)

26
Who has done what?
  • Heuristic approaches
  • Lancia et al. (2001)
  • Genetic algorithm (GA)
  • Steepest ascent local search
  • Results of Lancia et al.
  • Exact algorithm
  • Gaps 0-gt5 (Mostly gt5 exactly how much??)
  • Heuristics
  • Same story as above. GA much better than LS.
  • Work on similar topics can also be found in Havel
    et al. (1979), Martin et al. (1992) and so on.

27
Lets be critical...
  • Even just the LP relaxation of the IP formulation
    without no-crossing constraints takes a long time
    to solve for comparing pairs of real protein
    sequences with 100-200 amino acids.
  • Tried comparing two sequences with 120 amino
    acids, took more than 10 hours!!!
  • Really should consider the HP issue, and may be
    even aggregating certain amino acids!

28
Lets be critical...
  • A big problem with model - a 3-D example

Consider the following sequence
1 2 3 4 5 6 7
2
3
3
1
1
2
4
7
7
4
5
5
6
6
Two different structures giving the same
objective value by the ILP formulation of Lancia
et al. assuming acids within e-distance of 31/3
are connected by an edge.
29
Research opportunities
  • Exact methods
  • New ILP formulation.
  • Alternative solution methodologies for solving
    the ILPs - now that we know the ILP models are
    huge and solving them is hard.
  • Heuristics
  • Problem specific heuristic.
  • Different neighbourhood search for
    meta-heuristics.

30
Arthur Lesks model
  • Compare structures of two protein sequences by
    inspecting relations between secondary structures

Does the blue protein look like the green protein?
31
(No Transcript)
32
(No Transcript)
33
Protein sequence 1
Protein sequence 2
34
Similar to CMO...
D
C
B
?1
?1
?2
?2
?3
?4
?1
?1
35
Useful papers and websites
  • Greenberg, H.J., Hart, W.E., Lancia, G.
    Opportunities for Combinatorial Optimization in
    Computational Biology
  • http//www.dkfz-heidelberg.de/tbi/bioinfo/ProteinS
    tructure/
  • Christian Lemmen and Thomas Lengauer.
    Computational methods for the structural
    alignment of molecules, Journal of
    Computer-Aided Molecular Design, 14 215- 232,
    2000.
Write a Comment
User Comments (0)
About PowerShow.com