Combinatorial optimisation in protein structure prediction and recognition: Background, review, and - PowerPoint PPT Presentation

1 / 35

About This Presentation

Title:

Combinatorial optimisation in protein structure prediction and recognition: Background, review, and

Description:

1. Combinatorial optimisation in protein ... ( Different hydrophobicity) ... More complicated hydrophobicity (Atkins and Hart (1999) discussed fixed energy ... – PowerPoint PPT presentation

Number of Views:60

Avg rating:3.0/5.0

Slides: 36

Provided by: mathsan

Category:

more less

Transcript and Presenter's Notes

Title: Combinatorial optimisation in protein structure prediction and recognition: Background, review, and

1
Combinatorial optimisation in protein structure
prediction and recognition Background, review,
and research direction

Speaker Vicky Mak

2
Whats in this talk?

What is protein structure prediction and
recognition?
Who has done what before?
Whats interesting and hasnt been done?
Being critical about others work is easy.
Doing something brilliant is difficult.
This talk addresses the easy problem.

3
Combining two Amino acids
Before
After
4
Protein polypeptide chain
N-terminal
C-terminal
A polypeptide chain chain of amino acids linked
together by peptide bonds. Each amino acid is
the same except for the residues. There are 20
such amino acids. Different combinations of these
20 amino acids make different proteins. A
protein sequence can contain from tens to
thousands of amino acids.
5
An example
Primary structure individual amino acids.
Quaternary structure greenblue chains.
Secondary structure ?-helix and ?-sheet.
The green chain defines a tertiary structure. So
is the blue chain.
6
Motivation

Notice It is the 3-D structures of the proteins
that are important (2 different sequences can
have exactly the same structure!)
Need to know the shape of a protein, so as to
develop antibodies that bind that shape - Fold
prediction.
Antibodies produced against one protein may also
work for another protein that looks similar -
Structure recognition.

7
Structure prediction
8
HP models (Ab initio prediction )

Given a sequence of amino acids, determine the
structure from scratch.
Hydrophobic-hydrophilic (HP) model
proposed by Dill (1985)
Two groups of amino acids
Hydrophobic acids (H)
Hydrophilic acids (P)
Self avoiding walks on lattices
Objective minimise global free energy
Meaning, its good to put as many Hydrophobic
acids as close together as possible.

9
HP model on latticesa 2-dimensional example
Hydrophobic acids
Hydrophilic acids
10
HP model on latticesa 2-dimensional example
Hydrophobic acids
Hydrophilic acids
Fold with 5 hydrophobic contacts
11
Previous work on HP models

Most previous work involves complete enumeration
of self-avoiding random walks on various lattices
(e.g. Lau and Dill (1989), Irback and Troein
(2002))
Irback and Troein (2002) managed sequences with
up to 25 amino acids
Unger and Moult (1993) - hybrid Genetic Algorithm
and Simulated Annealing (2-D)
size 20-64. Opt for size 36,48,60 (Opt ?! How do
they know?)
Shakhovich et al. (1991) tried SA on 30 27-acid
problems. (Only 1 found global minimum.
Inappropriate local search is to blame.)
Backofen (2001) constraint programming approach
tested problems of size 27-36, time 20min -
1hr38min (opt)
IP models proposed recently in Greenberg, Hart
and Lancia (2002). No numerical results reported
as yet.
(See pages 1-4 of pdf file)

12
Problems with IP models

Dealing with symmetry
Methods are suggested in Greenberg, Hart and
Lancia (2002) and in Beckofens PhD thesis.
What about other lattices?
Number of lattice points unnecessarily large.
Lau and Dill (1989) proposed maximal compact
chain conformations Lattice walks in which every
point is occupied by exactly one amino acid.
E.g. 3x3x3 cubic lattice for a 27-amino
acid-chain
May be not that tight, but definitely not n2.
May be a union of some of those maximal compact
chain conformations.

13
Lets be critical

Cubic lattices probably not good enough. But its
a good start anyway.
Faulon, Rintoul and Young (2002) tried 2-D
honeycomb, 2-D square, 3-D diamond and 3-D cubic
lattices. Agarwala et al. triangular lattice
(Constrained SAW, no optimisation involved).
Use energy matrix rather than simple unit credit
for each HH interaction? (Different
hydrophobicity)
Energy released by putting different pairs of
H-acids together are different, and are
depending on how far they are apart in sequence!
Dills HP model is too simplified.
Besides, interactions between H-acids should be
defined differently to the Domain and
Neighbourhood.

14
Under old definitions, suppose are hydrophobic
acids,
are all the same.
15
look better than
But surely
16
Research opportunities

Exact algorithms
Alternative ILP formulations (with tight LP
relaxation bounds)
Difference in lattice neighbourhood and
hydrophobic interaction neighbourhood (use
Euclidean distance for the latter).
Development of solution methodologies
Modify Dills model to deal with reality
Alternative lattices (apply optimisation
techniques as supposed to complete or simple
constrained numeration).
More complicated hydrophobicity (Atkins and Hart
(1999) discussed fixed energy matrix and proved
NP-hardness).
Previous methods either constraints programming
or integer linear programming. Why not a hybrid
CP and ILP approach?

17
Research opportunities

No methods so far can manage a sequence with gt100
amino acids
Heuristics
Meta-heuristics still room for research, try
different neighbourhood scheme
Tailor-made search techniques that considers
folding patterns
Development of problem-specific heuristic or
greedy heuristic
At least that will provide quick initial bounds
for exact methods.

18
Structure recognition
19

Sequence alignment
Comparing a sequence of amino acids with known
sequences in Protein Data Bank on the primary
structure level.
Does this sequence look alike that sequence?
Methods well developed e.g. BLAST.
Fold recognition
Comparing the structure of an unknown protein
with known protein structures in PDB.
Contact Map Optimisation (primary-structure
comparisons)
Arthur Lesks model (secondary-structure
comparisons)
Ip et al.s model (secondary-structure
comparisons)

20
Contact Map Optimisation

Comparing 3-D structures of two sequences of
amino acids, e.g. s(s1..sm) and t(t1..tn).
(Assuming you already know how each of them look
like, and you now want to know how much they look
alike each other.)
Construct an undirected graph for each of s and
t, amino-acids as vertices.
For each sequence, two amino acids that are
within a certain Euclidean distance from each
other are connected by an edge.

21
Contact Map Optimisation
s
s1
s2
sm
tn
t1
t2
t
22
Contact Map Optimisation
One way of mapping. 4 pairs of edges mapped.
23
Contact Map Optimisation
Another way of mapping. 5 edges mapped.
24
Wait a minute...

Remember from the HP models, amino acids are
divided into two groups. What is the point of
mapping a hydrophobic amino acid in one graph to
a hydrophilic amino acid in another or vice
versa???
Adding constraints that only amino acids of the
same group are supposed to be matched might be
helpful!!!

25
Who has done what?

No one noticed the HP issue so models arent 100
cool.
Lancia et al. (2001) ILP model (see pages 5-6 of
pdf file)
LP-relaxation of no-crossing constraints
typically weak, hence clique constraints
(exponentially many) are introduced.
Problem can be converted to a max independent
problem, for which cliques inequalities are
facet-defining.
O(n2) time separation for cliques.
Root-node LP relaxation (from 1min to 2hours for
62-74 acids and 80-140 contacts. The more alike
of the two proteins the faster LP relaxation can
be solved!)

26
Who has done what?

Heuristic approaches
Lancia et al. (2001)
Genetic algorithm (GA)
Steepest ascent local search
Results of Lancia et al.
Exact algorithm
Gaps 0-gt5 (Mostly gt5 exactly how much??)
Heuristics
Same story as above. GA much better than LS.
Work on similar topics can also be found in Havel
et al. (1979), Martin et al. (1992) and so on.

27
Lets be critical...

Even just the LP relaxation of the IP formulation
without no-crossing constraints takes a long time
to solve for comparing pairs of real protein
sequences with 100-200 amino acids.
Tried comparing two sequences with 120 amino
acids, took more than 10 hours!!!
Really should consider the HP issue, and may be
even aggregating certain amino acids!

28
Lets be critical...

A big problem with model - a 3-D example

Consider the following sequence
1 2 3 4 5 6 7
2
3
3
1
1
2
4
7
7
4
5
5
6
6
Two different structures giving the same
objective value by the ILP formulation of Lancia
et al. assuming acids within e-distance of 31/3
are connected by an edge.
29
Research opportunities

Exact methods
New ILP formulation.
Alternative solution methodologies for solving
the ILPs - now that we know the ILP models are
huge and solving them is hard.
Heuristics
Problem specific heuristic.
Different neighbourhood search for
meta-heuristics.

30
Arthur Lesks model

Compare structures of two protein sequences by
inspecting relations between secondary structures

Does the blue protein look like the green protein?
31
(No Transcript)
32
(No Transcript)
33
Protein sequence 1
Protein sequence 2
34
Similar to CMO...
D
C
B
?1
?1
?2
?2
?3
?4
?1
?1
35
Useful papers and websites

Greenberg, H.J., Hart, W.E., Lancia, G.
Opportunities for Combinatorial Optimization in
Computational Biology
http//www.dkfz-heidelberg.de/tbi/bioinfo/ProteinS
tructure/
Christian Lemmen and Thomas Lengauer.
Computational methods for the structural
alignment of molecules, Journal of
Computer-Aided Molecular Design, 14 215- 232,
2000.