Peter Smooker, Heiko Schrder, Margaret Hamilton, Aditya, Mannan, Sundara, Saravanan, Rajalingam Arav

About This Presentation

Title:

Peter Smooker, Heiko Schrder, Margaret Hamilton, Aditya, Mannan, Sundara, Saravanan, Rajalingam Arav

Description:

Modify Smith-Waterman: What should be the penalty for gaps (do gaps make any sense? ... Smith-Waterman Algorithm. Align S1=ATCTCGTATGATG S2=GTCTATCAC =1, =1. A ... – PowerPoint PPT presentation

Number of Views:86

Avg rating:3.0/5.0

Slides: 39

Provided by: goannaCs

Category:

more less

Transcript and Presenter's Notes

Title: Peter Smooker, Heiko Schrder, Margaret Hamilton, Aditya, Mannan, Sundara, Saravanan, Rajalingam Arav

1
A new approach to protein structure prediction
?
?
?
?
?
?

Peter Smooker, Heiko Schröder, Margaret Hamilton,
Aditya, Mannan, Sundara, Saravanan, Rajalingam
Aravinthan,Gad Abraham, Abdullah Al Amin,
Nalinda, Prashant

Whats on today?
Predicting protein structures
Fast implementation
Special purpose HPC
Searching for structural similarity
Visualisation of proteins
Lots of speculation, some results!

Aim Prediction of protein structures
Common methods
Homology modelling gt 30 match ? similar fold
Molecular modelling only for small molecules
Crystallography very expensive, very slow and not
always possible.
Only few structures are known and we are falling
behind (lt1).
Major efforts are being made e.g. Blue-Gene
(fastest supercomputer (IBM))
Linear time method?

4
Motivation

Genetic sequence databases are growing
exponentially (maybe not?)
Growth rate will continue, since multiple
concurrent genome projects have begun, with more
to come

5
Full Genome Comparison

related Organisms, but Tuberculosis causes a
disease ? find common and different parts
16?106 pair-wise sequence comparisons
More clever ways? I guess!
Many Genome-Genome Comparisons will be required
in the near future

6
Homology Modeling

Discovered sequences are analyzed by comparison
with databases
Complexity of sequence comparison is proportional
to the product of query size times database size
? Analysis too slow on sequential computers
Two possible approaches
Heuristics, e.g. BLAST,FastA, but the more
efficient the heuristics, the worse the quality
of the results
Parallel Processing, get high-quality results in
reasonable time

7
Protein Sequence Alignment

BLAST, FastA, Smith-Waterman

Smith- Waterman
TO(S)
FastA
BLAST
8
Smith-Waterman Algorithm
Align S1ATCTCGTATGATG S2GTCTATCAC
0
0
0
0
0
0
2
1
0
0
2
1
0
2
2
3
4
?1, ?1
5
7
9
8
10
9
(No Transcript)
10
Context sensitivity!
11

Protein folding
Our approach
Linear method we do not compute electromagnetic
fields nature has done it for us!
Physical forces have short range (decreasing
quadratic with the distance)
? context sensitivity Find the same protein with
the same context in the database copy that
structure.

12
Dihedral Angles

The 6 atoms in each peptide unit lie in the same
plane
-- f and ? are free to rotate

The structure of a protein is almost totally
determined, if all angles f and ? are known

13
Abdullah Al Amin
?
f
14
Abdullah Al Amin
f
15
Abdullah Al Amin
?
16
Abdullah Al Amin
17
Which f ?
Abdullah Al Amin
S val-val-xxx
18
Abdullah Al Amin
19
Abdullah Al Amin
val-val-ala
f ? ? same AA f ? ? neighbour
20
Abdullah Al Amin
21
Complexity Reducing the size of search
space Reducing the number of peaks. 2x size
of search space 2X-Y assuming we have predicted Y
angles with high confidence Our aim Large Y
(YX is not possible) Method Increase the
context Problem Longer the context ? fewer
matches Example 20k different sequences of
length k. Ek PDB/20k. k3, E3 1000. k5, E5
3. k9, E9 1/50000.
22
Which context??

I I O ALA LYS SER O O I (E20)
? reduce number of peaks
Different lists for different groups of proteins?
(inside cells, outside cells), Saravanan
? reduce number of peaks
Short and perfect ? to longer and less perfect?
Rajalingam Aravinthan, Gad Abraham
? reduce number of peaks
Reduce the size of the search space!

23
Rajalingam AravinthanGad Abraham
24
Prediction based on length 3
25
Abdullah Al Amin
?
f
26
Why 9?
27
Suffix trie and suffix tree fast search!
Suffix trie for abcacbcabacb (all suffixes up to
length 4).
Find all strings that are similar to aacb
(tolerance 1).
Breadth first search!
Prashant
0
c
a
b
1
1
0
a
c
b
c
b
a
1
1
1
1
c
a
c
b
b
c
1
1
1
b
a
b
1
1
28
Parallel Architectures for Bioinformatics

Embedded Massively Parallel Accelerators

29
Parallel Architectures for Bioinformatics

Supercomputer performance at low cost
combines SIMD and MIMD paradigm within a parallel
architecture ? Hybrid Computer

30
Speculation Finding similar structures based on
sequences of fs and ?s. We could search for a
structure that has a high degree of similarity
with a predicted structure (instead of similarity
of the sequence particularly in hydrophobic
parts). Modify Smith-Waterman What should be
the penalty for gaps (do gaps make any sense?)
how do we treat confidence information?
31
Smith-Waterman Algorithm
Align S1ATCTCGTATGATG S2GTCTATCAC
?1, ?1
1
32
Nalinda
degrees difference
1500 Score
------------------------- - 10 50
( ai aj x 0.9)2
? ? -10
33
Nalinda
34
Look ahead
35

Visualisation tool
Sequence of dihedral angles
Structure of protein
Visualise structure
Indicate confidence
Translate change of dihedral angle into change
of 3D-structure
Emphasise physical collisions
Show positions for potential S-S bonds and
hydrogen bonds
Show fields?

Speculation
Simulation of the folding process
Predict the structure of the following
hydrophobic subsequence needs to be tested
whether hydrophobicity is highly correlated with
being inside a protein.
Mark all positions of cysteines
Mark all positions of potential hydrogen bonds
Simulate the bending process
Look for similar structures up to here similar
Compare structures of identical O/I sequences
Compare surfaces (cut protein at a hydrophil
position and look at the set of exposed
hydrophobic amino acids)
Develop an algorithm to determine structural
similarity, either based on dihedral angles or on
Euclidian positions using dynamic programming.
With such an algorithm similar surroundings can
be found.
Do new parts deform old parts significantly?