Peter Smooker, Heiko Schrder, Margaret Hamilton, Aditya, Mannan, Sundara, Saravanan, Rajalingam Arav - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Peter Smooker, Heiko Schrder, Margaret Hamilton, Aditya, Mannan, Sundara, Saravanan, Rajalingam Arav

Description:

Modify Smith-Waterman: What should be the penalty for gaps (do gaps make any sense? ... Smith-Waterman Algorithm. Align S1=ATCTCGTATGATG S2=GTCTATCAC =1, =1. A ... – PowerPoint PPT presentation

Number of Views:86
Avg rating:3.0/5.0
Slides: 39
Provided by: goannaCs
Category:

less

Transcript and Presenter's Notes

Title: Peter Smooker, Heiko Schrder, Margaret Hamilton, Aditya, Mannan, Sundara, Saravanan, Rajalingam Arav


1
A new approach to protein structure prediction
?
?
?
?
?
?
  • Peter Smooker, Heiko Schröder, Margaret Hamilton,
    Aditya, Mannan, Sundara, Saravanan, Rajalingam
    Aravinthan,Gad Abraham, Abdullah Al Amin,
    Nalinda, Prashant

2
  • Whats on today?
  • Predicting protein structures
  • Fast implementation
  • Special purpose HPC
  • Searching for structural similarity
  • Visualisation of proteins
  • Lots of speculation, some results!

3
  • Aim Prediction of protein structures
  • Common methods
  • Homology modelling gt 30 match ? similar fold
  • Molecular modelling only for small molecules
  • Crystallography very expensive, very slow and not
    always possible.
  • Only few structures are known and we are falling
    behind (lt1).
  • Major efforts are being made e.g. Blue-Gene
    (fastest supercomputer (IBM))
  • Linear time method?

4
Motivation
  • Genetic sequence databases are growing
    exponentially (maybe not?)
  • Growth rate will continue, since multiple
    concurrent genome projects have begun, with more
    to come

5
Full Genome Comparison
  • related Organisms, but Tuberculosis causes a
    disease ? find common and different parts
  • 16?106 pair-wise sequence comparisons
  • More clever ways? I guess!
  • Many Genome-Genome Comparisons will be required
    in the near future

6
Homology Modeling
  • Discovered sequences are analyzed by comparison
    with databases
  • Complexity of sequence comparison is proportional
    to the product of query size times database size
  • ? Analysis too slow on sequential computers
  • Two possible approaches
  • Heuristics, e.g. BLAST,FastA, but the more
    efficient the heuristics, the worse the quality
    of the results
  • Parallel Processing, get high-quality results in
    reasonable time

7
Protein Sequence Alignment
  • BLAST, FastA, Smith-Waterman

Smith- Waterman
TO(S)
FastA
BLAST
8
Smith-Waterman Algorithm
Align S1ATCTCGTATGATG S2GTCTATCAC
0
0
0
0
0
0
2
1
0
0
2
1
0
2
2
3
4
?1, ?1
5
7
9
8
10
9
(No Transcript)
10
Context sensitivity!
11
  • Protein folding
  • Our approach
  • Linear method we do not compute electromagnetic
    fields nature has done it for us!
  • Physical forces have short range (decreasing
    quadratic with the distance)
  • ? context sensitivity Find the same protein with
    the same context in the database copy that
    structure.

12
Dihedral Angles
  • The 6 atoms in each peptide unit lie in the same
    plane
  • -- f and ? are free to rotate
  • The structure of a protein is almost totally
    determined, if all angles f and ? are known

13
Abdullah Al Amin
?
f
14
Abdullah Al Amin
f
15
Abdullah Al Amin
?
16
Abdullah Al Amin
17
Which f ?
Abdullah Al Amin
S val-val-xxx
18
Abdullah Al Amin
19
Abdullah Al Amin
val-val-ala
f ? ? same AA f ? ? neighbour
20
Abdullah Al Amin
21
Complexity Reducing the size of search
space Reducing the number of peaks. 2x size
of search space 2X-Y assuming we have predicted Y
angles with high confidence Our aim Large Y
(YX is not possible) Method Increase the
context Problem Longer the context ? fewer
matches Example 20k different sequences of
length k. Ek PDB/20k. k3, E3 1000. k5, E5
3. k9, E9 1/50000.
22
Which context??
  • I I O ALA LYS SER O O I (E20)
  • ? reduce number of peaks
  • Different lists for different groups of proteins?
  • (inside cells, outside cells), Saravanan
  • ? reduce number of peaks
  • Short and perfect ? to longer and less perfect?
  • Rajalingam Aravinthan, Gad Abraham
  • ? reduce number of peaks
  • Reduce the size of the search space!

23
Rajalingam AravinthanGad Abraham
24
Prediction based on length 3
25
Abdullah Al Amin
?
f
26
Why 9?
27
Suffix trie and suffix tree fast search!
Suffix trie for abcacbcabacb (all suffixes up to
length 4).
Find all strings that are similar to aacb
(tolerance 1).
Breadth first search!
Prashant
0
c
a
b
1
1
0
a
c
b
c
b
a
1
1
1
1
c
a
c
b
b
c
1
1
1
b
a
b
1
1
28
Parallel Architectures for Bioinformatics
  • Embedded Massively Parallel Accelerators

29
Parallel Architectures for Bioinformatics
  • Supercomputer performance at low cost
  • combines SIMD and MIMD paradigm within a parallel
    architecture ? Hybrid Computer

30
Speculation Finding similar structures based on
sequences of fs and ?s. We could search for a
structure that has a high degree of similarity
with a predicted structure (instead of similarity
of the sequence particularly in hydrophobic
parts). Modify Smith-Waterman What should be
the penalty for gaps (do gaps make any sense?)
how do we treat confidence information?
31
Smith-Waterman Algorithm
Align S1ATCTCGTATGATG S2GTCTATCAC
?1, ?1
1
32
Nalinda
degrees difference
1500 Score
------------------------- - 10 50
( ai aj x 0.9)2
? ? -10
33
Nalinda
34
Look ahead
35
  • Visualisation tool
  • Sequence of dihedral angles
  • Structure of protein
  • Visualise structure
  • Indicate confidence
  • Translate change of dihedral angle into change
  • of 3D-structure
  • Emphasise physical collisions
  • Show positions for potential S-S bonds and
  • hydrogen bonds
  • Show fields?

36
  • Speculation
  • Simulation of the folding process
  • Predict the structure of the following
    hydrophobic subsequence needs to be tested
    whether hydrophobicity is highly correlated with
    being inside a protein.
  • Mark all positions of cysteines
  • Mark all positions of potential hydrogen bonds
  • Simulate the bending process
  • Look for similar structures up to here similar
  • Compare structures of identical O/I sequences
  • Compare surfaces (cut protein at a hydrophil
    position and look at the set of exposed
    hydrophobic amino acids)
  • Develop an algorithm to determine structural
    similarity, either based on dihedral angles or on
    Euclidian positions using dynamic programming.
  • With such an algorithm similar surroundings can
    be found.
  • Do new parts deform old parts significantly?

37
?
?
?
?
?
?
?
?
?
?
38
Thank you !
Write a Comment
User Comments (0)
About PowerShow.com