Rapid Methods for Comparing Protein Structures and Scanning Structure Databases - PowerPoint PPT Presentation

1 / 33

About This Presentation

Title:

Rapid Methods for Comparing Protein Structures and Scanning Structure Databases

Description:

Post Doc (Structural Biology Program), EMBL, Heidelberg, Germany, (1995-2000) Current Position: ... Dictionary of protein secondary structures ... – PowerPoint PPT presentation

Number of Views:87

Avg rating:3.0/5.0

Slides: 34

Provided by: aas50

Category:

more less

Transcript and Presenter's Notes

Title: Rapid Methods for Comparing Protein Structures and Scanning Structure Databases

1
Rapid Methods for Comparing Protein Structures
and Scanning Structure Databases

Oliviero Carugo, Current Bioinformatics1(1),
2006
Azhar Ali Shah
Computational Foundations of Nanoscience Journal
Club (CFNJC)

CFNJC, October 19, 2007
2
Overview

Introduction
About the author
Problem
Requirements
Motivations
Background
Classification of methods
Summary
Observations

3
Introduction about the author 1/2

Name Oliviero Carugo
Nationality Italian and French
Education
PhD (Chemistry), Univ. of Pavia, Italy, (1985 -
1986)
Post Doc (Structural Biology Program), EMBL,
Heidelberg, Germany, (1995-2000)
Current Position
AP, Dept. of General Chemistry, Univ. of Pavia,
Italy (2000 --)
Visiting Professor, Dept. of Biomolecular
Structural Chemistry, University of Vienna,
Austria (2005 --)

4
Introduction about the author 2/2

Research interests
Structural bioinformatics
Estimation of protein structure similarity,
prediction of inter-molecular interactions,
prediction of crystallizability of gene products
DBLP Carugo
CX, DPX and PRIDE WWW servers for the analysis
and comparison of protein 3D structures. Nucleic
Acids Research 33(Web-Server-Issue) 252-254
(2005)
DPX for the analysis of the protein core.
Bioinformatics 19(2) 313-314 (2003)
Prediction of protein polypeptide fragments
exposed to the solvent. In Silico Biology 3 35
(2003)
CX, an algorithm that identifies protruding atoms
in proteins. Bioinformatics 18(7) 980-984 (2002)

5
Introduction problem 1/2

Complexity of the structural biological
information is increasing more rapidly as
compared to computer performance
Consider
Number of PDB entries as structural biological
information (PDB Graph)
Number of transistors per IC as a parameter of
compute performance (Moores Law)
Evaluation for 3 decades (1971 to 2003) gives

6
Introduction problem 2/2
Confusing description!
Number of PDB Structures
Number of transistors per IC (x 100, 000)
Total structures in 2003 20, 000 Yearly growth
in 2003 5000
7
Introduction requirement

Fast algorithms and protocols to measure
similarity b/w protein 3D structures available in
large scale databases

8
Introduction motivations

The estimation of similarity between protein 3D
structures helps in
Molecular evolution
Molecular modelling
Function prediction
Database scanning

9
Introduction background 1/3

So many algorithms
Each biological problem requires its own
comparison method
Different problems need different logical
approaches

10
Introduction background 2/3

Slow methods
Careful examination of proximity among two or
more proteins using structural alignment
Too slow for large databases
Often use two step strategy
Coarse structure representation (e.g. SSE)
Fine structure representation (e.g. positions of
C? atoms)

11
Introduction background 3/3

Fast methods
Used for large scale databases
Work on coarse representation of protein
structures
Results are less accurate and detailed (e.g. no
structural alignment)

12
Introduction focus of the paper

Fast comparison methods that can handle large
scale structural databases
Rapid Methods for Comparing Protein Structures
and Scanning Structure Databases

13
Classification of methods

Based on the representation of proteins 3D
structure
String
Array
Secondary structure elements (SSEs)
Backbone

14
String representation 1/4

Uncommon but appealing
Allows to use sequence alignment methods to
compare 3D structures
3D structure of n residues/SSEs (or other
structural units) is represented by n characters
Characters are chosen from an alphabet
Each character has associated structural features

15
String representation 2/4

Problem
Difficult to design an appropriate alphabet that
can well describe the 3D structural features
Comparison methods based on strings
TOPSCAN (Martin ACR, Protein Eng, 2000),UCL
Uses STRIDE program to identify SSEs
Builds the vectors b/w the endpoints of SSEs
SSEs are associated with one of the 12 characters
on the basis of larger component in the vector

16
String representation 3/4
17
String representation 4/4

Uses Needleman and Wunsch algorithm on string
representation of two 3D structures and
calculates the percentage similarity score using
following scheme

How fast TOPSCAN is?
Should be 10?
18
Array representation 1/4

3D structure represented as a fixed length array
of real numbers
Benefits
For the comparison of equal length arrays there
are well assessed mathematical tools based on
proximity detection
E.g. Euclidian distance b/w two points in an
orthogonal space
Problems
Definition of the array
No obvious way to describe an object by means of
predefined set of variables

19
Array representation 2/4

Comparison methods based on arrays
PRIDE (Carugo and Pongor, J Mol Bio 2002)
Uses distances b/w C? atoms to represent the 3D
structure
28 histograms are computed for each structure
e.g.

Two histograms are compared through contingency
table and ?2 Test to obtain the probability of
identity score
Fold similarity of two structures is estimated as
the average of probability of identity scores
obtained from the pairwise comparison of 28
histograms
20
(No Transcript)
21
Array representations 4/4

PRIDE results agreeable with CATH
Fast comparison
1000 comparisons per second
SGI R10000 system with 200 MHz

22
Secondary structural elements (SSEs) 1/6

Simplified description of 3D structure
i.e a few tens of SSEs as compared to several
tens or hundreds of residues
Smaller number of variables make comparison
easier

23
Secondary structural elements (SSEs) 2/6

Different ways to represent protein 3D structure
by means of SSEs
Secondary structural assignments
SSE approximation

24
Secondary structural elements (SSEs) 3/6

Secondary structural assignments
Different assignments with different programs
Due to variable torsion angles along the backbone
Common methods
DSSP (Kabsch and Sander, Biopolymers 1983)
Dictionary of protein secondary structures
Looks for hydrogen bonds b/w main-chain atoms and
assigns each residue with one of eight types of
secondary structure conformations
STRIDE (Frishman and Argos, Proteins 1995)
Uses both hydrogen bonds and torsion angles to
assign secondary structures

25
Secondary structural elements (SSEs) 4/6

Other methods for SSE assignments
P-Curve
DEFINE
SSA
VADAR
Voronoi Tessellations
Contradiction in results
DSSP and STRIDE agree in 96 (for 707 Ps)
DSSP, STRIDE, DEFINE agree in 71 (for 126 Ps)
DSSP, DEFINE, P-Curve agree in 63 (for 154 Ps)

Secondary structure assignments are quite
ambiguous and inconsistent! (consensus based on
majority vote needed)
Serious limitation of the methods that compare
3D structures based on SSE arrangements
26
Secondary structural elements (SSEs) 5/6

SSE approximations
As a vector from N to C terminus
Differ from arrays in terms of variable length
Well assessed mathematical tools cannot be used
Different ways

27
Secondary structural elements (SSEs) 6/6
Statistical performance of SSM or other
methods? Two-step methods are slow?

Two-step methods based on SSEs
SSM (Krissinel and Heinrick, EMBL 2003)
Secondary Structure Matching
http//www.ebi.ac.uk/msd-srv/ssm/
Protein 3D structures are represented as graphs
Nodes are SSEs
Graph comparison results in identification of
equivalent residues
Subsequent minimization of RMSD b/w equivalent
residues
DEJAVU (http//xray.bmc.uu.se/usf/)
Matras (http//biunit.naist.jp/matras/)
VAST(http//www.ncbi.nlm.nih.gov/Structure/VAST)

28
Backbone representations

Uses vector based profiles to describe
trajectories from N to C terminus of backbone
Trajectory could be described as a simple curve
Each residue is associated with the curvature and
torsion of the curve
Differences of these parameters are used to
compare two 3D structures
Useful when one compares same protein in two
different states (e.g with or without a
substrate, inhibitors and cofactors etc.)
It is hard to handle with gaps and insertions

Hardly used in general case for similarity
evaluation and hence no public web servers are
available. However?
29
Comparison b/w various methods
Strange! Speed also depends on the power of
computing environment the algorithm runs on.

For 86 queries, DALI gives best quality of
results as compared to
CE, Matras, PRIDE, SGM, Structal and VAST
(Sierk and Pearson, Protein Sc 2004)
For 70 queries CE, Dali, VAST and Matras provide
better quality of results with high speed as
compared to
DEJAVU, Lock, PRIDE, SSM, TOP, TOPS, TOPSCAN
(Novotony et al. Proteins 2004)

30
Summary

Rapid methods may use coarse representation of 3D
structures in following forms
Strings
E.g TOPSCAN
Arrays
E.g PRIDE
SSEs
Two-step methods SSM, DEJAVU, Matras, VAST
Backbone
Algorithmic level studies no public web servers
Comparison on same collection of data on same
computing environment is useful
To benchmark the sate of the art of fast
procedures

31
Observations

Actual benchmarking of rapid methods on large
scale databases
Proper evaluation of methods based on different
representations of proteins 3D structure
Full classification of methods based on structure
representation

32
Source www.intel.com/research/silicon/mooreslaw.h
tm
33
Source www.ncsb.org

Write a Comment

User Comments (0)