Rapid Methods for Comparing Protein Structures and Scanning Structure Databases - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Rapid Methods for Comparing Protein Structures and Scanning Structure Databases

Description:

Post Doc (Structural Biology Program), EMBL, Heidelberg, Germany, (1995-2000) Current Position: ... Dictionary of protein secondary structures ... – PowerPoint PPT presentation

Number of Views:87
Avg rating:3.0/5.0
Slides: 34
Provided by: aas50
Category:

less

Transcript and Presenter's Notes

Title: Rapid Methods for Comparing Protein Structures and Scanning Structure Databases


1
Rapid Methods for Comparing Protein Structures
and Scanning Structure Databases
  • Oliviero Carugo, Current Bioinformatics1(1),
    2006
  • Azhar Ali Shah
  • Computational Foundations of Nanoscience Journal
    Club (CFNJC)

CFNJC, October 19, 2007
2
Overview
  • Introduction
  • About the author
  • Problem
  • Requirements
  • Motivations
  • Background
  • Classification of methods
  • Summary
  • Observations

3
Introduction about the author 1/2
  • Name Oliviero Carugo
  • Nationality Italian and French
  • Education
  • PhD (Chemistry), Univ. of Pavia, Italy, (1985 -
    1986)
  • Post Doc (Structural Biology Program), EMBL,
    Heidelberg, Germany, (1995-2000)
  • Current Position
  • AP, Dept. of General Chemistry, Univ. of Pavia,
    Italy (2000 --)
  • Visiting Professor, Dept. of Biomolecular
    Structural Chemistry, University of Vienna,
    Austria (2005 --)

4
Introduction about the author 2/2
  • Research interests
  • Structural bioinformatics
  • Estimation of protein structure similarity,
  • prediction of inter-molecular interactions,
  • prediction of crystallizability of gene products
  • DBLP Carugo
  • CX, DPX and PRIDE WWW servers for the analysis
    and comparison of protein 3D structures. Nucleic
    Acids Research 33(Web-Server-Issue) 252-254
    (2005)
  • DPX for the analysis of the protein core.
    Bioinformatics 19(2) 313-314 (2003)
  • Prediction of protein polypeptide fragments
    exposed to the solvent. In Silico Biology 3 35
    (2003)
  • CX, an algorithm that identifies protruding atoms
    in proteins. Bioinformatics 18(7) 980-984 (2002)

5
Introduction problem 1/2
  • Complexity of the structural biological
    information is increasing more rapidly as
    compared to computer performance
  • Consider
  • Number of PDB entries as structural biological
    information (PDB Graph)
  • Number of transistors per IC as a parameter of
    compute performance (Moores Law)
  • Evaluation for 3 decades (1971 to 2003) gives

6
Introduction problem 2/2
Confusing description!
Number of PDB Structures
Number of transistors per IC (x 100, 000)
Total structures in 2003 20, 000 Yearly growth
in 2003 5000
7
Introduction requirement
  • Fast algorithms and protocols to measure
    similarity b/w protein 3D structures available in
    large scale databases

8
Introduction motivations
  • The estimation of similarity between protein 3D
    structures helps in
  • Molecular evolution
  • Molecular modelling
  • Function prediction
  • Database scanning

9
Introduction background 1/3
  • So many algorithms
  • Each biological problem requires its own
    comparison method
  • Different problems need different logical
    approaches

10
Introduction background 2/3
  • Slow methods
  • Careful examination of proximity among two or
    more proteins using structural alignment
  • Too slow for large databases
  • Often use two step strategy
  • Coarse structure representation (e.g. SSE)
  • Fine structure representation (e.g. positions of
    C? atoms)

11
Introduction background 3/3
  • Fast methods
  • Used for large scale databases
  • Work on coarse representation of protein
    structures
  • Results are less accurate and detailed (e.g. no
    structural alignment)

12
Introduction focus of the paper
  • Fast comparison methods that can handle large
    scale structural databases
  • Rapid Methods for Comparing Protein Structures
    and Scanning Structure Databases

13
Classification of methods
  • Based on the representation of proteins 3D
    structure
  • String
  • Array
  • Secondary structure elements (SSEs)
  • Backbone

14
String representation 1/4
  • Uncommon but appealing
  • Allows to use sequence alignment methods to
    compare 3D structures
  • 3D structure of n residues/SSEs (or other
    structural units) is represented by n characters
  • Characters are chosen from an alphabet
  • Each character has associated structural features

15
String representation 2/4
  • Problem
  • Difficult to design an appropriate alphabet that
    can well describe the 3D structural features
  • Comparison methods based on strings
  • TOPSCAN (Martin ACR, Protein Eng, 2000),UCL
  • Uses STRIDE program to identify SSEs
  • Builds the vectors b/w the endpoints of SSEs
  • SSEs are associated with one of the 12 characters
    on the basis of larger component in the vector

16
String representation 3/4
17
String representation 4/4
  • Uses Needleman and Wunsch algorithm on string
    representation of two 3D structures and
    calculates the percentage similarity score using
    following scheme

How fast TOPSCAN is?
Should be 10?
18
Array representation 1/4
  • 3D structure represented as a fixed length array
    of real numbers
  • Benefits
  • For the comparison of equal length arrays there
    are well assessed mathematical tools based on
    proximity detection
  • E.g. Euclidian distance b/w two points in an
    orthogonal space
  • Problems
  • Definition of the array
  • No obvious way to describe an object by means of
    predefined set of variables

19
Array representation 2/4
  • Comparison methods based on arrays
  • PRIDE (Carugo and Pongor, J Mol Bio 2002)
  • Uses distances b/w C? atoms to represent the 3D
    structure
  • 28 histograms are computed for each structure
    e.g.

Two histograms are compared through contingency
table and ?2 Test to obtain the probability of
identity score
Fold similarity of two structures is estimated as
the average of probability of identity scores
obtained from the pairwise comparison of 28
histograms
20
(No Transcript)
21
Array representations 4/4
  • PRIDE results agreeable with CATH
  • Fast comparison
  • 1000 comparisons per second
  • SGI R10000 system with 200 MHz

22
Secondary structural elements (SSEs) 1/6
  • Simplified description of 3D structure
  • i.e a few tens of SSEs as compared to several
    tens or hundreds of residues
  • Smaller number of variables make comparison
    easier

23
Secondary structural elements (SSEs) 2/6
  • Different ways to represent protein 3D structure
    by means of SSEs
  • Secondary structural assignments
  • SSE approximation

24
Secondary structural elements (SSEs) 3/6
  • Secondary structural assignments
  • Different assignments with different programs
  • Due to variable torsion angles along the backbone
  • Common methods
  • DSSP (Kabsch and Sander, Biopolymers 1983)
  • Dictionary of protein secondary structures
  • Looks for hydrogen bonds b/w main-chain atoms and
    assigns each residue with one of eight types of
    secondary structure conformations
  • STRIDE (Frishman and Argos, Proteins 1995)
  • Uses both hydrogen bonds and torsion angles to
    assign secondary structures

25
Secondary structural elements (SSEs) 4/6
  • Other methods for SSE assignments
  • P-Curve
  • DEFINE
  • SSA
  • VADAR
  • Voronoi Tessellations
  • Contradiction in results
  • DSSP and STRIDE agree in 96 (for 707 Ps)
  • DSSP, STRIDE, DEFINE agree in 71 (for 126 Ps)
  • DSSP, DEFINE, P-Curve agree in 63 (for 154 Ps)

Secondary structure assignments are quite
ambiguous and inconsistent! (consensus based on
majority vote needed)
Serious limitation of the methods that compare
3D structures based on SSE arrangements
26
Secondary structural elements (SSEs) 5/6
  • SSE approximations
  • As a vector from N to C terminus
  • Differ from arrays in terms of variable length
  • Well assessed mathematical tools cannot be used
  • Different ways

27
Secondary structural elements (SSEs) 6/6
Statistical performance of SSM or other
methods? Two-step methods are slow?
  • Two-step methods based on SSEs
  • SSM (Krissinel and Heinrick, EMBL 2003)
  • Secondary Structure Matching
  • http//www.ebi.ac.uk/msd-srv/ssm/
  • Protein 3D structures are represented as graphs
  • Nodes are SSEs
  • Graph comparison results in identification of
    equivalent residues
  • Subsequent minimization of RMSD b/w equivalent
    residues
  • DEJAVU (http//xray.bmc.uu.se/usf/)
  • Matras (http//biunit.naist.jp/matras/)
  • VAST(http//www.ncbi.nlm.nih.gov/Structure/VAST)

28
Backbone representations
  • Uses vector based profiles to describe
    trajectories from N to C terminus of backbone
  • Trajectory could be described as a simple curve
  • Each residue is associated with the curvature and
    torsion of the curve
  • Differences of these parameters are used to
    compare two 3D structures
  • Useful when one compares same protein in two
    different states (e.g with or without a
    substrate, inhibitors and cofactors etc.)
  • It is hard to handle with gaps and insertions

Hardly used in general case for similarity
evaluation and hence no public web servers are
available. However?
29
Comparison b/w various methods
Strange! Speed also depends on the power of
computing environment the algorithm runs on.
  • For 86 queries, DALI gives best quality of
    results as compared to
  • CE, Matras, PRIDE, SGM, Structal and VAST
  • (Sierk and Pearson, Protein Sc 2004)
  • For 70 queries CE, Dali, VAST and Matras provide
    better quality of results with high speed as
    compared to
  • DEJAVU, Lock, PRIDE, SSM, TOP, TOPS, TOPSCAN
  • (Novotony et al. Proteins 2004)

30
Summary
  • Rapid methods may use coarse representation of 3D
    structures in following forms
  • Strings
  • E.g TOPSCAN
  • Arrays
  • E.g PRIDE
  • SSEs
  • Two-step methods SSM, DEJAVU, Matras, VAST
  • Backbone
  • Algorithmic level studies no public web servers
  • Comparison on same collection of data on same
    computing environment is useful
  • To benchmark the sate of the art of fast
    procedures

31
Observations
  • Actual benchmarking of rapid methods on large
    scale databases
  • Proper evaluation of methods based on different
    representations of proteins 3D structure
  • Full classification of methods based on structure
    representation

32
Source www.intel.com/research/silicon/mooreslaw.h
tm
33
Source www.ncsb.org
Write a Comment
User Comments (0)
About PowerShow.com