Lecture 9.2: Homology and Structural Similarity (What do when you have no structure ...) - PowerPoint PPT Presentation

1 / 71
About This Presentation
Title:

Lecture 9.2: Homology and Structural Similarity (What do when you have no structure ...)

Description:

... Fully automated protein structure prediction using ISITES, HMMSTR and ROSETTA. ... Rosetta: Monte carlo fragment move based structure generation, Bayesian ... – PowerPoint PPT presentation

Number of Views:119
Avg rating:3.0/5.0
Slides: 72
Provided by: stephe78
Category:

less

Transcript and Presenter's Notes

Title: Lecture 9.2: Homology and Structural Similarity (What do when you have no structure ...)


1
Lecture 9.2Homology and Structural
Similarity(What do when you have no structure
...)
  • Boris Steipe
  • boris.steipe_at_utoronto.ca
    http//biochemistry.utoronto.ca/steipe
  • Departments of Biochemistry and Molecular and
    Medical Genetics
  • Program in Proteomics and Bioinformatics
  • University of Toronto
  • (This lecture is based in part on a lecture held
    by Chris Hogue, Toronto, for CBW in 2002)

2
Concepts
  • Domains are folding units, functional units and
    units of inheritance.
  • Homologous domains have similar structure.
  • Structural similarity can be measured and similar
    domains can be retrieved from databases.
  • Detection of similar folds can provide
    mechanistic explanations.
  • Threading methods can sometimes find similar
    folds.
  • Ab initio predictions of structure are highly
    experimental.

3
Concept 1
  • Domains are
  • folding units, functional units, and units of
    inheritance.

4
Domains as units of inheritance - the PH domain
story
Dotlet - A dotplot of Pleckstrin (p47) reveals
similarity between N-and C terminus !
5
Domains as units of inheritance - the PH domain
story
Matrix EBLOSUM62 Gap_penalty 10.0
Extend_penalty 0.5 Length 100 Identity
31/100 (31.0) Similarity 48/100
(48.0) Gaps 6/100 ( 6.0) 6
IREGYLVKKGSVFNTWKPMWVVLLEDG--IEFYKKKSDNSPKGMIPLKGS
53 ............
............ 245 IKQGCLLKQGHRRKNWKVRKFIL
REDPAYLHYYDPAGAEDPLGAIHLRGC 294 54
TLTSPCQDFGKRMF----VFKITTTKQQDHFFQAAFLEERDAWVRDINKA
99 ......... ........
..... 295 VVTSVESNSNGRKSEEENLFEIITADEVHYF
LQAATPKERTEWIKAIQMA 344
Emboss - Optimal sequence alignment 31 identity
over 100 amino acids.
6
Domains as units of inheritance - the PH domain
story
Matrix EBLOSUM62 Gap_penalty 10.0
Extend_penalty 0.5 Length 100 Identity
31/100 (31.0) Similarity 48/100
(48.0) Gaps 6/100 ( 6.0) 6
IREGYLVKKGSVFNTWKPMWVVLLEDG--IEFYKKKSDNSPKGMIPLKGS
53 ............
............ 245 IKQGCLLKQGHRRKNWKVRKFIL
REDPAYLHYYDPAGAEDPLGAIHLRGC 294 54
TLTSPCQDFGKRMF----VFKITTTKQQDHFFQAAFLEERDAWVRDINKA
99 ......... ........
..... 295 VVTSVESNSNGRKSEEENLFEIITADEVHYF
LQAATPKERTEWIKAIQMA 344
!
-C
N-
Human p47
-C
N-
Human p47
Overlapping alignments may define domain
boundaries ! We can search a database with this
knowledge ...
7
Domains as units of inheritance - the PH domain
story
-C
N-
Human p47
Hits are smoothly bounded and extend over the
entire domain.
486 hits ... etc.
8
Domains as units of inheritance - the PH domain
story
in contrast ...
-C
N-
Human p47
Hits extend over the entire domain. PSI Blast
would be difficult ...
(Yeast only, for clarity)
9
Concept 2
  • Homologous domains have similar structure.

10
Homologous domains have similar structures
1PLS/2DYN 23 ID
1PLS - PH domain (Human pleckstrin)
2DYN - PH domain (Human dynamin)
11
Homology and Structural Similarity
Proteins that diverge in evolution maintain their
global fold !
Russell et al. (1997) J Mol Biol 269 423-439
12
Concept 3
  • Structural similarity can be measured and similar
    domains can be retrieved from databases.

13
RMSD metric
To calculate the RMSD, a pairwise correspondence
of points has to be defined first.
14
RMSDopt
RMSDopt min(RMSDcoord)
RMSDopt RMSDcoord(A, Rs x (B-Ts))
The translation vector Ts and the rotation matrix
Ms define a superposition of the vector set B on
A.
An analytic solution of the superposition problem
is available, but not straightforward (involves
an eigenvalue problem).
15
Superposition in practice
  • Prealigned structures
  • VAST (http//www.ncbi.nlm.nih.gov/Structure/VAST/v
    ast.shtml)
  • FSSP (http//www.bioinfo.biocenter.helsinki.fi808
    0/dali/index.html)
  • Homstrad (http//www-cryst.bioc.cam.ac.uk/homstra
    d/)

60 70 80
90 100 1dro ( 32 )
wdkVyMaAkAG-------rIsFykd-qkgyk----------snpelTfrg
1btn ( 23 ) whnVyCvin-------nqeMgFykd-aksaa
----------sg--ipYh s1pls ( 21 )
wkpmwVVLle-------dgIeFykk-ksdn---------------spk--
1fgya ( 281 ) wkrrwFiLTd-------ncLyYFey-ttdk-
--------------epr-- 1faoa ( 181 )
wktrwFtLhr-------neLkYfkd-qm sp---------------epi-
- 1qqga ( 25 ) mhkrFFVLraaseaggparLEyYen-ekkw
r----------hkssapk-- 1bak ( 576 )
wqrryFyLfp-------nrlewrge----------------geap-----
1dyna ( 30 ) skeYwFvLta-------enLsWykd-deek-
--------------ekk-- 1dbha ( 456 )
kherhIFLFd--------gLICCksnhgqprl--------pgasnaeyrL
1b55a ( 25 ) fkkrlFlLtv-------hkLsYyeydfe--
r----------grrgskk-- 1mai ( 37 )
rreRfYkLqe-----dcktIwqesr-kv-----------------mrspe
1fhoa ( 25 ) pKlRyVfLfr-------nkimFtEqd---as
t--------s---ppsyth 1foea (1288 )
ePeLaAfVFk-------tAVVLVykdgskqkkklvgshrlsiyeewdpfr
bbbbbb bbbbb

16
Superposition in practice
  • Web services
  • VAST (http//www.ncbi.nlm.nih.gov/Structure/VAST/v
    ast.shtml)
  • CE (http//cl.sdsc.edu/ce.html)
  • LGA (http//predictioncenter.llnl.gov/local/lga/lg
    a.html)
  • Prosup (http//lore.came.sbg.ac.at8080/CAME/CAME_
    EXTERN/PROSUP/)

(Note Click on "Rasmol" on the results page to
return the alignment)
Useability and reliability of these services is
variable. "Intelligent" algorithms can
superimpose without the need for user definition
of correspondence. The downside is that the user
cannot define correspondences.
17
Superposition in practice - locally installed
  • Many molecular modeling programs have
    superposition features
  • DeepView (http//ca.expasy.org/spdbv/)
  • MolMol (http//www.mol.biol.ethz.ch/wuthrich/softw
    are/molmol/)
  • O (http//alpha2.bmc.uu.se/alwyn/o_related.html)
  • WhatIf (http//www.cmbi.kun.nl/whatif/)

18
When is RMSD misleading ?
  • Rigid body movement of domains or subdomains ...

?
19
Internal coordinates as an alternative to
superposition
a
a'
c'
b'
b
c
(a,a')
(b,b')
(c,c')
20
VAST - Database searches at MMDB
21
DALI ...
22
... and FSSP
The prealigned fold-tree
23
Workflow MMDB ...
Open http//www.ncbi.nlm.nih.gov/ enter your
search term ...
24
Workflow MMDB ...
Choose "Structure" ...
25
Workflow MMDB ...
Choose your protein of interest ...
26
... structure summary ...
27
... access domains similar to SH3 ...
28
... select, download ...
29
... display.
30
Concept 4
  • Detection of similar folds can provide
    mechanistic explanations.

31
Protein Modules
Modular interactions between biomolecules are
responsible for the inner workings of the
cell. There are far more modular interacting
proteins than classical enzymes in the human
genome we have known this since S. cerevisiae.
Pawson Lin
32
Protein Domains an alphabet of functional
modules
33
Workflow for domain architectures
Starting from a citation ...
34
... access sequence ...
35
... display sequence ...
36
... link to domain architecture ...
(from CDDdatabase - incl. SMART and Pfam)
37
... show domain relatives ...
38
... access domain information ...
39
... in CDD ...
40
... visualize in Cn3D.
41
Protein structure prediction
  • What to do when no structure is known and no
    homologues are found ?

42
Three Paths to Protein Structure Prediction
  • Homology Modeling
  • Threading (Fold recognition)
  • Ab initio prediction

43
Concept 5
  • Threading methods can sometimes find similar
    folds.

44
Fold recognition ("Threading")
Template Structure
Query Sequence
Query Sequence
Query Sequence
Query Sequence
45
Threading Database Search
  • Premise is that most sequences match some 3-D
    structure that is already known (1/2)
  • Given a database of known 3-D protein folds
  • align the test sequence to each known protein
  • in real 3-D coordinate space (slow but exact)
  • in parameterized 1-D space (fast but approximate)
  • optimize some scoring function
  • sort out best sequence-structure alignment
  • assess alignments - statistically significant?

46
Threading Statistics
  • Z score (sequence composition correction)
  • number of standard deviations the found alignment
    is off from the mode of a randomized version of
    the structure or profile
  • P value (sequence length correction)
  • Shuffle the sequence - make a distribution of
    random threads
  • Is the unscrambled thread any better than a
    randomly optimized sequence
  • Z score of Z scores
  • Look for P values as a criterion for choosing a
    threading method...

47
Database Searching...
  • Sensitivity
  • High sensitivity implies finding all possible
    true positive matches in the database
  • Specificity
  • High specificity implies finding no false
    positive matches in the search.

48
Threading as a Database Search Method
  • Has INCREDIBLY poor sensitivity
  • 10-20 on a good day
  • Has INCREDIBLY poor specificity.
  • 90 of hits are false positives
  • So...

49
Interpret Threading Accordingly...
  • In a ranked list of 10 matches, expect that only
    one might be correct
  • Expect that none may be correct
  • Expect that the top ranked hit is a false
    positive...

50
How then does Threading find things?
  • If there is a true positive in a threading search
    hit list - People find it ...
  • It is most often found by FUNCTIONAL similarity.
  • Similar enzymatic mechanisms
  • Motifs, DART ...
  • Similar roles, cellular distributions ...

51
Concept 6
  • Ab initio predictions of structure are highly
    experimental.

52
Protein structure prediction is easy
The assumption Native structure is a global
energy minimum
  • The algorithm
  • Reasonably generate all conformations
  • Score with an appropriate scoring function
  • Choose the one with best score

reasonable search finishes in reasonable
time appropriate monotonous with q (or at
least)DG, useful radius of convergence
53
Why is structure prediction hard ?
  • Appropriate scoring functions
  • Reasonable structure generation
  • Working approaches

54
Protein structure scoring functions
Molecular Mechanics Empirical
(Statistical) Combinations
The scoring function is the single most important
component of any optimization !
55
Protein structure scoring functions
bonds
Molecular Mechanics Empirical
(Statistical) Combinations
angles
dihedrals
Van der Waals
Coulomb
56
Protein structure scoring functions
Energy of state i
Molecular Mechanics Empirical
(Statistical) Combinations
Frequency
Partition function
Frequency of observation of a,b at separation x
All observations of a,b
Potential energy between a,b at separation x
57
Protein structure scoring functions
Molecular Mechanics Empirical
(Statistical) Combinations
Usually combine potential energy and empirical
solvation terms
58
Why is structure prediction hard ?
  • Appropriate scoring functions
  • Reasonable structure generation
  • Working approaches

59
Combinatorially large search spaces make
enumeration impossible.
  • Consider
  • 100 residues
  • 3 states
  • 3100 1047 conformations

60
A Blind Golfer's view of global optimization I
How do you hit a hole-in-one, when you can't even
see the hole ? How do you hit 18 holes-in-one in
a row ?
61
A Blind Golfer's view of global optimization II
Change the shape of the golf course !
62
An analysis of why the Blind Golfer's strategy
works
a
b
Local improvements in position (a) lead to
incremental improvements in energy (b) !!!
63
How does nature fold proteins ?
The funnel model reconciles the thermodynamic and
the kinetic view !
q
DG
In a flat folding landscape, a thermodynamic
minimum is kinetically inaccessible.
An ideal funnel results in fast, two-state
folding through many possible pathways.
But ...
Dill KA Chan HS (1997) From Levinthal to
pathways to funnels. Nature Struct Biol 410-19
64
How does nature fold proteins ?
Real folding landscapes appear to be more complex
- robust folding is possible, but so are
populated intermediate states and kinetic traps.
What does this mean for promising computational
strategies ? To the degree that folding is under
Thermodynamic control Direct inference of
structure is possible To the degree that folding
is under Kinetic control Simulation of folding
pathway is required
Dill KA Chan HS (1997) From Levinthal to
pathways to funnels. Nature Struct Biol 410-19
65
How to solve hard problems
  • Simplification
  • Brute force
  • Branch and bound
  • Heuristics
  • Local optimization
  • Simulated annealing
  • Genetic algorithms
  • Neural networks

66
Is structure prediction NP hard ?
Not necessarily nature does it in P.
A problem that is NP-hard in principle, can be P
in practice. This is the significance of the
protein folding funnel. Search for local
solutions - subproblems !
67
Why is structure prediction hard ?
  • Appropriate scoring functions
  • Reasonable structure generation
  • Working approaches

68
Ab initio prediction
Isites Sequence - structure motifs
HMMSTR Hidden Markov Model 2-structure
prediction
Rosetta Monte carlo fragment move based
structure generation, Bayesian conditional
probability scoring function
Bystroff, C. Shao, Y. (2002) Fully automated
protein structure prediction using ISITES, HMMSTR
and ROSETTA. Bioinformatics 18 S1 S54-S61
69
Ab initio prediction
What can you expect ?
50 residues lt 6Å RMSD 20 of proteins
globally topologically correct 60 of proteins
with partially topologically correct substructures
RMSD 5.9Å
RMSD 5.9Å
RMSD 5.9Å
Bystroff, C. Shao, Y. (2002) Fully automated
protein structure prediction using ISITES, HMMSTR
and ROSETTA. Bioinformatics 18 S1 S54-S61
70
An ab initio Predictionserver on the WWW
http//robetta.bakerlab.org
71
Open Issues
  • Scoring functions
  • radius of convergence ...
  • Workflow
  • what will you do with the results ?
Write a Comment
User Comments (0)
About PowerShow.com