Homology Modeling (advanced) - PowerPoint PPT Presentation

1 / 57
About This Presentation
Title:

Homology Modeling (advanced)

Description:

Deswal R, Singh R, Lynn AM, Frank R. Methods: Simple threading of sequence on human homologue ... AMBER (Peter Kollman, UCSF) CHARMM (Martin Karplus, Harvard) ... – PowerPoint PPT presentation

Number of Views:221
Avg rating:3.0/5.0
Slides: 58
Provided by: stephe78
Category:

less

Transcript and Presenter's Notes

Title: Homology Modeling (advanced)


1
Homology Modeling(advanced)
  • Boris Steipe
  • University of Toronto
  • boris.steipe_at_utoronto.ca

2
Concepts
  • Review of homology modeling basics
  • Multiple sequence alignment revisited
  • Modeling goals revisited
  • Modeling methods revisited
  • Modeling problems and energy minimization
  • Conclusions

3
Concept 1
  • Sequence alignment is the single most important
    step in homology modeling.

4
Alignment is the limiting step for homology model
accuracy
No amount of forcefield minimization will put a
misaligned residue in the right place !
HOMSTRAD _at_ CASP4 Williams MG et al. (2001)
Proteins Suppl.5 92-97
5
Superposition vs. Alignment
  • The coordinates of two proteins can be
    superimposed in space.
  • An alignment may be derived from a superposition
    by correlating residues that are close in space.
  • An optimal sequence alignment may lead to a
    different alignment ...

Superposition of 1GTR and 2TS1
6
Superposition vs. Alignment
TyrRS ERVTLYCGFDPTAdS--LHIGHLATILTMRRFQQAGHRPIA
LVGGAtgligdpsgkkser
1GTR
26 TTVHTRFPPEPNG-YLHIGHAKSICL--NF---------------
GIAqDYKGQCN--
2TS1 29
ERVTLYCGFDPTAdSLHIGHLATILT--MR---------------RFQ-Q
AGHRPI-- TyrRS tlnaketVEAWSARIKEQLgrfldfeadgn
pa----------------k--------IKN

1GTR 26 ----------------------LRFD-DTnpv-----
-----------keDIEYVESIKN
2TS1
29 ----------------------ALVG-GAtgligdpsgkksertlna
ketVEAWSARIKE TyrRS NYDWIgpldvitflrdvgk----hf
svnymmakesvqsrietgisftefsYMMLQAYDFL

1GTR 26 DVewl------------gf----hwsgnVRYSSD-
--------------------YFdql

2TS1 29 QLgrf------------ldfeadgnpakIKNNYD------
---------------WIgpl TyrRS
RLYetegCRLQIGGSDQwgnitaGL--------ELIRKTKgearAFGLTI
PLV
1GTR 26
hayaie-------------linkglayvdeltpeqireyrgtltqpgkns
pyrdrsveen
2TS1 29
dvitfl-------------rdvgkhfsvnym-------------------
---------- TyrRS
1GTR
26 lalfekmraggfeegkaclrakidmaspfivmrdpvlyrikfaehh
qtgnkwciypmYDF
2TS1 29
-------------------------------------makesvqsrietg
isftefsYMM TyrRS 1GTR 26
THCISDALEG----ITHSLCTLEFqdnrrlYDWVLDNITipvhPRQYEFS
RL 262
2TS1 29
LQAYDFLRLYetegCRLQIGGSDQwgnitaGLELIRKTKgearAFGLTIP
LV 223
Example E. coli GlnRS (1GTR) and G.
stearothermophilus TyrRS (2TS1). Although the
optimal sequence alignment (top/middle) is not
unreasonable (19 ID 40/212 residues, similar
function, ATP binding motif conserved (box)),
comparison with the structure shows it is
actually wrong for all but 11 residues ! The
superposition-based alignment (middle/bottom) is
quite dissimilar in sequence ( 4.5ID 12/265
residues) but the superposition actually matches
39 of residues ( 104/265 ) as pairs in space
over the length of the domain.
7
Inserts may be accomodated in a distant part of
the structure
Example - a five residue insert
  • Sequence aligment (shows what happened)
  • gktlit nfsqehip
  • gktlisflyeqnfsqehip
  • Structure alignment (shows how it's accomodated)
  • gktlitnfsq ehip
  • gktlisflyeqnfsqehip

a-helix
8
Indels (inserts or deletions)
  • Comparisons of alignments and structures
    demonstrate that uniform gap penalty assumptions
    are NOT BIOLOGICAL.
  • Indels are most often observed in loops, less
    often in secondary structure elements
  • When they do not occur in loops, there is
    frequently a maintenance of helical or strand
    properties.

9
Can we do better than using a uniform gap
assumption?
  • Required position specific gap penalties
  • One approach implemented in Clustal as secondary
    structure masks
  • Get secondary structure information, convert it
    to Clustal mask format. (Easy - read
    documentation !)
  • Alternatively use a manual sequence alignment
    editor to move gaps out of secondary structure
    regions.

However This is "automatically" achieved by
modern multiple sequence alignment programs.
10
Concept 4
  • SwissModel in practice.

11
Homology Modeling Process
TAR
PSI-BLAST
Search
nr (PDB)
These are really two queries rolled into one
procedure.
TAR Target sequence
MSA
Align
Search Sequence database similarity search
Cinema
nr non-redundant Genbank subset, (with annotated
structures)
MSA
HOM Homologous sequences
SwissModel
Model
ExPDB
TEM Sequences of homologues with known structure
LIG
Align Careful Multiple Sequence Alignment
3D
MSA Multiple Sequence Alignment
Model Generate 3D Model
TextEditor
Complete
ExPDB Modeling template structure database
3DC
Complete Add ligands, substrates etc. to model
Analyse Interpret and conclude
RasMol
Analyse
PUB Publish results
Consurf
PUB
12
SwissModel ... first approach mode
http//www.expasy.org/swissmod
13
Uses of structure revisited - I
  • Prototype 1 Analytical
  • Explain mechanistic aspects of protein.
  • (e.g. in terms of)
  • residues involved in catalysis
  • global properties (like electrostatics)
  • shape, relative orientation and distances of
    domains or subdomains
  • flexibility and dynamics - e.g. hypothesizing
    about the rate limiting step

14
Uses of structure revisited - II
  • Prototype 2 Comparative
  • Bring conservation patterns into a spatial
    context in order to infer causality from
    (database) correlations.
  • (e.g. in terms of)
  • describing context specific conservation patterns
    and anlyizing these according to conserved
    properties
  • analyizing the predicted effect of sequence
    variation (e.g. for engineering changes, fusing
    domains or predicting SNP effects)
  • distinguish physiological vs. nonphysiological
    interactions

15
Item 2
Multiple Sequence Alignment Revisited
16
Current State of the Art ProbCons
ProbCons is a novel tool for generating multiple
alignments of protein sequences. Using a
combination of probabilistic modeling and
consistency-based alignment techniques, ProbCons
has achieved the highest accuracies of all
alignment methods to date. On the BAliBASE
benchmark alignment database, alignments produced
by ProbCons show statistically significant
improvement over current programs, containing an
average of 7 more correctly aligned columns than
those of T-Coffee, 11 more correctly aligned
columns than those of CLUSTAL W, and 14 more
correctly aligned columns than those of DIALIGN.
http//probcons.stanford.edu
Do, C.B., Mahabhashyam, M.S.P., Brudno, M., and
Batzoglou, S. 2005. PROBCONS Probabilistic
Consistency-based Multiple Sequence
Alignment. Genome Research 15 330-340.
17
Item 3
Modeling Goals Revisited
18
A homology model is ...
  • A 3-D map that integrates information on
  • evolutionary conservation of structures
  • a protein sequence
  • principles of protein structure

Always ask where does the information come from
... how reliable is it.
19
What is a homology model useful for ?
Goal Biochemical inference from 3D similarity
  • Bonds
  • Angles, plain and dihedral
  • Surfaces, solvent accessibility
  • Amino acid functions, presence in structure
    patterns
  • Spatial relationship of residues to active site
  • Spatial relationship to other residues
  • Participation in function / mechanism
  • Static and dynamic disorder
  • Electrostatics
  • Conservation patterns (structural and functional)
  • Plausibility of posttranslational modification
    sites
  • Suitability as drug target

Unreliable
Primary use
Educated guesswork
... but you can't predict the structural
consequences of posttranslational modifications!
20
Abuse of homology models
  • Modelling properties that cannot / will not be
    verified
  • Analysing geometry of model
  • Interpreting loop structures near indels
  • Inferring relative domain arrangement
  • Inferring structures of complexes

Homology models map information from a sequence
alignment into 3D space. They cannot be used to
"predict structure".
21
Databases of Models
  • Dont make models unless you check first...
  • Swiss-Model repository
  • 64,000 models based on 4000 structures and
    Swiss-Prot proteins
  • ModBase
  • Made with "Modeller" - 15,000 reliable models for
    substantial segments of approximately 4,000
    proteins in the genomes of Saccharomyces
    cerevisiae, Mycoplasma genitalium, Methanococcus
    jannaschii, Caenorhabditis elegans, and
    Escherichia coli.
  • 3D crunch
  • 1998 large scale modeling experiment

22
http//modbase.compbio.ucsf.edu/modbase-cgi-new/in
dex.cgi
23
http//swissmodel.expasy.org/repository/
24
http//www.expasy.ch/swissmod/SM_3DCrunch.html
25
Example Interpreting peptide scans
Peptides. 2005 26(3)395-404. Identification
of immunodominant regions of Brassica juncea
glyoxalase I as potential antitumor
immunomodulation targets. Deswal R, Singh R, Lynn
AM, Frank R.
Goals Validate exposed properties of
immuno-reactive peptides identified by peptide
scanning.
Methods Simple threading of sequence on human
homologue
26
Example Comparative Drug Design
Caffrey CR, Placha L, Barinka C, Hradilek M,
Dostal J, Sajid M, McKerrow JH, Majer P,
Konvalinka J, Vondrasek J. Homology modeling and
SAR analysis of Schistosoma japonicum cathepsin D
(SjCD) with statin inhibitors identify a unique
active site steric barrier with potential for the
design of specific inhibitors. Biol Chem. 2005
Apr386(4)339-49.
Goals Compare active sites to obtain hints for
drug design
Methods Homology modeling of s.japonicum
sequence on human structure with a commercial
package (Insight, accelrys) extensive energy
minimization.
Comments Questionable results. Inappropriate
method, and residues that are identified actually
appear conserved (F/F, M/I).
27
Example Inferring complexes (I)
High-quality homology models derived from NMR and
X-ray structures of E. coli proteins YgdK and Suf
E suggest that all members of the YgdK/Suf E
protein family are enhancers of cysteine
desulfurases. Protein Sci. 2005
Jun14(6)1597-608. Liu G, Li Z, Chiang Y, Acton
T, Montelione GT, Murray D, Szyperski T. The
structural biology of proteins mediating
iron-sulfur (Fe-S) cluster assembly is central
for understanding several important biological
processes. Here we present the NMR structure of
the 16-kDa protein YgdK from Escherichia coli,
which shares 35 sequence identity with the E.
coli protein SufE. The SufE X-ray crystal
structure was solved in parallel with the YdgK
NMR structure in the Northeast Structural
Genomics (NESG) consortium. Both proteins are (1)
key components for Fe-S metabolism, (2) exhibit
the same distinct fold, and (3) belong to a
family of at least 70 prokaryotic and eukaryotic
sequence homologs. Accurate homology models were
calculated for the YgdK/SufE family based on YgdK
NMR and SufE crystal structure. Both structural
templates contributed equally, exemplifying
synergy of NMR and X-ray crystallography. SufE
acts as an enhancer of the cysteine desulfurase
activity of SufS by SufE-SufS complex formation.
A homology model of CsdA, a desulfurase encoded
in the same operon as YgdK, was modeled using the
X-ray structure of SufS as a template. Protein
surface and electrostatic complementarities
strongly suggest that YgdK and CsdA likewise form
a functional two-component desulfurase complex.
Moreover, structural features of YgdK and SufS,
which can be linked to their interaction with
desulfurases, are conserved in all homology
models. It thus appears very likely that all
members of the YgdK/SufE family act as enhancers
of Suf-S-like desulfurases. The present study
exemplifies that "refined" selection of two (or
more) targets enables high-quality homology
modeling of large protein families.
28
Example Inferring complexes (II)
Methods "Nest" http//honiglab.cpmc.columbia.edu/
Comments Structural similarity of models is NOT
a sign of accurracy!
29
Example Annotation of Function
Saunders NF, Goodchild A, Raftery M, Guilhaus M,
Curmi PM, Cavicchioli R. Predicted roles for
hypothetical proteins in the low-temperature
expressed proteome of the Antarctic archaeon
Methanococcoides burtonii. J Proteome Res.
2005 4(2)464-72.
Goals Derive functional annotation
Methods InterProScan, Prospect, prediction of
subcellular localization (secretion),
visualization of conserved genomic context
Comments Difficult challenge (archaeon!). Well
done, state-of-the art analysis gives functional
informartion for 55/135 novel proteins. (see
also http//psychro.bioinformatics.unsw.edu.au/)
30
Synopsis of Goals
Valid goals use 3-D models as a map of
information on conservation, such as spatial
proximity and surface exposure of
residues. Poorly stated goals attempt to
interpret details of geometry.
31
Item 4
Modeling Methods Revisited
32
Homology Modeling Software?
  • Freely available packages perform as good as
    commercial ones at CASP (Critical Assessment of
    Structure Prediction)
  • Swiss Model (see February's Integrated
    Assignment)
  • Modeller (http//guitar.rockefeller.edu)
  • others ...

33
Swissmodel in comparison
3D-Crunch Experiment 211,000 sequences ? 64,000
models gt50 seqID ? 1 Å RMSD 40-49
seqID ? 63 lt 3Å 25-29 seqID ? 49 lt 4Å
Manual alternatives Modeller ... Automatic
alternatives SwissModel sdsc1 3djigsaw
pcomb_pcons cphmodels easypred
First place for RMSD and correct
aligned, Second place for coverage
Guex et al. (1999) TIBS 24365-367 EVA Eyrich et
al. (2001) Bioinformatics 171242-1243
(http//cubic.bioc.columbia.edu/eva)
34
Comparison of Approaches
Wallner B, Elofsson A. All are not equal a
benchmark of different homology modeling
programs. Protein Sci. 2005 14(5)1315-27.
35
Item 5
Modeling Problems
36
Homology Modeling in Practice
How to assess model reliability ? - All indels
are wrong - Structure analysis ("threading",
"solvent accessibility", compatibility with
ligands) can point out possible alignment
errors - But no point in "repairing"
stereochemistry, only review alignment.
37
Homology Modeling in Practice
Can you predict function from your model ? No
(and yes) - the model may be incompatible with a
specific function.
38
Homology Modeling in Practice
Evaluation of errors We found that 'through
space' proximity to gaps and chain termini, local
three-dimensional 'density', three-dimensional
environment conservation, and B-factor of the
template contribute to local deviations in the
backbone in addition to local sequence identity.
Comput Chem. 2000 24(1)13-31. Estimating local
backbone structural deviation in homology models.
Cardozo T, Batalov S, Abagyan R.
39
Can energy minimization correct errors ?
40
Energy Minimization (slides by David Wishart)
41
Energy Minimization
  • Efficient way of polishing and shining your
    protein model
  • Removes atomic overlaps and unnatural strains in
    the structure
  • Stabilizes or reinforces strong hydrogen bonds,
    breaks weak ones
  • Brings protein to lowest energy in about 1-2
    minutes CPU time

42
Energy Minimization (Theory)
  • Treat Protein molecule as a set of balls (with
    mass) connected by rigid rods and springs
  • Rods and springs have empirically determined
    force constants
  • Allows one to treat atomic-scale motions in
    proteins as classical physics problems (OK
    approximation)

43
Standard Energy Function
E
Kr(ri - rj)2 Kq(qi - qj)2 Kf(1-cos(nfj))2
qiqj/4perij Aij/r6 - Bij/r12 Cij/r10 -
Dij/r12
Bond length Bond bending Bond torsion Coulomb van
der Waals H-bond
44
Energy Terms
r
f
q
Kr(ri - rj)2
Kq(qi - qj)2
Kf(1-cos(nfj))2
Stretching Bending
Torsional
45
Energy Terms
r
r
r
qiqj/4perij
Aij/r6 - Bij/r12
Cij/r10 - Dij/r12
Coulomb van der Waals H-bond
46
An Energy Surface
High Energy
Low Energy
Overhead View Side View
47
A More Realistic Protein Energy Surface
The Folding Funnel
48
Minimization Methods
  • Energy surfaces for proteins are complex
    hyperdimensional spaces
  • Biggest problem is overcoming local minimum
    problem
  • Simple methods (slow) to complex methods (fast)
  • Monte Carlo Method
  • Steepest Descent
  • Conjugate Gradient

49
Monte Carlo Algorithm
  • Generate a conformation or alignment (a state)
  • Calculate that states energy or score
  • If that states energy is less than the previous
    state accept that state and go back to step 1
  • If that states energy is greater than the
    previous state accept it if a randomly chosen
    number is lt e-E/kT where E is the state energy
    otherwise reject it
  • Go back to step 1 and repeat until done

50
Conformational Sampling
Mid-energy lower energy lowest energy
highest energy
51
Monte Carlo Minimization
High Energy
Low Energy
Performs a progressive or directed random search
52
Steepest Descent Conjugate Gradients
  • Frequently used for energy minimization of large
    (and small) molecules
  • Ideal for calculating minima for complex (I.e.
    non-linear) surfaces or functions
  • Both use derivatives to calculate the slope and
    direction of the optimization path
  • Both require that the scoring or energy function
    be differentiable (smooth)

53
Steepest Descent Minimization
High Energy
Low Energy
Makes small locally steep moves down gradient
54
Conjugate Gradient Minimization
High Energy
Low Energy
Includes information about the prior history of
path
55
Energy Minimization (end of slides by David
Wishart)
  • Very complex programs that have taken years to
    develop and refine
  • Several freeware options to choose
  • XPLOR (Axel Brunger, Yale)
  • GROMACS (Gronnigen, The Netherlands)
  • AMBER (Peter Kollman, UCSF)
  • CHARMM (Martin Karplus, Harvard)
  • TINKER (Jay Ponder, Wash U))

56
However(CASP5 (2002) - State of the art in
Homology modeling)
The good
The ugly
better
worse than template
shocking!
Coordinate manipulations do not improve accuracy !
Remote sequence similarity detection methods have
improved.
Tramontano A Morea V (2003) Assessment of
homology based predictions in CASP5 Proteins
S6352-368
57
Can energy minimization correct errors ?
Apparently not errors are avoided by better
alignment, judicious choice of templates and
careful interpretation, considering the
limitations of the method.
Write a Comment
User Comments (0)
About PowerShow.com