Lab 9'3: Homology Modeling - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Lab 9'3: Homology Modeling

Description:

QRS E. coli vs. ERS P. putida: ~ 19% ID ... vs. sequence alignment between E. coli GlnRS and G. stearothermophilus TyrRS. ... (e.g. in terms of) ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 44
Provided by: stephe78
Category:
Tags: coli | homology | lab | modeling

less

Transcript and Presenter's Notes

Title: Lab 9'3: Homology Modeling


1
Lab 9.3Homology Modeling
  • Boris Steipe
  • boris.steipe_at_utoronto.ca
    http//biochemistry.utoronto.ca/steipe
  • Departments of Biochemistry and Molecular and
    Medical Genetics
  • Program in Proteomics and Bioinformatics
  • University of Toronto

2
http//creativecommons.org/licenses/by-sa/2.0/
3
Concepts
  • Sequence alignment is the single most important
    step in homology modeling.
  • Reasons to model need to be defined.
  • Fully automated homology modeling services
    perform well.
  • SwissModel in practice.

4
Concept 1
  • Sequence alignment is the single most important
    step in homology modeling.

5
What is conserved in structure?
E-E.coli ... IKTRFAPSPTGYLHVGGARTA ...
EQMAKGE----KPRYDGRC ... AHVSMINGDDGKKLSKRH E-P.p
utida ... VRTRIAPSPTGDPHVGTAYIA ...
EQQARGE----TPRYDGRA ... CYMPLLRNPDKSKLSKRK Q-E.c
oli ... VHTRFPPEPNGYLHIGHAKSI ...
TLTQPGKNSPYRDRSVEEN ... YEFSRL-NLEYTVMSKRK Q-Fly
... VHTRFPPEPNGILHIGHAKAI ...
FNPKPS---PWRERPIEES ... WEYGRL-NMNYALVSKRK Q-Hum
an ... VRTRFPPEPNGILHIGHAKAI ...
HNTLPS---PWRDRPMEES ... WEYGRL-NLHYAVVSKRK E-Fly
... VVVRFPPEASGYLHIGHAKAA ...
QRVE----SANRSNSVEKN ... WSYSRL-NMTNTVLSKRK E-Hum
an ... VTVRFPPEASGYLHIGHAKAA ...
QRIE----SKHRKNPIEKN ... WEYSRL-NLNNTVLSKRK E-Yea
st ... VVTRFPPEPSGYLHIGHAKAA ...
DGVA----SARRDRSVEEN ... WDFARI-NFVRTLLSKRK ATP-B
inding

QRS E. coli vs. ERS P. putida 19 ID
Many regions are expected to be highly conserved
in structure.
Some changes should be straightforward to model.
6
What is conserved in structure?
E-E.coli ... IKTRFAPSPTGYLHVGGARTA ...
EQMAKGE----KPRYDGRC ... AHVSMINGDDGKKLSKRH E-P.p
utida ... VRTRIAPSPTGDPHVGTAYIA ...
EQQARGE----TPRYDGRA ... CYMPLLRNPDKSKLSKRK Q-E.c
oli ... VHTRFPPEPNGYLHIGHAKSI ...
TLTQPGKNSPYRDRSVEEN ... YEFSRL-NLEYTVMSKRK Q-Fly
... VHTRFPPEPNGILHIGHAKAI ...
FNPKPS---PWRERPIEES ... WEYGRL-NMNYALVSKRK Q-Hum
an ... VRTRFPPEPNGILHIGHAKAI ...
HNTLPS---PWRDRPMEES ... WEYGRL-NLHYAVVSKRK E-Fly
... VVVRFPPEASGYLHIGHAKAA ...
QRVE----SANRSNSVEKN ... WSYSRL-NMTNTVLSKRK E-Hum
an ... VTVRFPPEASGYLHIGHAKAA ...
QRIE----SKHRKNPIEKN ... WEYSRL-NLNNTVLSKRK E-Yea
st ... VVTRFPPEPSGYLHIGHAKAA ...
DGVA----SARRDRSVEEN ... WDFARI-NFVRTLLSKRK ATP-B
inding

How would sidechain rotamers be modeled?
- conserved dihedral angles - preferred
rotamers - DEE (Dead End Elimination theorem) for
global consistency.
7
Homology Modeling Issues
E-E.coli ... IKTRFAPSPTGYLHVGGARTA ...
EQMAKGE----KPRYDGRC ... AHVSMINGDDGKKLSKRH E-P.p
utida ... VRTRIAPSPTGDPHVGTAYIA ...
EQQARGE----TPRYDGRA ... CYMPLLRNPDKSKLSKRK Q-E.c
oli ... VHTRFPPEPNGYLHIGHAKSI ...
TLTQPGKNSPYRDRSVEEN ... YEFSRL-NLEYTVMSKRK Q-Fly
... VHTRFPPEPNGILHIGHAKAI ...
FNPKPS---PWRERPIEES ... WEYGRL-NMNYALVSKRK Q-Hum
an ... VRTRFPPEPNGILHIGHAKAI ...
HNTLPS---PWRDRPMEES ... WEYGRL-NLHYAVVSKRK E-Fly
... VVVRFPPEASGYLHIGHAKAA ...
QRVE----SANRSNSVEKN ... WSYSRL-NMTNTVLSKRK E-Hum
an ... VTVRFPPEASGYLHIGHAKAA ...
QRIE----SKHRKNPIEKN ... WEYSRL-NLNNTVLSKRK E-Yea
st ... VVTRFPPEPSGYLHIGHAKAA ...
DGVA----SARRDRSVEEN ... WDFARI-NFVRTLLSKRK ATP-B
inding

How would you (or should you even) model indels?
- Where should the insertion be placed? - What is
the conformation of the new residues? - Which
residues should be deleted? - How many additional
residues need to change conformation?
8
Alignment is the limiting step for homology model
accuracy
No amount of forcefield minimization will put a
misaligned residue in the right place !
HOMSTRAD _at_ CASP4 Williams MG et al. (2001)
Proteins Suppl.5 92-97
9
Superposition vs. Alignment
  • The coordinates of two proteins can be
    superimposed in space.
  • An alignment may be derived from a superposition
    by correlating residues that are close in space.
  • An optimal sequence alignment may lead to a
    different alignment ...

1GTR vs 2TS1
10
Superposition vs. Alignment
TyrRS ERVTLYCGFDPTAdS--LHIGHLATILTMRRFQQAGHRPIA
LVGGAtgligdpsgkkser
1GTR
26 TTVHTRFPPEPNG-YLHIGHAKSICL--NF---------------
GIAqDYKGQCN--
2TS1 29
ERVTLYCGFDPTAdSLHIGHLATILT--MR---------------RFQ-Q
AGHRPI-- TyrRS tlnaketVEAWSARIKEQLgrfldfeadgn
pa----------------k--------IKN

1GTR 26 ----------------------LRFD-DTnpv-----
-----------keDIEYVESIKN
2TS1
29 ----------------------ALVG-GAtgligdpsgkksertlna
ketVEAWSARIKE TyrRS NYDWIgpldvitflrdvgk----hf
svnymmakesvqsrietgisftefsYMMLQAYDFL

1GTR 26 DVewl------------gf----hwsgnVRYSSD-
--------------------YFdql

2TS1 29 QLgrf------------ldfeadgnpakIKNNYD------
---------------WIgpl TyrRS
RLYetegCRLQIGGSDQwgnitaGL--------ELIRKTKgearAFGLTI
PLV
1GTR 26
hayaie-------------linkglayvdeltpeqireyrgtltqpgkns
pyrdrsveen
2TS1 29
dvitfl-------------rdvgkhfsvnym-------------------
---------- TyrRS
1GTR
26 lalfekmraggfeegkaclrakidmaspfivmrdpvlyrikfaehh
qtgnkwciypmYDF
2TS1 29
-------------------------------------makesvqsrietg
isftefsYMM TyrRS 1GTR 26
THCISDALEG----ITHSLCTLEFqdnrrlYDWVLDNITipvhPRQYEFS
RL 262
2TS1 29
LQAYDFLRLYetegCRLQIGGSDQwgnitaGLELIRKTKgearAFGLTIP
LV 223
  • Example structural vs. sequence alignment
    between E. coli GlnRS and G. stearothermophilus
    TyrRS.
  • Although the optimal sequence alignment is not
    unreasonable (19 ID 40/212 residues),
    comparison with the structure shows it is
    actually wrong for all but 11 residues ! The
    structure based alignment is quite dissimilar in
    sequence ( 4.5ID 12/265 residues) but the
    superposition actually matches 39 of residues
    ( 104/265 ) over the length of the domain.

11
Inserts may be accomodated in a distant part of
the structure
Example - a five residue insert
  • Sequence aligment (shows what happened)
  • gktlit nfsqehip
  • gktlisflyeqnfsqehip
  • Structure alignment (shows how it's accomodated)
  • gktlitnfsq ehip
  • gktlisflyeqnfsqehip

a-helix
12
Off by 1, Off by 4
3.8Å
  • A shift in alignment of 1 residue corresponds to
    a skew in the modeled structure of about 4 Å (3.8
    Å is the inter-alpha carbon distance)
  • Nothing you can do AFTER an alignment will fix
    this error (not even molecular dynamics).

13
Indels (inserts or deletions)
  • Observations of known similarities in structures
    demonstrate that uniform gap penalty assumptions
    are NOT BIOLOGICAL.
  • Indels are most often observed in loops, less
    often in secondary structure elements
  • When they do not occur in loops, there is usually
    a maintenance of helical or strand properties.

14
Can we do better with the gap assumption?
  • Required position specific gap penalties
  • One approach implemented in Clustal as secondary
    structure masks
  • Get secondary structure information, convert it
    to Clustal mask format. (Easy - read
    documentation !)

15
Secondary structure from PDB .... (Algorithm ?)
16
Secondary structure from RasMol .... (DSSP !)
17
Concept 2
  • Reasons to model need to be defined.

18
Use of homology models
Biochemical inference from 3D similarity
  • Bonds
  • Angles, plain and dihedral
  • Surfaces, solvent accessibility
  • Amino acid functions, presence in structure
    patterns
  • Spatial relationship of residues to active site
  • Spatial relationship to other residues
  • Participation in function / mechanism
  • Static and dynamic disorder
  • Electrostatics
  • Conservation patterns (structural and functional)
  • Posttranslational modification sites (but not
    structural consequences!)
  • Suitability as drug target

Don't !
19
Abuse of homology models
  • Modelling properties that cannot / will not be
    verified
  • Analysing geometry of model
  • Interpreting loop structures near indels
  • Inferring relative domain arrangement
  • Inferring structures of complexes

20
Databases of Models
  • Dont make models unless you check first...
  • Swiss-Model repository
  • 64,000 models based on 4000 structures and
    Swiss-Prot proteins
  • ModBase
  • Made with "Modeller" - 15,000 reliable models for
    substantial segments of approximately 4,000
    proteins in the genomes of Saccharomyces
    cerevisiae, Mycoplasma genitalium, Methanococcus
    jannaschii, Caenorhabditis elegans, and
    Escherichia coli.

21
Concept 3
  • Fully automated services perform well.

22
Homology Modeling Process
TAR
PSI-BLAST
Search
nr (PDB)
These are really two queries rolled into one
procedure.
TAR Target sequence
T-Coffee
Align
Search Sequence database similarity search
Cinema
nr non-redundant Genbank subset, (with annotated
structures)
MSA
HOM Homologous sequences
SwissModel
Model
ExPDB
TEM Sequences of homologues with known structure
LIG
Align Careful Multiple Sequence Alignment
3D
MSA Multiple Sequence Alignment
Model Generate 3D Model
TextEditor
Complete
ExPDB Modeling template structure database
3DC
Complete Add ligands, substrates etc. to model
Analyse Interpret and conclude
RasMol
Analyse
PUB Publish results
Consurf
PUB
23
Homology Modeling Software?
  • Freely available packages perform as good as
    commercial ones at CASP (Critical Assessment of
    Structure Prediction)
  • Swiss Model (see your Integrated Assignment)
  • Modeller (http//guitar.rockefeller.edu)

24
Swiss-Model steps
  • Search for sequence similarities

BLASTP against EX-NRL 3D
Peitsch M Guex N (1997) Electrophoresis 18 2714
25
Swiss-Model steps
Identity gt 25 Expected model gt 20 resid.
  • Search for sequence similarities
  • Evaluate suitable templates

Peitsch M Guex N (1997) Electrophoresis 18 2714
26
Swiss-Model steps
  • Search for sequence similarities
  • Evaluate suitable templates
  • Generate structural alignments

Select regions of similarity and match in
coordinate-space (EXPDB).
Peitsch M Guex N (1997) Electrophoresis 18 2714
27
Swiss-Model steps
  • Search for sequence similarities
  • Evaluate suitable templates
  • Generate structural alignments
  • Average backbones

Compute weighted average coordinates for backbone
atoms expected to be in model.
Peitsch M Guex N (1997) Electrophoresis 18 2714
28
Swiss-Model steps
  • Search for sequence similarities
  • Evaluate suitable templates
  • Generate structural alignments
  • Average backbones
  • Build loops
  • Pick plausible loops from library, ligate to
    stems if not possible, try combinatorial search.

Peitsch M Guex N (1997) Electrophoresis 18 2714
29
Swiss-Model steps
  • Search for sequence similarities
  • Evaluate suitable templates
  • Generate structural alignments
  • Average backbones
  • Build loops
  • Bridge incomplete backbones

Bridge with overlapping pieces from pentapeptide
fragment library, anchor with the terminal
residues and add the three central residues.
Peitsch M Guex N (1997) Electrophoresis 18 2714
30
Swiss-Model steps
  • Search for sequence similarities
  • Evaluate suitable templates
  • Generate structural alignments
  • Average backbones
  • Build loops
  • Bridge incomplete backbones
  • Rebuild sidechains

Rebuild sidechains from rotamer library -
complete sidechains first, then regenerate
partial sidechains from probabilistic approach.

Peitsch M Guex N (1997) Electrophoresis 18 2714
31
Swiss-Model steps
  • Search for sequence similarities
  • Evaluate suitable templates
  • Generate structural alignments
  • Average backbones
  • Build loops
  • Bridge incomplete backbones
  • Rebuild sidechains
  • Energy minimize

Gromos 96 - Energy minimization
Peitsch M Guex N (1997) Electrophoresis 18 2714
32
Swiss-Model steps
  • Search for sequence similarities
  • Evaluate suitable templates
  • Generate structural alignments
  • Average backbones
  • Build loops
  • Bridge incomplete backbones
  • Rebuild sidechains
  • Energy minimize
  • Write Alignment and PDB file

e-mail results
Peitsch M Guex N (1997) Electrophoresis 18 2714
33
CASP5 (2002) - Homology
worse than template
better
shocking!
RMSD(target,template) RMSD(target, model), Å
Remote sequence similarity detection methods have
improved.
Coordinate manipulations do not improve accuracy.
Tramontano A Morea V (2003) Assessment of
homology based predictions in CASP5 Proteins
S6352-368
34
Swissmodel in comparison
3D-Crunch 211,000 sequences -gt 64,000
models Controls gt50 ID 1 Å RMSD 40-49 ID
63 lt 3Å 25-29 ID 49 lt 4Å
Manual alternatives Modeller ... Automatic
alternatives SwissModel sdsc1 3djigsaw
pcomb_pcons cphmodels easypred
1 for RMSD and correct aligned, 2 for
coverage
Guex et al. (1999) TIBS 24365-367 EVA Eyrich et
al. (2001) Bioinformatics 171242-1243
(http//cubic.bioc.columbia.edu/eva)
35
Concept 4
  • SwissModel in practice.

36
SwissModel ... first approach mode
http//www.expasy.org/swissmod
37
... enter the ExPDB template ID...
38
... run in Normal Mode (Except if defining a
DeepView project )...
39
... successful submission.
Results come by e-mail.
40
Homology Modeling in Practice
How to assess model reliability ? - All indels
are wrong - Structure analysis ("threading",
"solvent accessibility", compatibility with
ligands) can point out possible alignment
errors - But no point in "repairing"
stereochemistry, only review alignment.
41
Homology Modeling in Practice
Can you predict function from your model ? No
(and yes) - the model may be incompatible with a
specific function.
42
Uses of structure revisited - I
  • Prototype 1 Analytical
  • Explain mechanistic aspects of protein.
  • (e.g. in terms of)
  • residues involved in catalysis
  • global properties (like electrostatics)
  • shape, relative orientation and distances of
    domains or subdomains
  • flexibility and dynamics - e.g. hypothesizing
    about the rate limiting step

43
Uses of structure revisited - II
  • Prototype 2 Comparative
  • Bring conservation patterns into a spatial
    context in order to infer causality from
    (database) correlations.
  • (e.g. in terms of)
  • describing context specific conservation patterns
    and anlyizing these according to conserved
    properties
  • analyzing the predicted effect of sequence
    variation (e.g. for engineering changes, fusing
    domains or predicting SNP effects)
  • distinguish physiological vs. nonphysiological
    interactions
Write a Comment
User Comments (0)
About PowerShow.com