Homology Modelling - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Homology Modelling

Description:

CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS TECHNICAL UNIVERSITY ... techniques to reduce time (from months to days) and cost (from $100.000 to $20.000/structure) ... – PowerPoint PPT presentation

Number of Views:615
Avg rating:3.0/5.0
Slides: 33
Provided by: thomasb71
Category:

less

Transcript and Presenter's Notes

Title: Homology Modelling


1
Homology Modelling
  • Thomas Blicher
  • Center for Biological Sequence Analysis

2
Why Do We Need Homology Modelling?
  • Ab Initio protein folding (random sampling)
  • 100 aa, 3 conf./residue gives approximately 1048
    different overall conformations!
  • Random sampling is NOT feasible, even if
    conformations can be sampled at picosecond (10-12
    sec) rates.
  • Levinthals paradox
  • Do homology modelling instead.

3
How Is It Possible?
  • The structure of a protein is uniquely determined
    by its amino acid sequence(but sequence is
    sometimes not enough)
  • prions
  • pH, ions, cofactors, chaperones
  • Structure is conserved much longer than sequence
    in evolution.
  • Structure gt Function gt Sequence

4
How Often Can We Do It?
  • There are currently 40000 structures in the PDB
    (but only 4000 if you include only ones that are
    not more than 30 identical and have a resolution
    better than 3.0 Å).
  • An estimated 25 of all sequences can be modeled
    and structural information can be obtained for
    50.

5
Worldwide Structural Genomics
  • Fold space coverage
  • Complete genomes
  • Signaling proteins
  • Improving technology
  • Disease-causing organisms
  • Model organisms
  • Membrane proteins
  • Protein-ligand interactions

6
Structural Genomics in North America
  • 10 year 600 million project initiated in 2000,
    funded largely by NIH.
  • AIM structural information on 10000 unique
    proteins (now 4-6000), so far 1000 have been
    determined.
  • Improve current techniques to reduce time (from
    months to days) and cost (from 100.000 to
    20.000/structure).
  • 9 research centers currently funded (2005),
    targets are from model and disease-causing
    organisms (a separate project on TB proteins).

7
Homology Modeling for Structural Genomics
Roberto Sánchez et al. Nature Structural Biology
7, 986 - 990 (2000)
8
How Well Can We Do It?
Sali, A. Kuriyan, J. Trends Biochem. Sci. 22,
M20M24 (1999) 
9
How Is It Done?
  • Identify template(s) initial alignment
  • Improve alignment
  • Backbone generation
  • Loop modelling
  • Side chains
  • Refinement
  • Validation ?

10
Template Identification
  • Search with sequence
  • Blast
  • Psi-Blast
  • Fold recognition methods
  • Use biological information
  • Functional annotation in databases
  • Active site/motifs

11
Alignment
12
1 2 3 4 5 6 7 8 9 10 11 12
13 14 PHE ASP ILE CYS ARG LEU PRO GLY SER ALA
GLU ALA VAL CYS PHE ASN VAL CYS ARG THR PRO ---
--- --- GLU ALA ILE CYS PHE ASN VAL CYS ARG ---
--- --- THR PRO GLU ALA ILE CYS
13
1 2 3 4 5 6 7 8 9 10 11 12
13 14 PHE ASP ILE CYS ARG LEU PRO GLY SER ALA
GLU ALA VAL CYS PHE ASN VAL CYS ARG THR PRO ---
--- --- GLU ALA ILE CYS PHE ASN VAL CYS ARG ---
--- --- THR PRO GLU ALA ILE CYS
14
Improving the Alignment
1 2 3 4 5 6 7 8 9 10 11 12
13 14 PHE ASP ILE CYS ARG LEU PRO GLY SER ALA
GLU ALA VAL CYS PHE ASN VAL CYS ARG THR PRO ---
--- --- GLU ALA ILE CYS PHE ASN VAL CYS ARG ---
--- --- THR PRO GLU ALA ILE CYS
From Professional Gambling by Gert Vriend
http//www.cmbi.kun.nl/gv/articles/text/gambling.
html
15
Template Quality
  • Selecting the best template is crucial!
  • The best template may not be the one with the
    highest id (best p-value)
  • Template 1 93 id, 3.5 Å resolution ?
  • Template 2 90 id, 1.5 Å resolution ?

16
The Importance of Resolution
4 Å
3 Å
2 Å
1 Å
17
Ramachandran Plot
  • Allowed backbone torsion angles in proteins

Amino acid residue
18
Template Quality Ramachandran Plot
X-ray structure good data.
19
Backbone Generation
  • Generate the backbone coordinates from the
    template for the aligned regions.
  • Several programs can do this, most of the groups
    at CASP6 use Modeller
  • http//salilab.org/modeller/modeller.html

20
Loop Modelling
  • Knowledge based
  • Searches PDB for fragments that match the
    sequence to be modelled (Levitt, Holm, Baker
    etc.).
  • Energy based
  • Uses an energy function to evaluate the quality
    of the loop and minimizes this function by Monte
    Carlo (sampling) or molecular dynamics (MD)
    techniques.
  • Combination

21
Loops the Rosetta Method
  • Find fragments (10 per amino acid) with the same
    sequence and secondary structure profile as the
    query sequence.
  • Combine them using a Monte Carlo scheme to build
    the loop.
  • David Baker et al.

22
Side Chains
  • Side chain rotamers are dependent on backbone
    conformation.
  • Most successful method in CASP6 was SCWRL by
    Dunbrack et al.
  • Graph-theory knowledge based method to solve the
    combinatorial problem of side chain modelling.
  • http//dunbrack.fccc.edu/SCWRL3.php

23
Side Chains
  • Prediction accuracy is high for buried residues,
    but much lower for surface residues
  • Experimental reasonsside chains at the surface
    are more flexible.
  • Theoretical reasonsmuch easier to handle
    hydrophobic packing in the core than the
    electrostatic interactions, including H-bonds to
    waters.

24
Side Chains
  • If the seq. id is high, the networks of side
    chain contacts may be conserved, and keeping the
    side chain rotamers from the template may be
    better than predicting new ones.

25
Refinement
  • Energy minimization
  • Molecular dynamics
  • Big errors like atom clashes can be removed, but
    force fields are not perfect and small errors
    will also be introduced keep minimization to a
    minimum or matters will only get worse.

26
Error Recovery
  • If errors are introduced in the model, they
    normally can NOT be recovered at a later step
  • The alignment can not make up for a bad choice of
    template.
  • Loop modeling can not make up for a poor
    alignment.
  • If errors are discovered, the step where they
    were introduced should be redone.

27
Validation
  • Most programs will get the bond lengths and
    angles right.
  • The Ramachandran plot of the model usually looks
    pretty much like the Ramachandran plot of the
    template (so select a high quality template).
  • Inside/outside distributions of polar and apolar
    residues can be useful.

28
Validation ProQ Server
  • ProQ is a neural network based predictor that
    based on a number of structural features predicts
    the quality of a protein model.
  • ProQ is optimized to find correct models in
    contrast to other methods which are optimized to
    find native structures.

Arne Elofssons group http//www.sbc.su.se/bjorn
/ProQ/
29
Structure Validation
  • ProCheck
  • http//www.biochem.ucl.ac.uk/roman/procheck/proch
    eck.html
  • WhatIf server
  • http//swift.cmbi.kun.nl/WIWWWI/

30
Homology Modelling Servers
  • Eva-CM performs continous and automated analysis
    of comparative protein structure modeling servers
  • A current list of the best performing servers can
    be found at
  • http//cubic.bioc.columbia.edu/eva/doc/intro_cm.ht
    ml

31
The Hardest Target in CASP6
  • Only 8 sequence id between target and template.

Dunbrack, Wang Jin (2004) CASP6 Fold
Recognition Assessment
32
Summary
  • Successful homology modelling depends on the
    following
  • Template quality
  • Alignment (add biological information)
  • Modelling program/procedure (use more than one)
  • Always validate your final model!
Write a Comment
User Comments (0)
About PowerShow.com