Homology Modeling: Principles, tools and techniques - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Homology Modeling: Principles, tools and techniques

Description:

... and include the distance, angle and dihedral values ... Stereochemical bond, angle, dihedral, improper. Homology mainchain and sidechain dihedrals ... – PowerPoint PPT presentation

Number of Views:252
Avg rating:3.0/5.0
Slides: 35
Provided by: SUPA8
Category:

less

Transcript and Presenter's Notes

Title: Homology Modeling: Principles, tools and techniques


1
Homology ModelingPrinciples, tools and
techniques
  • Supa Hannongbua
  • Department of Chemistry,
  • Faculty of Science,
  • Kasetsart University, Bangkok 10900, THAILAND
  • fscisph_at_ku.ac.th http//kuchem.sci.ku.ac.th

Supa Hannongbua
2
Introduction
  • Development of molecular biology rapid
    identification, isolation and sequencing of
    genes.
  • Problem time-consuming task to obtain the
    3D-structure of proteins.
  • Alternative strategy in structural biology is to
    develop models of protein when the contraints
    from X-ray diffraction or NMR are not yet
    available.
  • Homology modeling is the method that can be
    applied to generate reasonable models of protein
    structure.

3
What is Homology Modeling?
  • Predicts the three-dimensional structure of a
    given protein sequence (TARGET) based on an
    alignment to one or more known protein structures
    (TEMPLATES)
  • If similarity between the TARGET sequence and the
    TEMPLATE sequence is detected, structural
    similarity can be assumed.
  • In general, 30 sequence identity is required for
    generating useful models.

4
Structural Prediction by Homology Modeling
Structural Databases
SeqFold,Profiles-3D, PSI-BLAST, BLAST FASTA
Reference Proteins
Ca Matrix Matching
Conserved Regions
Protein Sequence
Sequence Alignment Coordinate Assignment
Predicted Conserved Regions
Loop Searching/generation
MODELER
Initial Model
Structure Analysis
Sidechain Rotamers and/or MM/MD
WHAT IF, PROCHECK, PROSAII,..
Refined Model
5
How good can homology modeling be?
  • Sequence Identity
  • 60-100 Comparable to medium resolution NMR
  • Substrate Specificity
  • 30-60 Molecular replacement in crystallography
  • Support site-directed mutagenesis
  • through visualization
  • lt30 Serious errors

6
Significance of Protein Structure
  • What does a structure offer in the way of
    biological knowledge?
  • Location of mutants and conserved residues
  • Ligand and functional sites
  • Clefts/Cavities
  • Evolutionary Relationships
  • Mechanisms

7
The importance of the sequence alignment
  • the quality of the sequence alignment is of
    crucial importance
  • Misplaced gaps, representing insertions or
    deletions, will cause residues to be misplaced in
    space
  • Careful inspection and adjustment on Automatic
    alignment may improve the quality of the modeling.

8
Programs for Model Protein Construction
  • MODELLER 4.0
  • guitar.rockefeller.edu/modeller/modeller.html
  • SWISS-MOD Server
  • www.expasy.ch/swissmod/SWISS-MODEL.html
  • SCWRL (SideChain placement With Rotamer Library)
  • www.fccc.edu/research/labs/dunbrack/scwrl/

9
Protein Structural Databases
  • Templates can be found using the TARGET sequence
    as a query for searching using FASTA or BLAST
  • PDB (http//www.rcsb.org/pdb)
  • MODELLER (http//guitar.rockefeller.edu/modeller/m
    odeller.html)
  • ModBase (http//pipe.rockefeller.edu/modbase/gener
    al-info.html)
  • 3DCrunch (http//www.expasy.ch/swissmod/SM_3DCrunc
    h.html)

10
Gaining confidence in template searching
  • Once a suitable template is found, it is a good
    idea to do a literature search (PubMed) on the
    relevant fold to determine what biological
    role(s) it plays.
  • Does this match the biological/biochemical
    function that you expect?

11
Other factors to consider in selecting templates
  • Template environment
  • pH
  • Ligands present?
  • Resolution of the templates
  • Family of proteins
  • Phylogenetic tree construction can help find the
    subfamily closest to the target sequence
  • Multiple templates?

12
Target-Template Alignment
  • No current comparative modeling method can
    recover from an incorrect alignment
  • Use multiple sequence alignments as initial
    guide.
  • Consider slightly alternative alignments in areas
    of uncertainty, build multiple models
  • Sequence-Structure alignment programs
  • Tries to put gaps in variable regions/loops
  • Note sequence from database versus sequence from
    the actual PDB are not always identical

13
Differences in multiple sequence alignments
  • Inserting gap at ends of helix versus in the
    middle
  • When gaps are placed at the ends of helices, all
    models from these alignments resulted in rmsd.
    versus actual of 1.3-1.8 Å.
  • In another helical region, placing them in the
    middle results in rmsd. of 2.0 Å versus less
    than 1.0 Å for correct alignment.

14
Differences in multiple sequence alignments
  • Inserting gaps into the middle of helices and
    misaligning
  • For residues 75-95, this caused a rmsd. between
    model and actual of 5-7 Å versus less than 1.0 Å
    for the correct alignment.
  • For residues 95-115, which include a random
    section, the rmsd. is 5.0 Å versus 1.5 Å for
    correct alignment.

15
Target-Multiple Template Alignment
  • Alignment is prepared by superimposing all
    template structures
  • Add target sequence to this alignment
  • Compare with multiple sequence alignment and
    adjust

16
Adjusting the alignment
  • Using tools such as Genedoc (www.psc.edu/biomed/ge
    nedoc) to view secondary structure along the
    alignment and use this information as criteria
    for adjustments
  • Avoid gaps in secondary structure elements
  • Use MEME to find a relatively large number of
    well conserved regions

17
Secondary Structure Prediction
  • The Predict Protein server
  • http//www.embl-heidelberg.de/predictprotein/
  • Adding secondary structure prediction algorithms
    can help make decisions on whether helices should
    be shortened/extended in areas of poor sequence
    identity.
  • PHD program, output can be read by Genedoc.

18
Constructing Multi-domain protein models
  • Building a multi-domain protein using templates
    corresponding to the individual domains
  • proteinA aaaaaaaaaaaaa---------------------
  • proteinB -----------------bbbbbbbbbbbbbbb
  • Target aaaaaaaaaaaaabbbbbbbbbbbbbbb

19
Multiple model approach
  • Reminder Consider the effects of different
    substitution matrices, different gap penalties,
    and different algorithms. (Vogt et al. J. Mol.
    Biol. 1995, 249816-831.)
  • Construct multiple models
  • Use structural analysis programs to determine
    best model

Jaroszewski, Pawlowski and Godsik, J. Molecular
Modeling, 1998, 4294-309 Venclovas, Ginalski and
Fidelis. PROTEINS, 1999, 373-80 (Suppl)
20
Model Building
  • Rigid-Body Assembly
  • Assembles a model from a small number of rigid
    bodies obtained from aligned protein structure
  • Implememted in COMPOSER
  • Segment Matching
  • Satisfaction of Spatial Restraints
  • MODELLER
  • guitar.rockefeller.edu/modeller/modeller.html

21
Initial model and procedures
  • Calculate coordinates for atoms that have
    equivalent atoms in the templates as an average
    over all templates
  • CHARMM internal coordinates are used for
    remaining unknown coordinates
  • Generate stereochemical and homology derived
    restraints

22
Spatial restraints ?
  • Minimizes the objective function, F, with respect
    to the Cartesian coordinates of the protein atoms
  • F(R) Sci (fi,pi)
  • R are the cartesian coordinates of the atoms
  • c is a restraint dependant on f,p
  • f is a geometric feature of a molecule and
    include the distance, angle and dihedral values
  • p are parameters to help describe some restraints

23
Homology and Sterochemical Restraints
  • Initial model is an average over all templates
  • Stereochemical bond, angle, dihedral, improper
  • Homology mainchain and sidechain dihedrals
  • mainchain CA-CA distances
  • sidechain-mainchain distances
  • sidechain-sidechain distances
  • Non-bonded pairs (on the fly)
  • user-defined

24
Sidechain Conformation
  • Protein sidechains play a key role in molecular
    recognition and packing of hydrophobic cores of
    globular proteins
  • Protein sidechain conformations tend to exist in
    a limited number of canonical shapes, usually
    called rotamers
  • Rotamer libraries can be constructed where only
    3-50 conformations are taken into account for
    each side chain

25
Coupling between mainchain and sidechain
  • Mainchain shifts (0.2 0.5 Å) cause increased
    sidechain coordinate errors (0.1 0.8 Å ),
    torsional errors of 10-30º and exaggerated strain
    energy for overpacked mutants compared with the
    correct mutant backbones.
  • Lee, C. Folding and Design, 1995, 11-12

26
Sidechains on surface of protein
  • Exposed sidechains on surface can be highly
    flexible without a single dominant conformation
  • So ultimately if these solvent exposed sidechains
    do not form binding interactions with other
    molecules or involved in say, a catalytic
    reaction, then accuracy may not be crucialalso
    look at the B-factors
  • Can refine the sidechains with molecular
    mechanics minimization
  • Sampling?
  • Scoring?

27
Clustering the ensemble
  • Cluster analysis, based on overall fold, followed
    by selection of the structure closest to the
    centroid of the largest cluster is likely to
    identify a structure more representative of the
    ensemble than the commonly used minimized average
    structure

NMRCLUST (http//neon.chem.le.ac.uk/nmrclust/prot
ocol.html)
28
Errors in Homology Modeling
  • a) Side chain packing b)Distortions and
    shifts c) no template

29
Errors in Homology Modeling
  • d) Misalignments e) incorrect
    template
  • Marti-Renom et al., Ann. Rev. Biophys. Biomol.
    Struct., 2000, 29291-325.

30
Detection of Errors
  • First check should include a stereochemical check
    on the modeled structurePROCHECK, WHATCHECK,
    DISTAN which will show deviations from normal
    bond lengths, dihedrals, etc.
  • Visualization follow the backbone trace and then
    subsequently move out to Ca-Cß orientation.

31
PROCHECK
http//www.biochem.ucl.ac.uk/roman/ procheck/proc
heck.html
32
Dihydrofolate Reductase (DHFR) multiple sequence
alignment
33
Dihydrofolate Reductase (DHFR) alignment
34
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com