Multiple Mapping Method with Multiple Templates (M4T): optimizing sequence-to-structure alignments and combining unique information from multiple templates - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Multiple Mapping Method with Multiple Templates (M4T): optimizing sequence-to-structure alignments and combining unique information from multiple templates

Description:

Title: Multiple Mapping Method: A novel approach to the sequence-to-structure alignment problem in comparative protein structure modeling Author – PowerPoint PPT presentation

Number of Views:214
Avg rating:3.0/5.0
Slides: 24
Provided by: Braj155
Category:

less

Transcript and Presenter's Notes

Title: Multiple Mapping Method with Multiple Templates (M4T): optimizing sequence-to-structure alignments and combining unique information from multiple templates


1
Multiple Mapping Method with Multiple Templates
(M4T) optimizing sequence-to-structure
alignments and combining unique information from
multiple templates
  • András Fiser
  • Department of Biochemistry and
  • Seaver Center for Bioinformatics
  • Albert Einstein College of Medicine
  • Bronx, New York, USA

2
Comparative protein structure modeling
START
Template Search
Multiple Templates
Target Template Alignment
Multiple Mapping Method
Model Building
Loop, side chain modeling
Model Evaluation
Statistical potential
END
3
Why do we need sequence alignments?
Sequence vs. sequence Establishing residue
equivalencies between two proteins to locate
conserved/variable regions
Sequence vs. databases Querying sequence
databases
Sequence vs. structure To generate input
alignment for comparative modeling / threading
4
Ranking of models built on alternative alignments
Template 1a6m Target 1spg, chain B
21 sequence identity
Example
Template VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHP
ETLEKFDRFKHLKTEAEMKASEDLKKHGVTVLTALGAILKKK Target
CLW DWTDAERAAIKALWGKIDVGEIGP-QALSRLLIVYPWTQRHFK
GFGNISTNAAILGNAKVAEHGKTVMGGLDRAVQNM Target A2D
DWTDAERAAIKALWGKI-DVGEIGPQALSRLLIVYPWTQRHFKGFGNIS
TNAAILGNAKVAEHGKTVMGGLDRAVQNM Template
GHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRH-PGDFGADAQ
GAMNKALELFRKDIAAKYKELGY Target CLW
DNIKNVYKQLSIKHSEKIHVDPDNFRLLGEIITMCVGAKFGPSAFTPEIH
EAWQKFLAVVVSALGRQYH---- Target A2D
DNIKNVYKQLSIKHSEKIHVDPDNFRLLGEIITMCVGAKF-G---PSAFT
PEIHEAWQKFLAVVVSALGRQYH
Problem None of the currently available methods
produce consistently superior results in all cases
5
Alternative solutions vs. sequence similarity
Instead of relying on just one alignment method,
one should combine results of several alternative
techniques
6
Multiple Mapping Method
  • Idea
  • Improve the accuracy of sequence-to-structure
    alignment by optimally splicing alternative
    inputs.
  • Three components
  • - Sampling
  • - Algorithm
  • - Scoring function

7
MMM scoring function increasing the
dimensionality of input information
Template VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHP
ETLEKFDRFKHLKTEAEMKASEDLKKHGVTVLTALGAIL Target
CLW DWTDAERAAIKALWGKIDVGEIGP-QALSRLLIVYPWTQRHFK
GFGNISTNAAILGNAKVAEHGKTVMGGLDRAV Template
KKKGHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRH-PGDFGA
DAQGAMNKALELFRKDIAAKYKELGY Target CLW
QNMDNIKNVYKQLSIKHSEKIHVDPDNFRLLGEIITMCVGAKFGPSAFTP
EIHEAWQKFLAVVVSALGRQYH---- Template
VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLK
TEAEMKASEDLKKHGVTVLTALGAIL Target A2D
DWTDAERAAIKALWGKI-DVGEIGPQALSRLLIVYPWTQRHFKGFGNIS
TNAAILGNAKVAEHGKTVMGGLDRAV Template
KKKGHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRH-PGDFGA
DAQGAMNKALELFRKDIAAKYKELGY Target A2D
QNMDNIKNVYKQLSIKHSEKIHVDPDNFRLLGEIITMCVGAKF-G---PS
AFTPEIHEAWQKFLAVVVSALGRQYH
1
2
1
2
Different mapping identifies a different
environment for each residue to align Assess the
fitness of each mapping
8
Multiple Mapping Method Algorithm
Step 1 Identify variable regions from the
consensus alignment of the input set Step 2
Select the best scoring variable segments, and
combine them with with the core
region of the alignment.
Example
Template 1a6m Target 1spg, chain B
21 sequence id
Template VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHP
ETLEKFDRFKHLKTEAEMKASEDLKKHGVTVLTALGAILKKK Target
CLW DWTDAERAAIKALWGKIDVGEIGP-QALSRLLIVYPWTQRHFK
GFGNISTNAAILGNAKVAEHGKTVMGGLDRAVQNM Target A2D
DWTDAERAAIKALWGKI-DVGEIGPQALSRLLIVYPWTQRHFKGFGNIS
TNAAILGNAKVAEHGKTVMGGLDRAVQNM Template
GHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRH-PGDFGADAQ
GAMNKALELFRKDIAAKYKELGY Target CLW
DNIKNVYKQLSIKHSEKIHVDPDNFRLLGEIITMCVGAKFGPSAFTPEIH
EAWQKFLAVVVSALGRQYH---- Target A2D
DNIKNVYKQLSIKHSEKIHVDPDNFRLLGEIITMCVGAKF-G---PSAFT
PEIHEAWQKFLAVVVSALGRQYH
9
MMM example using ideal scoring function
Experimental ClustalW, RMSD 2.0 Å Align2D, RMSD
2.7 Å
CLUSTALW 2.6 Å ALIGN2D 6.1 Å
Template VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHP
ETLEKFDRFKHLKTEAEMKASEDLKKHGVTVLTALGAILKKK Target
MMM DWTDAERAAIKALWGKI-DVGEIGPQALSRLLIVYPWTQRHFK
GFGNISTNAAILGNAKVAEHGKTVMGGLDRAVQNM Template
GHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRH-PGDFGADAQ
GAMNKALELFRKDIAAKYKELGY Target MMM
DNIKNVYKQLSIKHSEKIHVDPDNFRLLGEIITMCVGAKFGPSAFTPEIH
EAWQKFLAVVVSALGRQYH----
Experimental MMM, RMSD 1.8 Å
10
Multiple Mapping Method scoring function (1)
  • A composite scoring function to assess the
    compatibility/fit of alternative variable
    segments in the template structural environment.
  • The composite scoring function consists of three
    mostly non-overlapping components.
  • Environment-specific substitution matrices
    (FUGUE1).
  • A scoring scheme based on a comparison (PHD vs.
    DSSP) of the secondary structure types (H3P22).
  • Statistically derived residue-residue contact
    energy (Rykunov and Fiser3).

1Shi et al. J. Mol. Biol. (2001) 310,
243-257 2Rice et al., J. Mol. Biol (1997) 267,
1026-1038 3Rykunov Fiser., Proteins. (2007)
67, 559-68
11
MMM performance on 1400 pairs
12
MMM performance on 87 pairs, meta-servers
ESypred3D Consensus
13
Sampling vs. Scoring
14
Summary
  • Multiple Mapping Method optimally combines
    alternative alignments obtained from different
    methods or scoring function
  • On a benchmark dataset of 6635 protein pair
    structural alignments, comparative models built
    using MMM alignments are approximately 0.3 ? and
    0.5 Å more accurate on average in the whole
    spectrum and in the lt30 target-template sequence
    identity regions, respectively, than the average
    accuracy of models built using the alternative
    input alignments ( 3 and 4 Å).

15
Optimally combining multiple templates
16
(No Transcript)
17
Selecting multiple templates
  • Target sequence by PSI-BLAST.
  • Hits selected if sequence overlap with the target
    is gt 60 of the actual SCOP domain length or
    more than 75 of the PDB chain length in case of
    a missing SCOP classification.
  • Iterative clustering procedure identifies the
    most suitable templates to combine. Templates are
    selected or discarded according to a hierarchical
    selection procedure that accounts for
  • sequence identity between templates and target
    sequence,
  • sequence identity among templates,
  • crystal resolution of the templates,
  • contribution of templates to the target sequence
    (i.e. if a region is covered by several templates
    or by a single template only).

18
Single versus multiple templates
Using a dataset of 765 proteins with known
structure two sets of models were built (1)
using one template (best E-value hit light
bars), (2) using multiple templates (grey bars)
19
Andincreased coverage
Histogram of models difference length. Each
query sequence is modeled using single and
multiple templates. The histogram shows the
frequency of (LmLs). Lm length of model built
using multiple templates, and Ls length of the
model built using a single template.
20
The x-ray structure, the model with multiple
templates and with a single template are shown in
grey, red, and blue, respectively. Multiple
templates agree much better in two exposed
regions A and B, than the model built using
single template.
21
Increased CoverageThe x-ray structure, the
model with multiple templates, and model with
single templates are shown in grey, red, and
blue, respectively. The addition of extra
templates allowed obtaining a longer model that
include a beta-turn-beta-turn extra region (20
amino acids), depicted in ribbon.
22
Acknowledgement
  • Lab members
  • Dmitrij Rykunov
  • Rotem Rubinstein
  • J. Eduardo Fajardo
  • Carlos J. Madrid-Aliste
  • Veena Venkatagiriyappa
  • Joseph Dybas
  • Mario Pujato
  • Brajesh Rai
  • Narcis Fernandez-Fuentes
  • Elliot Sternberger

23
Http//www.fiserlab.org/servers
Write a Comment
User Comments (0)
About PowerShow.com