Title: Multiple Mapping Method with Multiple Templates (M4T): optimizing sequence-to-structure alignments and combining unique information from multiple templates
1Multiple Mapping Method with Multiple Templates
(M4T) optimizing sequence-to-structure
alignments and combining unique information from
multiple templates
- András Fiser
- Department of Biochemistry and
- Seaver Center for Bioinformatics
- Albert Einstein College of Medicine
- Bronx, New York, USA
2Comparative protein structure modeling
START
Template Search
Multiple Templates
Target Template Alignment
Multiple Mapping Method
Model Building
Loop, side chain modeling
Model Evaluation
Statistical potential
END
3Why do we need sequence alignments?
Sequence vs. sequence Establishing residue
equivalencies between two proteins to locate
conserved/variable regions
Sequence vs. databases Querying sequence
databases
Sequence vs. structure To generate input
alignment for comparative modeling / threading
4Ranking of models built on alternative alignments
Template 1a6m Target 1spg, chain B
21 sequence identity
Example
Template VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHP
ETLEKFDRFKHLKTEAEMKASEDLKKHGVTVLTALGAILKKK Target
CLW DWTDAERAAIKALWGKIDVGEIGP-QALSRLLIVYPWTQRHFK
GFGNISTNAAILGNAKVAEHGKTVMGGLDRAVQNM Target A2D
DWTDAERAAIKALWGKI-DVGEIGPQALSRLLIVYPWTQRHFKGFGNIS
TNAAILGNAKVAEHGKTVMGGLDRAVQNM Template
GHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRH-PGDFGADAQ
GAMNKALELFRKDIAAKYKELGY Target CLW
DNIKNVYKQLSIKHSEKIHVDPDNFRLLGEIITMCVGAKFGPSAFTPEIH
EAWQKFLAVVVSALGRQYH---- Target A2D
DNIKNVYKQLSIKHSEKIHVDPDNFRLLGEIITMCVGAKF-G---PSAFT
PEIHEAWQKFLAVVVSALGRQYH
Problem None of the currently available methods
produce consistently superior results in all cases
5 Alternative solutions vs. sequence similarity
Instead of relying on just one alignment method,
one should combine results of several alternative
techniques
6Multiple Mapping Method
- Idea
- Improve the accuracy of sequence-to-structure
alignment by optimally splicing alternative
inputs. - Three components
- - Sampling
- - Algorithm
- - Scoring function
7MMM scoring function increasing the
dimensionality of input information
Template VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHP
ETLEKFDRFKHLKTEAEMKASEDLKKHGVTVLTALGAIL Target
CLW DWTDAERAAIKALWGKIDVGEIGP-QALSRLLIVYPWTQRHFK
GFGNISTNAAILGNAKVAEHGKTVMGGLDRAV Template
KKKGHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRH-PGDFGA
DAQGAMNKALELFRKDIAAKYKELGY Target CLW
QNMDNIKNVYKQLSIKHSEKIHVDPDNFRLLGEIITMCVGAKFGPSAFTP
EIHEAWQKFLAVVVSALGRQYH---- Template
VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLK
TEAEMKASEDLKKHGVTVLTALGAIL Target A2D
DWTDAERAAIKALWGKI-DVGEIGPQALSRLLIVYPWTQRHFKGFGNIS
TNAAILGNAKVAEHGKTVMGGLDRAV Template
KKKGHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRH-PGDFGA
DAQGAMNKALELFRKDIAAKYKELGY Target A2D
QNMDNIKNVYKQLSIKHSEKIHVDPDNFRLLGEIITMCVGAKF-G---PS
AFTPEIHEAWQKFLAVVVSALGRQYH
1
2
1
2
Different mapping identifies a different
environment for each residue to align Assess the
fitness of each mapping
8Multiple Mapping Method Algorithm
Step 1 Identify variable regions from the
consensus alignment of the input set Step 2
Select the best scoring variable segments, and
combine them with with the core
region of the alignment.
Example
Template 1a6m Target 1spg, chain B
21 sequence id
Template VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHP
ETLEKFDRFKHLKTEAEMKASEDLKKHGVTVLTALGAILKKK Target
CLW DWTDAERAAIKALWGKIDVGEIGP-QALSRLLIVYPWTQRHFK
GFGNISTNAAILGNAKVAEHGKTVMGGLDRAVQNM Target A2D
DWTDAERAAIKALWGKI-DVGEIGPQALSRLLIVYPWTQRHFKGFGNIS
TNAAILGNAKVAEHGKTVMGGLDRAVQNM Template
GHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRH-PGDFGADAQ
GAMNKALELFRKDIAAKYKELGY Target CLW
DNIKNVYKQLSIKHSEKIHVDPDNFRLLGEIITMCVGAKFGPSAFTPEIH
EAWQKFLAVVVSALGRQYH---- Target A2D
DNIKNVYKQLSIKHSEKIHVDPDNFRLLGEIITMCVGAKF-G---PSAFT
PEIHEAWQKFLAVVVSALGRQYH
9MMM example using ideal scoring function
Experimental ClustalW, RMSD 2.0 Å Align2D, RMSD
2.7 Å
CLUSTALW 2.6 Å ALIGN2D 6.1 Å
Template VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHP
ETLEKFDRFKHLKTEAEMKASEDLKKHGVTVLTALGAILKKK Target
MMM DWTDAERAAIKALWGKI-DVGEIGPQALSRLLIVYPWTQRHFK
GFGNISTNAAILGNAKVAEHGKTVMGGLDRAVQNM Template
GHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRH-PGDFGADAQ
GAMNKALELFRKDIAAKYKELGY Target MMM
DNIKNVYKQLSIKHSEKIHVDPDNFRLLGEIITMCVGAKFGPSAFTPEIH
EAWQKFLAVVVSALGRQYH----
Experimental MMM, RMSD 1.8 Å
10Multiple Mapping Method scoring function (1)
- A composite scoring function to assess the
compatibility/fit of alternative variable
segments in the template structural environment. - The composite scoring function consists of three
mostly non-overlapping components. - Environment-specific substitution matrices
(FUGUE1). - A scoring scheme based on a comparison (PHD vs.
DSSP) of the secondary structure types (H3P22). - Statistically derived residue-residue contact
energy (Rykunov and Fiser3).
1Shi et al. J. Mol. Biol. (2001) 310,
243-257 2Rice et al., J. Mol. Biol (1997) 267,
1026-1038 3Rykunov Fiser., Proteins. (2007)
67, 559-68
11MMM performance on 1400 pairs
12MMM performance on 87 pairs, meta-servers
ESypred3D Consensus
13Sampling vs. Scoring
14Summary
- Multiple Mapping Method optimally combines
alternative alignments obtained from different
methods or scoring function - On a benchmark dataset of 6635 protein pair
structural alignments, comparative models built
using MMM alignments are approximately 0.3 ? and
0.5 Å more accurate on average in the whole
spectrum and in the lt30 target-template sequence
identity regions, respectively, than the average
accuracy of models built using the alternative
input alignments ( 3 and 4 Å).
15Optimally combining multiple templates
16(No Transcript)
17Selecting multiple templates
- Target sequence by PSI-BLAST.
- Hits selected if sequence overlap with the target
is gt 60 of the actual SCOP domain length or
more than 75 of the PDB chain length in case of
a missing SCOP classification. - Iterative clustering procedure identifies the
most suitable templates to combine. Templates are
selected or discarded according to a hierarchical
selection procedure that accounts for - sequence identity between templates and target
sequence, - sequence identity among templates,
- crystal resolution of the templates,
- contribution of templates to the target sequence
(i.e. if a region is covered by several templates
or by a single template only).
18Single versus multiple templates
Using a dataset of 765 proteins with known
structure two sets of models were built (1)
using one template (best E-value hit light
bars), (2) using multiple templates (grey bars)
19Andincreased coverage
Histogram of models difference length. Each
query sequence is modeled using single and
multiple templates. The histogram shows the
frequency of (LmLs). Lm length of model built
using multiple templates, and Ls length of the
model built using a single template.
20The x-ray structure, the model with multiple
templates and with a single template are shown in
grey, red, and blue, respectively. Multiple
templates agree much better in two exposed
regions A and B, than the model built using
single template.
21Increased CoverageThe x-ray structure, the
model with multiple templates, and model with
single templates are shown in grey, red, and
blue, respectively. The addition of extra
templates allowed obtaining a longer model that
include a beta-turn-beta-turn extra region (20
amino acids), depicted in ribbon.
22Acknowledgement
- Lab members
- Dmitrij Rykunov
- Rotem Rubinstein
- J. Eduardo Fajardo
- Carlos J. Madrid-Aliste
- Veena Venkatagiriyappa
- Joseph Dybas
- Mario Pujato
- Brajesh Rai
- Narcis Fernandez-Fuentes
- Elliot Sternberger
23Http//www.fiserlab.org/servers