# Introduction%20to%20bioinformatics - PowerPoint PPT Presentation

Title:

## Introduction%20to%20bioinformatics

Description:

### Title: Introduction to bioinformatics Author: pirovano Last modified by: gebruiker Created Date: 3/14/2006 9:06:45 AM Document presentation format – PowerPoint PPT presentation

Number of Views:371
Avg rating:3.0/5.0
Slides: 81
Provided by: pir86
Category:
Tags:
Transcript and Presenter's Notes

Title: Introduction%20to%20bioinformatics

1
Introduction to bioinformatics 2008Lecture 8
Multiple Sequence Alignment (II)
2
Progressive multiple alignment
1
Score 1-2
2
1
Score 1-3
3
4
Score 4-5
5
Scores
Similarity matrix
55
Scores to distances
Iteration possibilities
Guide tree
Multiple alignment
3
Progressive alignment strategy
1. Perform pair-wise alignments of all of the
sequences (all against all e.g. make N(N-1)/2
alignments)
2. Use the alignment scores to make a similarity (or
distance) matrix
3. Use that matrix to produce a guide tree
4. Align the sequences successively, guided by the
order and relationships indicated by the tree
(N-1 alignment steps).

4
Progressive alignment strategy
• Methods
• Biopat (Hogeweg and Hesper 1984 -- first
integrated method ever)
• MULTAL (Taylor 1987)
• DIALIGN (12, Morgenstern 1996)
• PRRP (Gotoh 1996)
• ClustalW (Thompson et al 1994)
• PRALINE (Heringa 1999)
• T-Coffee (Notredame 2000)
• POA (Lee 2002)
• MUSCLE (Edgar 2004)
• PROBSCONS (Do, 2005)

5
Flavodoxin fold aligning 13 Flavodoxins cheY
5(??) fold
6
Flavodoxin-cheY NJ tree
7
Flavodoxin fold helix-beta-helix
8
Flavodoxin family - TOPS diagrams
The basic topology of the flavodoxin fold is
given below, the other four TOPS diagrams show
flavodoxin folds with local insertions of
secondary structure elements.
2
3
4
1
2
3
4
5
?-helix ?-strand
1
5
9
Flavodoxin-cheY NJ tree
10
Flavodoxin-cheY Pre-processing (prepro?1500)
11
Clustal, ClustalW, ClustalX
• CLUSTAL W/X (Thompson et al., 1994) uses
Neighbour Joining (NJ) algorithm (Saitou and Nei,
1984), widely used in phylogenetic analysis, to
construct a guide tree (see lecture on
phylogenetic methods).
• Sequence blocks are represented by profile, in
which the individual sequences are additionally
weighted according to the branch lengths in the
NJ tree.
• Further carefully crafted heuristics include
• (i) local gap penalties
• (ii) automatic selection of the amino acid
substitution matrix, (iii) automatic gap penalty
• (iv) mechanism to delay alignment of sequences
that appear to be distant at the time they are
considered.
• CLUSTAL (W/X) does not allow iteration (Hogeweg
and Hesper, 1984 Corpet, 1988, Gotoh, 1996
Heringa, 1999, 2002)

12
ClustalW web-interface
13
• CLUSTAL X (1.64b) multiple sequence alignment
Flavodoxin-cheY
• 1fx1 -PKALIVYGSTTGNTEYTAETIARQLANAG-Y-E
VDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSIE------LQDDFIPLFD
-SLEETGAQGRK
VDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSIE------LQDDFIPLFD
-SLEETGAQGRK
• FLAV_DESGI MPKALIVYGSTTGNTEGVAEAIAKTLNSEG-M-E
-DLDRAGLKDKK
• FLAV_DESSA MSKSLIVYGSTTGNTETAAEYVAEAFENKE-I-D
• FLAV_DESDE MSKVLIVFGSSTGNTESIAQKLEELIAAGG-H-E
-EFNRFGLAGRK
• FLAV_CLOAB -MKISILYSSKTGKTERVAKLIEEGVKRSGNI-E
VKTMNLDAVDKKFLQE-SEGIIFGTPTYYAN---------ISWEMKKWID
-ESSEFNLEGKL
• FLAV_MEGEL --MVEIVYWSGTGNTEAMANEIEAAVKAAG-A-D
VESVRFEDTNVDDVAS-KDVILLGCPAMGSE--E------LEDSVVEPFF
-TDLAPKLKGKK
• 4fxn ---MKIVYWSGTGNTEKMAELIAKGIIESG-K-D
VNTINVSDVNIDELLN-EDILILGCSAMGDE--V------LEESEFEPFI
-EEISTKISGKK
• FLAV_ANASP SKKIGLFYGTQTGKTESVAEIIRDEFGNDVVT--
--LHDVSQAEVTDLND-YQYLIIGCPTWNIGELQ---SD-----WEGLYS
-ELDDVDFNGKL
• FLAV_AZOVI -AKIGLFFGSNTGKTRKVAKSIKKRFDDETMSD-
--ALNVNRVSAEDFAQ-YQFLILGTPTLGEGELPGLSSDCENESWEEFLP
-KIEGLDFSGKT
DKLPEVDMKDLP
--LDVRRATREQFLS--YPVLLLGTPTLGDGELPGVEAGSQYDSWQEFTN
--VHDIAKSSKEDLEA-YDILLLGIPTWYYGEAQ-CD-------WDDFFP
-TLEEIDFNGKL
FNNVEEAEDGVDALN------KLQAGGYGFV--I------SDWNMPNMDG
-LELLKTIR---
• . ... .
.

The secondary structures of 4 sequences are known
and can be used to asses the alignment (red is
?-strand, blue is ?-helix)
14
There are problems
• Accuracy is very important !!!!
• Progressive multiple alignment is a greedy
strategy Alignment errors during the
construction of the MSA cannot be repaired
anymore and these errors are propagated into
later progressive steps.
• Comparisons of sequences at early steps during
progressive alignment cannot make use of
information from other sequences.
• It is only later during the alignment progression
through profile representation) becomes employed
in the alignment steps.

15
Progressive multiple alignment
Once a gap, always a gap Feng Doolittle, 1987
16
alignment
• Profile pre-processing (Praline)
• Secondary structure-induced alignment
• Matrix extension
• Objective try to avoid (early) errors

17
PRALINE web-interface
18
Profile pre-processing
1
Score 1-2
2
1
Score 1-3
3
4
5
Score 4-5
1
Key Sequence
2
1
Pre-alignment
3
4
5
Master-slave (N-to-1) alignment
A C D . . Y
1
Pre-profile
Pi Px
19
Pre-profile generation
1
Score 1-2
2
1
Score 1-3
3
4
Score 4-5
5
Cut-off
Pre-profiles
Pre-alignments
1
A C D . . Y
1
2
3
4
5
2
2
A C D . . Y
1
3
4
5
5
A C D . . Y
1
5
2
3
4
20
Pre-profile alignment
Pre-profiles
1
A C D . . Y
2
A C D . . Y
Final alignment
3
A C D . . Y
1
2
3
4
5
4
A C D . . Y
A C D . . Y
5
21
Pre-profile alignment
1
2
1
3
4
5
2
2
1
3
4
Final alignment
5
3
1
1
3
2
2
4
3
5
4
5
4
4
1
2
3
5
5
1
5
2
3
4
22
Pre-profile alignmentAlignment consistency
Ala131
1
1
2
1
A131 A131 L133 C126 A131
3
4
5
2
2
1
2
3
4
5
3
1
3
2
4
5
4
4
1
2
5
3
5
5
1
5
2
3
4
23
PRALINE pre-profile generation
• Idea use the information from all query
sequences to make a pre-profile for each query
sequence that contains information from other
sequences
• You can use all sequences in each pre-profile, or
use only those sequences that will probably align
correctly. Incorrectly aligned sequences in the
pre-profiles will increase the noise level.
• Select using alignment score only allow
sequences in pre-profiles if their alignment with
the score higher than a given threshold value.
In PRALINE, this threshold is given as
prepro1500 (alignment score threshold value is
1500 see next two slides)

24
Reliable sequences for pre-profiles
The curve each time gives the number of pairwise
alignments (y) scoring less than x. The range
1500ltxlt1800 shows a flat section of the curve
that can serve as a natural cut-off point for
admitting sequences into the pre-alignment blocks
25
Global pre-processing (prepro?0)
• Preprocessed profile for sequence 2
LPVAIFGLGDAEGYPD
• 1fx1 KALIVYGSTTGNTEYTAETIARQL-ANAGYEVDS
RDAASVEAFEGFDLVLLGCSTW--GDD---SIELQDDFLFDSLEETGAQG
RKVACFGCGDS-SY-E
• 4fxn -MKIVYWSGTGNTEKMAELIAKGISGKDVNTINV
SDVNIDELLNE-DILILGC---SAMGDEVLEESEFEPFIEEISTKISGKK
VALGSYGWGDGKWMRD
• FLAV_ANASP KIGLFYGTQTGKTESVaEIIRDEFGNDVVTLHDV
SEVTD---LNDYQYLIIgCPTWNIG---ELQ-SDW-EGLYSELDDVDFNG
• FLAV_AZOVI KIGLFFGSNTGKTRKVaKSIKKRFDTMSDA-LNV
NRVS-AEDFAQYQFLILgTPTLGPGLSSDCENESWEEFL-PKIEGLDFSG
KTVALfGLGDQVGYPE
• FLAV_CLOAB KISILYSSKTGKTERVaKLIEE--GVKRSGNIEV
KDAVDKKFLQESEGIIFgTPTYYANISWEMK--KW----IDESSEFNLEG
KLGAAfSTANAGGSDI
• FLAV_DESDE KVLIVFGSSTGNTESIaQKLEELIAA-GGHEVTL
RKVAAfASGDQE-Y-E
• FLAV_DESGI KALIVYGSTTGNTEGVaEAIAKTLNSEGTTVVNV
KKVGVfGCGDS-SY-T
• FLAV_DESSA KSLIVYGSTTGNTETAaEYVAEAFENK-EIDVEL
KKVSVfGCGDSD-Y-T
RDAASVEAFEGFDLVLLgCSTW--GDD---SIELQDDFLFDSLEETGAQG
RKVACfGCGDS-SY-E
HDISSKEDLEAYDILLLgIPTWYYG----EAQCDWDDF-FPTLEEIDFNG
KLVALfGCGDQEDYAE
KTVALfGLGDQLNYSK
KKVGLfGYGWGSG---
• 3chy KELKFLVVDDFSTRRIVRNLLKELGFNEEAEDGV
LPVLMV---TAEAKKE
• 2fcr NFCDAIEEIHDCFAKQGAKPVGFSNPDDYDYEES
KSVRDGKFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV
• 1fx1 YFCGAVDAIEEKLKNLGA----------------
EIVQD----GLRID--GDPRAARDDIVGWAHDVRGAI--

26
Global pre-processing (prepro?0)
• Preprocessed profile for sequence 3
• 4fxn MKIVYWSGTGNTEKMAELIAKGIIESGKDVNTIN
VSDVNIDELLNEDILILGCSAMGDEVLEESEFEPFIEEISTKISGKKVAL
FGSYGWGDGKWMRDFE
• 1fx1 ALIVYGSTTGNTEYTAETIARQLANAGYEVDSRD
AASVEAGGLFEGDLVLLGCSTWGDDSIEQDDFIPLFDSLETGAQGRKVAC
FGSYEYFCGA-VDAIE
-AIFGLGDAEGYPDFC
• FLAV_ANASP IGLFYGTQTGKTESVaEIIRD---EFGNDVVTLD
VSQAEVTDLNDYQYLIIgCPTWNIGEL-QSDWEGLYSELDVDFNGKLVAY
• FLAV_AZOVI IGLFFGSNTGKTRKVaKSIKKRFDDETMS-DALN
VNRVSAEDFAQYQFLILgTPTLGEGELENESWEEFLPKIGLDFSGKTVAL
fGQVGYPEGELYSFFK
• FLAV_CLOAB MKILYSSKTGKTERVaKLIEEGVKRSGNEVKTMN
LDAVDKKFLQESEGIIFgTPTYYANI--SWEMKKWIDESSENLEGKLGAA
fSTAGGSDIALLTILN
• FLAV_DESDE VLIVFGSSTGNTESIaQKLEELIAAGGHEVTLLN
fAS---GDQEYVPAIE
• FLAV_DESGI ALIVYGSTTGNTEGVaEAIAKTLNSEGMETTVVN
fGSYTYFCGA-VDVIE
• FLAV_DESSA MSIVYGSTTGNTETAaEYVAEAFENKEIDVELKN
fGDYTYFCGA-VDAIE
AASVEAGGLFEGDLVLLgCSTWGDDSIEQDDFIPLFDSLETGAQGRKVAC
fGSYEYFCGA-VDAIE
IAKSSKEDLEAYDILLLgIPTYGEAQCDWDDFFPTLEEID--FNGKLVAL
fGDYAFCDAGTIRDIE
VRRATREQFLSYPVLLLgTPTLGDELVEASQYDSWQEFTNTDLTGKTVAL
fGNYSKNFVSAMRILY
fGSYGWGSGEWMDAWK
• 3chy DKELKFLVVDDFSTMRRIVRNLLKELG--FNNVE
PVLMVTAEAKKENIIA
• 4fxn ERMNGYGCVVVETPLIVQNEPDEAEQDCIEFGKK
IANI
• 1fx1 EKLKNLGAEIVQDGLRIDGDPRAARDDIVGWAHD
VRGA

27
Pre-profiles (prepro?1500)
1
2
28
Pre-profiles (prepro?1500)
13
14
29

Local pre-processing
Local alignments are calculated from high to low
scoring each time the sequence parts
corresponding to a selected local alignment are
blocked such that a next local alignment has to
emerge before or after the earlier selected one
this preserves co-linearity of the local
alignments and assocaited sequence fragments in
the pre-alignments
30
Local pre-processing (locprepro?0)
• Preprocessed profile for sequence 2 2fcr
LPVAIFGLGDAEGYPD
• 1fx1 ...IVYGSTTGNTEYTAETIARQL---ANAGYEV
DDAASVEAFEGFDLVLLGCSTW--GDDSELQ----DDFLFDSLEETGAQG
RKVACFGCGDS-SY-E
• 4fxn KI-VYWS-GTGNTEKMAELIAKGIGKDVNT-INV
SDVNIDELLNE-DILILGCSA--MGDEVEES--EFEPF----IEEISTKG
KKVALFGWGDGKGYG-
• FLAV_ANASP KIGLFYGTQTGKTESVaEIIRDEFGNDVVTLHDV
SEVTD---LNDYQYLIIgCPTWNIG---ELQ-SDW-EGLYSELDDVDFNG
• FLAV_AZOVI KIGLFFGSNTGKTRKVaKSIKKTM---SDA-LNV
NRVS-AEDFAQYQFLILgTPTLGEGSDCENE--SWEEFL-PKIEGLDFSG
KTVALfGLGDQVGYPE
• FLAV_CLOAB KISILYSSKTGKTERVaKLIEE--GVKRSGNIEV
KDAVDKKFLQESEGIIFgTPTY-------YANISWEKWI-DESSEFNLEG
KLGAAfSTANSAGGSD
RKVAAfASGDQE-Y-E
• FLAV_DESGI ...IVYGSTTGNTEGVaEAIAKTLNSEGTTVVNV
KKVGVfGCGDS-SY-T
• FLAV_DESSA ...IVYGSTTGNTETAaEYVAEAFENK---EIDV
KKVSVfGCGDSD-Y-T
DDAASVEAFEGFDLVLLgCSTW--GDDSELQ----DDFLFDSLEETGAQG
RKVACfGCGDS-SY-E
• FLAV_ECOLI ..GIFFGSDTGNTENIaKMIQKQLG-K-----DV
KLVALfGCGDQEDYAE
KTVALfGLGDQLNYSK
KKVGLfGYGWGSG---
• 3chy ..................................
L-----GFNNVEEAED
• 2fcr NFCDAIEEIHDCFAKQGAKPVGFSNPDDYDYEES
KSVRDGKFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV

31
Local pre-processing (locprepro?0)
• Preprocessed profile for sequence 3 4fxn
• 4fxn MKIVYWSGTGNTEKMAELIAKGIIESGKDVNTIN
VSDVNIDELLNEDILILGCSAMGDEVLEESEFEPFIEEISTKISGKKVAL
FGSYGWGDGKWMRDFE
• 1fx1 ..IVYGSTTGNTEYTAETIARQLANAGYEVDSRD
AASVEAGGLFEGDLVLLGCSTWGDDSIEQDDFIPLFDSLETGAQGRKVAC
FGC---GDSSYVDAIE
F---GLGDAE------
• FLAV_ANASP ..LFYGTQTGKTESVaEIIRD---EFGNDVVTLD
VSQAEVTDLNDYQYLIIgCPTIGE--L-QSDWEGLYSELDVDFNGKLVAY
• FLAV_AZOVI ..LFFGSNTGKTRKVaKSIKKRFDETMSD--ALN
VNRVSAEDFAQYQFLILgTPTLGEGELNESEFLPKIEGLD--FSGKTVAL
fGQVGYGEGSWSTD--
• FLAV_CLOAB MKILYSSKTGKTERVaKLIEEGVKRSGNEVKTMN
LDAVD-KKFLQEEGIIFgTPTMKKWIDESSEFN--LEAfSTANSGSDIAL
LGGVAFGKPK------
• FLAV_DESDE ..IVFGSSTGNTEKLEELIAAG----GHEVTLLN
fAS---GDQEY-EHFE
• FLAV_DESGI ..IVYGSTTGNTEGVaEAIAKTLNSEGMETTVVN
fGC---GDSSYTYDIE
• FLAV_DESSA ..IVYGSTTGNTETAaEYVAEAFENKEIDVELKN
fGC---GDS----DYE
AASVEAGGLFEGDLVLLgCSTWGDDSIEQDDFIPLFDSLETGAQGRKVAC
fGC---GDSSYVDAIE
VHDISKEDLEAYDILLLgIPTYGEAQCDWDDFFPTLEEID--FNGKLVAL
fGC---GD---QEDYA
---ATREQFLSYPVLLLgTPTLGDELVEASQYDSWQEFTNTDLTGKTVAL
f---GLGDQNYSKNFV
fGSYGWGSGEWMDAWK
• 3chy .RIV......N...LKEL---GFVEEAEDVDALN
................
• 4fxn ERMNGYGCVVVETPLIVQNEPDEAEQDCIEFGKK
IANI

32
• CLUSTAL X (1.64b) multiple sequence alignment
Flavodoxin-cheY
• 1fx1 -PKALIVYGSTTGNTEYTAETIARQLANAG-Y-E
VDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSIE------LQDDFIPLFD
-SLEETGAQGRK
VDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSIE------LQDDFIPLFD
-SLEETGAQGRK
• FLAV_DESGI MPKALIVYGSTTGNTEGVAEAIAKTLNSEG-M-E
-DLDRAGLKDKK
• FLAV_DESSA MSKSLIVYGSTTGNTETAAEYVAEAFENKE-I-D
• FLAV_DESDE MSKVLIVFGSSTGNTESIAQKLEELIAAGG-H-E
-EFNRFGLAGRK
• FLAV_CLOAB -MKISILYSSKTGKTERVAKLIEEGVKRSGNI-E
VKTMNLDAVDKKFLQE-SEGIIFGTPTYYAN---------ISWEMKKWID
-ESSEFNLEGKL
• FLAV_MEGEL --MVEIVYWSGTGNTEAMANEIEAAVKAAG-A-D
VESVRFEDTNVDDVAS-KDVILLGCPAMGSE--E------LEDSVVEPFF
-TDLAPKLKGKK
• 4fxn ---MKIVYWSGTGNTEKMAELIAKGIIESG-K-D
VNTINVSDVNIDELLN-EDILILGCSAMGDE--V------LEESEFEPFI
-EEISTKISGKK
• FLAV_ANASP SKKIGLFYGTQTGKTESVAEIIRDEFGNDVVT--
--LHDVSQAEVTDLND-YQYLIIGCPTWNIGELQ---SD-----WEGLYS
-ELDDVDFNGKL
• FLAV_AZOVI -AKIGLFFGSNTGKTRKVAKSIKKRFDDETMSD-
--ALNVNRVSAEDFAQ-YQFLILGTPTLGEGELPGLSSDCENESWEEFLP
-KIEGLDFSGKT
DKLPEVDMKDLP
--LDVRRATREQFLS--YPVLLLGTPTLGDGELPGVEAGSQYDSWQEFTN
--VHDIAKSSKEDLEA-YDILLLGIPTWYYGEAQ-CD-------WDDFFP
-TLEEIDFNGKL
FNNVEEAEDGVDALN------KLQAGGYGFV--I------SDWNMPNMDG
-LELLKTIR---
• . ... .
.

33
Flavodoxin-cheY Pre-processing (prepro?1500)
• 1fx1 -PKALIVYGSTTGNT-EYTAETIARQLANAG-YE
VDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSI------ELQDDFIPLF-
DSLEETGAQGRKVACF
• FLAV_DESDE MSKVLIVFGSSTGNT-ESIaQKLEELIAAGG-HE
EEFNRFGLAGRKVAAf
VDSRDAASVEAGGLFEGFDLVLLgCSTWGDDSI------ELQDDFIPLF-
DSLEETGAQGRKVACf
• FLAV_DESSA MSKSLIVYGSTTGNT-ETAaEYVAEAFENKE-ID
• FLAV_DESGI MPKALIVYGSTTGNT-EGVaEAIAKTLNSEG-ME
EDLDRAGLKDKKVGVf
DKLPEVDMKDLPVAIF
• FLAV_AZOVI -AKIGLFFGSNTGKT-RKVaKSIKKRFDDET-MS
DA-LNVNRVS-AEDFAQYQFLILgTPTLGEGELPGLSSDCENESWEEFL-
PKIEGLDFSGKTVALf
• FLAV_ENTAG MATIGIFFGSDTGQT-RKVaKLIHQKLDG---IA
DAPLDVRRAT-REQFLSYPVLLLgTPTLGDGELPGVEAGSQYDSWQEFT-
• FLAV_ANASP SKKIGLFYGTQTGKT-ESVaEIIRDEFGN---DV
VTLHDVSQAE-VTDLNDYQYLIIgCPTWNIGEL--------QSDWEGLY-
SELDDVDFNGKLVAYf
• FLAV_ECOLI -AITGIFFGSDTGNT-ENIaKMIQKQLGK---DV
PTLEEIDFNGKLVALf
• 4fxn -MK--IVYWSGTGNT-EKMAELIAKGIIESG-KD
VNTINVSDVNIDELL-NEDILILGCSAMGDEVL-------EESEFEPFI-
EEIS-TKISGKKVALF
VESVRFEDTNVDDVA-SKDVILLgCPAMGSEEL-------EDSVVEPFF-
TDLA-PKLKGKKVGLf
• FLAV_CLOAB -MKISILYSSKTGKT-ERVaKLIEEGVKRSGNIE
VKTMNLDAVD-KKFLQESEGIIFgTPTYYAN---------ISWEMKKWI-
DESSEFNLEGKLGAAf
EEAEDGVDALNKLQAGGYGFVI---SDWNMPNM----------DGLELL-
• T
• 1fx1 GCGDS-SY-EYFCGA-VDAIEEKLKNLGAEIVQD
---------------------GLRIDGD--PRAARDDIVGWAHDVRGAI-
-------
• FLAV_DESDE ASGDQ-EY-EHFCGA-VPAIEERAKELgATIIAE
---------------------GLKMEGD--ASNDPEAVASfAEDVLKQL-
-------
• FLAV_DESVH GCGDS-SY-EYFCGA-VDAIEEKLKNLgAEIVQD
---------------------GLRIDGD--PRAARDDIVGwAHDVRGAI-
-------

34
Flavodoxin-cheY Local Pre-processing(locprepro?3
00)
• 1fx1 --PKALIVYGSTTGNTEYTAETIARQLANAGYEV
DSRDAASVEAGGLFEGFDLVLLGCSTWGDDSI------ELQDDFIPL--F
DSLEETGAQGRKVACF
DSRDAASVEAGGLFEGFDLVLLgCSTWGDDSI------ELQDDFIPL--F
DSLEETGAQGRKVACf
• FLAV_DESSA -MSKSLIVYGSTTGNTETAaEYVAEAFENKEIDV
• FLAV_DESGI -MPKALIVYGSTTGNTEGVaEAIAKTLNSEGMET
EDLDRAGLKDKKVGVf
• FLAV_DESDE -MSKVLIVFGSSTGNTESIaQKLEELIAAGGHEV
EEFNRFGLAGRKVAAf
• 4fxn --MK--IVYWSGTGNTEKMAELIAKGIIESGKDV
NTINVSDVNIDELLN-EDILILGCSAMGDEVL------E-ESEFEPF--I
EEIS-TKISGKKVALF
ESVRFEDTNVDDVAS-KDVILLgCPAMGSEEL------E-DSVVEPF--F
TDLA-PKLKGKKVGLf
DKLPEVDMKDLPVAIF
• FLAV_ANASP -SKKIGLFYGTQTGKTESVaEIIRDEFGNDVVTL
H--DVSQAEV-TDLNDYQYLIIgCPTWNIGEL--------QSDWEGL--Y
SELDDVDFNGKLVAYf
• FLAV_AZOVI --AKIGLFFGSNTGKTRKVaKSIKKRFDDETMSD
A-LNVNRVSA-EDFAQYQFLILgTPTLGEGELPGLSSDCENESWEEF--L
PKIEGLDFSGKTVALf
APLDVRRATR-EQFLSYPVLLLgTPTLGDGELPGVEAGSQYDSWQEF--T
H--DIAKSSK-EDLEAYDILLLgIPTWYYGEA--------QCDWDDF--F
PTLEEIDFNGKLVALf
• FLAV_CLOAB --MKISILYSSKTGKTERVaKLIEEGVKRSGNIE
VKTMNLDAVDKKFLQESEGIIFgTPTYYA-----------NISWEMKKWI
DESSEFNLEGKLGAAf
AEDGVDALNKLQ-AGGYGFVI---SDWNMPNM----------DGLEL--L
• 1fx1 GCGDS--SY-EYFCGA-VD--AIEEKLKNLGAEI
VQD---------------------GLRID--GDPRAARDDIVGWAHDVRG
AI--------
• FLAV_DESVH GCGDS--SY-EYFCGA-VD--AIEEKLKNLgAEI
VQD---------------------GLRID--GDPRAARDDIVGwAHDVRG
AI--------
• FLAV_DESSA GCGDS--DY-TYFCGA-VD--AIEEKLEKMgAVV
KI--------
• FLAV_DESGI GCGDS--SY-TYFCGA-VD--VIEKKAEELgATL
VAS---------------------SLKID--GEPD--SAEVLDwAREVLA
RV--------

35
Strategies for multiple sequence alignment
• Profile pre-processing
• Secondary structure-induced alignment
(Praline-SS)
• Matrix extension
• Objective integrate secondary structure
information to anchor alignments and avoid errors

36
Protein structure hierarchical levels
TERTIARY STRUCTURE (fold)
37
Why use (predicted) structural information
• Structure more conserved than sequence
• Many structural protein families (e.g. globins)
have family members with very low sequence
similarities. For example, globin sequences
identities can be as low as 10 while still
having an identical fold.
• This means that you can still observe equivalent
secondary structures in homologous proteins even
if sequence similarities are extremely low.
• But you are dependent on the quality of
prediction methods. For example, secondary
structure prediction is currently at 76
correctness. So, 1 out of 4 predicted amino acids
is still incorrect.

38
How to combine secondary structure and amino acid
information
Amino acid substitution matrices
Dynamic programming search matrix
MDAGSTVILCFV
HHHCCCEEEEEE
M D A A S T I L C G S
H H H H C C E E E C C
H
H
C
C
E
E
Default
39
In terms of scoring
• So how would you score a profile using this extra
information?
• Same way of scoring as before, but you can use
sec. struct. specific substitution scores in
various combinations.
• Where does it fit in?
• Very important structure is always more
conserved than sequence so secondary structure
elements can help anchoring the alignments

40
Sequences to be aligned
Predict secondary structure
HHHHCCEEECCCEEECCHH HHHCCCCEECCCEEHHH HHHHHHHHHHHH
HCCCEEEE
CCCCCCEECCCEEEECCHH HHHHHCCEEEECCCEECCC
Secondary structure
Align sequences using secondary structure
Multiple alignment
41
Using predicted secondary structure
1fx1 -PK-ALIVYGSTTGNTEYTAETIARQLANAG-YE
VDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSI------ELQDDFIPLFD
S-LEETGAQGRKVACF e eeee b
ssshhhhhhhhhhhhhhttt eeeee stt tttttt seeee b
ee sss ee ttthhhhtt ttss tt
eeeee FLAV_DESVH MPK-ALIVYGSTTGNTEYTaETIARELA
DAG-YEVDSRDAASVEAGGLFEGFDLVLLgCSTWGDDSI------ELQDD
FIPLFDS-LEETGAQGRKVACf e eeeeee
hhhhhhhhhhhhhhh eeeeee eeeeee
hhhhhh
eeeee FLAV_DESGI MPK-ALIVYGSTTGNTEGVaEAIAKTLN
FVPLYED-LDRAGLKDKKVGVf e eeeeee
hhhhhhhhhhhhhh eeeeee hhhhhh eeeeeee
hhhhhh
eeeeee FLAV_DESSA MSK-SLIVYGSTTGNTETAaEYVAEAF
eeeeee hhhhhhhhhhhhhh eeeee
eeeee hhhhhhh h
eeeee FLAV_DESDE MSK-VLIVFGSSTGNTESIaQKLEELIA
FLSLFEE-FNRFGLAGRKVAAf eeee
hhhhhhhhhhhhhh eeeee hhhhhhhhhhheeeee
hhhhhhh hh eeeee 2fcr
LPVAIF eeeee
ssshhhhhhhhhhhhhggg b eeggg s gggggg seeeeeee
stt s s s sthhhhhhhtggg tt
eeeee FLAV_ANASP SKK-IGLFYGTQTGKTESVaEIIRDEFG
ND--VVTL-HDVSQAE-VTDLNDYQYLIIgCPTWNIGEL--------QSD
WEGLYSE-LDDVDFNGKLVAYf eeeee
hhhhhhhhhhhh eee hhh hhhhhhheeeeee
hhhhhhhhh
eeeeee FLAV_ECOLI -AI-TGIFFGSDTGNTENIaKMIQKQL
DWDDFFPT-LEEIDFNGKLVALf eee
hhhhhhhhhhhh eee hhh hhhhhhheeeee
hhhhh
eeeeee FLAV_AZOVI -AK-IGLFFGSNTGKTRKVaKSIKKRF
DDET-MSDA-LNVNRVS-AEDFAQYQFLILgTPTLGEGELPGLSSDCENE
SWEEFLPK-IEGLDFSGKTVALf eee
hhhhhhhhhhhhh hhh hhhhhhheeeee
hhhhhhhhh
eeeeee FLAV_ENTAG MAT-IGIFFGSDTGQTRKVaKLIHQKL
hhhhhhhhhhhh hhh hhhhhhheeeee
hhhhh eeeee 4fxn
----MKIVYWSGTGNTEKMAELIAKGIIESG-KDVNTINVSDV
NIDELLNE-DILILGCSAMGDEVL------E-ESEFEPFIEE-IST-KIS
GKKVALF eeeee
ssshhhhhhhhhhhhhhhtt eeeettt sttttt seeeeee
btttb ttthhhhhhh hst t tt
eeeee FLAV_MEGEL M---VEIVYWSGTGNTEAMaNEIEAAVK
VEPFFTD-LAP-KLKGKKVGLf
hhhhhhhhhhhhhh eeeee hhhhhhhh eeeee

eeeee FLAV_CLOAB M-K-ISILYSSKTGKTERVaKLIEEGVK
RSGNIEVKTMNL-DAVDKKFLQESEGIIFgTPTY-YANI--------SWE
MKKWIDE-SSEFNLEGKLGAAf eee
hhhhhhhhhhhhhh eeeeee hhhhhhhhhh eeee
hhhhhhhhh eeeee 3chy
LPVLMV tt eeee s
hhhhhhhhhhhhhht eeeesshh hhhhhhhh eeeee s
sss hhhhhhhhhh ttttt eeee 1fx1
GCGDS-SY-EYFCGAVDAIEEKLKNLGAEIVQD-----------
----------GLRIDGD--PRAARDDIVGWAHDVRGAI--------
eee s ss sstthhhhhhhhhhhttt ee s
eeees gggghhhhhhhhhhhhhh FLAV_
DESVH GCGDS-SY-EYFCGAVDAIEEKLKNLgAEIVQD------
---------------GLRIDGD--PRAARDDIVGwAHDVRGAI-------
- eee hhhhhhhhhhhh
eeeee eeeee
hhhhhhhhhhhhhh FLAV_DESGI GCGDS-SY-TYFCGAVDVI
EKKAEELgATLVAS---------------------SLKIDGE--P--DSA
EVLDwAREVLARV-------- eee
hhhhhhhhhhhh eeeee
hhhhhhhhhhh FLAV_DESSA
GCGDS-DY-TYFCGAVDAIEEKLEKMgAVVIGD-----------------
hhhhhhhhhhhh eeeee
e eee FLAV_DESDE
ASGDQ-EY-EHFCGAVPAIEERAKELgATIIAE-----------------
----GLKMEGD--ASNDPEAVASfAEDVLKQL--------
e hhhhhhhhhhhhhh eeeee
ee hhhhhhhhhhh 2fcr
GLGDAEGYPDNFCDAIEEIHDCFAKQGAKPVGFSNPDDYDYEESKSV
RD-GKFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV------
eee ttt ttsttthhhhhhhhhhhtt eee b gggs
s tteet teesseeeettt ss hhhhhhhhhhhhhhhht FLAV_A
FNDSKALR-NGKFVGLALDEDNQSDLTDDRIKSwVAQLKSEFGL------
hhhhhhhhhhhhhh
eeee
hhhhhhhhhhhhhhhh FLAV_ECOLI
DHFVGLAIDEDRQPELTAERVEKwVKQISEELHLDEILNA
hhhhhhhhhhhhhh eeee
hhhhhhhhhhhhhhhhhh FLAV_AZOVI
GLGDQVGYPENYLDALGELYSFFKDRgAKIVGSWSTDGYEFESS
EAVVD-GKFVGLALDLDNQSGKTDERVAAwLAQIAPEFGLS--L--
e hhhhhhhhhhhhhh eeeee
hhhhhhhhhhh FLAV_ENTA
G GLGDQLNYSKNFVSAMRILYDLVIARgACVVGNWPREGYKFSF
SAALLENNEFVGLPLDQENQYDLTEERIDSwLEKLKPAV-L------
hhhhhhhhhhhhhhh eeee
hhhhhhh hhhhhhhhhhhh 4fxn
G-----SYGWGDGKWMRDFEERMNGYGCVVVET---------
------------PLIVQNE--PDEAEQDCIEFGKKIANI---------
e eesss shhhhhhhhhhhhtt ee s
eeees ggghhhhhhhhhhhht FLAV
_MEGEL G-----SYGWGSGEWMDAWKQRTEDTgATVIGT-----
-----------------AIVNEM--PDNAPE-CKElGEAAAKA-------
-- hhhhhhhhhhh
eeeee eeee h
hhhhhhhh FLAV_CLOAB STANSIA-GGSDIALLTILNHLMVK
-gMLVYSG----GVAFGKPKTHLG-----YVHINEI--QENEDENARIfG
ERiANkV--KQIF--
hhhhhhhhhhhhhh eeeee
hhhh hhh hhhhhhhhhhhh h 3chy
-----------TAEAKKENIIAAAQAGASGY-------------------
------VVK----P-FTAATLEEKLNKIFEKLGM------
ess hhhhhhhhhtt see
ees s hhhhhhhhhhhhhhht

G
42
Strategies for multiple sequence alignment
• Profile pre-processing
• Secondary structure-induced alignment
• Matrix extension
• Objective try to avoid (early) errors

43
Integrating alignment methods and alignment
information with T-Coffee
• Integrating different pair-wise alignment
techniques (NW, SW, ..)
• Combining different multiple alignment methods
(consensus multiple alignment)
• Combining sequence alignment methods with
structural alignment techniques
• Plug in user knowledge

44
Matrix extension
• T-Coffee
• Tree-based Consistency Objective Function For
alignmEnt Evaluation
• Cedric Notredame (Bioinformatics for dummies)
• Des Higgins
• Jaap Heringa J. Mol. Biol., 302, 205-2172000

45
Using different sources of alignment information

Structure alignments
Clustal
Clustal
Dialign
Lalign
Manual
T-Coffee
46
T-Coffee library system
Seq1 AA1 Seq2 AA2 Weight 3 V31 5 L33 10 3 V31 6
L34 14 5 L33 6 R35 21 5 l33 6 I36 35
47
Matrix extension
2
1
3
1
4
1
3
2
4
2
4
3
48
Search matrix extension alignment transitivity
49
T-Coffee
Other sequences
Direct alignment
50
Search matrix extension
51
T-COFFEE web-interface
52
3D-COFFEE
• Computes structural based alignments
• Structures associated with the sequences are
retrieved and the information is used to optimise
the MSA
• More accurate but for many (many) proteins we
do not have the structure!

53
but.....
• T-COFFEE (V1.23) multiple sequence alignment
• Flavodoxin-cheY
• 1fx1 ----PKALIVYGSTTGNTEYTAETIARQLANAG-
YEVDSRDAASVE-AGGLFEGFDLVLLGCSTWGDDSIE------LQDDFIP
L-FDSLEETGAQGRK-----
YEVDSRDAASVE-AGGLFEGFDLVLLGCSTWGDDSIE------LQDDFIP
L-FDSLEETGAQGRK-----
• FLAV_DESGI ---MPKALIVYGSTTGNTEGVAEAIAKTLNSEG-
L-YEDLDRAGLKDKK-----
• FLAV_DESSA ---MSKSLIVYGSTTGNTETAAEYVAEAFENKE-
• FLAV_DESDE ---MSKVLIVFGSSTGNTESIAQKLEELIAAGG-
L-FEEFNRFGLAGRK-----
• 4fxn ------MKIVYWSGTGNTEKMAELIAKGIIESG-
KDVNTINVSDVN-IDELL-NEDILILGCSAMGDEVLE-------ESEFEP
F-IEEIS-TKISGKK-----
• FLAV_MEGEL -----MVEIVYWSGTGNTEAMANEIEAAVKAAG-
F-FTDLA-PKLKGKK-----
• FLAV_CLOAB ----MKISILYSSKTGKTERVAKLIEEGVKRSGN
IEVKTMNLDAVD-KKFLQ-ESEGIIFGTPTYYAN---------ISWEMKK
W-IDESSEFNLEGKL-----
--DAPIDVDDVTDPQAL-KDYDLLFLGAPTWNTGA----DTERSGTSWDE
FLYDKLPEVDMKDLP-----
• FLAV_ENTAG ---MATIGIFFGSDTGQTRKVAKLIHQKLDGIA-
--DAPLDVRRAT-REQF-LSYPVLLLGTPTLGDGELPGVEAGSQYDSWQE
• FLAV_ANASP ---SKKIGLFYGTQTGKTESVAEIIRDEFGNDV-
--VTLHDVSQAE-VTDL-NDYQYLIIGCPTWNIGEL--------QSDWEG
L-YSELDDVDFNGKL-----
• FLAV_AZOVI ----AKIGLFFGSNTGKTRKVAKSIKKRFDDET-
M-SDALNVNRVS-AEDF-AQYQFLILGTPTLGEGELPGLSSDCENESWEE
F-LPKIEGLDFSGKT-----
• FLAV_ECOLI ----AITGIFFGSDTGNTENIAKMIQKQLGKDV-
F-FPTLEEIDFNGKL-----
VE-EAEDGVDALNKLQ-AGGYGFVISDWNMPNMDGLE-------------
• . . . .

54
Multiple alignment methods
• Multi-dimensional dynamic programminggt extension
of pairwise sequence alignment.
• Progressive alignmentgt incorporates phylogenetic
information to guide the alignment process
• Iterative alignmentgt correct for problems with
progressive alignment by repeatedly realigning
subgroups of sequence

55
Iteration
Iteration can help in cases where one can learn
from the data produced in a preceding step, so
that the next step can be taken in a more
informed way.
Convergence
Limit cycle
Divergence
56
Pre-profile alignmentAlignment consistency
Ala131
1
1
2
1
A131 A131 L133 C126 A131
3
4
5
2
2
1
2
3
4
5
3
1
3
2
4
5
4
4
1
2
5
3
5
5
1
5
2
3
4
57
Flavodoxin-cheY consistency scores(PRALINE
prepro0)
Completely consistently aligned amino acids
1fx1 --7899999999999TEYTAETIARQL8776-66
57777777777777553799VL999ST97775599989-43556667779
8998878AQGRKVACF FLAV_DESVH
-46788999999999TEYTAETIAREL7777-775777777777777755
3799VL999ST97775599989-435566677798998878AQGRKVACF
FLAV_DESDE -47899999999999999999999988776695
658888777777778763YDAVL999SAW987778987775355666666
9777776789GRKVAAF FLAV_DESGI
-46788999999999TEGVAEAIAKTL9997-766788887777778875
39DVVL999ST987776--9889546667776697776557777888888
FLAV_DESSA 936777999999999999999999999887597
65777888888888876399999999STW77765--99995366666777
97998779999999999 4fxn
-8787799999999999999999997766669675677888888888887
77999999988777776--9889577788888897773237888888888
FLAV_MEGEL 9776779999999999999999997777766-6
65666677788899976799999999987777669--8873623344666
95555455778888888 2fcr
79DLLF99999855312888111224555555407777777888888888
FLAV_ANASP -47899LFYGTQTGKTESVAEIIR977765392
2356677777777897779999999999988843--99985557787778
99998879999999999 FLAV_ECOLI
997789999GSDTGNTENIAKMIQ87742229224566788899999955
69999999999755553----99262225555495777767778999999
FLAV_AZOVI --79IGLFFGSNTGKTRKVAKSIK998877596
57577888888999777899999999999877761112222222244555
-5555555778999999 FLAV_ENTAG
94789999999999999999999998755229223234555555555555
688899999998875521111111133477777-7777777999999999
FLAV_CLOAB -86999ILYSSKTGKTERVAK999755555505
7678887888887777765778899998522223--98883422344555
97777777777777777 3chy
01222222233333356666655555552229222222222222211121
63335555755553222888877674533344493332222222222222
Avrg Consist 86677788888888899999999987765548
44455566666666665557888888888766544887666334445566
586666556778888888 Conservation
01255386758489697469639464633430452443554465434735
16658868567554455000000314365446505575435547747759
1fx1 G888799955555559888888888899777-
---7777797787787978---5555555667765556777777788887
99------ FLAV_DESVH G888799955555559888888888
899777----7777797787787978---555555566776555677777
778888799------ FLAV_DESDE
A88878685555555999988888889998879--8777788-9877777
7--8555555554433245667777777777599------ FLAV_DESG
I 87775977755555677777777777777778---88888887
667778777775555555555542424667888887777-------- FL
AV_DESSA 977768777555556777777777777777767887
777777778888-978985555555556536556888888888877----
---- 4fxn 86777755555555266666666655555
55778877679998777779777776655555555554444666666665
55798------ FLAV_MEGEL 8577775666666525556777
77888888868997788898877655867788554433322222221223
3223355557-------- 2fcr
87777357333333377776666777776553333333333333332283
3333333332244444567777777888777633------ FLAV_ANAS
P 9777737753333447778888887777777333344444444
44433833333344444444444455577777788777734------ FL
AV_ECOLI 977743786444444777788888888888833334
44444444444424444455555455577566778888888887773411
0000 FLAV_AZOVI 97776355333333466666667777777
77333344444444444448233335555555555554555888888887
7772311---- FLAV_ENTAG 9777738865555558666666
66677666633333333333333322123333344444444455555665
566666555582------ FLAV_CLOAB
76662722222221244444444445555558788222222222222211
1111122222222222344443333333233399------ 3chy
222227222222224111355431113324578-877789976
66556877776322222222222322222323344444422------ A
vrg Consist 86665656444444466666666666666665666
55555655555556555654444434444433444556666666666668
89999 Conservation 736630574333341634645344447
46710000011010011000000010434744645443225474454448
434301000000 Iteration 0 SP 135136.00 AvSP
10.473 SId 3838 AvSId 0.297
Consistency values are scored from 0 to 10 the
value 10 is represented by the corresponding
amino acid (red)
58
Flavodoxin-cheY consistency scores (PRALINE
prepro1500)
1fx1 -42444IVYGSTTGNTEYTAETIARQL8866
66666577777775667888DLVLLGCSTW77766----99547666676
9-77888788AQGRKVACFFLAV_DESVH
-34444IVYGSTTGNTEYTAETIAREL77666666657777777566788
8DLVLLGCSTW77766----995476666769-77888788AQGRKVACF
FLAV_DESSA -33444IVYGSTTGNTET999998887776557
77668888899666686YDIVLFGCSTW77777----996466666779-
-34444IVYGSTTGNTEGVA999999999976555567777788666667
8DVVLLGCSTW77777----995466666779-88887688888KKVGVF
FLAV_DESDE -44777IVFGSSTGNTE9887776666555667
77778899999777777YDAVLFGCSAW88877----997587777779-
8887766777GRKVAAF4fxn
-32222IVYWSGTGNTE8888888876666778888888888NI888858
6DILILGCSA888888------8-8888886--66665378ISGKKVALF
FLAV_MEGEL -12222IVYWSGTGNTEAMA8888888888888
888555555555555485DVILLGCPAMGSE77------572222288--
8888755588GKKVGLF2fcr
-41456IFFSTSTGNTTEVA999998865432222765554443244779
YDLLFLGAPT944411999-111112454441-8DKLPEVDMKDLPVAIF
FLAV_ANASP -00456LFYGTQTGKTESVAEII9877553233
22427776666623589YQYLIIGCPTW55532--999843678W98889
9998888888GKLVAYFFLAV_AZOVI
-42445LFFGSNTGKTRKVAKSIK87777434333536666665467777
YQFLILGTPTLGEG862222222222355558-45666666888KTVALF
FLAV_ENTAG -266IGIFFGSDTGQTRKVAKLIHQKL666466
4424DVRRATR88888SYPVLLLGTPT88888644444444446WQEF8-
-51114IFFGSDTGNTENIAKMI987743311111555555588355599
YDILLLGIPT954431----88355225544--44666666779KLVALF
FLAV_CLOAB -63666ILYSSKTGKTERVAKLIE633333333
33333333333366LQESEGIIFGTPTY63--6--------66SWE3333
3333333333GKLGAAF3chy
Avrg Consist
93344599999999999999999887766555555556666677566678
89999999999767658888775555566668967777677889999999
Conservation 023642867584896974696394646334435
43125645654143443665886856755445500000031446544600
55575345547747759 1fx1
G98879-89-999877977--7788899999999955--88888-99
88887798999777778766553344588776666222266899899FL
AV_DESVH G98879-89-999877977--778889999999995
5--88888-99888877989997777787665533445887766662222
66899899FLAV_DESSA G98878-688688888-88--8899
9999999999979988888887788889-89-978777766675664557
7776666654466899899FLAV_DESGI
G98879-898688888987--788888999GATLV7698899-9998789
888-8899787878776663122477788888333276899899FLAV_
DESDE AS8888-68-888888899--9999999999988888-9
99888889887788978887766688542222122555555553332779
999994fxn GS2228-228222222222--2388888
88888888888888888888888888888888777886676553557755
5533221288888888FLAV_MEGEL
G4888--28-8888882MD--AWKQRTEDTGATVI77-------------
--------77222--224444222222244222112--------2fcr
GLGDA5-8Y5DNFC88-88--887777777777776544
45555555555443855557777744653333577999999875553338
99899FLAV_ANASP GTGDQ5-GY5899999-99--99EEKIS
QRGG9997555554444444443328444446666555555555666667
6666433333899899FLAV_AZOVI
GLGDQ5-885777555-55--55555788888888555555555555555
554855555555555666555555888855555544442--288FLAV_
ENTAG GLGDQL-NYSKNFVSA-MR--ILYDLVIARGACVVG888
8EGYKFSFSAA6664NEFVGLPLDQEN88888EERIDSWLE888422426
88688FLAV_ECOLI GC99549784688888987997777777
77888885544444444444444411444477777445577556778888
8887433322100100FLAV_CLOAB
STANS636666333333333333666666666666666666333336336
6336663333336EDENARIFGERIANKVKQI3333336666663chy
VTAEA---KKENIIAA-----------AQAGAS------
-------------------GYVVK-----PFTAATLEEKLNKIFEKLGM-
----- Avrg Consist
99887797877777777779977888888888888667777777777677
66677777676667766655455577776666433355788788Conse
rvation 74664003715454570630035453444474575300
00010100100000000106837601444423355744544484343010
00000 Iteration 0 SP 136702.00
AvSP 10.654 SId 3955 AvSId 0.308
Consistency values are scored from 0 to 10 the
value 10 is represented by the corresponding
amino acid (red)
59
Consistency iteration
Pre-profiles
Multiple alignment positional consistency scores
60
Pre-profile update iteration
Pre-profiles
Multiple alignment
61
Iterate similarity matrix, guide tree and MSA
1
Score 1-2
2
1
Score 1-3
3
4
Score 4-5
5
Similarity matrix
Scores
This way of iterating was already implemented in
1984 by Hogeweg and Hesper
55
Guide tree
Multiple alignment
62
Secondary structure-induced alignment
63
PRALINEUsing secondary structure for alignment
Dynamic programming search matrix
Amino acid exchange weights matrices
MDAGSTVILCFV
HHHCCCEEEEEE
M D A A S T I L C G S
H H H H C C E E E C C
H
H
C
C
E
E
Default
64
Flavodoxin-cheY multiple alignment/ secondary
structure iteration cheY SSEs
IVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP 3chy-I
TERATION-0 PHD EEEEEEE
HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE
3chy-ITERATION-1 PHD EEEEEEEE
HHHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-2 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHH EEEEEE
3chy-ITERATION-3 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-4 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEE
3chy-ITERATION-5 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-6 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-7 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-8 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEEE
3chy-ITERATION-9 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHHH EEEEE
3chy-AA SEQUENCE AA
FTAATLEEKLNKIFEKLGM 3chy-ITERATION-0
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHH
HHHHHHHHHHHHHH 3chy-ITERATION-1
PHD HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-2
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-3
PHD HHHHHHHHHHHH
HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-4 PHD HHHHH
EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-5 PHD HHHHHHHH
EEEEE HHHHHHHHHHHHHHHH EEE
HHHHHHHHHHHHHH 3chy-ITERATION-6 PHD
HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE
HHHHHHHHHHHHHH 3chy-ITERATION-7
PHD HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-8
PHD HHHHHHHH EEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-9
PHD HHHHHHHH EEEEE
HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH
65
Flavodoxin-cheY multiple alignment/ secondary
structure iteration cheY SSEs
IVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP 3chy-I
TERATION-0 PHD EEEEEEE
HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE
3chy-ITERATION-1 PHD EEEEEEEE
HHHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-2 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHH EEEEEE
3chy-ITERATION-3 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-4 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEE
3chy-ITERATION-5 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-6 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-7 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-8 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEEE
3chy-ITERATION-9 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHHH EEEEE
3chy-AA SEQUENCE AA
FTAATLEEKLNKIFEKLGM 3chy-ITERATION-0
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHH
HHHHHHHHHHHHHH 3chy-ITERATION-1
PHD HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-2
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-3
PHD HHHHHHHHHHHH
HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-4 PHD HHHHH
EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-5 PHD HHHHHHHH
EEEEE HHHHHHHHHHHHHHHH EEE
HHHHHHHHHHHHHH 3chy-ITERATION-6 PHD
HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE
HHHHHHHHHHHHHH 3chy-ITERATION-7
PHD HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-8
PHD HHHHHHHH EEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-9
PHD HHHHHHHH EEEEE
HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH
66
Flavodoxin-cheY multiple alignment/ secondary
structure iteration cheY SSEs
IVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP 3chy-I
TERATION-0 PHD EEEEEEE
HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE
3chy-ITERATION-1 PHD EEEEEEEE
HHHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-2 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHH EEEEEE
3chy-ITERATION-3 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-4 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEE
3chy-ITERATION-5 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-6 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-7 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-8 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEEE
3chy-ITERATION-9 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHHH EEEEE
3chy-AA SEQUENCE AA
FTAATLEEKLNKIFEKLGM 3chy-ITERATION-0
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHH
HHHHHHHHHHHHHH 3chy-ITERATION-1
PHD HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-2
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-3
PHD HHHHHHHHHHHH
HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-4 PHD HHHHH
EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-5 PHD HHHHHHHH
EEEEE HHHHHHHHHHHHHHHH EEE
HHHHHHHHHHHHHH 3chy-ITERATION-6 PHD
HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE
HHHHHHHHHHHHHH 3chy-ITERATION-7
PHD HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-8
PHD HHHHHHHH EEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-9
PHD HHHHHHHH EEEEE
HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH
67
Flavodoxin-cheY multiple alignment/ secondary
structure iteration cheY SSEs
IVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP 3chy-I
TERATION-0 PHD EEEEEEE
HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE
3chy-ITERATION-1 PHD EEEEEEEE
HHHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-2 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHH EEEEEE
3chy-ITERATION-3 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-4 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEE
3chy-ITERATION-5 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-6 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-7 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-8 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEEE
3chy-ITERATION-9 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHHH EEEEE
3chy-AA SEQUENCE AA
FTAATLEEKLNKIFEKLGM 3chy-ITERATION-0
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHH
HHHHHHHHHHHHHH 3chy-ITERATION-1
PHD HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-2
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-3
PHD HHHHHHHHHHHH
HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-4 PHD HHHHH
EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-5 PHD HHHHHHHH
EEEEE HHHHHHHHHHHHHHHH EEE
HHHHHHHHHHHHHH 3chy-ITERATION-6 PHD
HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE
HHHHHHHHHHHHHH 3chy-ITERATION-7
PHD HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-8
PHD HHHHHHHH EEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-9
PHD HHHHHHHH EEEEE
HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH
68
Flavodoxin-cheY multiple alignment/ secondary
structure iteration cheY SSEs
IVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP 3chy-I
TERATION-0 PHD EEEEEEE
HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE
3chy-ITERATION-1 PHD EEEEEEEE
HHHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-2 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHH EEEEEE
3chy-ITERATION-3 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-4 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEE
3chy-ITERATION-5 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-6 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-7 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-8 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEEE
3chy-ITERATION-9 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHHH EEEEE
3chy-AA SEQUENCE AA
FTAATLEEKLNKIFEKLGM 3chy-ITERATION-0
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHH
HHHHHHHHHHHHHH 3chy-ITERATION-1
PHD HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-2
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-3
PHD HHHHHHHHHHHH
HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-4 PHD HHHHH
EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-5 PHD HHHHHHHH
EEEEE HHHHHHHHHHHHHHHH EEE
HHHHHHHHHHHHHH 3chy-ITERATION-6 PHD
HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE
HHHHHHHHHHHHHH 3chy-ITERATION-7
PHD HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-8
PHD HHHHHHHH EEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-9
PHD HHHHHHHH EEEEE
HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH
69
Flavodoxin-cheY multiple alignment/ secondary
structure iteration cheY SSEs
IVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP 3chy-I
TERATION-0 PHD EEEEEEE
HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE
3chy-ITERATION-1 PHD EEEEEEEE
HHHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-2 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHH EEEEEE
3chy-ITERATION-3 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-4 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEE
3chy-ITERATION-5 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-6 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-7 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-8 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEEE
3chy-ITERATION-9 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHHH EEEEE
3chy-AA SEQUENCE AA
FTAATLEEKLNKIFEKLGM 3chy-ITERATION-0
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHH
HHHHHHHHHHHHHH 3chy-ITERATION-1
PHD HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-2
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-3
PHD HHHHHHHHHHHH
HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-4 PHD HHHHH
EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-5 PHD HHHHHHHH
EEEEE HHHHHHHHHHHHHHHH EEE
HHHHHHHHHHHHHH 3chy-ITERATION-6 PHD
HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE
HHHHHHHHHHHHHH 3chy-ITERATION-7
PHD HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-8
PHD HHHHHHHH EEEEE HHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-9
PHD HHHHHHHH EEEEE
HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH
70
Flavodoxin-cheY multiple alignment/ secondary
structure iteration cheY SSEs
IVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP 3chy-I
TERATION-0 PHD EEEEEEE
HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE
3chy-ITERATION-1 PHD EEEEEEEE
HHHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-2 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHH EEEEEE
3chy-ITERATION-3 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-4 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEE
3chy-ITERATION-5 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-6 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHH EEEEEE
3chy-ITERATION-7 PHD EEEEEEEE
HHHHHHHHHHHHHH EEE HHHHHH EEEEE
3chy-ITERATION-8 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHH EEEEEE
3chy-ITERATION-9 PHD EEEEEEEE
HHHHHHHHHHHHHH HHHHHHHHHH EEEEE
3chy-AA SEQUENCE AA
FTAATLEEKLNKIFEKLGM 3chy-ITERATION-0
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHH
HHHHHHHHHHHHHH 3chy-ITERATION-1
PHD HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-2
PHD HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH
EEE HHHHHHHHHHHHHH 3chy-ITERATION-3
PHD HHHHHHHHHHHH
HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-4 PHD HHHHH
EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH
3chy-ITERATION-5 PHD HHHHHHHH
EEEEE HHHHHHHHHHHHHHHH EEE
HHHHHHHHHHHHHH 3chy-ITERATION-6 PHD
HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE
HHHHHHHHHHHHHH 3chy-ITERATION-7
PHD HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH