Title: Multiple Sequence Alignment
1Multiple Sequence Alignment
2Multiple Sequence Alignment
- VTISCTGSSSNIGAG?NHVKWYQQLPG
- VTISCTGTSSNIGS??ITVNWYQQLPG
- LRLSCSSSGFIFSS??YAMYWVRQAPG
- LSLTCTVSGTSFDD??YYSTWVRQPPG
- PEVTCVVVDVSHEDPQVKFNWYVDG??
- ATLVCLISDFYPGA??VTVAWKADS??
- ATLVCLISDFYPGA??VTVAWKADS??
- AALGCLVKDYFPEP??VTVSWNSG-??
- VSLTCLVKGFYPSD??IAVEWESNG-?
- Goal Bring the greatest number of similar
characters into the same column of the alignment - Similar to alignment of two sequences.
3CLUSTALW MSA
MSA of four oxidoreductase NAD binding domain
protein sequences. Red AVFPMILW. Blue DE.
Magenta RHK. Green STYHCNGQ. Grey all
others. Residue ranges are shown after sequence
names.
Chenna et al. Nucleic Acids Research, 2003, Vol.
31, No. 13 3497-3500
4Multiple Sequence Alignment Motivation
- Correspondence. Find out which parts do the same
thing - Similar genes are conserved across widely
divergent species, often performing similar
functions - Structure prediction
- Use knowledge of structure of one or more members
of a protein MSA to predict structure of other
members - Structure is more conserved than sequence
- Create profiles for protein families
- Allow us to search for other members of the
family - Genome assembly Automated reconstruction of
contig maps of genomic fragments such as ESTs - MSA is the starting point for phylogenetic
analysis
5Multiple Sequence Alignment Approaches
- Optimal Global Alignments -Dynamic programming
- Generalization of Needleman-Wunsch
- Find alignment that maximizes a score function
- Computationally expensive Time grows as product
of sequence lengths - Global Progressive Alignments - Match
closely-related sequences first using a guide
tree - Global Iterative Alignments - Multiple
re-building attempts to find best alignment - Local alignments
- Profiles, Blocks, Patterns
6Scoring a multiple alignment
A
A
A
A
C
A
A
C
C
C
A
A
C
C
A
Sum of pairs
Star
Tree
7Sum of Pairs
20a - 10ß
8Sum-of-Pairs Scoring Function
- Score of multiple alignment
- ?i ltj score(Si,Sj)
- where score(Si,Sj) score of induced
pairwise alignment
9Induced Pairwise Alignment
- S1 S - T I S C T G - S - N I
- S2 L - T I C N G S S - N I
- S3 L R T I S C S G F S Q N I
Induced pairwise alignment of S1, S2
S1 S T I S C T G - S N I S2 L T I C N G S S
N I
10MSA Dynamic Programming
- The two-sequence alignment algorithm can be
generalized to any number of sequences. - E.g., for three sequences X, Y, W
define Ci,j,k score of optimum alignment
among X1..i, Y1..j, W1..k - As for two sequences, divide possible alignments
into different classes, depending on how they
end. - Use to devise recurrence relations for Ci,j,k
- Ci,j,k is the maximum out of all possibilities
11MSA 7 ways alignment can end for 3 sequences
Xi Yj Wk
X1 . . . Xi-1 Xi Y1 . . . Yj-1 Yj W1 . . . Wk-1 Wk
- Yj Wk
Xi - Wk
Xi - -
Xi Yj -
- Yj -
- - Wk
12Dynamic programming for three sequences
Each alignment is a path through the dynamic
programming matrix
S
A
A
N
S
V
S
N
S
Start
13Dynamic Programming for Three Sequences
There are 7 ways to get to Ci,j,k
Ci,j,k
Ci-1,j,k-1
Ci-1,j-1,k-1
Ci-1,j,k-1
Enumerate all possibilities and choose the best
one
14Dynamic Programming MSA General Case
- For k sequences of length n, dynamic programming
algorithm does (2k-1) nk operations - Example 6 sequences of length 100 require
6.4X1013 calculations - Space for table is nk
- Implementations (e.g., WashU MSA 2.1) use tricks
and only search subset of dynamic programming
table - Even this is expensive. E.g., Baylor CM Search
launcher limits MSA to 8 sequences of 800
characters and 10 minutes processing time
15Problems with SP scoring
- Pair-wise comparisons can over-score
evolutionarily distant pairs. - Reason For 3 or more sequences, SP scoring does
not correspond to any evolutionary tree
But not
16Overcoming problems with SP scoring
- Use weights to incorporate evolution in sum of
pairs scoring - Some pair-wise alignments are more important than
others - E.g., more important to have a good alignment
between mouse and human sequences than mouse and
bird - Assign different weights to different pair-wise
alignments. - Weight decreases with evolutionary distance.
- Use star tree approach
- one sequence is assigned as the ancestor and all
others are contrasted it.
17Star Alignments
- Construct multiple alignments using pair-wise
alignment relative to a fixed sequence - Out of a set S S1, S2, . . . , Sr of
sequences, pick sequence Sc that
maximizesstar_score(c) ? sim(Sc, Si) 1 i
r, i ? cwhere sim(Si, Sj) is the optimal
score of a pair-wise alignment between Si and Sj
18Algorithm
- Compute sim(Si, Sj) for every pair (i,j)
- Compute star_score(i) for every i
- Choose the index c that minimizes star_score(c)
and make it the center of the star - Produce a multiple alignment M such that, for
every i, the induced pairwise alignment of Sc and
Si is the same as the optimum alignment of Sc and
Si.
19Step 4 Detail
Sc A-ACC-TT S2 AGACCGT-
Sc AA--CCTT S1 AATGCC--
Sc A-A--CC-TT S1 A-ATGCC--- S2 AGA--CCGT-