Multiple Sequence Alignment - PowerPoint PPT Presentation

About This Presentation
Title:

Multiple Sequence Alignment

Description:

Construct multiple alignments using pair-wise alignment relative to a fixed sequence ... (Si, Sj) is the optimal score of a pair-wise alignment between Si and Sj ... – PowerPoint PPT presentation

Number of Views:322
Avg rating:3.0/5.0
Slides: 20
Provided by: davidfern2
Category:

less

Transcript and Presenter's Notes

Title: Multiple Sequence Alignment


1
Multiple Sequence Alignment
  • Dynamic Programming

2
Multiple Sequence Alignment
  • VTISCTGSSSNIGAG?NHVKWYQQLPG
  • VTISCTGTSSNIGS??ITVNWYQQLPG
  • LRLSCSSSGFIFSS??YAMYWVRQAPG
  • LSLTCTVSGTSFDD??YYSTWVRQPPG
  • PEVTCVVVDVSHEDPQVKFNWYVDG??
  • ATLVCLISDFYPGA??VTVAWKADS??
  • ATLVCLISDFYPGA??VTVAWKADS??
  • AALGCLVKDYFPEP??VTVSWNSG-??
  • VSLTCLVKGFYPSD??IAVEWESNG-?
  • Goal Bring the greatest number of similar
    characters into the same column of the alignment
  • Similar to alignment of two sequences.

3
CLUSTALW MSA
MSA of four oxidoreductase NAD binding domain
protein sequences. Red AVFPMILW. Blue DE.
Magenta RHK. Green STYHCNGQ. Grey all
others. Residue ranges are shown after sequence
names.
Chenna et al. Nucleic Acids Research, 2003, Vol.
31, No. 13 3497-3500
4
Multiple Sequence Alignment Motivation
  • Correspondence. Find out which parts do the same
    thing
  • Similar genes are conserved across widely
    divergent species, often performing similar
    functions
  • Structure prediction
  • Use knowledge of structure of one or more members
    of a protein MSA to predict structure of other
    members
  • Structure is more conserved than sequence
  • Create profiles for protein families
  • Allow us to search for other members of the
    family
  • Genome assembly Automated reconstruction of
    contig maps of genomic fragments such as ESTs
  • MSA is the starting point for phylogenetic
    analysis

5
Multiple Sequence Alignment Approaches
  • Optimal Global Alignments -Dynamic programming
  • Generalization of Needleman-Wunsch
  • Find alignment that maximizes a score function
  • Computationally expensive Time grows as product
    of sequence lengths
  • Global Progressive Alignments - Match
    closely-related sequences first using a guide
    tree
  • Global Iterative Alignments - Multiple
    re-building attempts to find best alignment
  • Local alignments
  • Profiles, Blocks, Patterns

6
Scoring a multiple alignment
A
A
A
A
C
A
A
C
C
C
A
A
C
C
A
Sum of pairs
Star
Tree
7
Sum of Pairs
20a - 10ß
8
Sum-of-Pairs Scoring Function
  • Score of multiple alignment
  • ?i ltj score(Si,Sj)
  • where score(Si,Sj) score of induced
    pairwise alignment

9
Induced Pairwise Alignment
  • S1 S - T I S C T G - S - N I
  • S2 L - T I C N G S S - N I
  • S3 L R T I S C S G F S Q N I

Induced pairwise alignment of S1, S2
S1 S T I S C T G - S N I S2 L T I C N G S S
N I
10
MSA Dynamic Programming
  • The two-sequence alignment algorithm can be
    generalized to any number of sequences.
  • E.g., for three sequences X, Y, W
    define Ci,j,k score of optimum alignment
    among X1..i, Y1..j, W1..k
  • As for two sequences, divide possible alignments
    into different classes, depending on how they
    end.
  • Use to devise recurrence relations for Ci,j,k
  • Ci,j,k is the maximum out of all possibilities

11
MSA 7 ways alignment can end for 3 sequences
Xi Yj Wk
X1 . . . Xi-1 Xi Y1 . . . Yj-1 Yj W1 . . . Wk-1 Wk
- Yj Wk
Xi - Wk
Xi - -
Xi Yj -
- Yj -
- - Wk
12
Dynamic programming for three sequences
Each alignment is a path through the dynamic
programming matrix
S
A
A
N
S
V
S
N
S
Start
13
Dynamic Programming for Three Sequences
There are 7 ways to get to Ci,j,k
Ci,j,k
Ci-1,j,k-1
Ci-1,j-1,k-1
Ci-1,j,k-1
Enumerate all possibilities and choose the best
one
14
Dynamic Programming MSA General Case
  • For k sequences of length n, dynamic programming
    algorithm does (2k-1) nk operations
  • Example 6 sequences of length 100 require
    6.4X1013 calculations
  • Space for table is nk
  • Implementations (e.g., WashU MSA 2.1) use tricks
    and only search subset of dynamic programming
    table
  • Even this is expensive. E.g., Baylor CM Search
    launcher limits MSA to 8 sequences of 800
    characters and 10 minutes processing time

15
Problems with SP scoring
  • Pair-wise comparisons can over-score
    evolutionarily distant pairs.
  • Reason For 3 or more sequences, SP scoring does
    not correspond to any evolutionary tree

But not
16
Overcoming problems with SP scoring
  • Use weights to incorporate evolution in sum of
    pairs scoring
  • Some pair-wise alignments are more important than
    others
  • E.g., more important to have a good alignment
    between mouse and human sequences than mouse and
    bird
  • Assign different weights to different pair-wise
    alignments.
  • Weight decreases with evolutionary distance.
  • Use star tree approach
  • one sequence is assigned as the ancestor and all
    others are contrasted it.

17
Star Alignments
  • Construct multiple alignments using pair-wise
    alignment relative to a fixed sequence
  • Out of a set S S1, S2, . . . , Sr of
    sequences, pick sequence Sc that
    maximizesstar_score(c) ? sim(Sc, Si) 1 i
    r, i ? cwhere sim(Si, Sj) is the optimal
    score of a pair-wise alignment between Si and Sj

18
Algorithm
  • Compute sim(Si, Sj) for every pair (i,j)
  • Compute star_score(i) for every i
  • Choose the index c that minimizes star_score(c)
    and make it the center of the star
  • Produce a multiple alignment M such that, for
    every i, the induced pairwise alignment of Sc and
    Si is the same as the optimum alignment of Sc and
    Si.

19
Step 4 Detail
Sc A-ACC-TT S2 AGACCGT-
Sc AA--CCTT S1 AATGCC--
Sc A-A--CC-TT S1 A-ATGCC--- S2 AGA--CCGT-
Write a Comment
User Comments (0)
About PowerShow.com