Multiple Alignment and Motif Searching - PowerPoint PPT Presentation

1 / 399
About This Presentation
Title:

Multiple Alignment and Motif Searching

Description:

Multiple Alignment and Motif Searching Burkhard Morgenstern Universit t G ttingen Institute of Microbiology and Genetics Department of Bioinformatics – PowerPoint PPT presentation

Number of Views:207
Avg rating:3.0/5.0
Slides: 400
Provided by: pub648
Category:

less

Transcript and Presenter's Notes

Title: Multiple Alignment and Motif Searching


1
Multiple Alignment and Motif Searching
  • Burkhard Morgenstern
  • Universität Göttingen
  • Institute of Microbiology and Genetics
  • Department of Bioinformatics
  • Tunis, March 2007

2
Multiple Alignment and Motif Searching
  • http//www.gobics.de/
  • burkhard/teaching/tunis_07.php

3
www.gobics.de/burkhard/teaching/tunis_07.php
4
Information flow in the cell
5
Information flow in the cell
  • Idea
  • Sequence -gt Structure -gt Function

6
Information flow in the cell
7
Information flow in the cell
  • gap between sequence and structure/function data
  • Lots of data available at the sequence level
  • Fewer data at the structure and function level

8
Exponential growth of data bases
9
  • Major goal of bioinformatics close the gap
    between sequence information and
    structure/function information
  • Most important tool for sequence analysis
    sequence comparison
  • Simple approach dot plot, more advanced
    approach sequence alignment



10
The dot plot



11
The dot plot
  • Gibbs and McIntyre (1970)



12
The dot plot
  • Y Q E W T Y I V A R E A Q Y E
  • C I V M R E Q Y
  • Two sequences to be compared



13
The dot plot
  • Y Q E W T Y I V A R E A Q Y E
  • C
  • I
  • V
  • M
  • R
  • E
  • Q
  • Y
  • Comparison matrix



14
The dot plot
  • Y Q E W T Y I V A R E A Q Y E
  • C
  • I X
  • V
  • M
  • R
  • E
  • Q
  • Y
  • Search pairs of identical residues



15
The dot plot
  • Y Q E W T Y I V A R E A Q Y E
  • C
  • I X
  • V X
  • M
  • R X
  • E X X X
  • Q X X
  • Y X X
  • Dot plot dot (X) for all pairs of identical
    residues



16
The dot plot
  • Y Q E W T Y I V A R E A Q Y E
  • C
  • I X
  • V X
  • M
  • R X
  • E X X X
  • Q X X
  • Y X X



17
The dot plot
  • Y Q E W T Y I V A R E A Q Y E
  • C
  • I X
  • V X
  • M
  • R X
  • E X X X
  • Q X X
  • Y X X
  • Homologies as diagonal lines from top-left to
    bottom-right corner



18
The dot plot
  • Y Q E W T Y I V A R E A Q Y E
  • C
  • I X
  • V X
  • M
  • R X
  • E X X X
  • Q X X
  • Y X X
  • Inversions as diagonals from bottom left to top
    right



19
The dot plot
  • Y Q E W T Y Q E V R E Y Q E I
  • C
  • I X
  • V X
  • M
  • R
  • Y X X X
  • Q X X X
  • E X X X X
  • Repeats as parallel diagonals



20
The dot plot
  • Y Q E W T Y Q E V R E Y Q E I
  • C
  • I X
  • V X
  • M
  • R
  • Y X X X
  • Q X X X
  • E X X X X



21
The dot plot
  • Advantages
  • Various types of similarity detectable (repeats,
    inversions)
  • Useful for large-scale analysis
  • Use filtering for long sequeces dots represent
    matching segments instead of matching single
    residues


22
The dot plot



23
Pair-wise sequence alignment
  • Evolutionary or structurally related sequences
  • alignment possible
  • Sequence homologies represented by inserting gaps

24
Pair-wise sequence alignment
  • T Y I V A R E Q Y E
  • C I V M R E A Q Y
  • Two input sequences



25
Pair-wise sequence alignment
  • T Y I V A R E Q Y E
  • C
  • I
  • V
  • M
  • R
  • E
  • A
  • Q
  • Y
  • Comparison matrix for two sequences



26
Pair-wise sequence alignment
  • T Y I V A R E Q Y E
  • C
  • I X
  • V X
  • M
  • R X
  • E X X
  • A X
  • Q X
  • Y X X
  • Dot plot for two sequences



27
Pair-wise sequence alignment
  • T Y I V A R E Q Y E
  • C
  • I X
  • V X
  • M
  • R X
  • E X X
  • A X
  • Q X
  • Y X X
  • Similarities in same relative order over entire
    seqences



28
Pair-wise sequence alignment
  • T Y I V A R E Q Y E
  • C
  • I X
  • V X
  • M
  • R X
  • E X
  • A
  • Q X
  • Y X
  • Global alignment of sequences possible



29
Pair-wise sequence alignment
  • T Y I V A R E Q Y E
  • C X X
  • I X
  • V X
  • M X
  • R X
  • E X
  • A X
  • Q X
  • Y X X
  • Alignment corresponds to path through comparison
    matrix



30
Pair-wise sequence alignment
  • T Y I V A R E Q Y E
  • C X X
  • I X
  • V X
  • M X
  • R X
  • E X
  • A X
  • Q X
  • Y X X
  • Matches (red), mis-matches (green), gaps (blue)



31
Pair-wise sequence alignment
  • T Y I V A R E Q Y E
  • C X X
  • I X
  • V X
  • M X
  • R X
  • E X
  • A X
  • Q X
  • Y X X
  • Matches (red), mis-matches (green), gaps (blue)



32
Pair-wise sequence alignment
  • (global) alignment write sequences on top of
    each other, gaps represented by dash symbols



33
Pair-wise sequence alignment
  • T Y I V A R E Q Y E
  • C I V M R E A Q Y
  • Input sequences



34
Pair-wise sequence alignment
  • T Y I V A R E - Q Y E
  • C - I V M R E A Q Y
  • alignment of input sequences



35
Pair-wise sequence alignment
  • T Y I V A R E - Q Y E
  • C - I V M R E A Q Y -
  • alignment consists matches (red), mismatches
    (green) and gaps (blue)



36
Pair-wise sequence alignment
  • T Y I V A R E - Q Y E
  • C - I V M R E A Q Y
  • Basic task
  • Find best alignment of two sequences
  • alignment that reflects structural and
    evolutionary relations



37
Pair-wise sequence alignment
  • T Y I V A R E - Q Y E
  • C - I V M R E A Q Y
  • Questions
  • What is a good alignment?
  • How to find the best alignment?



38
Pair-wise sequence alignment
  • T Y I V A R E - Q Y E
  • C - I V M R E A Q Y
  • Idea consider alignment as hypothesis about
    evolution of sequences.
  • gaps correspond to insertions/deletions
  • mismatches correspond to substitutions



39
Pair-wise sequence alignment
  • T Y I V A R E - Q Y E
  • C - I V M R E A - Q Y
  • Problem
  • astronomical number of possible alignments



40
Pair-wise sequence alignment
  • T Y I V A R E Q Y E
  • C I - V M R E A Q Y
  • Problem
  • astronomical number of possible alignments



41
Pair-wise sequence alignment
  • T Y I V A R E - Q Y E
  • - C I V M R E A Q Y
  • Problem
  • astronomical number of possible alignments
  • stupid computer has to find out which alignment
    is best ??



42
Pair-wise sequence alignment
  • T Y I V A R E - Q Y E
  • - C I V M R E A Q Y
  • First (simplified) rules
  • minimize number of mismatches
  • maximize number of matches



43
Pair-wise sequence alignment
  • T Y I V A R E - Q Y E
  • - C I V M R E A Q Y
  • General assumption sequences not too distantly
    related.
  • In this case mismatches (substitutions) and gaps
    (insertions/deletions) unlikely
  • Consequence good alignment should reduce gaps
    and mismatches



44
Pair-wise sequence alignment
  • T Y I V A R E - Q Y E
  • C I - V M R E A Q Y
  • First (simplified) rules
  • minimize number of mismatches
  • maximize number of matches



45
Pair-wise sequence alignment
  • T Y I V A R E - Q Y E
  • - C I V M R E A Q Y
  • First (simplified) rules
  • minimize number of mismatches
  • maximize number of matches



46
Pair-wise sequence alignment
  • T Y I V A R E - Q Y E
  • - C I V M R E A Q Y
  • First (simplified) rules
  • minimize number of mismatches
  • maximize number of matches



47
Pair-wise sequence alignment
  • T Y I V A R E - Q Y E
  • C I - V M R E A Q Y
  • Second (simplified) rule
  • minimize number of gaps



48
Pair-wise sequence alignment
  • T Y I V - A R E - Q Y E
  • C I - V M - R E A Q Y
  • Second (simplified) rule
  • minimize number of gaps
  • Parsimony principle minimize number of
    evolutionary events



49
Pair-wise sequence alignment
  • For protein sequences
  • different degrees of similarity among amino
    acids.
  • counting matches/mismatches oversimplistic



50
Pair-wise sequence alignment
  • T Y I V
  • T L V
  • Protein sequences to be aligned



51
Pair-wise sequence alignment
  • T Y I V
  • T L - V
  • Possible alignment



52
Pair-wise sequence alignment
  • T Y I V
  • T - L V
  • Alternative alignment



53
Pair-wise sequence alignment
  • T Y I V
  • T - L V
  • Some amino acid residues are more similar to each
    other than others
  • Therefore similarity among amino acid residues
    has to be taken into account.



54
(No Transcript)
55
Pair-wise sequence alignment
  • T Y I V
  • T - L V
  • To assess quality of protein alignments
  • use similarity scores for amino acids
  • s(a,b) similarity score for amino acids a and b



56
Pair-wise sequence alignment
  • Similarity measured by substitution matrices
    based on substitution probabilities
  • Important substitution matrices
  • PAM (M. Dayhoff)
  • BLOSUM (S. Henikoff / J. Henikoff)



57
Pair-wise sequence alignment
  • The PAM matrix


  • Consider probability pa,b of substitution a ? b
    (or b ? a) for amino acids a and b
  • Define for amino acids a and b similarity score
    S(a,b) based on probability pa,b
  • First task find out pa,b for every pair of
    amino acids a, b

58
Pair-wise sequence alignment
  • The PAM matrix
  • Use closely related protein families no
    alignment problem, no double substitutions
  • Construct phylogenetic tree with parsimony method
  • Count substitution frequencies/probabilities
  • Normalize substitution probabilities
  • Extrapolate probabilities for larger evolutionary
    distances



59
Pair-wise sequence alignment
  • Finally define similarity score
  • S(a,b) log (pa,b / qa qb)
  • qa (relative) frequency of amino acid a



60
(No Transcript)
61
Pair-wise sequence alignment
  • T Y I V
  • T - L V
  • Given a similarity score s(a,b) for pairs of
    amino acids, define quality score of alignment
    as
  • sum of similarity values s(a,b) of aligned
    residues
  • minus gap penalty g for each residue aligned with
    a gap



62
Pair-wise sequence alignment
  • T Y I V
  • T - L V
  • Example
  • Score s(T,T) s(I,L) s (V,V) - g



63
Pair-wise sequence alignment
  • T Y I V
  • T - L V
  • Next question find alignment with best score
  • Dynamic-programming algorithm finds alignment
    with best score.
  • (Needleman and Wunsch, 1970)



64
Pair-wise sequence alignment
  • T Y I V A R E A Q Y E
  • - C I V M R E - Q Y
  • Alignment corresponds to path through comparison
    matrix


65
Pair-wise sequence alignment
  • T Y I V A R E A Q Y E
  • C
  • I X
  • V X
  • M
  • R X
  • E X X
  • Q X
  • Y X X



66
Pair-wise sequence alignment
  • T Y I V A R E A Q Y E
  • X X
  • C X
  • I X
  • V X
  • M X
  • R X
  • E X X
  • Q X
  • Y X X



67
Pair-wise sequence alignment
  • T Y I V A R E A Q Y E
  • - C I V M R E - Q Y
  • Alignment corresponds to path through comparison
    matrix


68
Pair-wise sequence alignment

  • T W L V - R E A Q I
  • - C I V M R E - H Y

69
Pair-wise sequence alignment

  • Score of alignment
  • Sum of similarity values of aligned
    residues
  • minus gap penatly
  • T W L V - R E A Q I
  • - C I V M R E - H Y

70
Pair-wise sequence alignment

  • Example
  • S - g s(W,C) s(L,L) s(V,V) -
    g s(R,R)
  • T W L V - R E A Q I
  • - C I V M R E - H Y

71
Pair-wise sequence alignment

  • T W L V R E A Q Y I
  • X X
  • C X
  • I X
  • V X
  • M X
  • R X
  • E X X
  • H X
  • Y X X
  • T W L V - R E A Q I
  • - C I V M R E - H Y

72
Pair-wise sequence alignment

  • T W L V R E A Q Y I
  • X X
  • C X Alignment
    corresponds
  • I X to path through
  • V X comparison
    matrix
  • M X
  • R X
  • E X X
  • H X
  • Y X X
  • T W L V - R E A Q I
  • - C I V M R E - H Y

73
Pair-wise sequence alignment

  • i
  • T W L V R E A Q Y I
  • X X Dynamic
    programming
  • C X Calculate
    scores S(i,j)
  • I X of optimal
    alignment of
  • V X prefixes up to
    positions
  • M X i and j.
  • j R X
  • E
  • H
  • Y
  • T W L V - R
  • - C I V M R

74
Pair-wise sequence alignment

  • i
  • T W L V R E A Q Y I
  • X X
  • C X S(i,j) can be
    calculated from
  • I X possible
    predecessors
  • V X S(i-1,j-1),
    S(i,j-1), S(i-1,j).
  • M X
  • j R X
  • E
  • H
  • Y
  • T W L V - R
  • - C I V M R

75
Pair-wise sequence alignment

  • i
  • T W L V R E A Q Y I
  • X X
  • C X Score of
    optimal path that
  • I X comes from top
    left
  • V X
  • M X S(i-1,j-1)
    s(R,R)
  • j R X
  • E
  • H
  • Y
  • T W L V - R
  • - C I V M R

76
Pair-wise sequence alignment

  • i
  • T W L V R E A Q Y I
  • X X
  • C X Score of
    optimal path that
  • I X comes from
    above
  • V X
  • j-1M X S(i,j-1) g
  • j R X
  • E
  • H
  • Y
  • T W L V R -
  • - C I V M R

77
Pair-wise sequence alignment

  • i-1 i
  • T W L V R E A Q Y I
  • X X
  • C X Score of
    optimal path that
  • I X comes from left
  • V X
  • M X S(i-1,j) g
  • j R X X
  • E
  • H
  • Y
  • T W L - - V R
  • - C I V M R -

78
Pair-wise sequence alignment

  • i-1 i
  • T W L V R E A Q Y I
  • X X
  • C X Score of
    optimal path
  • I X
  • V X Maximum of
    these three
  • M X values
  • j R X X
  • E
  • H
  • Y
  • T W L - - V R
  • - C I V M R -

79
Pair-wise sequence alignment
  • Recursion formula for global alignment
  • For sequences x and y



80
Pair-wise sequence alignment

  • T W L V R
  • C
  • I
  • V
  • M
  • R
  • E
  • H
  • Y

81
Pair-wise sequence alignment

  • T W L V R
  • x x x
  • C x x x
  • I x x
  • V x x
  • M x x
  • R x x
  • E x x
  • H x x
  • Y x x
  • Fill matrix from top left
    to bottom right

82
Pair-wise sequence alignment

  • T W L V R
  • x x x
  • C x x x
  • I x x x
  • V x x
  • M x x
  • R x x
  • E x x
  • H x x
  • Y x x
  • Fill matrix from top left
    to bottom right

83
Pair-wise sequence alignment

  • T W L V R
  • x x x
  • C x x x
  • I x x x
  • V x x x
  • M x x
  • R x x
  • E x x
  • H x x
  • Y x x
  • Fill matrix from top left
    to bottom right

84
Pair-wise sequence alignment

  • T W L V R
  • x x x
  • C x x x
  • I x x x
  • V x x x
  • M x x x
  • R x x
  • E x x
  • H x x
  • Y x x
  • Fill matrix from top left
    to bottom right

85
Pair-wise sequence alignment

  • T W L V R
  • x x x
  • C x x x
  • I x x x
  • V x x x
  • M x x x
  • R x x x
  • E x x
  • H x x
  • Y x x
  • Fill matrix from top left
    to bottom right

86
Pair-wise sequence alignment

  • T W L V R
  • x x x
  • C x x x
  • I x x x
  • V x x x
  • M x x x
  • R x x x
  • E x x x
  • H x x
  • Y x x
  • Fill matrix from top left
    to bottom right

87
Pair-wise sequence alignment

  • T W L V R
  • x x x
  • C x x x
  • I x x x
  • V x x x
  • M x x x
  • R x x x
  • E x x x
  • H x x x
  • Y x x
  • Fill matrix from top left
    to bottom right

88
Pair-wise sequence alignment

  • T W L V R
  • x x x
  • C x x x
  • I x x x
  • V x x x
  • M x x x
  • R x x x
  • E x x x
  • H x x x
  • Y x x x
  • Fill matrix from top left
    to bottom right

89
Pair-wise sequence alignment

  • T W L V R
  • x x x x
  • C x x x
  • I x x x
  • V x x x
  • M x x x
  • R x x x
  • E x x x
  • H x x x
  • Y x x x
  • Fill matrix from top left
    to bottom right

90
Pair-wise sequence alignment

  • T W L V R
  • x x x x
  • C x x x x
  • I x x x
  • V x x x
  • M x x x
  • R x x x
  • E x x x
  • H x x x
  • Y x x x
  • Fill matrix from top left
    to bottom right

91
Pair-wise sequence alignment

  • T W L V R
  • x x x x
  • C x x x x
  • I x x x x
  • V x x x
  • M x x x
  • R x x x
  • E x x x
  • H x x x
  • Y x x x
  • Fill matrix from top left
    to bottom right

92
Pair-wise sequence alignment

  • T W L V R
  • x x x x
  • C x x x x
  • I x x x x
  • V x x x x
  • M x x x
  • R x x x
  • E x x x
  • H x x x
  • Y x x x
  • Fill matrix from top left
    to bottom right

93
Pair-wise sequence alignment

  • T W L V R
  • x x x x x x
  • C x x x x x x
  • I x x x x x x
  • V x x x x x x
  • M x x x x x x
  • R x x x x x x
  • E x x x x x x
  • H x x x x x x
  • Y x x x x x x
  • Fill matrix from top left
    to bottom right

94
Pair-wise sequence alignment

  • T W L V R
  • x x x x x x
  • C x x x x x x
  • I x x x x x x
  • V x x x x x x
  • M x x x x x x
  • R x x x x x x
  • E x x x x x x
  • H x x x x x x
  • Y x x x x x x
  • Find optimal alignment by
    trace-back procedure

95
Pair-wise sequence alignment

  • T W L V R
  • x x x x x x
  • C x
  • I x
  • V x
  • M x
  • R x
  • E x
  • H x
  • Y x
  • Initial matrix entries?

96
Pair-wise sequence alignment

  • i
  • T W L V R
  • X X
  • C X Entries S(i,j)
    scores
  • I X of optimal
    alignment of
  • j V X prefixes up to
    positions
  • M i and j.
  • R
  • E
  • H
  • Y
  • T W L V
  • - C I V

97
Pair-wise sequence alignment

  • i
  • T W L V R
  • j X X X X X
  • C Entries S(i,0)
    scores
  • I of optimal
    alignment of
  • V prefix up to
    positions
  • M i and empty
    prefix.
  • R
  • E Score - i g
  • H
  • Y
  • T W L V
  • - - - -

98
Pair-wise sequence alignment

  • T W L V R
  • C
  • I
  • V
  • M
  • R
  • E
  • H
  • Y
  • Initial matrix entries
    Example, g 2

99
Pair-wise sequence alignment

  • T W L V R
  • 0 -2 -4 -6 -8 -10
  • C -2
  • I -4
  • V -6
  • M -8
  • R -10
  • E -12
  • H -14
  • Y -16
  • Initial matrix entries
    Example, g 2

100
Pair-wise global alignment

  • T W L V R E A Q Y I
  • X X
  • C X
  • I X
  • V X
  • M X
  • R X
  • E X X
  • F X
  • Y X X
  • T W L V - R E A Q I
  • - C I V M R E - F Y

101
Pair-wise global alignment
  • Computational complexity how does program run
    time and memory depend on size of input data?
  • l1 and l2 length of sequences
  • Computing time and memory proportional to
  • l1 l2
  • Time and memory complexity O(l1 l2)



102
Pair-wise sequence alignment
  • More realistic gap penalty affine-linear instead
    of linear
  • Penalty for gap of length l
  • c0 (l-1) c1
  • c0 gap-opening penalty
  • c0 gap-extension penalty



103
Pair-wise local alignment
  • So far global alignment considered sequences
    aligned over their entire length.
  • But sequences often share only local sequence
    similarity (conserved genes or domains)
  • Most important application database searching



104
Pair-wise local alignment

  • T W L V R E A Q Y I
  • X X
  • C X
  • I X
  • V X
  • M X
  • R X
  • E X X
  • H X
  • Y X X
  • T W L V - R E A Q I
  • - C I V M R E - F Y

105
Pair-wise local alignment

  • T W L V R E A Q Y I
  • X X
  • C X
  • I X
  • V X
  • M X
  • R X
  • E X X
  • F X
  • Y X X
  • T W L V - R E A Q I
  • - C I V M R E - F Y

106
Pair-wise local alignment
  • Problem
  • Find pair of segments with maximal alignment
    score (not necessarily part of optimal global
    alignment!)
  • Equivalent find path starting and ending
    anywhere in the matrix.



107
Pair-wise local alignment

  • T W L V R E A Q Y I
  • X X
  • C X
  • I X
  • V X
  • M X
  • R X
  • E X X
  • F X
  • Y X X
  • T W L V - R E A Q I
  • - C I V M R E - F Y

108
Pair-wise local alignment

  • Recursion formula for global alignment
  • S(i,j) max S(i-1,j-i)s(ai,bj) , S(i-1,j) g
    , S(i,j-i) g

109
Pair-wise local alignment

  • Recursion formula for local alignment
  • S(i,j) max 0 , S(i-1,j-i)s(ai,bj) , S(i-1,j)
    g , S(i,j-i) g

110
Pair-wise local alignment

  • T W L V R
  • 0 0 0 0 0 0
  • C 0
  • I 0
  • V 0
  • M 0
  • R 0
  • E 0
  • H 0
  • Y 0
  • Initial matrix entries 0

111
Pair-wise local alignment

  • T W L V R
  • 0 0 0 0 0 0
  • C 0 0
  • I 0
  • V 0
  • M 0
  • R 0
  • E 0
  • H 0
  • Y 0
  • s(C,T) -2

112
Pair-wise sequence alignment
  • Recursion formula for global alignment



113
Pair-wise sequence alignment
  • Recursion formula for local alignment



114
Pair-wise sequence alignment
  • For trace-back
  • Store positions imax and jmax with
  • S(imax ,jmax) maximal



115
Pair-wise local alignment

  • T W L V R E A Q Y I
  • X X
  • C X
  • I X
  • V X
  • M X
  • R X
  • E X X
  • F X
  • Y X X
  • T W L V - R E A Q I
  • - C I V M R E - F Y

116
Pair-wise local alignment

  • Algorithm by Smith and Waterman (1983)
  • Implementation e.g. BestFit in GCG package

117
Pair-wise local alignment
  • Complexity
  • l1 and l2 length of sequencescomputing time and
    memory proportional to l1 l2
  • Time and space complexity O(l1 l2)
  • Too slow for data base searching!
  • Therefore tools like BLAST necessary for database
    searching


118
The Basic Local Alignment Search Tool (BLAST)
  • New BLAST version (1997)
  • Two-hit strategy
  • Gapped BLAST
  • Position-Specific Iterative BLAST
  • (PSI BLAST)


119
The Basic Local Alignment Search Tool (BLAST)
  • PSI BLAST
  • search database with standard BLAST
  • take best hits and create multiple alignment
  • calculate profile from multiple alignment
  • search database again with profile as query


120
The Basic Local Alignment Search Tool (BLAST)



121
The Basic Local Alignment Search Tool (BLAST)
  • profile for sequence family or motif
  • table of amino acid/nucleotide frequencies at any
    position in alignment.


122
The Basic Local Alignment Search Tool (BLAST)
  • Profile frequencies of nucleotides at every
    position.
  • seq1 A T T G A T
  • seq2 C T T G T A G
  • seq3 A - - G T A T
  • seq4 A T G G T G T
  • seq5 A C T G T A C
  • A 80 0 0 0 0 80 0
  • T 0 75 75 0 100 0 60
  • C 20 25 0 0 0 0 20
  • G 0 0 25 100 0 20 20



123
Tools for multiple sequence alignment
  • s1 T Y I M R E A Q Y E S A Q
  • s2 T C I V M R E A Y E
  • s3 Y I M Q E V Q Q E R
  • s4 W R Y I A M R E Q Y E



124
Tools for multiple sequence alignment
  • s1 - T Y I - M R E A Q Y E S A Q
  • s2 - T C I V M R E A - Y E - - -
  • s3 - - Y I - M Q E V Q Q E R - -
  • s4 W R Y I A M R E - Q Y E - - -



125
Tools for multiple sequence alignment
  • s1 - T Y I - M R E A Q Y E S A Q
  • s2 - T C I V M R E A - Y E - - -
  • s3 - - Y I - M Q E V Q Q E R - -
  • s4 W R Y I A M R E - Q Y E - - -



126
Tools for multiple sequence alignment
  • s1 - T Y I - M R E A Q Y E S A Q
  • s2 - T C I V M R E A - Y E - - -
  • s3 - - Y I - M Q E V Q Q E R - -
  • s4 W R Y I A M R E - Q Y E - - -



127
Tools for multiple sequence alignment
  • s1 - T Y I - M R E A Q Y E S A Q
  • s2 - T C I V M R E A - Y E - - -
  • s3 - - Y I - M Q E V Q Q E R - -
  • s4 W R Y I A M R E - Q Y E - - -
  • General information in multiple alignment
  • Functionally important regions more conserved
    than non-functional regions
  • Local sequence conservation indicates
    functionality!



128
Tools for multiple sequence alignment
  • s1 - T Y I - M R E A Q Y E S A Q
  • s2 - T C I V M R E A - Y E - - -
  • s3 - - Y I - M Q E V Q Q E R - -
  • s4 W R Y I A M R E - Q Y E - - -
  • For phylogeny reconstruction
  • Estimate pairwise distances between sequences
    (distance-based methods for tree reconstruction)
  • Estimate evloutionary events in evolution
    (parsimony and maximum likelihood methods)



129
Tools for multiple sequence alignment
  • s1 - T Y I - M R E A Q Y E S A Q
  • s2 - T C I V M R E A - Y E - - -
  • s3 - - Y I - M Q E V Q Q E R - -
  • s4 W R Y I A M R E - Q Y E - - -
  • Astronomical number of possible alignments!



130
Tools for multiple sequence alignment
  • s1 - T Y I - M R E A Q Y E S A Q
  • s2 - T C I V M R E A - - - Y E -
  • s3 Y I - - - M Q E V Q Q E R - -
  • s4 W R Y I A M R E - Q Y E - - -
  • Astronomical number of possible alignments!



131
Tools for multiple sequence alignment
  • s1 - T Y I - M R E A Q Y E S A Q
  • s2 - T C I V M R E A - - - Y E -
  • s3 Y I - - - M Q E V Q Q E R - -
  • s4 W R Y I A M R E - Q Y E - - -
  • Computer has to decide which one is best??



132
Tools for multiple sequence alignment
  • Questions in development of multiple-alignment
    programs (as in pairwise alignment)
  • (1) What is a good alignment?
  • ? objective function (score)
  • (2) How to find a good alignment?
  • ? optimization algorithm
  • First question far more important !



133
Tools for multiple sequence alignment
  • Traditional Objective functions
  • Define Score of alignments as
  • Sum of individual similarity scores S(a,b)
  • Gap penalties
  • Needleman-Wunsch scoring system (1970)



134
Tools for multiple sequence alignment
  • Traditional Objective functions
  • Can be generalized to multiple alignment
  • (e.g. sum-of-pair score, tree alignment)
  • Needleman-Wunsch algorithm can also be
    generalized to multiple alignment, but
  • Very time and memory consuming!
  • -gt Heuristics needed



135
Multiple sequence alignment
  • First question how to score multiple
    alignments?
  • Possible scoring scheme
  • Sum-of-pairs score

136
Multiple sequence alignment
  • Multiple alignment implies pairwise alignments
  • 1aboA 36 WCEAQt..kngqGWVPSNYITPVN......
  • 1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP......
  • 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp
  • 1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd.....
  • 1vie 28 YAVESeahpgsvQIYPVAALERIN......

137
Multiple sequence alignment
  • Multiple alignment implies pairwise alignments
  • 1aboA 36 WCEAQt..kngqGWVPSNYITPVN......
  • 1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP......
  • 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp
  • 1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd.....
  • 1vie 28 YAVESeahpgsvQIYPVAALERIN......

138
Multiple sequence alignment
  • Multiple alignment implies pairwise alignments
  • 1aboA 36 WCEAQt..kngqGWVPSNYITPVN......
  • 1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP......
  • 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp
  • 1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd.....
  • 1vie 28 YAVESeahpgsvQIYPVAALERIN......

139
Multiple sequence alignment
  • Multiple alignment implies pairwise alignments
  • 1aboA 36 WCEAQt..kngqGWVPSNYITPVN......
  • 1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP......
  • 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp
  • 1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd.....
  • 1vie 28 YAVESeahpgsvQIYPVAALERIN......

140
Multiple sequence alignment
  • Multiple alignment implies pairwise alignments
  • Use sum of scores of these p.a.
  • 1aboA 36 WCEAQt..kngqGWVPSNYITPVN......
  • 1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP......
  • 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp
  • 1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd.....
  • 1vie 28 YAVESeahpgsvQIYPVAALERIN......

141
Multiple sequence alignment
  • Needleman-Wunsch coring scheme can be generalized
    from pair-wise to multiple alignment

142
Multiple sequence alignment

143
Multiple sequence alignment
  • Complexity
  • For sequences of length l1 l2 l3
  • O( l1 l2 l3 )
  • For n sequences ( average length l )
  • O( ln )
  • Exponential complexity!

144
Multiple sequence alignment
  • Needleman-Wunsch coring scheme can be generalized
    from pair-wise to multiple alignment
  • Optimal solution not feasible
  • -gt Heuristics necessary

145
Progressive Alignment
  • WCEAQTKNGQGWVPSNYITPVN
  • WWRLNDKEGYVPRNLLGLYP
  • AVVIQDNSDIKVVPKAKIIRD
  • YAVESEAHPGSFQPVAALERIN
  • WLNYNETTGERGDFPGTYVEYIGRKKISP

146
Progressive Alignment
  • WCEAQTKNGQGWVPSNYITPVN
  • WWRLNDKEGYVPRNLLGLYP
  • AVVIQDNSDIKVVPKAKIIRD
  • YAVESEAHPGSFQPVAALERIN
  • WLNYNETTGERGDFPGTYVEYIGRKKISP
  • Guide tree

147
Progressive Alignment
  • WCEAQTKNGQGWVPSNYITPVN
  • WWRLNDKEGYVPRNLLGLYP
  • AVVIQDNSDIKVVPKAKIIRD
  • YAVESEAHPGSFQPVAALERIN
  • WLNYNETTGERGDFPGTYVEYIGRKKISP
  • Idea align closely related sequences first!

148
Progressive Alignment
  • WCEAQTKNGQGWVPSNYITPVN
  • WW--RLNDKEGYVPRNLLGLYP-
  • AVVIQDNSDIKVVP--KAKIIRD
  • YAVESEASFQPVAALERIN
  • WLNYNEERGDFPGTYVEYIGRKKISP
  • Profile alignment, once a gap - always a gap

149
Progressive Alignment
  • WCEAQTKNGQGWVPSNYITPVN
  • WW--RLNDKEGYVPRNLLGLYP-
  • AVVIQDNSDIKVVP--KAKIIRD
  • YAVESEASVQ--PVAALERIN------
  • WLN-YNEERGDFPGTYVEYIGRKKISP
  • Profile alignment, once a gap - always a gap

150
Progressive Alignment
  • WCEAQTKNGQGWVPSNYITPVN-
  • WW--RLNDKEGYVPRNLLGLYP-
  • AVVIQDNSDIKVVP--KAKIIRD
  • YAVESEASVQ--PVAALERIN------
  • WLN-YNEERGDFPGTYVEYIGRKKISP
  • Profile alignment, once a gap - always a gap

151
Progressive Alignment
  • WCEAQTKNGQGWVPSNYITPVN--------
  • WW--RLNDKEGYVPRNLLGLYP--------
  • AVVIQDNSDIKVVP--KAKIIRD-------
  • YAVESEA---SVQ--PVAALERIN------
  • WLN-YNE---ERGDFPGTYVEYIGRKKISP
  • Profile alignment, once a gap - always a gap

152
Progressive Alignment
  • Greedy algorithm
  • Consider partial solution of bigger problem
  • search best partial solution, fix solution
  • search second-best partial solution that is
    consistent with first solution, fix solution
  • Search third-best partial solution etc.
  • E.g. Rucksack-Problem

153
Progressive Alignment
  • WCEAQTKNGQGWVPSNYITPVN--------
  • WW--RLNDKEGYVPRNLLGLYP--------
  • AVVIQDNSDIKVVP--KAKIIRD-------
  • YAVESEA---SVQ--PVAALERIN------
  • WLN-YNE---ERGDFPGTYVEYIGRKKISP
  • Profile alignment, once a gap - always a gap

154
Progressive Alignment
  • Most important software program
  • CLUSTAL W
  • J. Thompson, T. Gibson, D. Higgins (1994),
    CLUSTAL W improving the sensitivity of
    progressive multiple sequence alignment Nuc.
    Acids. Res. 22, 4673 - 4680
  • ( 18.000 citations in the literature)

155
Tools for multiple sequence alignment
  • Problems with traditional approach
  • Results depend on gap penalty
  • Heuristic guide tree determines alignment
  • alignment used for phylogeny reconstruction
  • Algorithm produces global alignments.



156
Tools for multiple sequence alignment
  • Problems with traditional approach
  • But
  • Many sequence families share only local
    similarity
  • E.g. sequences share one conserved motif



157
Local sequence alignment
EYENS

ERYENS
ERYAS
Find common motif in sequences ignore the rest
158
Local sequence alignment
E-YENS

ERYENS
ERYA-S
Find common motif in sequences ignore the rest
159
Local sequence alignment

E-YENS
ERYENS
ERYA-S
Find common motif in sequences ignore the rest
Local alignment
160
Local sequence alignment
  • Important methods for local multiple alignment
  • PIMA
  • MEME/MAST
  • Idea expectation maximation.

161
Local sequence alignment
Traditional alignment approaches Either global
or local methods!
162
New question sequence families with multiple
local similarities


Neither local nor global methods appliccable
163
New question sequence families with multiple
local similarities


Alignment possible if order conserved
164
The DIALIGN approach
  • Morgenstern, Dress, Werner (1996),
  • PNAS 93, 12098-12103
  • Combination of global and local methods
  • Assemble multiple alignment from
  • gap-free local pair-wise alignments
  • (,,fragments)

165
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

166
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

167
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

168
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

169
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

170
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

171
The DIALIGN approach
  • atc------taatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

172
The DIALIGN approach
  • atc------taatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaa--gagtatcacccctgaattgaataa

173
The DIALIGN approach
  • atc------taatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaa--gagtatcacc----------cctgaattgaataa

174
The DIALIGN approach
  • atc------taatagttaaactcccccgtgc-ttag
  • cagtgcgtgtattactaac----------gg-ttcaatcgcg
  • caaa--gagtatcacc----------cctgaattgaataa

175
The DIALIGN approach
Consistency!
  • atc------taatagttaaactcccccgtgc-ttag
  • cagtgcgtgtattactaac----------gg-ttcaatcgcg
  • caaa--gagtatcacc----------cctgaattgaataa

176
The DIALIGN approach
  • atc------TAATAGTTAaactccccCGTGC-TTag
  • cagtgcGTGTATTACTAAc----------GG-TTCAATcgcg
  • caaa--GAGTATCAcc----------CCTGaaTTGAATaa

177
The DIALIGN approach
  • Score of an alignment
  • Define score of fragment f
  • l(f) length of f
  • s(f) sum of matches (similarity values)
  • P(f) probability to find a fragment with length
    l(f) and at least s(f) matches in random
    sequences that have the same length as the input
    sequences.
  • Score w(f) -ln P(f)

178
The DIALIGN approach
  • Score of an alignment
  • Define score of fragment f
  • Define score of alignment as
  • sum of scores of involved fragments
  • No gap penalty!

179
The DIALIGN approach
  • Score of an alignment
  • Goal in fragment-based alignment approach find
  • Consistent collection of fragments with maximum
    sum of weight scores

180
The DIALIGN approach
  • atctaatagttaaaccccctcgtgcttagagatccaaac
  • cagtgcgtgtattactaacggttcaatcgcgcacatccgc
  • Pair-wise alignment

181
The DIALIGN approach
  • atctaatagttaaaccccctcgtgcttagagatccaaac
  • cagtgcgtgtattactaacggttcaatcgcgcacatccgc
  • Pair-wise alignment
  • recursive algorithm finds optimal chain of
  • fragments.

182
The DIALIGN approach
  • ------atctaatagttaaaccccctcgtgcttag-------agatccaa
    ac
  • cagtgcgtgtattactaac----------ggttcaatcgcgcacatccgc
    --
  • Pair-wise alignment
  • recursive algorithm finds optimal chain of
  • fragments.

183
The DIALIGN approach
  • ------atctaatagttaaaccccctcgtgcttag-------agatccaa
    ac
  • cagtgcgtgtattactaac----------ggttcaatcgcgcacatccgc
    --
  • Optimal pairwise alignment chain of fragments
    with maximum sum of weights found by dynamic
    programming
  • Standard fragment-chaining algorithm
  • Space-efficient algorithm

184
The DIALIGN approach
  • Multiple alignment
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

185
The DIALIGN approach
  • Multiple alignment
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaccctgaattgaagagtatcacataa
  • (1) Calculate all optimal pair-wise alignments

186
The DIALIGN approach
  • Multiple alignment
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa
  • (1) Calculate all optimal pair-wise alignments

187
The DIALIGN approach
  • Multiple alignment
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa
  • (1) Calculate all optimal pair-wise alignments

188
The DIALIGN approach
  • Fragments from optimal pair-wise alignments
  • might be inconsistent

189
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

190
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

191
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

192
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaa--gagtatcacccctgaattgaataa

193
The DIALIGN approach
  • atc------taatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaa--gagtatcacccctgaattgaataa

194
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

195
The DIALIGN approach
  • Fragments from optimal pair-wise alignments might
    be inconsistent
  • Sort fragments according to scores
  • Include them one-by-one into growing multiple
    alignment as long as they are consistent
  • (greedy algorithm, comparable to knapsack
    problem)

196
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

197
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
Write a Comment
User Comments (0)
About PowerShow.com