Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of G - PowerPoint PPT Presentation

About This Presentation
Title:

Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of G

Description:

Astronomical Number of possible alignments! Tools for multiple sequence alignment ... `True' alignment known by information about structure or evolution. ... – PowerPoint PPT presentation

Number of Views:134
Avg rating:3.0/5.0
Slides: 90
Provided by: publ153
Category:

less

Transcript and Presenter's Notes

Title: Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of G


1
Bioinformatics Methods CourseMultiple Sequence
AlignmentBurkhard Morgenstern University of
GöttingenInstitute of Microbiology and Genetics
Department of BioinformaticsGöttingen,
October/November 2006
2
Tools for multiple sequence alignment
  • T Y I M R E A Q Y E
  • T C I V M R E A Y E



3
Tools for multiple sequence alignment
  • T Y I - M R E A Q Y E
  • T C I V M R E A - Y E



4
Tools for multiple sequence alignment
  • T Y I M R E A Q Y E
  • T C I V M R E A Y E
  • Y I M Q E V Q Q E
  • Y I A M R E Q Y E



5
Tools for multiple sequence alignment
  • T Y I - M R E A Q Y E
  • T C I V M R E A - Y E
  • Y - I - M Q E V Q Q E
  • Y I A M R E - Q Y E



6
Tools for multiple sequence alignment
  • T Y I - M R E A Q Y E
  • T C I V M R E A - Y E
  • - Y I - M Q E V Q Q E
  • Y I A M R E - Q Y E
  • Astronomical Number of possible alignments!



7
Tools for multiple sequence alignment
  • T Y I - M R E A Q Y E
  • T C I V - M R E A Y E
  • - Y I - M Q E V Q Q E
  • Y I A M R E - Q Y E
  • Astronomical Number of possible alignments!



8
Tools for multiple sequence alignment
  • T Y I - M R E A Q Y E
  • T C I V M R E A - Y E
  • - Y I - M Q E V Q Q E
  • Y I A M R E - Q Y E
  • Which one is the best ???



9
Tools for multiple sequence alignment
  • Questions in development of alignment programs
  • (1) What is a good alignment?
  • ? objective function (score)
  • (2) How to find a good alignment?
  • ? optimization algorithm
  • First question far more important !



10
Tools for multiple sequence alignment
  • Before defining an objective function (scoring
    scheme)
  • What is a biologically good alignment ??



11
Tools for multiple sequence alignment
  • Criteria for alignment quality
  • 3D-Structure align residues at corresponding
    positions in 3D structure of protein!
  • Evolution align residues with common ancestors!



12
Tools for multiple sequence alignment
  • T Y I - M R E A Q Y E
  • T C I V - M R E A Y E
  • - Y I - M Q E V Q Q E
  • - Y I A M R E - Q Y E
  • Alignment hypothesis about sequence evolution
  • Search for most plausible hypothesis!



13
Tools for multiple sequence alignment
  • Compute for amino acids a and b
  • Probability pa,b of substitution
  • a ? b (or b ? a),
  • Frequency qa of a
  • Define
  • s(a,b) log (pa,b / qa qb)



14
Tools for multiple sequence alignment
15
(No Transcript)
16
Tools for multiple sequence alignment
  • Traditional objective functions
  • Define Score of alignments as
  • Sum of individual similarity scores s(a,b)
  • Gap penalty g for each gap in alignment
  • Needleman-Wunsch scoring system (1970) for
    pairwise alignment ( alignment of two sequences)



17
  • T Y W I V
  • T - - L V
  • Example
  • Score s(T,T) s(I,L) s (V,V) 2 g



18
  • T Y W I V
  • T - - L V
  • Idea alignment with optimal (maximal) score
    probably biologically meaningful.
  • Dynamic programming algorithm finds optimal
    alignment for two sequences efficiently
    (Needleman and Wunsch, 1970).



19
Tools for multiple sequence alignment
  • Traditional Objective functions can be
    generalized to multiple alignment (e.g.
    sum-of-pair score, tree alignment)
  • Needleman-Wunsch algorithm can also be
    generalized to find optimal multiple alignment,
    but
  • Very time and memory consuming!
  • -gt Heuristic algorithm needed, i.e. fast but
    sub-optimal solution



20
Tools for multiple sequence alignment
  • Most commonly used heuristic for multiple
    alignment
  • Progressive alignment
  • (mid 1980s)



21
Progressive Alignment
  • WCEAQTKNGQGWVPSNYITPVN
  • WWRLNDKEGYVPRNLLGLYP
  • AVVIQDNSDIKVVPKAKIIRD
  • YAVESEAHPGSFQPVAALERIN
  • WLNYNETTGERGDFPGTYVEYIGRKKISP

22
Progressive Alignment
  • WCEAQTKNGQGWVPSNYITPVN
  • WWRLNDKEGYVPRNLLGLYP
  • AVVIQDNSDIKVVPKAKIIRD
  • YAVESEAHPGSFQPVAALERIN
  • WLNYNETTGERGDFPGTYVEYIGRKKISP
  • Guide tree

23
Progressive Alignment
  • WCEAQTKNGQGWVPSNYITPVN
  • WW--RLNDKEGYVPRNLLGLYP-
  • AVVIQDNSDIKVVP--KAKIIRD
  • YAVESEASFQPVAALERIN
  • WLNYNEERGDFPGTYVEYIGRKKISP
  • Profile alignment, once a gap - always a gap

24
Progressive Alignment
  • WCEAQTKNGQGWVPSNYITPVN
  • WW--RLNDKEGYVPRNLLGLYP-
  • AVVIQDNSDIKVVP--KAKIIRD
  • YAVESEASVQ--PVAALERIN------
  • WLN-YNEERGDFPGTYVEYIGRKKISP
  • Profile alignment, once a gap - always a gap

25
Progressive Alignment
  • WCEAQTKNGQGWVPSNYITPVN-
  • WW--RLNDKEGYVPRNLLGLYP-
  • AVVIQDNSDIKVVP--KAKIIRD
  • YAVESEASVQ--PVAALERIN------
  • WLN-YNEERGDFPGTYVEYIGRKKISP
  • Profile alignment, once a gap - always a gap

26
Progressive Alignment
  • WCEAQTKNGQGWVPSNYITPVN--------
  • WW--RLNDKEGYVPRNLLGLYP--------
  • AVVIQDNSDIKVVP--KAKIIRD-------
  • YAVESEA---SVQ--PVAALERIN------
  • WLN-YNE---ERGDFPGTYVEYIGRKKISP
  • Profile alignment, once a gap - always a gap

27
CLUSTAL W
  • Most important software program
  • CLUSTAL W
  • J. Thompson, T. Gibson, D. Higgins (1994),
    CLUSTAL W improving the sensitivity of
    progressive multiple sequence alignment Nuc.
    Acids. Res. 22, 4673 - 4680
  • ( 20.000 citations in the literature)

28
Tools for multiple sequence alignment
  • Problems with traditional approach
  • Results depend on gap penalty
  • Heuristic guide tree determines alignment
  • alignment used for phylogeny reconstruction
  • Algorithm produces global alignments.



29
Tools for multiple sequence alignment
  • Problems with traditional approach
  • But
  • Many sequence families share only local
    similarity
  • E.g. sequences share one conserved motif



30
Local sequence alignment
EYENS

ERYENS
ERYAS
Find common motif in sequences ignore the rest
31
Local sequence alignment
E-YENS

ERYENS
ERYA-S
Find common motif in sequences ignore the rest
32
Local sequence alignment

E-YENS
ERYENS
ERYA-S
Find common motif in sequences ignore the rest
Local alignment
33
Gibbs Motive Sampler
Local multiple alignment without gaps C.E.
Lawrence et al. (1993) Detecting subtle sequence
signals a Gibbs Sampling Strategy for Multiple
Alignment Science, 262, 208 - 214
34
Traditional alignment approaches Either global
or local methods!
35
New question sequence families with multiple
local similarities


Neither local nor global methods appliccable
36
New question sequence families with multiple
local similarities


Alignment possible if order conserved
37
The DIALIGN approach
  • Morgenstern, Dress, Werner (1996),
  • PNAS 93, 12098-12103
  • Combination of global and local methods
  • Assemble multiple alignment from
  • gap-free local pair-wise alignments
  • (,,fragments)

38
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

39
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

40
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

41
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

42
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

43
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

44
The DIALIGN approach
  • atc------taatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

45
The DIALIGN approach
  • atc------taatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaa--gagtatcacccctgaattgaataa

46
The DIALIGN approach
  • atc------taatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaa--gagtatcacc----------cctgaattgaataa

47
The DIALIGN approach
  • atc------taatagttaaactcccccgtgc-ttag
  • cagtgcgtgtattactaac----------gg-ttcaatcgcg
  • caaa--gagtatcacc----------cctgaattgaataa

48
The DIALIGN approach
Consistency!
  • atc------taatagttaaactcccccgtgc-ttag
  • cagtgcgtgtattactaac----------gg-ttcaatcgcg
  • caaa--gagtatcacc----------cctgaattgaataa

49
The DIALIGN approach
  • atc------TAATAGTTAaactccccCGTGC-TTag
  • cagtgcGTGTATTACTAAc----------GG-TTCAATcgcg
  • caaa--GAGTATCAcc----------CCTGaaTTGAATaa

50
The DIALIGN approach
  • Multiple alignment
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

51
The DIALIGN approach
  • Multiple alignment
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaccctgaattgaagagtatcacataa
  • (1) Calculate all optimal pair-wise alignments

52
The DIALIGN approach
  • Multiple alignment
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa
  • (1) Calculate all optimal pair-wise alignments

53
The DIALIGN approach
  • Multiple alignment
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa
  • (1) Calculate all optimal pair-wise alignments

54
The DIALIGN approach
  • Fragments from optimal pair-wise alignments
  • might be inconsistent

55
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

56
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

57
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

58
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaa--gagtatcacccctgaattgaataa

59
The DIALIGN approach
  • atc------taatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaa--gagtatcacccctgaattgaataa

60
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

61
The DIALIGN approach

  • Score of alignment
  • Define weight score for fragments based on
    probability of random occurrence
  • Score of alignment sum of weight scores of
    fragments
  • Goal find consistent set of fragments with
    maximum total weight

62
The DIALIGN approach

  • Advantages of segment-based approach
  • Program can produce global and local alignments!
  • Sequence families alignable that cannot be
    aligned with standard methods

63
T-COFFEE

  • C. Notredame, D. Higgins, J. Heringa (2000),
    T-Coffee A novel algorithm for multiple sequence
    alignment, J. Mol. Biol.

64


65


66


67
T-COFFEE

  • T-COFFEE
  • Less sensitive to spurious pairwise similarities
  • Can handle local homologies better than CLUSTAL

68
T-COFFEE

  • T-COFFEE
  • Idea
  • Build library of pairwise alignments
  • Alignment from seq i, j and seq j, k supports
    alignmetn from seq i, k.

69
Evaluation of multi-alignment methods

  • Alignment evaluation by comparison to trusted
    benchmark alignments.
  • True alignment known by information about
    structure or evolution.

70
Evaluation of multi-alignment methods
  • For protein alignment
  • M. McClure et al. (1994)
  • 4 protein families, known functional sites
  • J. Thompson et al. (1999)
  • Benchmark data base, 130 known 3D structures
    (BAliBASE)
  • T. Lassmann E. Sonnhammer (2002)
    BAliBASE simulated evolution (ROSE)


71
Evaluation of multi-alignment methods


72

Evaluation of multi-alignment methods

1aboA 1 .NLFVALYDfvasgdntlsitkGEKLRVLgynhn
..............gE 1ycsB 1
kGVIYALWDyepqnddelpmkeGDCMTIIhrede............deiE
1pht 1 gYQYRALYDykkereedidlhlGDILTVNkgs
lvalgfsdgqearpeeiG 1ihvA 1
.NFRVYYRDsrd......pvwkGPAKLLWkg.................eG
1vie 1 .drvrkksga.........awqGQIVGWYctn
lt.............peG 1aboA 36
WCEAQt..kngqGWVPSNYITPVN...... 1ycsB 39
WWWARl..ndkeGYVPRNLLGLYP...... 1pht 51
WLNGYnettgerGDFPGTYVEYIGrkkisp 1ihvA 27
AVVIQd..nsdiKVVPRRKAKIIRd..... 1vie 28
YAVESeahpgsvQIYPVAALERIN...... Key alpha
helix RED beta strand GREEN core blocks
UNDERSCORE
BAliBASE Reference alignments
73


74
Result DIALIGN best method for distantly related
sequences, T-Coffee best for globally related
proteins


75
Evaluation of multi-alignment methods
  • BAliBASE 5 categories of benchmark sequences
    (globally related, internal gaps, end gaps)
  • CLUSTAL W, T-COFFEE, MAFFT, PROBCONS perform well
    on globally related sequences, DIALIGN superior
    for local similarities


76
Evaluation of multi-alignment methods
  • Conclusion no single best multi alignment
    program!
  • Advice try different methods!


77
Anchored sequence alignment
  • Idea semi-automatic alignment
  • use expert knowledge to define constraints
    instead of fully automated alignment
  • Define parts of the sequences where biologically
    correct alignment is known as anchor points,
    align rest of the sequences automatically.


78
Anchored sequence alignment
  • NLFVALYDFVASGDNTLSITKGEKLRVLGYNHN
  • IIHREDKGVIYALWDYEPQNDDELPMKEGDCMT
  • GYQYRALYDYKKEREEDIDLHLGDILTVNKGSLVALGFS


79
Anchored sequence alignment
  • NLFVALYDFVASGDNTLSITKGEKLRVLGYNHN
  • IIHREDKGVIYALWDYEPQNDDELPMKEGDCMT
  • GYQYRALYDYKKEREEDIDLHLGDILTVNKGSLVALGFS
  • Anchor points in multiple alignment


80
Anchored sequence alignment
  • NLFV ALYDFVASGDNTLSITKGEKLRVLGYNHN
  • IIHREDKGVIYALWDYEPQND DELPMKEGDCMT
  • GYQYRALYDYKKEREEDIDLHLGDILTVNKGSLVALGFS
  • Anchor points in multiple alignment


81
Anchored sequence alignment
  • -------NLF V-ALYDFVAS GD-------- NTLSITKGEk
    lrvLGYNhn
  • iihredkGVI Y-ALWDYEPQ ND-------- DELPMKEGDC
    MT-------
  • -------GYQ YrALYDYKKE REedidlhlg DILTVNKGSL
    VA-LGFS--
  • Anchored multiple alignment


82
Algorithmic questions
  • Goal
  • Find optimal alignment (consistent set of
    fragments) under costraints given by
    user-specified anchor points!

83
Algorithmic questions
  • Additional input file with anchor points
  • 1 3 215 231 5 4.5
  • 2 3 34 78 23 1.23
  • 1 4 317 402 8 8.5

84
Algorithmic questions
  • NLFVALYDFVASGDNTLSITKGEKLRVLGYNHN
  • IIHREDKGVIYALWDYEPQNDDELPMKEGDCMT
  • GYQYRALYDYKKEREEDIDLHLGDILTVNKGSLVALGFS

85
Algorithmic questions
  • Additional input file with anchor points
  • 1 3 215 231 5 4.5
  • 2 3 34 78 23 1.23
  • 1 4 317 402 8 8.5

86
Algorithmic questions
  • Additional input file with anchor points
  • 1 3 215 231 5 4.5
  • 2 3 34 78 23 1.23
  • 1 4 317 402 8 8.5
  • Sequences

87
Algorithmic questions
  • Additional input file with anchor points
  • 1 3 215 231 5 4.5
  • 2 3 34 78 23 1.23
  • 1 4 317 402 8 8.5
  • Sequences start positions

88
Algorithmic questions
  • Additional input file with anchor points
  • 1 3 215 231 5 4.5
  • 2 3 34 78 23 1.23
  • 1 4 317 402 8 8.5
  • Sequences start positions length

89
Algorithmic questions
  • Additional input file with anchor points
  • 1 3 215 231 5 4.5
  • 2 3 34 78 23 1.23
  • 1 4 317 402 8 8.5
  • Sequences start positions length
    score
Write a Comment
User Comments (0)
About PowerShow.com