Exon prediction by Genomic Sequence alignment Burkhard Morgenstern and Oliver Rinner - PowerPoint PPT Presentation

About This Presentation
Title:

Exon prediction by Genomic Sequence alignment Burkhard Morgenstern and Oliver Rinner

Description:

Sequence alignment in molecular data analysis: (M. Brudno) ... local alignment (Smith and Waterman, 1983) atctaatagttaatactcgtccaagtat ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 97
Provided by: publ153
Category:

less

Transcript and Presenter's Notes

Title: Exon prediction by Genomic Sequence alignment Burkhard Morgenstern and Oliver Rinner


1

Vorlesung Grundlagen der Bioinformatik http//g
obics.de/lectures/ss07/grundlagen


2
Sequence alignment in molecular data analysis
Information from a Single Sequence Alone
3
Sequence alignment in molecular data analysis
Information from a Single Sequence Alone
Multi-Organism High Quality Sequences
(M. Brudno)
4
Tools for multiple sequence alignment
  • seq1 T Y I M R E A Q Y E
  • seq2 T C I V M R E A Y E
  • seq3 Y I M Q E V Q Q E
  • seq4 Y I A M R E Q Y E



5
Tools for multiple sequence alignment
  • seq1 T Y I - M R E A Q Y E
  • seq2 T C I V M R E A - Y E
  • seq3 Y - I - M Q E V Q Q E
  • seq4 Y I A M R E - Q Y E



6
Tools for multiple sequence alignment
  • seq1 T Y I - M R E A Q Y E
  • seq2 T C I V M R E A - Y E
  • seq3 Y - I - M Q E V Q Q E
  • seq4 Y I A M R E - Q Y E



7
Tools for multiple sequence alignment
  • seq1 T Y I - M R E A Q Y E
  • seq2 T C I V M R E A - Y E
  • seq3 Y - I - M Q E V Q Q E
  • seq4 Y I A M R E - Q Y E



8
Tools for multiple sequence alignment
  • seq1 T Y I - M R E A Q Y E
  • seq2 T C I V M R E A - Y E
  • seq3 Y - I - M Q E V Q Q E
  • seq4 Y I A M R E - Q Y E



9
Tools for multiple sequence alignment
  • seq1 T Y I - M R E A Q Y E
  • seq2 T C I V M R E A - Y E
  • seq3 Y - I - M Q E V Q Q E
  • seq4 Y I A M R E - Q Y E
  • Functionally important regions more conserved
    than non-functional regions



10
Tools for multiple sequence alignment
  • seq1 T Y I - M R E A Q Y E
  • seq2 T C I V M R E A - Y E
  • seq3 Y - I - M Q E V Q Q E
  • seq4 Y I A M R E - Q Y E
  • Functionally important regions more conserved
    than non-functional regions
  • Local sequence conservation indicates
    functionality!



11
Tools for multiple sequence alignment
  • seq1 T Y I - M R E A Q Y E
  • seq2 T C I V M R E A - Y E
  • seq3 - Y I - M Q E V Q Q E
  • seq4 Y I A M R E - Q Y E
  • Astronomical Number of possible alignments!



12
Tools for multiple sequence alignment
  • seq1 T Y I - M R E A Q Y E
  • seq2 T C I V - M R E A Y E
  • seq3 - Y I - M Q E V Q Q E
  • seq4 Y I A M R E - Q Y E
  • Astronomical Number of possible alignments!



13
Tools for multiple sequence alignment
  • seq1 T Y I - M R E A Q Y E
  • seq2 T C I V M R E A - Y E
  • seq3 - Y I - M Q E V Q Q E
  • seq4 Y I A M R E - Q Y E
  • Which one is the best ???



14
Tools for multiple sequence alignment
  • Questions in development of alignment programs
  • (1) What is a good alignment?
  • ? objective function (score)
  • (2) How to find a good alignment?
  • ? optimization algorithm
  • First question far more important !



15
Tools for multiple sequence alignment
  • Most important scoring scheme for multiple
    alignment
  • Sum-of-pairs score for global alignment.



16
Divide-and-Conquer Alignment (DCA)
  • J. Stoye, A. Dress (Bielefeld)
  • Approximate optimal global multiple alignment
  • Divide sequences into small sub-sequences
  • Use MSA to calculate optimal alignment for
    sub-sequences
  • Concatenate sub-alignments

17
Divide-and-Conquer Alignment (DCA)

18
Divide-and-Conquer Alignment (DCA)
19
Tools for multiple sequence alignment
  • Problems with traditional approach
  • Results depend on gap penalty
  • Heuristic guide tree determines alignment
    alignment used for phylogeny reconstruction
  • Algorithm produces global alignments.



20
First step in sequence comparison alignment
  • global alignment (Needleman and Wunsch, 1970
    Clustal W)
  • atctaatagttaatactcgtccaagtat
  • atctgtattactaaacaactggtgctacta

21
First step in sequence comparison alignment
  • global alignment (Needleman and Wunsch, 1970
    Clustal W)
  • atc--taatagttaat--actcgtccaagtat
  • atctgtattact-aaacaactggtgctacta-

22
First step in sequence comparison alignment
  • global alignment (Needleman and Wunsch, 1970
    Clustal W)
  • atc--taatagttaat--actcgtccaagtat
  • atctgtattact-aaacaactggtgctacta-
  • local alignment (Smith and Waterman, 1983)
  • atctaatagttaatactcgtccaagtat
  • gcgtgtattactaaacggttcaatctaacat

23
First step in sequence comparison alignment
  • global alignment (Needleman and Wunsch, 1970
    Clustal W)
  • atc--taatagttaat--actcgtccaagtat
  • atctgtattact-aaacaactggtgctacta-
  • local alignment (Smith and Waterman, 1983)
  • atctaatagttaatactcgtccaagtat
  • gcgtgtattactaaacggttcaatctaacat

24
First step in sequence comparison alignment
  • global alignment (Needleman and Wunsch, 1970
    Clustal W)
  • atc--taatagttaat--actcgtccaagtat
  • atctgtattact-aaacaactggtgctacta-
  • local alignment (Smith and Waterman, 1983)
  • atc--taatagttaatactcgtccaagtat
  • gcgtgtattact-aaacggttcaatctaacat

25
New question sequence families with multiple
local similarities


Neither local nor global methods appliccable
26
New question sequence families with multiple
local similarities


Alignment possible if order conserved
27
The DIALIGN approach
  • Morgenstern, Dress, Werner (1996),
  • PNAS 93, 12098-12103
  • Combination of global and local methods
  • Assemble multiple alignment from
  • gap-free local pair-wise alignments
  • (,,fragments)

28
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

29
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

30
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

31
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

32
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

33
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

34
The DIALIGN approach
  • atc------taatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

35
The DIALIGN approach
  • atc------taatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaa--gagtatcacccctgaattgaataa

36
The DIALIGN approach
  • atc------taatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaa--gagtatcacc----------cctgaattgaataa

37
The DIALIGN approach
  • atc------taatagttaaactcccccgtgc-ttag
  • cagtgcgtgtattactaac----------gg-ttcaatcgcg
  • caaa--gagtatcacc----------cctgaattgaataa

38
The DIALIGN approach
Consistency!
  • atc------taatagttaaactcccccgtgc-ttag
  • cagtgcgtgtattactaac----------gg-ttcaatcgcg
  • caaa--gagtatcacc----------cctgaattgaataa

39
The DIALIGN approach
  • atc------TAATAGTTAaactccccCGTGC-TTag
  • cagtgcGTGTATTACTAAc----------GG-TTCAATcgcg
  • caaa--GAGTATCAcc----------CCTGaaTTGAATaa

40
The DIALIGN approach
  • Score of an alignment
  • Define score of fragment f
  • l(f) length of f
  • s(f) sum of matches (similarity values)
  • P(f) probability to find a fragment with length
    l(f) and at least s(f) matches in random
    sequences that have the same length as the input
    sequences.
  • Score w(f) -ln P(f)

41
The DIALIGN approach
  • Score of an alignment
  • Define score of fragment f
  • Define score of alignment as
  • sum of scores of involved fragments
  • No gap penalty!

42
The DIALIGN approach
  • Score of an alignment
  • Goal in fragment-based alignment approach find
  • Consistent collection of fragments with maximum
    sum of weight scores

43
The DIALIGN approach
  • atctaatagttaaaccccctcgtgcttagagatccaaac
  • cagtgcgtgtattactaacggttcaatcgcgcacatccgc
  • Pair-wise alignment

44
The DIALIGN approach
  • atctaatagttaaaccccctcgtgcttagagatccaaac
  • cagtgcgtgtattactaacggttcaatcgcgcacatccgc
  • Pair-wise alignment
  • recursive algorithm finds optimal chain of
  • fragments.

45
The DIALIGN approach
  • ------atctaatagttaaaccccctcgtgcttag-------agatccaa
    ac
  • cagtgcgtgtattactaac----------ggttcaatcgcgcacatccgc
    --
  • Pair-wise alignment
  • recursive algorithm finds optimal chain of
  • fragments.

46
The DIALIGN approach
  • ------atctaatagttaaaccccctcgtgcttag-------agatccaa
    ac
  • cagtgcgtgtattactaac----------ggttcaatcgcgcacatccgc
    --
  • Optimal pairwise alignment chain of fragments
    with maximum sum of weights found by dynamic
    programming
  • Standard fragment-chaining algorithm
  • Space-efficient algorithm

47
The DIALIGN approach
  • Multiple alignment
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

48
The DIALIGN approach
  • Multiple alignment
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaccctgaattgaagagtatcacataa
  • (1) Calculate all optimal pair-wise alignments

49
The DIALIGN approach
  • Multiple alignment
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa
  • (1) Calculate all optimal pair-wise alignments

50
The DIALIGN approach
  • Multiple alignment
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa
  • (1) Calculate all optimal pair-wise alignments

51
The DIALIGN approach
  • Fragments from optimal pair-wise alignments
  • might be inconsistent

52
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

53
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

54
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

55
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaa--gagtatcacccctgaattgaataa

56
The DIALIGN approach
  • atc------taatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaa--gagtatcacccctgaattgaataa

57
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

58
The DIALIGN approach
  • Fragments from optimal pair-wise alignments might
    be inconsistent
  • Sort fragments according to scores
  • Include them one-by-one into growing multiple
    alignment as long as they are consistent
  • (greedy algorithm, comparable to rucksack
    problem)

59
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

60
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

61
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

62
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

63
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa
  • Consistency problem

64
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa
  • Consistency problem

65
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa
  • Upper and lower bounds for alignable positions

66
The DIALIGN approach
  • atc------taatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaa--gagtatcacccctgaattgaataa
  • Upper and lower bounds for alignable positions

67
The DIALIGN approach
  • atc------taatagt taaactcccccgtgcttag
  • Cagtgcgtgtattact aacggttcaatcgcg
  • caaa--gagtatcacccctgaattgaataa
  • Upper and lower bounds for alignable positions

68
The DIALIGN approach
  • atc------taata-----gttaaactcccccgtgcttag
  • Cagtgcgtgtatta-----ctaacggttcaatcgcg
  • caaa--gagtatcacccctgaattgaataa
  • Upper and lower bounds for alignable positions

69
The DIALIGN approach
site x i,p (sequence i, position p)
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa
  • Upper and lower bounds for alignable positions

70
The DIALIGN approach
Calculate upper bound bl(x,i) and lower
bound bu(x,i) for each x and sequence i
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa
  • Upper and lower bounds for alignable positions

71
The DIALIGN approach
bl(x,i) and bu(x,i) updated for each new
fragment in alignment
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa
  • Upper and lower bounds for alignable positions

72
The DIALIGN approach
  • Consistency bounds are to be updated for each
  • new fragment that is included in to the growing
  • Alignment
  • Efficient algorithm
  • (Abdeddaim and Morgenstern, 2002)

73
The DIALIGN approach

  • Advantages of segment-based approach
  • Program can produce global and local alignments!
  • Sequence families alignable that cannot be
    aligned with standard methods

74
Program input
  • Program usage
  • gt dialign2-2 options ltinput_filegt
  • ltinput_filegt multi-sequence file in
    FASTA-format

75
Program output

  • DIALIGN 2.2.1
  • Program code written by Burkhard
    Morgenstern and Said Abdeddaim
  • e-mail contact bmorgen_at_gwdg.de
  • Published research assisted by
    DIALIGN 2 should cite
  • Burkhard Morgenstern (1999).
  • DIALIGN 2 improvement of the
    segment-to-segment
  • approach to multiple sequence
    alignment.
  • Bioinformatics 15, 211 - 218.
  • For more information, please visit
    the DIALIGN home page at
  • http//bibiserv.techfak.uni-bielefe
    ld.de/dialign/

76
Program output
  • Alignment (DIALIGN format)
  • dog_il4 1 cagg------ ----GTTTGA
    atctgataca ttgc------ ----------
  • bla 1 ctga------ ----------
    ---------- --------GC CAAGTGGGAA
  • blu 1 ttttgatatg agaaGTGTGA
    aacaagctat cctatattGC TAAGTGGCAG

  • 0000000000 0000000000
    0000000000 0000000011 1111111111
  • dog_il4 25 ---------- --ATGGCACT
    GGGGTGAATG AGGCAGGCAG CAGAATGATC
  • bla 17 ggtgtgaata catgggtttc
    cagtaccttc tgaggtccag agtacc----
  • blu 51 ccctggcttt ctATGTGCAC
    AGAATGGGAG GAAAGTGCCT GCTAGTGAGC

  • 0000000000 0000000000
    0000000000 0000000000 0000000000
  • dog_il4 63 GTACTGCAGC CCTGAGCTTC
    CACTGGCCCA TGTTGGTATC CTTGTATTTT
  • bla 63 ---------- ----------
    ---TTTCCCA TGTGCTCCAT GGTGGAATGG

77
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

78
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

79
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

80
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

81
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

82
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

83
The DIALIGN approach
  • atc------taatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

84
The DIALIGN approach
  • atc------taatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaa--gagtatcacccctgaattgaataa

85
The DIALIGN approach
  • atc------taatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaa--gagtatcacc----------cctgaattgaataa

86
The DIALIGN approach
  • atc------taatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaac----------ggttcaatcgcg
  • caaa--gagtatcacc----------cctgaattgaataa

87
The DIALIGN approach
  • atc------taatagttaaactcccccgtgc-ttag
  • cagtgcgtgtattactaac----------gg-ttcaatcgcg
  • caaa--gagtatcacc----------cctgaattgaataa

88
The DIALIGN approach
  • atc------TAATAGTTAaactccccCGTGC-TTag------
  • cagtgcGTGTATTACTAAc----------GG-TTCAATcgcg
  • caaa--GAGTATCAcc----------CCTGaaTTGAATaa--

89
The DIALIGN approach
  • atc------taatagttaaactcccccgtgc-ttag
  • cagtgcgtgtattactaac----------gg-ttcaatcgcg
  • caaa--gagtatcacc----------cctgaattgaataa

90
Alignment of large genomic sequences
  • Fragment-based alignment approach useful for
    alignment of genomic sequences.
  • Possible applications
  • Detection of regulatory elements
  • Identification of pathogenic microorganisms
  • Gene prediction

91
DIALIGN alignment of human and murine genomic
sequences
92
DIALIGN alignment of tomato and Thaliana genomic
sequences
93
Alignment of large genomic sequences


Gene-regulatory sites identified by mulitple
sequence alignment (phylogenetic footprinting)
94
Alignment of large genomic sequences


95
Performance of long-range alignment programs for
exon discovery (human - mouse comparison)
96
Performance of long-range alignment programs for
exon discovery (thaliana - tomato comparison)
Write a Comment
User Comments (0)
About PowerShow.com