Exon prediction by Genomic Sequence alignment Burkhard Morgenstern and Oliver Rinner - PowerPoint PPT Presentation

Loading...

PPT – Exon prediction by Genomic Sequence alignment Burkhard Morgenstern and Oliver Rinner PowerPoint presentation | free to download - id: 23440e-N2IwZ



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Exon prediction by Genomic Sequence alignment Burkhard Morgenstern and Oliver Rinner

Description:

Task in bioinformatics: Find best multiple alignment for given sequence set ... Computer has to decide: which one is best?? Tools for multiple sequence alignment ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 84
Provided by: publ153
Learn more at: http://www.gobics.de
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Exon prediction by Genomic Sequence alignment Burkhard Morgenstern and Oliver Rinner


1

Burkhard Morgenstern Institut für Mikrobiologie
und Genetik Molekulare Evolution und
Rekonstruktion von phylogenetischen Bäumen WS
2006/2007


2
  • Goal
  • Phylogeny reconstruction based on molecular
    sequence data (DNA, RNA, protein sequences)

3
Multiple sequence alignment
  • Molecular phylogeny reconstruction relies on
    comparative nucleic acid and protein sequence
    analysis
  • Alignment most important tool for sequence
    comparison
  • Multiple alignment contains more information than
    pair-wise alignment

4
Tools for multiple sequence alignment
  • Y I M Q E V Q Q E R
  • Sequence duplicates in history (e.g. speciation
    event)



5
Tools for multiple sequence alignment
  • Y I M Q E V Q Q E R



6
Tools for multiple sequence alignment
  • Y I M Q E V Q Q E R
  • Y I M Q E V Q Q E R



7
Tools for multiple sequence alignment
  • Y I M Q E A Q Q E R
  • Y L M Q E V Q Q E R
  • Substitutions occur



8
Tools for multiple sequence alignment
  • Y I M Q E A Q Q E R
  • Y L M Q E V Q Q E R



9
Tools for multiple sequence alignment
  • YAI M Q E A Q Q E R
  • Y L M - - V Q Q E R V
  • Insertions/deletions (indels) occur



10
Tools for multiple sequence alignment
  • YAI M Q E A Q Q E R
  • Y L M - - V Q Q E R V



11
Tools for multiple sequence alignment
  • Y A I M Q E A Q Q E R
  • Y L M V Q Q E R V
  • because of insertions/deletions sequence
    similarity no longer immediately visible!



12
Tools for multiple sequence alignment
  • Y A I M Q E A Q Q E R -
  • Y - L M V - - Q Q E R V
  • Alignment brings together related parts of the
    sequences by inserting gaps into sequences



13
Tools for multiple sequence alignment
  • Y A I M Q E A Q Q E R -
  • Y - L M V - - Q Q E R V



14
Tools for multiple sequence alignment
  • Y A I M Q E A Q Q E R -
  • Y - L M V - - Q Q E R V
  • Mismatches correspond to substitutions
  • Gaps correspond to indels



15
Tools for multiple sequence alignment
  • Pairwise alignment alignment of two sequences
  • Multiple alignment alignment of N gt 2 sequences



16
Tools for multiple sequence alignment
  • s1 R Y I M R E A Q Y E S A Q
  • s2 R C I V M R E A Y E
  • s3 Y I M Q E V Q Q E R
  • s4 W R Y I A M R E Q Y E
  • Assumtion sequence family related by common
    ancestry similarity due to common history
  • Sequence similarity not obvious (insertions and
    deletions may have happened)



17
Tools for multiple sequence alignment
  • s1 - R Y I - M R E A Q Y E S A Q
  • s2 - R C I V M R E A - Y E - - -
  • s3 - - Y I - M Q E V Q Q E R - -
  • s4 W R Y I A M R E - Q Y E - - -
  • Multiple alignment arrangement of sequences by
    introducing gaps
  • Alignment reveals sequence similarities



18
Tools for multiple sequence alignment
  • s1 - R Y I - M R E A Q Y E S A Q
  • s2 - R C I V M R E A - Y E - - -
  • s3 - - Y I - M Q E V Q Q E R - -
  • s4 W R Y I A M R E - Q Y E - - -



19
Tools for multiple sequence alignment
  • s1 - R Y I - M R E A Q Y E S A Q
  • s2 - R C I V M R E A - Y E - - -
  • s3 - - Y I - M Q E V Q Q E R - -
  • s4 W R Y I A M R E - Q Y E - - -



20
Tools for multiple sequence alignment
  • s1 - R Y I - M R E A Q Y E S A Q
  • s2 - R C I V M R E A - Y E - - -
  • s3 - - Y I - M Q E V Q Q E R - -
  • s4 W R Y I A M R E - Q Y E - - -
  • General information in multiple alignment
  • Functionally important regions more conserved
    than non-functional regions
  • Local sequence conservation indicates
    functionality!



21
Tools for multiple sequence alignment
  • s1 - R Y I - M R E A Q Y E S A Q
  • s2 - R C I V M R E A - Y E - - -
  • s3 - - Y I - M Q E V Q Q E R - -
  • s4 W R Y I A M R E - Q Y E - - -
  • Phylogeny reconstruction based on multiple
    alignment
  • Estimate pairwise distances between sequences
    (distance-based methods for tree reconstruction)
  • Estimate evloutionary events in evolution
    (parsimony and maximum likelihood methods)



22
Tools for multiple sequence alignment
  • s1 - R Y I - M R E A Q Y E S A Q
  • s2 - R C I V M R E A - Y E - - -
  • s3 - - Y I - M Q E V Q Q E R - -
  • s4 W R Y I A M R E - Q Y E - - -
  • Task in bioinformatics Find best multiple
    alignment for given sequence set



23
Tools for multiple sequence alignment
  • s1 - R Y I - M R E A Q Y E S A Q
  • s2 - R C I V M R E A - Y E - - -
  • s3 - - Y I - M Q E V Q Q E R - -
  • s4 W R Y I A M R E - Q Y E - - -
  • Astronomical number of possible alignments!



24
Tools for multiple sequence alignment
  • s1 - R Y I - M R E A Q Y E S A Q
  • s2 - R C I V M R E A - - - Y E -
  • s3 Y I - - - M Q E V Q Q E R - -
  • s4 W R Y I A M R E - Q Y E - - -
  • Astronomical number of possible alignments!



25
Tools for multiple sequence alignment
  • s1 - R Y I - M R E A Q Y E S A Q
  • s2 - R C I V M R E A - - - Y E -
  • s3 Y I - - - M Q E V Q Q E R - -
  • s4 W R Y I A M R E - Q Y E - - -
  • Computer has to decide which one is best??



26
Tools for multiple sequence alignment
  • Questions in development of alignment programs
  • (1) What is a good alignment?
  • ? objective function (score)
  • (2) How to find a good alignment?
  • ? optimization algorithm
  • First question far more important !



27
Tools for multiple sequence alignment
  • Before defining an objective function (scoring
    scheme)
  • What is a biologically good alignment ??



28
Tools for multiple sequence alignment
  • Criteria for alignment quality
  • 3D-Structure align residues at corresponding
    positions in 3D structure of protein!



29
Tools for multiple sequence alignment
  • Criteria for alignment quality



30
Tools for multiple sequence alignment
  • Criteria for alignment quality
  • 3D-Structure align residues at corresponding
    positions in 3D structure of protein!



31
Tools for multiple sequence alignment
  • Species related by common history



32
Tools for multiple sequence alignment
  • Genes / proteins related by common history



33
Tools for multiple sequence alignment
  • Criteria for alignment quality
  • 3D-Structure align residues at corresponding
    positions in 3D structure of protein!
  • Evolution align residues with common ancestors!



34
Tools for multiple sequence alignment
  • s1 - R Y I - M R E A Q Y E S A Q
  • s2 - R C I V M R E A - Y E - - -
  • s3 - - Y I - M Q E V Q Q E R - -
  • s4 W R Y I A M R E - Q Y E - - -
  • Alignment hypothesis about sequence evolution
  • Mismatches correspond to substitutions
  • Gaps correspond to insertions/deletions



35
Tools for multiple sequence alignment
  • s1 - R Y I - M R E A Q Y E S A Q
  • s2 - R C I V M R E A - Y E - - -
  • s3 - - Y I - M Q E V Q Q E R - -
  • s4 W R Y I A M R E - Q Y E - - -
  • Alignment hypothesis about sequence evolution
  • Search for most plausible scenario!
  • Estimate probabilities for individual
    evolutionary events insertions/deletions,
    substitutions



36
Tools for multiple sequence alignment
  • s1 - R Y I - M R E A Q Y E S A Q
  • s2 - R C I V M R E A - Y E - - -
  • s3 - Y - I - M Q E V Q Q E R - -
  • s4 W R Y I A M R E - Q Y E - - -
  • Alignment hypothesis about sequence evolution
  • Search for most plausible scenario!
  • Estimate probabilities for individual
    evolutionary events insertions/deletions,
    substitutions



37
Tools for multiple sequence alignment
  • Compute score s(a,b) for degree of similarity
    between amino acids a and b based on probability
  • pa,b
  • of substitution
  • a ? b (or b ? a)
  • (Extremely simplified!)



38
Tools for multiple sequence alignment
39
Tools for multiple sequence alignment
  • Reason for different substitutin probabilities
    pa,b
  • Different physical and chemical properties of
    amino acids
  • Amino acids with similar properties more likely
    to be substituted against each other



40
(No Transcript)
41
Tools for multiple sequence alignment
  • Use penalty for gaps introduced into alignment
  • Simplest approach linear gap costs penalty
    proportional to gap length
  • Non-linear gap penalties more realistic long gap
    caused by single insertion/deletion
  • Most frequently used affine linear gap
    penalties more realistic, but efficient to
    calculate!



42
  • Traditional Objective functions
  • Define Score of alignments as
  • Sum of individual similarity scores s(a,b)
  • Minus gap penalties
  • Needleman-Wunsch scoring system for pairwise
    alignment (1970)



43
Pair-wise sequence alignment
  • T Y W I V
  • T - - L V
  • Example
  • Score s(T,T) s(I,L) s (V,V) 2 g
  • Assumption linear gap penalty!



44
Pair-wise sequence alignment
  • T Y W I V
  • T - - L V
  • Dynamic-programming algorithm finds
  • alignment with best score.
  • (Needleman and Wunsch, 1970)



45
Pair-wise sequence alignment
  • T Y W I V
  • T - - L V
  • Running time proportional to product of sequence
    length
  • Time-complexity O(l1 l2)



46
Pair-wise sequence alignment
  • Algorithm for pairwise alignment can be
    generalized to multiple alignment of N sequences
  • Time-complexity O(l1 l2 lN)
  • Not feasable in reality (too long running time!)
  • Heuristic necessary, i.e. fast algorithm that
    does not necessarily produce mathematically best
    alignment



47
Progressive Alignment
  • Most popular approach to (global) multiple
    sequence alignment
  • Progressive Alignment
  • Since mid-Eighties Feng/Doolittle,
    Higgins/Sharp, Taylor,

48
Progressive Alignment
  • WCEAQTKNGQGWVPSNYITPVN
  • WWRLNDKEGYVPRNLLGLYP
  • AVVIQDNSDIKVVPKAKIIRD
  • YAVESEAHPGSFQPVAALERIN
  • WLNYNETTGERGDFPGTYVEYIGRKKISP

49
Progressive Alignment
  • WCEAQTKNGQGWVPSNYITPVN
  • WWRLNDKEGYVPRNLLGLYP
  • AVVIQDNSDIKVVPKAKIIRD
  • YAVESEAHPGSFQPVAALERIN
  • WLNYNETTGERGDFPGTYVEYIGRKKISP
  • Guide tree

50
Progressive Alignment
  • WCEAQTKNGQGWVPSNYITPVN
  • WW--RLNDKEGYVPRNLLGLYP-
  • AVVIQDNSDIKVVP--KAKIIRD
  • YAVESEASFQPVAALERIN
  • WLNYNEERGDFPGTYVEYIGRKKISP
  • Profile alignment, once a gap - always a gap

51
Progressive Alignment
  • WCEAQTKNGQGWVPSNYITPVN
  • WW--RLNDKEGYVPRNLLGLYP-
  • AVVIQDNSDIKVVP--KAKIIRD
  • YAVESEASVQ--PVAALERIN------
  • WLN-YNEERGDFPGTYVEYIGRKKISP
  • Profile alignment, once a gap - always a gap

52
Progressive Alignment
  • WCEAQTKNGQGWVPSNYITPVN-
  • WW--RLNDKEGYVPRNLLGLYP-
  • AVVIQDNSDIKVVP--KAKIIRD
  • YAVESEASVQ--PVAALERIN------
  • WLN-YNEERGDFPGTYVEYIGRKKISP
  • Profile alignment, once a gap - always a gap

53
Progressive Alignment
  • WCEAQTKNGQGWVPSNYITPVN--------
  • WW--RLNDKEGYVPRNLLGLYP--------
  • AVVIQDNSDIKVVP--KAKIIRD-------
  • YAVESEA---SVQ--PVAALERIN------
  • WLN-YNE---ERGDFPGTYVEYIGRKKISP
  • Profile alignment, once a gap - always a gap

54
Progressive Alignment
  • WCEAQTKNGQGWVPSNYITPVN--------
  • WW--RLNDKEGYVPRNLLGLYP--------
  • AVVIQDNSDIKVVP--KAKIIRD-------
  • YAVESEA---SVQ--PVAALERIN------
  • WLN-YNE---ERGDFPGTYVEYIGRKKISP
  • Most important implementation CLUSTAL W

55
Progressive Alignment
  • CLUSTAL W Thompson et al., 1994 (17.000
    citations)
  • Pairwise distances as 1 - percentage of identity
  • Calculate un-rooted tree with Neighbor Joining
  • Define root as central position in tree
  • Define sequence weights based on tree
  • Gap penalties calculated based on various
    parameters

56
Tools for multiple sequence alignment
  • Problems with traditional approach
  • Results depend on gap penalty
  • Heuristic guide tree determines alignment
    alignment used for phylogeny reconstruction
  • Algorithm produces global alignments.



57
Tools for multiple sequence alignment
  • Problems with traditional approach
  • But
  • Many sequence families share only local
    similarity
  • E.g. sequences share one conserved motif



58
The DIALIGN approach
  • Morgenstern, Dress, Werner (1996),
  • PNAS 93, 12098-12103
  • Combination of global and local methods
  • Assemble multiple alignment from
  • gap-free local pair-wise alignments
  • (,,fragments)

59
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

60
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

61
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

62
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

63
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

64
The DIALIGN approach
  • atctaatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

65
The DIALIGN approach
  • atc------taatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaagagtatcacccctgaattgaataa

66
The DIALIGN approach
  • atc------taatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaa--gagtatcacccctgaattgaataa

67
The DIALIGN approach
  • atc------taatagttaaactcccccgtgcttag
  • cagtgcgtgtattactaacggttcaatcgcg
  • caaa--gagtatcacc----------cctgaattgaataa

68
The DIALIGN approach
  • atc------taatagttaaactcccccgtgc-ttag
  • cagtgcgtgtattactaac----------gg-ttcaatcgcg
  • caaa--gagtatcacc----------cctgaattgaataa

69
The DIALIGN approach
Consistency!
  • atc------taatagttaaactcccccgtgc-ttag
  • cagtgcgtgtattactaac----------gg-ttcaatcgcg
  • caaa--gagtatcacc----------cctgaattgaataa

70
The DIALIGN approach
  • atc------TAATAGTTAaactccccCGTGC-TTag
  • cagtgcGTGTATTACTAAc----------GG-TTCAATcgcg
  • caaa--GAGTATCAcc----------CCTGaaTTGAATaa

71
More methods for multiple alignment
  • T-Coffee
  • PIMA
  • Muscle
  • Prrp
  • Mafft
  • ProbCons

72
Substitution matrices
  • Similarity score s(a,b) for amino acids a and b
    based on probability pa,b of substitution a -gt b
  • Idea it is more reasonable to align amino acids
    that are often replaced by each other!



73
Substitution matrices
  • Assumptions
  • pa,b does not depend on sequence position
  • Sequence positions independent of each other
  • pa,b pb,a (symmetry!)



74
Substitution matrices
  • Compute score s(a,b) for degree of similarity
    between amino acids a and b
  • Probability pa,b of substitution
  • a ? b (or b ? a),
  • Frequency qa of a
  • Define
  • s(a,b) log (pa,b / qa qb)



75

Substitution matrices
76
Substitution matrices
  • To calculate pa,b
  • Consider alignments of related proteins and count
    substitutions
  • a ? b (or b ? a)



77
Substitution matrices
  • To calculate pa,b
  • Consider alignments of related proteins and count
    substitutions
  • a ? b (or b ? a)
  • ESWTS-RQWERYTIALMSDQRREVLYWIALY
  • ERWTSERQWERYTLALMS-QRREALYWIALY



78
Substitution matrices
  • To calculate pa,b
  • Consider alignments of related proteins and count
    substitutions
  • a ? b (or b ? a)
  • ESWTS-RQWERYTIALMSDQRREVLYWIALY
  • ERWTSERQWERYTLALMS-QRREALYWIALY



79
Substitution matrices
  • Problems involved
  • Probability pa,b depends on time t since
    sequences separated in evolution pa,b pa,b
    (t)
  • Protein families contain multiple sequences
    phylogenetic tree must be known!
  • Alignment of protein families must be known!
  • Multiple mutations at one sequence position



80
Substitution matrices
  • M. Dayhoff et al., Atlas of Protein sequence and
    Structure, 1978
  • PAM matrices



81
Substitution matrices
  • Calculation of pa,b(t)
  • Consider multiple alignments of closely related
    protein families
  • Count occurrence of a and b at corresponding
    positions in alignments using phylogenetic tree
  • Estimate pa,b(t) for small times t
  • Calculate conditional probabilities p(ab,t) for
    small t
  • Normalize to distance 1 PAM ( percentage of
    accepted mutations)
  • Calculate p(ab,t) for larger evolutionary
    distances by matrix multiplication
  • Calculate pa,b(t) for larger evolutionary
    distances



82
Substitution matrices
83
Substitution matrices
  • Alternative BLOSUM matrices
  • S. Henikoff and J.G. Henikoff, PNAS, 1992
  • Basis BLOCKS database, gap-free regions of
    multiple alignments.
  • Cluster of sequences if percentage of similarity
    gt L
  • Estimate pa,b(t) directly.
  • Default values L 62, L 50


About PowerShow.com