Exon prediction by Genomic Sequence alignment Burkhard Morgenstern and Oliver Rinner - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

Exon prediction by Genomic Sequence alignment Burkhard Morgenstern and Oliver Rinner

Description:

s4 W R Y I A M R E - Q Y E - - - Alignment hypothesis about ... Estimate probabilities for individual evolutionary events: insertions/deletions, substitutions ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 14
Provided by: publ153
Category:

less

Transcript and Presenter's Notes

Title: Exon prediction by Genomic Sequence alignment Burkhard Morgenstern and Oliver Rinner


1

Burkhard Morgenstern Institut für Mikrobiologie
und Genetik Molekulare Evolution und
Rekonstruktion von phylogenetischen Bäumen WS
2006/2007


2
Tools for multiple sequence alignment
  • s1 - R Y I - M R E A Q Y E S A Q
  • s2 - R C I V M R E A - Y E - - -
  • s3 - Y - I - M Q E V Q Q E R - -
  • s4 W R Y I A M R E - Q Y E - - -
  • Alignment hypothesis about sequence evolution
  • Search for most plausible scenario!
  • Estimate probabilities for individual
    evolutionary events insertions/deletions,
    substitutions



3
Substitution matrices
  • All protein alignment programs depend on
    similarity scores s(a,b)
  • Similarity score s(a,b) for amino acids a and b
    is based on probability pa,b of substitution a
    -gt b
  • Idea it is more reasonable to align amino acids
    that are often replaced by each other!



4
Tools for multiple sequence alignment
Substitution matrix assigns score s(a,b) to every
pair of amino acids a and b
5
Substitution matrices
  • Compute similarity score s(a,b) for amino acids a
    and b
  • Probability pa,b of substitution
  • a ? b (or b ? a),
  • Frequency qa of a
  • Define
  • s(a,b) log (pa,b / qa qb)



6
Substitution matrices
  • Simplifying assumptions
  • Evolution as random process substitution a -gt b
    occurs with probability pa,b(t) depending on
    time t in evolution since sequences originated
    from common ancester
  • pa,b(t) does not depend on sequence position
  • Sequence positions independent of each other
  • pa,b(t) pb,a(t) (symmetry!)



7
Substitution matrices
  • For small values of t pa,b(t) grows linearly
    with time t for a ? b
  • For large values of t multiple mutations must be
    accounted for



8
Substitution matrices
  • To calculate pa,b
  • Consider alignments of closely related proteins
    and count substitutions
  • a ? b (or b ? a)



9
Substitution matrices
  • To calculate pa,b
  • Consider alignments of closely related proteins
    and count substitutions
  • a ? b (or b ? a)
  • ESWTS-RQWERYTIALMSDQRREVLYWIALY
  • ERWTSERQWERYTLALMS-QRREALYWIALY



10
Substitution matrices
  • To calculate pa,b
  • Consider alignments of closely related proteins
    and count substitutions
  • a ? b (or b ? a)
  • ESWTS-RQWERYTIALMSDQRREVLYWIALY
  • ERWTSERQWERYTLALMS-QRREALYWIALY



11
Substitution matrices
  • Calculation of pa,b(t)
  • Consider multiple alignments of closely related
    protein families
  • Count substitutions a-gtb (or b-gta) in alignments
    based on phylogenetic tree
  • Estimate pa,b(t) for small times t 1 PAM
    (Percentage of Accepted Mutations)
  • Calculate conditional probabilities p(ab,t) for
    small t
  • Calculate p(ab,t) for larger evolutionary
    distances by matrix multiplication
  • Calculate pa,b(t) for larger evolutionary
    distances



12
Substitution matrices
13
Substitution matrices
  • Alternative BLOSUM matrices
  • S. Henikoff and J.G. Henikoff, PNAS, 1992
  • Basis BLOCKS database, gap-free regions of
    multiple alignments.
  • Cluster of sequences if percentage of similarity
    gt L
  • Estimate pa,b(t) directly.
  • Default values L 62, L 50


Write a Comment
User Comments (0)
About PowerShow.com