Biological Sequence Alignment - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Biological Sequence Alignment

Description:

... (a program-based web page) that illustrates the algorithms. ... pick the best of the smaller problems. S(p,p) S(p, pi) S(pe,p) S(pe,pi) 8Seq.Alignment08-errata ... – PowerPoint PPT presentation

Number of Views:74
Avg rating:3.0/5.0
Slides: 28
Provided by: danielm47
Category:

less

Transcript and Presenter's Notes

Title: Biological Sequence Alignment


1
Biological Sequence Alignment
  • Objectives
  • Terminology surrounding sequence alignment
  • Simple Edit Distance Dynamic Programming Algorithm

2
Biological Sequence Alignment
  • Materials adapted from many sources
  • Gerton Lunter
  • Paul Higgs, McAlister Univ
  • Mclure, Montana State Univ.
  • Aoife McLysagh
  • For your reference
  • http//etutorials.org/Misc/blast/PartIITheory/Ch
    apter3.SequenceAlignment/
  • http//lectures.molgen.mpg.de/Alg/
  • A good on-line text
  • Has an applet (a program-based web page) that
    illustrates the algorithms. But I like the
    following applet better
  • http//bibiserv.techfak.uni-bielefeld.de/media/seq
    analysis/align-applet.html
  • It single steps through the algorithms more
    clearly.
  • This applet also associated with an on-line text.
    It too is good, but it is very expansive.

3
Progression of Algorithms
  • The basic algorithm
  • Edit distance
  • Origins how many typewriter key strokes to fix
  • // (minimum)
  • Sequence of refinements
  • Global alignment Needleman-Wunsch70
  • Weighting substitutionsSellers74, gaps
  • // evolutionary model
  • Local Alignment Smith-Waterman81

4
Variations on Edit Distance
5
Differences (confusion)
  • 1. Allowable edits
  • e.g. S(Pet,Pep) 1
  • T substitutes P,
  • One operation
  • Or S(Pet, Pep) 2
  • delete T, insert P
  • Two operations

6
Edit Distance - no substitutions
  • (peter, piper) (peter, pepper)
  • (peter, piper)
  • Delete e, Insert i
  • Delete t, Insert p
  • S(peter, piper) 4
  • (peter, pepper)
  • Delete t, insert p
  • Insert p
  • S(peter, pepper) 3

7
Edit Distance - substitutions
  • (peter, piper)
  • (peter, pepper)
  • S(peter, piper) 2
  • substitute e/i
  • substitute t/p
  • S(peter, pepper) 2
  • substitute e/i
  • Insert p

8
Weighting
Should substitutions be cheaper than
insertions? substitute 1, insertion
1.2 S(peter, piper) 2, S(peter, pepper)
2.2 Should vowel substitutions be cheaper,
(0.7), than consonants (1.0)? S(peter, piper)
1.7 S(peter, pepper) 1.9
  • (peter, piper)
  • (peter, pepper)
  • S(peter, piper) 2
  • substitute e/i
  • substitute t/p
  • S(peter, pepper) 2
  • substitute e/i
  • Insert p

9
Variations on Edit Distance
10
Goal Model Biology (Evolution)
  • Point Mutations
  • substitution
  • insertion
  • deletion
  • Impact relative to position in a codon
  • Large Scale
  • Mutation

11
To date, biological sequence analysis --gt local
alignment
  • local alignment
  • weighted point substitution
  • no penalty for boundary problem
  • gaps associated with a penalty
  • similarity weighting, not distance

12
An Alignment Illustration Method
  • Peter Pet_er Pe_ter
  • Per Pe er Pe er
  • Piper Pepper Pepper

13
Notation
  • Let, V, W be two strings,
  • Let V v1, , vn, W w1, , wm where
  • n is the length of V, m is the length of W
  • vi or wj represent the i or j th character
  • // capital letters for the strings, small letters
    for the characters,
  • Let Vi represent a prefix of the string V, where
    Vi v1,, vi
  • e.g.
  • V betty,
  • v1 b, v2 e, v3 t, . , v5 y
  • V2 be,
  • W butter, W3 but
  • Let S(Vi,Wj) Sij edit distance for the
    strings, v1,vi and w1,,wj
  • e.g.
  • S(V2,W3) S(be, but)

14
More Notation
  • Lecture slides are in standard mathematical
    notation
  • string subscripts start at 1 unless stated
    otherwise
  • matrices
  • Code uses coding conventions, (start at 0)

15
More
  • Indel A hybrid term (combining the words
    "insertion" and "deletion") used to describe a
    difference in sequence due to either an insertion
    or a deletion event especially used when the
    evolutionary direction of the change is
    unspecified http//www.yeastgenome.org/help/gloss
    ary.html
  • pet_er
  • pepper

indel
16
Dynamic Programming Algorithms
  • Dynamic programming is a generic template
    constituting an entire class of algorithm.
  • // In the sense that divide and conquer
    constitute a class of algorithm
  • Sequence alignment problems are mostly solved
    using dynamic programming.
  • // (but not BLAST)

17
Aside Principle of Optimality
  • If the solution to a problem, P, can be broken
    into two subproblems, P1 and P2, where combining
    the solutions to P1 and P2 constitute solving P
  • And
  • If the solution to P is optimal the solutions to
    P1 and P2 are optimal.
  • Principle of optimality holds for alignment
    problems.
  • In general, dynamic programming algorithms are
    applicable for problems where the principle of
    optimality holds.
  • the subproblems do not have to be of equal size

18
  • S(pete,pipe)
  • S(p,p) 0
  • S(pe, p) 1
  • S(pe, pi) 1
  • S(pet, pi) 2
  • S(pet, pip) 2
  • S(pete, pip) 3
  • S(pete, pipe) 2 // ? What // of the principle
    of optimality?
  • Actually 3 ways to make the problem bigger, start
    at S(p,p)

S(p, pi)
S(pe,pi)
S(pe,p)
19
But, principle of optimality says, pick the best
of the smaller problems

S(p, pi)
S(pe,pi)
S(pe,p)
20
Recursive Definition ofSimple Edit Distance
  • for strings V v1, , vn, W w1, , wm
    where vi or wj represent the i or j th character,
  • Cost of substituting a character x, with y, is
    represented as c(x,y)
  • indel an insert or delete, represented by _
  • Si-1,j-1 c(vi,wj)
  • min Si,j-1 c(_,wj) // when
    defined
  • Si,j Si-1,j c(vi, _)
  • S00 0
  • c(vi,wj)
  • if vi wj then c(vi,wj) 0
  • if vi? wj then c(vi,wj) 1
  • c(_,wj) 1
  • c(vi, _) 1

21
We Dont Implement as a Recursive Function
  • V PEPPER
  • W PETER
  • Create a table, so
  • we can save the answers
  • to the smaller problem
  • instances
  • We will populate the table starting from the
    smallest problem S(,)
  • // empty string
  • Dirty trick
  • In a program index from 0.
  • Strings will still index starting from 1.
  • v0, and w0, will mean .

22
Example - Simple Edit Distancecomplete base-case
and boundary
Si-1,j-1 c(vi,wj) Si,j min Si,j-1
c(_,wj) Si-1,j c(vi, _)
23
Example - Simple Edit DistanceExample Cell S22
Si-1,j-1 c(vi,wj) Si,j min Si,j-1
c(_,wj) Si-1,j c(vi, _)
S1,1 c(E,E) S2,2 min S2,1 c(_,E)
S1,2 c(E, _)
S(P,P) c(E,E) S(PE,PE) min
S(PE,P) c(_,E) S(P, PE) c(E, _)
0 0 S(PE,PE) min 11
11
24
Example - Simple Edit Distance
25
Example - Simple Edit DistanceCell example 2,
S43 -what is special?
Si-1,j-1 c(vi,wj) Si,j min Si,j-1
c(_,wj) Si-1,j c(vi, _)
S3,2 c(P,T) S4,3 min S3,3 c(P,_)
S4,2 c(_, T)
S(PEPP,PET) S(PEP,PE) c(P, T)
min S(PEP,PET) c(P, _) S(PEPP,
PE)c(_, T)
1 1 S(PE,PE) min 11
2 1
// a tie for the winner
26
Example - Simple Edit DistanceCell example 2, S
4,3 - there is a tie
27
Edit distance 2,what is are the edits?
  • Traceback step
  • mark the winning
  • cases
  • more than one if there
  • are ties
Write a Comment
User Comments (0)
About PowerShow.com