More Alignments - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

More Alignments

Description:

Linear space alignment (to reduce space requirement) ... This is crucially important when several subregions of S and T are evolutionarily conserved. ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 23
Provided by: danb195
Category:

less

Transcript and Presenter's Notes

Title: More Alignments


1
More Alignments
2
Outline
  • We have learned
  • The basic DP algorithm for alignment
  • The general gap penalty
  • The affined gap penalty
  • Now let us study several other models of
    alignments
  • Local alignment (to compare substrings)
  • Fit alignment (to embed one sequence to
    another)
  • Linear space alignment (to reduce space
    requirement)
  • These models are suitable in different
    circumstances

3
Mouse, Human, Chimpanzee
Mouse to Human
Chimpanzee to Human
4
Mouse v.s. Human
Chromosome X of Mouse to Human
5
Local Alignment
  • Typically, when people do alignment, theyre
    actually finding good local alignments.
  • Given two sequences S and T
  • Find subregions of S and T for which theres
    enough sequence similarity that theyre likely to
    have come from the homologous model.
  • (That is, find subregions with greatest score.)
  • AATTAG-CCGATGAC
  • TGGAGGCTGATATA
  • This is done by exactly the same sort of
    procedure as before, except we add only one
    change.

6
Warm-up suffix alignment
  • Suppose we only get the free gap at the
    prefixes of the alignment.
  • AATTAG-CCGAT
  • TGGAGGCTGAT
  • That is, we choose two suffixes, and align them
    together optimally.
  • Let Di,j denote the optimal suffix alignment
    alignment score of s1..i, t1..j.

7
Last column
  • Consider the last column of this optimal suffix
    alignment. Four cases arise
  • Case 1 si v.s. tj
  • Case 2 si v.s.
  • Case 3 tj v.s.
  • Case 4 an empty alignment
  • Case 4 is the only new case comparing to the
    basic alignment.

8
DP algorithm for suffix alignment
  • Di-1, j-1 f(si,
    tj)
  • Di,j max Di-1, j indel
  • Di, j-1 indel
  • 0

How to backtrace?
How to do local alignment?
Answer will be here
9
Local Alignment
  • The optimal local alignment is maxi,j Di,j
  • Suppose the optimal local alignment aligns
    Si..i and Tj..j together, then it is the
    best suffix alignment of S1..i and T1..j.
  • Moreover, any local alignment is a suffix
    alignment of two prefixes, and vice versa.
  • DP algorithm as before, but backtrack from the
    maximum element in the matrix.
  • When extended to affined gap, this is the
    classical Smith-Waterman algorithm.

10
A Little History
  • The algorithm was first proposed by Temple Smith
    and Michael Waterman in 1981. Cited more than
    3000 times till now (2008).
  • The global alignment algorithm was called the
    Needleman-Wunsch algorithm, which was published
    in 1970.

11
A quick 1-slide catch-up
  • We want to find local alignments.
  • We build the matrix that consists of the score of
    the best local alignment (which might be the
    empty one) that ends with si and tj.
  • That works by traditional dynamic programming.
  • To incorporate more complicated gap penalties, we
    may need extra matrices, as for global alignment.
  • After O(nm) time, we have the matrix. Find the
    highest entry, and backtrack until the score is
    zero.
  • Thats the optimal local alignment.

12
An important side note
  • Its very straightforward to not just find one
    local alignment of S and T, but many of them.
  • This is crucially important when several
    subregions of S and T are evolutionarily
    conserved.
  • One key example, which will come to later in the
    term, is gene finding the coding parts of genes
    are conserved, and the other parts, not so much.
  • But for now, think about that O(nm) runtime.

13
Is this runtime good enough?
  • No!
  • Genbank is 1011 letters long.
  • To fill in the DP matrix takes 1022 time.
  • Thats no good.
  • Local alignments must be computed heuristically
    to avoid hideous runtimes
  • This is done by filtration, as we will learn in
    the future.

14
Fit Alignment
  • Given sequence S and T. Find a global alignment
    between S and a substring of T, maximizing the
    alignment score.
  • Deleting the prefix of T is free, deleting the
    suffix of T is free.

S
T
15
Linear Space Alignment
  • Why linear space?
  • Computer RAM used to be very expensive in 80s.
  • Prediction The cost for 128 kilobytes of memory
    will fall below U100 in the near future.
  • Creative Computing magazine. December 1981, page
    6.
  • Even today, keeping everything in the L2 cache or
    within one page of the RAM will speed up the
    computation.
  • We have learned the linear space if only
    alignment score, instead of the alignment, is
    required.
  • We will learn how to get slightly more than the
    alignment score

16
Linear Space Alignment
  • We want to compute one more piece of information
  • the j such that S1..m/2 aligns with T1..j in
    the optimal alignment.
  • This is only slightly more than the alignment
    score.
  • Can this be done in linear space?
  • So what?

m/2
S
T
j
17
Compute the j
C A T T G
A
T
T
G
A
18
How to find the midpoint?
  • Keep two columns of alignment scores.
  • Each column depends only on the previous column.
  • For columns past the middle of the DP matrix,
    keep track of which position in column n/2 the
    path to a point (i,j) in the DP matrix went
    through.
  • This is exactly equivalent to knowing, in the
    optimal alignment of s1si and t1tj, of what
    fraction of T was aligned to the first n/2
    letters of S.
  • And it only requires two rows of pointers, as
    well.

19
Compute the j
C A T T G
A
T
T
G
A
20
Algorithm
  • Use the previous idea to compute j such that the
    optimal alignment can be divided into two parts
  • S1..m/2 v.s. T1..j
  • Sm/21..m v.s. Tj1..n
  • Then we use the same algorithm to recursively
    compute the optimal alignments of these two
    parts.
  • Return the concatenation of these two optimal
    alignments.

21
Time Complexity
  • T(m,n) mn T(m/2,j)T(m/2,n-j) 2 mn

22
Making it practical
  • This algorithm is pain in the neck to implement
    for affined gap penalty.
  • But for the linear gap penalty, it is all right.
  • And well do this in our assignment.
Write a Comment
User Comments (0)
About PowerShow.com