Needleman Wunsch Sequence Alignment - PowerPoint PPT Presentation

1 / 6
About This Presentation
Title:

Needleman Wunsch Sequence Alignment

Description:

... with the highest score, a two-dimensional array (or matrix) is allocated. ... the F matrix, and its (i,j)th entry is often denoted Fij (j along horizontal ... – PowerPoint PPT presentation

Number of Views:1045
Avg rating:3.0/5.0
Slides: 7
Provided by: siddh1
Category:

less

Transcript and Presenter's Notes

Title: Needleman Wunsch Sequence Alignment


1
Needleman Wunsch Sequence Alignment
  • The NeedlemanWunsch algorithm performs a global
    alignment on two sequences (called A and B here).
  • It is commonly used in bioinformatics to align
    protein or nucleotide sequences.
  • The algorithm was proposed in 1970 by Saul
    Needleman and Christian Wunsch in their paper A
    general method applicable to the search for
    similarities in the amino acid sequence of two
    proteins, J Mol Biol. 48(3)443-53.
  • The NeedlemanWunsch algorithm is an example of
    dynamic programming, and was the first
    application of dynamic programming to biological
    sequence comparison.

2
Needleman Wunsch Sequence Alignment
  • Scores for aligned characters are specified by a
    similarity matrix. Here, S(i,j) is the similarity
    of characters i and j. It uses a linear gap
    penalty, called d.
  • For example, if the similarity matrix was
  • A G C T
  • A 10 -1 -3 -4
  • G -1 7 -5 -3
  • C -3 -5 9 0
  • T -4 -3 0 8
  • Then the alignment
  • AGACTAGTTAC
  • CGA - - -GACGT
  • with a gap penalty of -5, would have the
    following score...
  • S(A,C) S (G,G) S(A,A) 3d S(G,G) S(T,A)
    S(T,C) S(A,G) S(C,T)
  • -3 7 10 -35 7 -4 0 -1 0 1

3
Needleman Wunsch Sequence Alignment
  • To find the alignment with the highest score, a
    two-dimensional array (or matrix) is allocated.
    This matrix is often called the F matrix, and its
    (i,j)th entry is often denoted Fij (j along
    horizontal axis and i along vertical axis)
  • There is one column for each character in
    sequence A, and one row for each character in
    sequence B.
  • Thus, if we are aligning sequences of sizes n and
    m, the running time of the algorithm is O(nm) and
    the amount of memory used is in O(nm).
  • As the algorithm progresses, the Fij will be
    assigned to be the optimal score for the
    alignment of the first j characters in A and the
    first i characters in B.
  • The principle of optimality is then applied as
    follows.
  • Basis F0j d j Fi0 d i
  • Recursion, based on the principle of optimality
  • Fij max(Fi - 1,j - 1 S(Bi,Aj),Fi,j - 1 d,Fi
    - 1,j d)

4
Needleman Wunsch Sequence Alignment
  • The pseudo-code for the algorithm to compute the
    F matrix therefore looks like this (array and
    sequence indexes start at 0)
  • for i0 to length(B)-1
  • F(i,0) lt- di
  • for j0 to length(A)-1
  • F(0,j) lt- dj
  • for i1 to length(B)
  • for j 1 to length(A)
  • Choice1 lt- F(i-1,j-1) S(B(i), A(j))
  • Choice2 lt- F(i-1, j) d
  • Choice3 lt- F(i, j-1) d
  • F(i,j) lt- max(Choice1, Choice2, Choice3)
  • Once the F matrix is computed, the bottom right
    hand corner of the matrix is the maximum score
    for any alignment.
  • To compute which alignment actually gives this
    score, you can start from the bottom right cell,
    and compare the value with the three possible
    sources(Choice1, Choice2, and Choice3 above) to
    see which it came from.
  • If Choice1, then A(j) and B(i) are aligned,
  • If Choice2, then A(j) is aligned with a gap, and
  • If Choice3, then B(i) is aligned with a gap.

5
Needleman Wunsch Sequence Alignment
  • AlignmentA lt- "" AlignmentB lt- "
  • i lt- length(B) j lt- length(A)
  • while (i gt 0 AND j gt 0)
  • Score lt- F(i,j) ScoreDiag lt- F(i - 1, j - 1)
  • ScoreLeft lt- F(i, j - 1) ScoreUp lt- F(i - 1,
    j)
  • if (Score ScoreDiag S(A(j), B(i)))
  • AlignmentA lt- A(j) AlignmentA AlignmentB lt-
    B(i) AlignmentB
  • i lt- i 1 j lt- j 1
  • else if (Score ScoreLeft d)
  • AlignmentA lt- A(j) AlignmentA AlignmentB lt-
    "-" AlignmentB
  • j lt- j - 1
  • else if (Score ScoreUp d)
  • AlignmentA lt- "-" AlignmentA AlignmentB lt-
    B(i) AlignmentB
  • i lt- i - 1
  • while (j gt 0) AlignmentA lt- A(j) AlignmentA
    AlignmentB lt- "-" AlignmentB j lt- j - 1
  • while (i gt 0) AlignmentA lt- "-" AlignmentA
    AlignmentB lt- B(i) AlignmentB i lt- i - 1

6
Needleman Wunsch Sequence Alignment
  • Project Deliverables
  • Given the computation flow of the NWSA algorithm,
    architect a pipelined VHDL implementation such
    that a single pipeline stage contains a single
    processing element (PE).
  • 1. Find the number and width of data elements
    that move between PEs.
  • 2. Also assume that the testbench code includes
    the read/write memory.
  • a. Assume a fixed length of the A string A
    does not change.
  • b. B strings are sent from the memory to the
    PEs as inputs. Once a B string is consumed, the
    next B string is fed into the system from the
    memory.
  • c. The final score values are sent back to
    memory as outputs. Each score corresponds to a
    single B string.
  • d. Explicit instantiations of memory elements
    are not required supply input values from
    testbench, and read output values into the
    testbench.
  • e. Each PE also stores the compass value (cv) to
    remember where it got its score from (0
    diagonal, 1 up, 2 left).
  • 3. Describe your pipelined design implementation
    in your report.
  • 4. Give printouts of the VHDL codes, including
    testbench in the report.
  • 5. Attach the waveform printouts in the report.
Write a Comment
User Comments (0)
About PowerShow.com