Space Efficient Alignment Algorithms and Affine Gap Penalties - PowerPoint PPT Presentation

Loading...

PPT – Space Efficient Alignment Algorithms and Affine Gap Penalties PowerPoint presentation | free to download - id: 27f457-ZDc1Z



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Space Efficient Alignment Algorithms and Affine Gap Penalties

Description:

Space Efficient Alignment Algorithms and Affine Gap Penalties. Dr. Nancy Warter-Perez ... Compute the score matrix(s) between the source (0,0) and (n, m/2) ... – PowerPoint PPT presentation

Number of Views:170
Avg rating:3.0/5.0
Slides: 21
Provided by: lanct9
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Space Efficient Alignment Algorithms and Affine Gap Penalties


1
Space Efficient Alignment Algorithms and Affine
Gap Penalties
  • Dr. Nancy Warter-Perez

2
Outline
  • Algorithm complexity
  • Complexity of dynamic programming alignment
    algorithms
  • Memory efficient algorithms
  • Hirschbergs Divide and Conquer algorithm
  • Affine gap penalty

3
Algorithm Complexity
  • Indicates the space and time (computational)
    efficiency of a program
  • Space complexity refers to how much memory is
    required to execute the algorithm
  • Time complexity refers to how long it will take
    to execute (compute) the algorithm
  • Generally written in Big-O notation
  • O represents the complexity (order)
  • n represents the size of the data set
  • Examples
  • O(n) order n, linear complexity
  • O(n2) order n squared, quadratic complexity
  • Constants and lower orders ignored
  • O(2n) O(n) and O(n2 n 1) O(n2)

4
Complexity of Dynamic Programming Algorithms for
Global/Local Alignment
  • Time complexity O(mn)
  • For each cell in the score matrix, perform 3
    operations
  • Compute Up, Left, and Diagonal scores
  • O(3mn) O(mn)
  • Space complexity O(mn)
  • Size of scoring matrix mn
  • Size of trace back matrix mn
  • O(2mn) O(mn)
  • Where, m and n are the lengths of the sequences
    being aligned.
  • Since m ? n, O(n2 ) quadratic complexity!

5
Memory Requirements
  • For a sequence of 200-500 amino acids or
    nucleotides
  • O(n2) 5002 250,000
  • If store each score as a 32-bit value 4 bytes,
    it requires 1,000,000 bytes to represent the
    scoring matrix!
  • If store each trace back symbol as a character
    (8-bit value), it requires 250,000 bytes to
    represent the trace back matrix

6
Simple Improvement for Scoring Matrix
  • In reality, the space complexity of the scoring
    matrix is only linear, i.e., O(2min(m,n))
    O(min(m,n))
  • O(min(m,n)) ? O(n) for sequences of comparable
    lengths
  • 2,000 bytes (instead of 1 million)
  • But, trace back still quadratic space complexity

7
Hirschbergs Divide and Conquer Space Efficient
Algorithm
  • Compute the score matrix(s) between the source
    (0,0) and (n, m/2). Save m/2 column of s.
    Compute the reverse score matrix (sreverse)
    between the sink (n, m) and (0,m/2). Save the
    m/2 column of sreverse.
  • Find middle (i, m/2) satisfies max 0? i?n s(i,
    m/2) sreverse(n-i, m/2)
  • Recursively partition problem into 2 subproblems

8
Pseudo Code of Space-Efficient Alignment Algorithm
  • Path (source, sink)
  • If source and sink are in consecutive columns
  • output the longest path from the source to the
    sink
  • Else
  • middle ?middle vertex between source and sink
  • Path (source, middle)
  • Path (middle, sink)

9
Complexity of Space-Efficient Alignment Algorithm
  • Time complexity
  • Equal to the sum of the areas of the rectangles
  • Area ½ Area ¼ Area ? 2Area
  • where, Area nm
  • O(2nm) O(nm)
  • Quadratic time/computation complexity (same as
    before)
  • Space complexity
  • Need to save a column of s and sreverse for each
    computation (but can discard after computing
    middle)
  • O(min(n,m)) if m lt n, switch the sequences (or
    save a row of s and sreverse instead)
  • Linear space complexity!!
  • Reference http//www.csse.monash.edu.au/lloyd/ti
    ldeAlgDS/Dynamic/Hirsch/

10
Gap Penalties
  • Gap penalties account for the introduction of a
    gap - on the evolutionary model, an insertion or
    deletion mutation - in both nucleotide and
    protein sequences, and therefore the penalty
    values should be proportional to the expected
    rate of such mutations.
  • http//en.wikipedia.org/wiki/Sequence_alignmentAs
    sessment_of_significance

11
(No Transcript)
12
Source http//www.apl.jhu.edu/przytyck/Lect03_20
05.pdf
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
Project Verification - Use EMBOSS Pairwise
Alignment Tool http//www.ebi.ac.uk/Tools/emboss/a
lign/index.html
20
Project Verification LALIGN http//www.ch.embnet
.org/software/LALIGN_form.html
About PowerShow.com