# Space Efficient Alignment Algorithms and Affine Gap Penalties - PowerPoint PPT Presentation

PPT – Space Efficient Alignment Algorithms and Affine Gap Penalties PowerPoint presentation | free to download - id: 27f457-ZDc1Z

The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
Title:

## Space Efficient Alignment Algorithms and Affine Gap Penalties

Description:

### Space Efficient Alignment Algorithms and Affine Gap Penalties. Dr. Nancy Warter-Perez ... Compute the score matrix(s) between the source (0,0) and (n, m/2) ... – PowerPoint PPT presentation

Number of Views:170
Avg rating:3.0/5.0
Slides: 21
Provided by: lanct9
Category:
Tags:
Transcript and Presenter's Notes

Title: Space Efficient Alignment Algorithms and Affine Gap Penalties

1
Space Efficient Alignment Algorithms and Affine
Gap Penalties
• Dr. Nancy Warter-Perez

2
Outline
• Algorithm complexity
• Complexity of dynamic programming alignment
algorithms
• Memory efficient algorithms
• Hirschbergs Divide and Conquer algorithm
• Affine gap penalty

3
Algorithm Complexity
• Indicates the space and time (computational)
efficiency of a program
• Space complexity refers to how much memory is
required to execute the algorithm
• Time complexity refers to how long it will take
to execute (compute) the algorithm
• Generally written in Big-O notation
• O represents the complexity (order)
• n represents the size of the data set
• Examples
• O(n) order n, linear complexity
• O(n2) order n squared, quadratic complexity
• Constants and lower orders ignored
• O(2n) O(n) and O(n2 n 1) O(n2)

4
Complexity of Dynamic Programming Algorithms for
Global/Local Alignment
• Time complexity O(mn)
• For each cell in the score matrix, perform 3
operations
• Compute Up, Left, and Diagonal scores
• O(3mn) O(mn)
• Space complexity O(mn)
• Size of scoring matrix mn
• Size of trace back matrix mn
• O(2mn) O(mn)
• Where, m and n are the lengths of the sequences
being aligned.
• Since m ? n, O(n2 ) quadratic complexity!

5
Memory Requirements
• For a sequence of 200-500 amino acids or
nucleotides
• O(n2) 5002 250,000
• If store each score as a 32-bit value 4 bytes,
it requires 1,000,000 bytes to represent the
scoring matrix!
• If store each trace back symbol as a character
(8-bit value), it requires 250,000 bytes to
represent the trace back matrix

6
Simple Improvement for Scoring Matrix
• In reality, the space complexity of the scoring
matrix is only linear, i.e., O(2min(m,n))
O(min(m,n))
• O(min(m,n)) ? O(n) for sequences of comparable
lengths
• 2,000 bytes (instead of 1 million)
• But, trace back still quadratic space complexity

7
Hirschbergs Divide and Conquer Space Efficient
Algorithm
• Compute the score matrix(s) between the source
(0,0) and (n, m/2). Save m/2 column of s.
Compute the reverse score matrix (sreverse)
between the sink (n, m) and (0,m/2). Save the
m/2 column of sreverse.
• Find middle (i, m/2) satisfies max 0? i?n s(i,
m/2) sreverse(n-i, m/2)
• Recursively partition problem into 2 subproblems

8
Pseudo Code of Space-Efficient Alignment Algorithm
• Path (source, sink)
• If source and sink are in consecutive columns
• output the longest path from the source to the
sink
• Else
• middle ?middle vertex between source and sink
• Path (source, middle)
• Path (middle, sink)

9
Complexity of Space-Efficient Alignment Algorithm
• Time complexity
• Equal to the sum of the areas of the rectangles
• Area ½ Area ¼ Area ? 2Area
• where, Area nm
• O(2nm) O(nm)
• Quadratic time/computation complexity (same as
before)
• Space complexity
• Need to save a column of s and sreverse for each
computation (but can discard after computing
middle)
• O(min(n,m)) if m lt n, switch the sequences (or
save a row of s and sreverse instead)
• Linear space complexity!!
• Reference http//www.csse.monash.edu.au/lloyd/ti
ldeAlgDS/Dynamic/Hirsch/

10
Gap Penalties
• Gap penalties account for the introduction of a
gap - on the evolutionary model, an insertion or
deletion mutation - in both nucleotide and
protein sequences, and therefore the penalty
values should be proportional to the expected
rate of such mutations.
• http//en.wikipedia.org/wiki/Sequence_alignmentAs
sessment_of_significance

11
(No Transcript)
12
Source http//www.apl.jhu.edu/przytyck/Lect03_20
05.pdf
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
Project Verification - Use EMBOSS Pairwise
Alignment Tool http//www.ebi.ac.uk/Tools/emboss/a
lign/index.html
20
Project Verification LALIGN http//www.ch.embnet
.org/software/LALIGN_form.html