Loading...

PPT – Space Efficient Alignment Algorithms and Affine Gap Penalties PowerPoint presentation | free to download - id: 27f457-ZDc1Z

The Adobe Flash plugin is needed to view this content

Space Efficient Alignment Algorithms and Affine

Gap Penalties

- Dr. Nancy Warter-Perez

Outline

- Algorithm complexity
- Complexity of dynamic programming alignment

algorithms - Memory efficient algorithms
- Hirschbergs Divide and Conquer algorithm
- Affine gap penalty

Algorithm Complexity

- Indicates the space and time (computational)

efficiency of a program - Space complexity refers to how much memory is

required to execute the algorithm - Time complexity refers to how long it will take

to execute (compute) the algorithm - Generally written in Big-O notation
- O represents the complexity (order)
- n represents the size of the data set
- Examples
- O(n) order n, linear complexity
- O(n2) order n squared, quadratic complexity
- Constants and lower orders ignored
- O(2n) O(n) and O(n2 n 1) O(n2)

Complexity of Dynamic Programming Algorithms for

Global/Local Alignment

- Time complexity O(mn)
- For each cell in the score matrix, perform 3

operations - Compute Up, Left, and Diagonal scores
- O(3mn) O(mn)
- Space complexity O(mn)
- Size of scoring matrix mn
- Size of trace back matrix mn
- O(2mn) O(mn)
- Where, m and n are the lengths of the sequences

being aligned. - Since m ? n, O(n2 ) quadratic complexity!

Memory Requirements

- For a sequence of 200-500 amino acids or

nucleotides - O(n2) 5002 250,000
- If store each score as a 32-bit value 4 bytes,

it requires 1,000,000 bytes to represent the

scoring matrix! - If store each trace back symbol as a character

(8-bit value), it requires 250,000 bytes to

represent the trace back matrix

Simple Improvement for Scoring Matrix

- In reality, the space complexity of the scoring

matrix is only linear, i.e., O(2min(m,n))

O(min(m,n)) - O(min(m,n)) ? O(n) for sequences of comparable

lengths - 2,000 bytes (instead of 1 million)
- But, trace back still quadratic space complexity

Hirschbergs Divide and Conquer Space Efficient

Algorithm

- Compute the score matrix(s) between the source

(0,0) and (n, m/2). Save m/2 column of s.

Compute the reverse score matrix (sreverse)

between the sink (n, m) and (0,m/2). Save the

m/2 column of sreverse. - Find middle (i, m/2) satisfies max 0? i?n s(i,

m/2) sreverse(n-i, m/2) - Recursively partition problem into 2 subproblems

Pseudo Code of Space-Efficient Alignment Algorithm

- Path (source, sink)
- If source and sink are in consecutive columns
- output the longest path from the source to the

sink - Else
- middle ?middle vertex between source and sink
- Path (source, middle)
- Path (middle, sink)

Complexity of Space-Efficient Alignment Algorithm

- Time complexity
- Equal to the sum of the areas of the rectangles
- Area ½ Area ¼ Area ? 2Area
- where, Area nm
- O(2nm) O(nm)
- Quadratic time/computation complexity (same as

before) - Space complexity
- Need to save a column of s and sreverse for each

computation (but can discard after computing

middle) - O(min(n,m)) if m lt n, switch the sequences (or

save a row of s and sreverse instead) - Linear space complexity!!
- Reference http//www.csse.monash.edu.au/lloyd/ti

ldeAlgDS/Dynamic/Hirsch/

Gap Penalties

- Gap penalties account for the introduction of a

gap - on the evolutionary model, an insertion or

deletion mutation - in both nucleotide and

protein sequences, and therefore the penalty

values should be proportional to the expected

rate of such mutations. - http//en.wikipedia.org/wiki/Sequence_alignmentAs

sessment_of_significance

(No Transcript)

Source http//www.apl.jhu.edu/przytyck/Lect03_20

05.pdf

(No Transcript)

(No Transcript)

(No Transcript)

(No Transcript)

(No Transcript)

(No Transcript)

Project Verification - Use EMBOSS Pairwise

Alignment Tool http//www.ebi.ac.uk/Tools/emboss/a

lign/index.html

Project Verification LALIGN http//www.ch.embnet

.org/software/LALIGN_form.html