# A Quick Tour: - PowerPoint PPT Presentation

PPT – A Quick Tour: PowerPoint presentation | free to download - id: 7ee95d-MzQ5M

The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
Title:

## A Quick Tour:

Description:

### A Quick Tour: Algorithms, Complexity, Data Structure Laxmi Parida Computational Biology Center IBM T J Watson Research Center – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 47
Provided by: nyu83
Category:
Tags:
Transcript and Presenter's Notes

Title: A Quick Tour:

1
A Quick Tour Algorithms, Complexity, Data
Structure
Laxmi Parida Computational Biology Center IBM T
J Watson Research Center
2
Problem, Algorithm?
• Problem
• parameters (values are unspecified)
• solution (what is it)
• Algorithm
• step-by-step procedure to solve the problem

3
Example Problem
Given n positive integers, reorder the numbers
in ascending order
An instance of the problem with n5 is 3, 6, 4,
2, 1
4
Algorithm 1
Obtain all possible orderings of the n
numbers Check each ordering to see if it
satisfies the ascending property
5
Algorithm 2
1. Let i be 1 2. If i is equal to n, then exit 3.
Go through the numbers from i to n and pick the
minimum mi 4. Move mi to the start of the
list 5. Increment i by 1 and go to Step 2
Critical Question How efficient is the
algorithm?
6
Algorithm 3
Sort (n) 1. Split the elements into 2 sets of
n/2 elements 2. Solve the problem for the two
sets as Sort(n/2) 3. Merge the two solutions
in linear time
Merge Sort
7
Time Complexity (worst-case measure)
Described in terms of the size of the input n
Algorithm 1
n!
f(n) O(g(n)), f(n) lt cg(n), for all n
O(n2)
Algorithm 2
Algorithm 3
O(nlog n)
T(1) O(1), T(n) lt 2T(n/2) O(n)
8
Tractable Problems
Problem that has a polynomial time solution
Intractable if there exists no polynomial time
solution
9
Intractable Problems
Causes of intractability
• so difficult that exponential time needed to
discover the solution
• solution itself is exponential in size

10
How do we prove Intractability?
(NP-Complete Problems)
• Cooke's theorem (1971) SATISFIABILITY is
NP-complete
• If given problem P can be solved in polynomial
time, implies that SAT can be solved in
polynomial time, then P is intractable
• there exist a set of known NP-complete problems
Ni
• if one of Ni can be transformed to P in
polynomial time, then P is intractable

11
How do we deal with Intractable Problems?
• Intrinsic hardness
• Approximation Algorithms
• (provable approximation bounds, hardness of
approximation...)
• Heuristics
• Exponential Size of Output
• Output Sensitive Algorithms
• Intelligent means of reducing output size

12
Solving a Problem/ Designing an Algorithm
Consult an algorithmicist!
• Abstract the essentials of the problem
• Is it tractable?
• No. Is it approximable?
• Yes. Design an efficient algorithm
• No. Can the problem be reformulated without
significant loss?
• Yes. Repeat the process.
• No. Explore heuristics or other ad-hoc methods
• Yes. Design an efficient algorithm

13
The Mismatch Distance Problem I
Given strings s1, s2, the mismatch distance is
the number of positions that match a blank in
the other. The objective is to minimize this
distance in alignments.
Let s1 GTTCAGT s2 TTCGTT
GTTCAGT TTC GTT
mismatch distance 3
14
Mismatch Distance Problem (Recursive
Formulation)
0, if ij0 D(i,j)
D(i-1,j-1), if s1i s2j
minD(i,j-1), D(i-1,j) 1, otherwise
Required value is D(n1,n2)
T(1) O(1) n is
n1n2 T(n) lt 2T(n-1) O(1)
15
Dynamic Programming (polynomial time)
• optimal substructure
• overlapping subproblems
• recursive formulation
• subproblem solutions in a "table" (programming)

16
Dynamic Programming
-
G
T
T
C
A
G
T
-
0
1
2
3
4
5
6
7
T
1
2
1
2
3
4
5
6
T
2
3
2
1
2
3
4
5
C
3
4
3
2
1
2
3
4
G
4
3
4
3
2
3
2
3
T
5
4
3
4
3
4
3
2
T
6
5
4
3
4
5
4
3
GTTCAGT TTC GTT
17
The Dynamic Table (memoization)
• obtain optimal value
• use table to track the optimal path

18
The Mismatch Distance Problem II
Given strings s1, s2, the mismatch distance is
the number of positions that don't match with
each other (match a blank or mismatch). The
objective is to minimize this distance in
alignments.
s1 GATTCG s2
TACG TCGTTCG T
CGTTCG TACG
TACG mismatch distance 4
mismatch distance 5
19
Mismatch Distance Problem II (Recursive
Formulation)
0, if ij0 D(i,j)
D(i-1,j-1), if s1i s2j
minD(i-1,j-1),D(i,j-1), D(i-1,j) 1, otherwise
Required value is D(n1,n2)
20
Dynamic Programming
-
T
C
G
T
T
C
G
-
0
1
2
3
4
5
6
7
T
1
0
1
2
3
4
5
6
A
2
1
1
2
3
4
5
6
C
3
2
1
2
2
3
4
5
G
4
3
2
1
2
3
4
4
TCGTTCG TACG
21
Mismatch Distance Problems
TCGTTCG TACG T CGTTCG TACG
Problem II (distance 4)
Problem I (distance 5)
Longest Common Subsequence Problem
22
Mismatch Distance Problems
III Distance of gaps - of mismatches IV
Obtain the k best solutions (k 2)
23
Data Structure
• Organization of data
• to answer specific queries efficiently
• to retrieve efficiently, leading to efficient
algorithms

24
A Quick Tour Continues Data Structure
25
Data Structure
• Organization of data
• to answer specific queries efficiently
• to retrieve efficiently, leading to efficient
algorithms

26
Graphs, Trees
• Entities with binary relationships
• Acyclic graphs
• rooted, unrooted

27
Example Property of (compact) Trees
• root, internal, leaf nodes
• of internal nodes lt of leaf nodes

28
Suffix Tree (compact)
• rooted, directed tree
• leaves numbered bijectively from 1 to n root to
leaf represents si..n
• each internal node has at least 2 outgoing edges

Given a string s, represents all the suffixes of
s
29
Suffix Tree
example GTTCGATT
CGATT
G
T
ATT
4
6
TTCGATT

CGATT
T
ATT
8
5
1
3
CGATT

7
2
30
Patterns
Given a string GATCGATCGA what are the patterns?
maximal patterns GATCGA (2)
GA (3)
31
At most how many maximal non-unique patterns
exist in a string of length n?
32
Problem
Consider a genomic sequence s s1, s2, ... sm are
fragment compomers (mass spectrometry) Can we
compute the sequence of s?
33
One man's algorithm is another man's data
structure
Jon Bentley
34
Problem Formulation
Given a set X and subsets of X as S1, S2, S3, ...
Sm, is there a permutation o of the elements of X
such that the elements of each Si is consecutive
in o?
35
PQ Trees
A tree with different kinds of nodes - P The
children are in any order - Q The children are
in the fixed l-to-r or r-to-l order - leafnodes
elements of a set
36
PQ Tree (example)
Consistent seqs A B C D E B A C D E A B E D C B
A E D C C D E A B C D E B A E D C A B E D C B
A D A C B E ?
E
C
D
B
A
37
Few Definitions
Least Common Ancestor (LCA) of nodes 1, 2, ......
k is node p that is a common ancestor of all the
nodes and there does not exist common ancestor
p' s.t p is the ancestor of p' - strips of Q
- whole P Reachable Set (R(p)) is the
collection of the leafnodes that are reachable
from node p
38
Formulation as a PQ Tree Problem
Given a set X and subsets of X as S1, S2, S3, ...
Sm, is there a permutation o of the elements of X
such that the elements of each Si is consecutive
in o?
Does there exist a PQ Tree such that S1, S2, S3,
..., Sm are consistent and for each Si,
R(LCA(Si)) Si
39
Example
e,b,d,a h,f,a c,h,f,g b,d,a,f
What permutation of a, b, c, d, e, f, g, h gives
the four sets as neighbors?
40
Step 1
e,b,d,a
a
e
d
b
h, f, a
41
Step 2
h,f,a
a
e
f
h
d
b
c,h,f,g
42
Step 3
c,h,f,g
a
e
f
g
c
h
d
b
b,d,a,f
43
e, b, d, a h, f, a c, h, f, g b, d, a, f
Step 4
b,d,a,f
f
a
h
e
g
c
d
b
e b d a f h c g
44
Observations (algorithm)
• only a constant number of levels affected
• linear in the size of the input

45
Properties of PQ Tree
• minimal PQ Tree (unique)
• Reduce a tree using the following
• if only one child, merge with parent
• if k children of P node are P, merge the k nodes
with parent
• if k consecutive children of Q are Q, merge with
parent
• This reduction gives a unique PQ Tree

46
Reverse PQ Tree Problem
Given o1, o2, ... om permutations of X, find the
minimal PQ Tree that is consistent with o1, o2,
... om