Loading...

PPT – Dynamic programming algorithms for all-pairs shortest path and longest common subsequences PowerPoint presentation | free to download - id: 6eda37-OWJiO

The Adobe Flash plugin is needed to view this content

Dynamic programming algorithms for all-pairs

shortest path and longest common subsequences

- We will study a new techniquedynamic programming

algorithms (typically for optimization problems) - Ideas
- Characterize the structure of an optimal solution
- Recursively define the value of an optimal

solution - Compute the value of an optimal solution in a

bottom-up fashion (using matrix to compute) - Backtracking to construct an optimal solution

from computed information.

Floyd-Warshall algorithm for shortest path

- Use a different dynamic-programming formulation

to solve the all-pairs shortest-paths problem on

a directed graph G(V,E). - The resulting algorithm, known as the

Floyd-Warshall algorithm, runs in O (V3) time. - negative-weight edges may be present,
- but we shall assume that there are no

negative-weight cycles.

The structure of a shortest path

- We use a different characterization of the

structure of a shortest path than we used in the

matrix-multiplication-based all-pairs algorithms. - The algorithm considers the intermediate

vertices of a shortest path, where an

intermediate vertex of a simple path

pltv1,v2,,vlgt is any vertex in p other than v1

or vl, that is, any vertex in the set

v2,v3,,vl-1

Continue

- Let the vertices of G be V1,2,,n, and

consider a subset 1,2,,k of vertices for some

k. - For any pair of vertices i,j ? V, consider all

paths from i to j whose intermediate vertices are

all drawn from 1,2,,k,and let p be a

minimum-weight path from among them. - The Floyd-Warshall algorithm exploits a

relationship between path p and shortest paths

from i to j with all intermediate vertices in the

set 1,2,,k-1.

Relationship

- The relationship depends on whether or not k is

an intermediate vertex of path p. - If k is not an intermediate vertex of path p,

then all intermediate vertices of path p are in

the set 1,2,,k-1. Thus, a shortest path from

vertex i to vertex j with all intermediate

vertices in the set 1,2,,k-1 is also a

shortest path from i to j with all intermediate

vertices in the set 1,2,,k. - If k is an intermediate vertex of path p,then we

break p down into i k

j as shown Figure 2.p1 is a shortest path from i

to k with all intermediate vertices in the set

1,2,,k-1, so as p2.

All intermediate vertices in 1,2,,k-1

p2

k

p1

j

i

Pall intermediate vertices in 1,2,,k

Figure 2. Path p is a shortest path from vertex

i to vertex j,and k is the highest-numbered

intermediate vertex of p. Path p1, the portion

of path p from vertex i to vertex k,has all

intermediate vertices in the set 1,2,,k-1.The

same holds for path p2 from vertex k to vertex j.

A recursive solution to the all-pairs shortest

paths problem

- Let dij(k) be the weight of a shortest path from

vertex i to vertex j with all intermediate

vertices in the set 1,2,,k. A recursive

definition is given by - dij(k) wij

if k0, - min(dij(k-1),dik(k-1)dkj(k-1))

if k 1. - The matrix D(n)(dij(n)) gives the final

answer-dij(n) for all i,j

V-because all intermediate vertices are in the

set 1,2,,n.

Computing the shortest-path weights bottom up

- FLOYD-WARSHALL(W)
- n rowsW
- D(0) W
- for k 1 to n
- do for i 1 to n
- do for j 1 to n
- dij(k)

min(dij(k-1),dik(k-1)dkj(k-1)) - return D(n)

Example

- Figure 3

2

4

3

1

3

8

1

-5

-4

2

7

5

4

6

(No Transcript)

D(2)

(2)

(3)

D(3)

D(4)

(4)

(5)

D(5)

Comparison of two strings

- Longest common subsequence
- Shortest common supersequence
- Edit distance between two sequences

1. Longest common subsequence

- Definition 1 Given a sequence Xx1x2...xm,

another sequence Zz1z2...zk is a subsequence of

X if there exists a strictly increasing sequence

i1i2...ik of indices of X such that for all

j1,2,...k, we have xijzj. - Example 1 If Xabcdefg, Zabdg is a subsequence

of X. Xabcdefg,Zab d g

- Definition 2 Given two sequences X and Y. A

sequence Z is a common subsequence of X and Y if

Z is a subsequence of both X and Y. - Example 2 Xabcdefg and Yaaadgfd. Zadf is a

common subsequence of X and Y. - Xabc defg
- Yaaaadgfd
- Za d f

- Definition 3 A longest common subsequence of X

and Y is a common subsequence of X and Y with the

longest length. (The length of a sequence is the

number of letters in the seuqence.) - Longest common subsequence may not be unique.

Longest common subsequence problem

- Input Two sequences Xx1x2...xm, and

Yy1y2...yn. - Output a longest common subsequence of X and Y.
- A brute-force approach
- Suppose that m?n. Try all subsequence of X

(There are 2m subsequence of X), test if such a

subsequence is also a subsequence of Y, and

select the one with the longest length.

Charactering a longest common subsequence

- Theorem (Optimal substructure of an LCS)
- Let Xx1x2...xm, and Yy1y2...yn be two

sequences, and - Zz1z2...zk be any LCS of X and Y.
- 1. If xmyn, then zkxmyn and Z1..k-1 is an

LCS of X1..m-1 and Y1..n-1. - 2. If xm ?yn, then zk?xm implies that Z is an LCS

of X1..m-1 and Y. - 2. If xm ?yn, then zk?yn implies that Z is an LCS

of X and Y1..n-1.

The recursive equation

- Let ci,j be the length of an LCS of X1...i

and X1...j. - ci,j can be computed as follows
- 0

if i0 or j0, - ci,j ci-1,j-11 if

i,jgt0 and xiyj, - maxci,j-1,ci-1,j if i,jgt0

and xi?yj. - Computing the length of an LCS
- There are n?m ci,js. So we can compute them in

a specific order.

The algorithm to compute an LCS

- 1. for i1 to m do
- 2. ci,00
- 3. for j0 to n do
- 4. c0,j0
- 5. for i1 to m do
- 6. for j1 to n do
- 7.
- 8. if xi yj then
- 9. ci,jci-1,j-11
- 10 bi,j1
- 11. else if ci-1,jgtci,j-1 then
- 12. ci,jci-1,j
- 13. bi,j2
- 14. else ci,jci,j-1
- 15. bi,j3
- 14

- Example 3 XBDCABA and YABCBDAB.

Constructing an LCS (back-tracking)

- We can find an LCS using bi,js.
- We start with bn,m and track back to some cell

b0,i or bi,0. - The algorithm to construct an LCS
- 1. im
- 2. jn
- 3. if i0 or j0 then exit
- 4. if bi,j1 then
- ii-1
- jj-1
- print xi
- 5. if bi,j2 ii-1
- 6. if bi,j3 jj-1
- 7. Goto Step 3.
- The time complexity O(nm).

2. Shortest common supersequence

- Definition Let X and Y be two sequences. A

sequence Z is a supersequence of X and Y if both

X and Y are subsequence of Z. - Shortest common supersequence problem
- Input Two sequences X and Y.
- Output a shortest common supersequence of X and

Y.

- Recursive Equation
- Let ci,j be the length of an LCS of X1...i

and X1...j. - ci,j can be computed as follows
- j

if i0 - i

if j0, - ci,j ci-1,j-11 if

i,jgt0 and xiyj, - minci,j-11,ci-1,j1 if

i,jgt0 and xi?yj.

(No Transcript)

3. Edit distance between two sequences

- Three operations
- insertion inserting an x into abc (between a

and b), we get axbc. - deletion deleting b from abc, we get ac.
- replacement Given a sequence abc, replacing a

with x, we get xbc.

- Definition Suppose that we can use three edit

operations (insertion, deletion, and replacement)

to edit a sequence into another. The edit

distance between two sequences is the minimum

number of operations required to edit one

sequence into another. - Note each operation is counted as 1.
- Weighted edit distance
- There is a weight on each operation.
- For example s(a,b)1, s(a, _)1.5, s(b,a)1,

s(b,_)1.5. - Where the weight comes from
- For DNA and protein sequences, it is from

statistics.

Alignment of sequences -- an alternative

- An alignment of two sequences is obtained by

inserting spaces into or at either end of X and

Ysuch that the two resulting sequences X and Y

are of the same length. That is, every letter in

X is opposite to a unique letter in Y. - The alignment value is defined as
- where Xi and Yi denote the two letters in

column i of the alignment and s(Xi, Yi)

is the score (weight) of these opposing letters. - There are several popular socre schemes for DNA

and protein sequences.

- Facts The edit distance between two sequences is

the same as the alignment value of two sequences

if we use the same score scheme. - Recursive equation
- ci,jmin ci-1, j-1s(Xi, Yj), ci,

j-1s(_,Yj), ci-1, j)s(Xi,_). - Time and space complexity
- Both are O(nm) or O(n2) if both sequences have

equal length n. - Why?
- We have to compute ci,j (the cost) and bi,j

(for back-tracking). Each will take O(n2).

Linear space algorithm

- Hints Computing ci,j needs linear space

whereas back-tracking needs O(nm) time.

- To compute ci,j, we need ci-1,j-1, ci,j-1,

ci-1,j. - So, to get cn,m, we only have to keep dark

cells. - However, if we do not have all the bi,js, we

can not get the alignment (nor the edit process,

the subsequence, the supersequence).

- Discussion Each time we only keep a few bi,js

and we can re-compute the bi,js again. In this

way, we can get a linear space algorithm.

However, the time complexity is increased to

O(n3).

- A Better Idea find a cuting point.
- For the problems of smaller size, we do the same

thing until one of the segment contains 1 letter.

- Key each time, we fix the middle point (n/2) of

X.

- Example Xabcdefgh and Yaacdefhh.
- Score scheme match -- 0 and mismatch -- 1.
- The alignment
- abcdefgh abcd efgh
- aacdefhh aacd efhh
- /\
- cutting point (4,4).

- Finding the cutting point
- Let Xx1x2x3...xn and Yy1y2y3...ym.
- Define XTxnxn-1...x1 and YTymym-1 ...y1.
- Let ci,j be the cost of optimal alignment for

X1...i and Y1...j and cck,l be the cost of

optimal alignment for XT1...k and YT1...l. - for (i1, iltn i)
- if( (cn/2, iccn-n/2, m-i)

cn,n) - point i
- We need to check two rows, cn/2,1,

cn/2,2, ...cn/2,m and ccn-n/2, 1,

ccn-n/2,2, ... ccn-n/2,m. O(m) space.

The algorithm

- 1. compute cn,n, the n/2-th row and the

(n/21)-th row of c. - 2. find the cutting point (n/2, i) as shown

above. - 3. if i-n/2 1 then compute the alignment

of X1...n/2) and Y1...i. - 4. if n-n/21 1 then compute the alignment

of Xn/21...n and Yi1...n. - 5. if i-n/2 ! 1 and n-n/21 !1 then
- recursive on step 1-4 for the two pairs of

sequences X1...n/2) and Y1...i, and

Xn/21...n and Yi1...n finally combine

the two alignments for the two pairs of

sequences.

- Time complexity analysis
- The first round needs T time, where T is the

time for the normal algorithm. (O(n2).) - 2nd round needs 1/2 T. (0.5 n ? i 0.5 n ?

(n-i)0.5n2.) - 3rd round need 1/4 T.
- i-th round needs 1/2i-1 T.
- Total time T(1/21/41/8 ... )T 2T O(n2).