Approximation Algorithms

- Greedy Strategies

Max and Min

- min f is equivalent to max f.
- However, a good approximation for min f may not

be a good approximation for max f. - For example, consider a graph G(V,E). C is a

minimum vertex cover of G iff V \ C is a maximum

independent set of G. The minimum vertex cover

has a polynomial-time 2-approximation, but the

maximum independent set has no constant-bounded

approximation unless NPP. - Another example Minimum Connected Dominating Set

and Minimum Spanning Tree with Maximum Number of

Leaves

Greedy for Max and Min

- Max --- independent system
- Min --- submodular potential function

- Independent System

Independent System

- Consider a set E and a collection C of subsets

of E. (E,C) is called an independent system if

The elements of C are called independent sets

Maximization Problem

Greedy Approximation MAX

Theorem

Proof

(No Transcript)

(No Transcript)

Maximum Weight Hamiltonian Cycle

- Given an edge-weighted complete graph, find a

Hamiltonian cycle with maximum total weight.

Independent sets

- E all edges
- A subset of edges is independent if it is a

Hamiltonian cycle or a vertex-disjoint union of

paths. - C a collection of such subsets

Maximal Independent Sets

- Consider a subset F of edges. For any two maximal

independent sets I and J of F, - J lt 2I

(No Transcript)

- Theorem For the maximum Hamiltonian cycle

problem, the greedy algorithm MAX produces a

polynomial time approximation with performance

ratio at most 2.

Maximum Weight Directed Hamiltonian Cycle

- Given an edge-weighted complete digraph, find a

Hamiltonian cycle with maximum total weight.

Independent sets

- E all edges
- A subset of edges is independent if it is a

directed Hamiltonian cycle or a vertex-disjoint

union of directed paths.

(No Transcript)

Tightness

The rest of all edges have a cost e

e

1

1

1

1e

A Special Case

- If c satisfies the following quadrilateral

condition - For any 4 vertices u, v, u, v in V,
- Then the greedy approximation for maximum weight

Hamiltonian cycle has the performance ratio 2.

(No Transcript)

(No Transcript)

(No Transcript)

(No Transcript)

Superstring

- Given n strings s1, s2, , sn, find a shortest

string s containing all s1, s2, , sn as

substrings. - No si is a substring of another sj.

An Example

- Given S abcc, efaab, bccef
- Some possible solutions
- Concatenate all substrings abccefaabbccef (14

chars) - A shortest superstring is abccefaab (9 chars)

Relationship to Set Cover?

- How to transform the shortest superstring (SS)

to the Set Cover (SC) problem? - Need to identify U
- Need to identify S
- Need to define the cost function
- The SC instance is an SS instance
- Let U S (a set of n strings).
- How to define S ?

Relationship to SC (cont)

- Let M be the set that consists of the strings sijk

k

si

sj

sijk

Relationship to SC (cont)

Now, define S

Define cost of

Let C is the set cover of this constructed SC,

then the concatenation of all strings in C is a

solution of SS. Note that C is a collection of

Algorithm 1 for SS

Approximation Ratio

- Lemma 1 Let opt be length of the optimal

solution of SS and opt be the cost of the

optimal solution of SC, we have opt opt

2opt - Proof

Proof of Lemma 1 (cont)

Approximation Ratio

- Theorem1 Algorithm 1 has an approximation ratio

within a factor of 2Hn - Proof We know that the approximation ratio of

Set Cover is Hn. From Lemma 1, it follows

directly that Algorithm 1 is a 2Hn factor

algorithm for SS

Prefix and Overlap

- For two string s1 and s2, we have
- Overlap(s1,s2) the maximum between the suffix

of s1 and the prefix of s2. - pref(s1,s2) the prefix of s1 that remains after

chopping off overlap(s1,s2) - Example
- s1 abcbcaa and s2 bcaaca, then
- overlap(s1,s2) bcaa
- pref(s1,s2) abc
- Note overlap(s1,s2) ? overlap(s2, s1)

Is there any better approach?

- Now, suppose that in the optimal solution, the

strings appear from the left to right in the

order s1, s2, , sn - Define opt pref(s1,s2) pref(sn-1,sn)

pref(sn,s1) overlap(sn,s1)

Why overlap(sn,s1)? Consider this

example Sagagag, gagaga. If we just consider

the prefix only, the result would be ag whereas

the correct result is agagaga

Prefix Graph

- Define the prefix graph as follows
- Complete weighted directed graph G(V,E)
- V is a set of vertices, labeled from 1 to n (each

vertex represents each string si) - For each edge i?j, i ? j, assign a weight of

pref(si, sj) - Example
- Sabc, bcd, dab

1( )

( )3

2( )

Cycle Cover

- Cycle Cover a collection of disjoint cycles

covering all vertices (each vertex is in exactly

one cycle) - Note that the tour 1 ? 2 ? ? n ? 1 is a cycle

cover - Minimum weight cycle cover sum of weights is

minimum over all covers - Thus, we want to find a minimum weight cycle cover

How to find a min. weight cycle cover

- Corresponding to the prefix graph, construct a

bipartite graph H(X,YE) such that - X x1, x2, , xn and Y y1, y2, , yn
- For each i, j (in 1n), add edge (xi, yj) of

weight pref(si,sj) - Each cycle cover of the prefix graph ? a perfect

matching of the same weight in H. (Perfect

matching is a matching which covers all the

vertices) - Finding a minimum weight cycle cover finding a

minimum weight perfect matching (which can be

found in poly-time)

How to break the cycle

A constant factor algorithm

- Algorithm 2

Approximation Ratio

- Lemma 2 Let C be the minimum weight cycle cover

of S. Let c and c be two cycles in C, and let r,

r be representative strings from these cycles.

Then - overlap(r, r) lt w(c) w(c)
- Proof Exercise

Approximation Ratio (cont)

- Theorem 2 Algorithm 2 has an approximation

ratio of 4. - Proof (see next slide)

Proof

Modification to 3-Approximation

3-Approximation Algorithm

- Algorithm 3

Superstring via Hamiltonian path

- ov(u,v) maxw there exist x and y
- such that uxw and

vwy - Overlap graph G is a complete digraph
- V s1, s2, , sn
- ov(u,v) is edge weight.
- Suppose s is the shortest supper string. Let s1,

, sn be the strings in the order of appearance

from left to right. Then si, si1 must have

maximum overlap in s. Hence s1, , sn form a

directed Hamiltonian path in G.

(No Transcript)

(No Transcript)

The Algorithm (via Hamiltonian)

A special property

u

v

u

v

Theorem

- The Greedy approximation MAX for maximum

Hamiltonian path in overlapping graph has

performance ratio 2. - Conjecture This greedy approximation also give

the minimum superstring an approximation solution

within a factor of 2 from optimal. - Example Sabk, bk1, bka. s abk1a. Our

obtained solution abkabk1.

- Submodular Function

What is a submodular function?

- Consider a finite set E, (called ground set),

and a function f 2E ?Z. The function f is said

to be submodular if for any two subsets A and B

in 2E - Example f(A) A is submodular.

Set-Cover

- Given a collection C of subsets of a set E,

find a minimum subcollection C of C such that

every element of E appears in a subset in C .

Greedy Algorithm

Return C Here f(C) of elements in

C Basically, the algorithm pick up the set that

cover the most uncovered elements at each step

Analyze the Approximation Ratio

(No Transcript)

(No Transcript)

Alternative Analysis

What do we need?

Whats we need?

- Actually, this inequality holds if and only if f

is submodular and - (monotone increasing)

(No Transcript)

Proof

Proof of (1)

Proof of (2)

Theorem

- Greedy Algorithm produces an approximation within

ln n 1 from optimal for the set cover problem - The same result holds for weighted set-cover.

Weighted Set Cover

- Given a collection C of subsets of a set E and a

weight function w on C, find a minimum

total-weight subcollection C of C such that

every element of E appears in a subset in C .

Greedy Algorithm

A General Problem

Greedy Algorithm

A General Theorem

Remark (Normalized)

Proof

Proof (cont)

- We will prove these following claims

Show the First Claim

(No Transcript)

(No Transcript)

Show the Second Claim

For any integers p gt q gt 0, we have (p q)/p

\sum_jq1p 1/p \le \sum_jq1p 1/j

Connected Vertex-Cover

- Given a connected graph, find a minimum

vertex-cover which induces a connected subgraph.

- For any vertex subset A, p(A) is the number of

edges not covered by A. - For any vertex subset A, q(A) is the number of

connected component of the subgraph induced by A.

- -p is submodular.
- -q is not submodular.
- Note that when A is a connected vertex cover, the

q(A) 1 and p(A) 0.

-p-q

- Define f(A) -p(A) q(A). Then f(A) is submodular

and monotone increasing

Theorem

- Connected Vertex-Cover has a (1ln

?)-approximation where ? is the maximum degree. - -p(Ø)-E, -q(Ø)0.
- E-p(x)-q(x) lt ?-1

Weighted Connected Vertex-Cover

- Given a vertex-weighted connected graph,
- find a connected vertex-cover with minimum
- total weight.
- Theorem Weighted Connected Vertex-Cover
- has a (1ln ?)-approximation.