1 / 74

Approximation Algorithms

Motivation

- By now weve seen many NP-Complete problems.
- We conjecture none of them has polynomial time

algorithm.

Motivation

- Is this a dead-end? Should we give up altogether?

?

Motivation

- Or maybe we can settle for good approximation

algorithms?

Introduction

- Objectives
- To formalize the notion of approximation.
- To demonstrate several such algorithms.
- Overview
- Optimization and Approximation
- VERTEX-COVER, SET-COVER

Optimization

- Many of the problems weve encountered so far are

really optimization problems. - I.e - the task can be naturally rephrased as

finding a maximal/minimal solution. - For example finding a maximal clique in a graph.

Approximation

- An algorithm which returns an answer C which is

close to the optimal solution C is called an

approximation algorithm. - Closeness is usually measured by the ratio

bound ?(n) the algorithm produces. - Which is a function that satisfies, for any input

size n, maxC/C,C/C??(n).

VERTEX-COVER

- Instance an undirected graph G(V,E).
- Problem find a set C?V of minimal size s.t. for

any (u,v)?E, either u?C or v?C.

Example

Minimum VC NP-hard

- Proof It is enough to show the decision problem

below is NP-Complete

- Instance an undirected graph G(V,E) and a

number k. - Problem to decide if there exists a set V?V of

size k s.t for any (u,v)?E, u?V or v?V.

This follows immediately from the following

observation.

Minimum VC NP-hard

- Observation Let G(V,E) be an undirected graph.

The complement V\C of a vertex-cover C is an

independent-set of G. - Proof Two vertices outside a vertex-cover cannot

be connected by an edge. ?

VC - Approximation Algorithm

COR(B) 523-524

- C ? ?
- E ? E
- while E ? ?
- do let (u,v) be an arbitrary edge of E
- C ? C ? u,v
- remove from E every edge incident to either

u or v. - return C.

Demo

Compare this cover to the one from the example

Polynomial Time

- C ? ?
- E ? E
- while E ? ? do
- let (u,v) be an arbitrary edge of E
- C ? C ? u,v
- remove from E every edge incident to either u or

v - return C

Correctness

- The set of vertices our algorithm returns is

clearly a vertex-cover, since we iterate until

every edge is covered.

How Good an Approximation is it?

Observe the set of edges our algorithm chooses

? any VC contains 1 in each

our VC contains both, hence at most twice as

large

The Traveling Salesman Problem

The Mission A Tour Around the World

The Problem Traveling Costs Money

1795

Introduction

- Objectives
- To explore the Traveling Salesman Problem.
- Overview
- TSP Formal definition Examples
- TSP is NP-hard
- Approximation algorithm for special cases
- Inapproximability result

TSP

- Instance a complete weighted undirected graph

G(V,E) (all weights are non-negative). - Problem to find a Hamiltonian cycle of minimal

cost.

3

2

10

1

4

3

5

Polynomial Algorithm for TSP?

What about the greedy strategy At any point,

choose the closest vertex not explored yet?

The Greedy trategy Fails

10

?

?

5

12

2

3

?

0

1

The Greedy trategy Fails

10

?

?

5

12

2

3

?

0

1

TSP is NP-hard

- The corresponding decision problem
- Instance a complete weighted undirected graph

G(V,E) and a number k. - Problem to find a Hamiltonian path whose cost is

at most k.

TSP is NP-hard

verify!

- Theorem HAM-CYCLE ?p TSP.
- Proof By the straightforward efficient reduction

illustrated below

0

1

0

0

1

k0

0

HAM-CYCLE

TSP

What Next?

- Well show an approximation algorithm for TSP,
- which yields a ratio-bound of 2
- for cost functions which satisfy a certain

property.

The Triangle Inequality

- Definition Well say the cost function c

satisfies the triangle inequality, if - ?u,v,w?V c(u,v)c(v,w)?c(u,w)

Approximation Algorithm

COR(B) 525-527

- 1. Grow a Minimum Spanning Tree (MST) for G.
- 2. Return the cycle resulting from a preorder

walk on that tree.

Demonstration and Analysis

The cost of a minimal hamiltonian cycle ? the

cost of a MST

?

Demonstration and Analysis

The cost of a preorder walk is twice the cost of

the tree

Demonstration and Analysis

Due to the triangle inequality, the hamiltonian

cycle is not worse.

What About the General Case?

COR(B) 528

- Well show TSP cannot be approximated within any

constant factor ??1 - By showing the corresponding gap version is

NP-hard.

gap-TSP?

- Instance a complete weighted undirected graph

G(V,E). - Problem to distinguish between the following two

cases - There exists a hamiltonian cycle, whose cost

is at most V. - The cost of every hamiltonian cycle is more

than ?V.

YES

NO

Instances

min cost

What Should an Algorithm for gap-TSP Return?

min cost

DONT-CARE...

gap-TSP Approximation

- Observation Efficient approximation of factor ?

for TSP implies an efficient algorithm for

gap-TSP?.

gap-TSP is NP-hard

- Theorem For any constant ??1,
- HAM-CYCLE ?p gap-TSP?.
- Proof Idea Edges from G cost 1. Other edges cost

much more.

The Reduction Illustrated

1

?V1

1

1

?V1

1

HAM-CYCLE

gap-TSP

Verify (a) correctness (b) efficiency

Approximating TSP is NP-hard

- gap-TSP? is NP-hard

Approximating TSP within factor ? is NP-hard

Summary

?

- Weve studied the Traveling Salesman Problem

(TSP). - Weve seen it is NP-hard.
- Nevertheless, when the cost function satisfies

the triangle inequality, there exists an

approximation algorithm with ratio-bound 2.

Summary

?

- For the general case weve proven there is

probably no efficient approximation algorithm for

TSP. - Moreover, weve demonstrated a generic method for

showing approximation problems are NP-hard.

SET-COVER

- Instance a finite set X and a family F of

subsets of X, such that - Problem to find a set C?F of minimal size which

covers X, i.e -

SET-COVER Example

SET-COVER is NP-Hard

- Proof Observe the corresponding decision

problem. - Clearly, its in NP (Check!).
- Well sketch a reduction from (decision)

VERTEX-COVER to it

VERTEX-COVER ?p SET-COVER

one element for every edge

one set for every vertex, containing the edges it

covers

Greedy Algorithm

COR(B) 530-533

- C ? ?
- U ? X
- while U ? ? do
- select S?F that maximizes S?U
- C ? C ? S
- U ? U - S
- return C

Demonstration

compare to the optimal cover

0

1

2

3

4

5

Is Being Greedy Worthwhile? How Do We Proceed

From Here?

- We can easily bound the approximation ratio by

logn. - A more careful analysis yields a tight bound of

lnn.

Loose Ratio-Bound

- Claim If ? cover of size k, then after k

iterations the algorithm covered at least ½ of

the elements.

Suppose it doesnt and observe the situation

after k iterations

Loose Ratio-Bound

- Claim If ? cover of size k, then after k

iterations the algorithm covered at least ½ of

the elements.

Since this part ? can also be covered by k sets...

gt½

what we covered

Loose Ratio-Bound

- Claim If ? cover of size k, then after k

iterations the algorithm covered at least ½ of

the elements.

there must be a set not chosen yet, whose size is

at least ½n1/k

gt½

what we covered

Loose Ratio-Bound

- Claim If ? cover of size k, then after k

iterations the algorithm covered at least ½ of

the elements.

gt½

Thus in each of the k iterations weve covered at

least ½n1/k new elements

what we covered

Loose Ratio-Bound

- Claim If ? cover of size k, then after k

iterations the algorithm covered at least ½ of

the elements.

Therefore after klogn iterations (i.e - after

choosing klogn sets) all the n elements must be

covered, and the bound is proved.

Tight Ratio-Bound

- Claim The greedy algorithm approximates the

optimal set-cover within factor - H(max S S?F )
- Where H(d) is the d-th harmonic number

Tight Ratio-Bound

Claims Proof

- Whenever the algorithm chooses a set, charge 1.

- Split the cost between all covered vertices.

Analysis

- That is, we charge every element x?X with
- Where Si is the first set which covers x.

cx

Lemma

Number of members of S left uncovered after i

iterations

- Lemma For every S?F,

Let k be the smallest index, for which uk0.

?1?i?k Si covers ui-1-ui elements from S

Lemma

This last observation yields

Our greedy strategy promises Si (1?i?k) covers at

least as many new elements as S.

Since for any 1?i?C we defined ui as

S-(S1?...?Si)...

For any bgta?N, H(b)-H(a)1/(a1)...1/(b)?(b-a)1

/b

This is a telescopic sum

uk0

H(0)0

u0S

Analysis

- Now we can finally complete our analysis

Summary

?

- As it turns out, we can sometimes find efficient

approximation algorithms for NP-hard problems. - Weve seen two such algorithms
- for VERTEX-COVER (factor 2)
- for SET-COVER (logarithmic factor).

The Subset Sum Problem

- Problem definition
- Given a finite set S and a target t, find a

subset S ? S whose elements sum to t - All possible sums
- S x1, x2, .., xn
- Li set of all possible sums of x1, x2, .., xi
- Example
- S 1, 4, 5
- L1 0, 1
- L2 0, 1, 4, 5 L1 ? (L1 x2)
- L3 0, 1, 4, 5, 6, 9, 10 L2 ? (L2 x3)
- Li Li-1 ? (Li-1xi)

Subset Sum, revisited

- Given a set S of numbers, find a subset S that

adds up to some target number t. - To find the largest possible sum that doesnt

exceed t - T 0
- for each x in S
- T union(T, xT)
- remove elements from T that exceed t
- return largest element in T
- (Aside How should we implement T?)

x T adds x to each element in the set T

Potential doubling at each step

Complexity O(2n)

Trimming

- To reduce the size of the set T at each stage, we

apply a trimming process. - For example, if z and y are consecutive elements

and (1-d)y ? z lt y, then remove z. - If d0.1, 10,11,12,15,20,21,22,23,24,29 ?

10,12,15,20,23,29

Subset Sum with Trimming

- Incorporate trimming in the previous algorithm
- T 0
- for each x in S
- T union(T, xT)
- T trim(d, T)
- remove elements from T that exceed t
- return largest element in T
- Trimming only eliminates values, it doesnt

create new ones. So the final result is still

the sum of a subset of S that is less than t.

0 ? d ? 1/n

- At each stage, values in the trimmed T are within

a factor somewhere between (1-d) and 1 of the

corresponding values in the untrimmed T. - The final result (after n iterations) is within a

factor somewhere between (1-d)n and 1 of the

result produced by the original algorithm.

- After trimming, the ratio between successive

elements in T is at least 1/(1-d), and all of the

values are between 0 and t. - Hence the maximum number of elements in T is

log(1/(1-d)) t ? (log t / d). - This is enough to give us a polynomial bound on

the running time of the algorithm.

Subset Sum Trim

- Want to reduce the size of a list by trimming
- L An original list
- L The list after trimming L
- d trimming parameter, 0..1
- y an element that is removed from L
- z corresponding (representing) element in L

(also in L) - (y-z)/y ? d
- (1-d)y ? z ? y
- Example
- L 10, 11, 12, 15, 20, 21, 22, 23, 24, 29
- d 0.1
- L 10, 12, 15, 20, 23,

29 - 11 is represented by 10. (11-10)/11 ? 0.1
- 21, 22 are represented by 20. (21-20)/21 ? 0.1
- 24 is represented by 23. (24-23)/24 ? 0.1

Subset Sum Trim (2)

- Trim(L, d) // L y1, y2, .., ym
- L y1
- last y1 // most recent element z in L which

represent elements in L - for i 2 to m do
- if last lt (1-d) yi then // (1-d)y ? z ? y
- // yi is appended into L when it cannot

be represented by last - append yi onto the end of L
- last yi
- return L
- Example
- L 10, 11, 12, 15, 20, 21, 22, 23, 24, 29
- d 0.1
- L 10, 12, 15, 20, 23,

29 - O(m)

Subset Sum Approximate Algorithm

- Approx_subset_sum(S, t, e) // Sx1,x2,..,xn
- L0 0
- for i 1 to n do
- Li Li-1 ? (Li-1xi)
- Li Trim(Li, e/n)
- Remove elements that are greater than t from

Li - return the largest element in Ln
- Example
- L 104, 102, 201, 101, t308, e0.20, d

e/n0.05 - L0 0
- L1 0, 104
- L2 0, 102, 104, 206
- After trimming 104 L2 0, 102, 206
- L3 0, 102, 201, 206, 303, 407
- After trimming 206 L3 0, 102, 201, 303, 407
- After removing 407 L3 0, 102, 201, 303
- L4 0, 101, 102, 201, 203, 302, 303, 404
- After trimming 102, 203, 303 L4 0, 101, 201,

302, 404 - After removing 404 L4 0, 101, 201, 302

Subset Sum - Correctness

- The approximation solution C is not smaller than

(1-e) times of an optimal solution C - i.e., C(1-e) ? C
- Proof
- for every element y in L there is a z in L such

that - (1-e/n)y ? z ? y
- for every element y in Li there is a z in Li

such that - (1-e/n)i y ? z ? y
- If y is an optimal solution in Ln, then there is

a corresponding z in Ln - (1-e/n)n y ? z ? y
- Since (1-e) lt (1-e/n)n (1-e/n)n is increasing

- (1-e) y ? (1-e/n)n y ? z
- (1-e) y ? z
- So the value z returned is not smaller than 1-e

times the optimal solution y

Subset Sum Correctness (2)

- The approximation algorithm is fully polynomial
- Proof
- Successive elements z and z in Li must have the

relationship - z/z 1/(1-e/n)
- i,e, they differ by a factor of at least

1/(1-e/n) - The number of elements in each Li is at most
- log 1/(1-e/n) t t is the largest
- ln t / (-ln(1-e/n))
- ? (ln t) / (-(-e/n)) Eq. 2.10 x/(1x) ?

ln(1x) ? x, for x gt -1 - ? (n ln t) / e
- So the length of Li is polynomial
- So the running time of the algorithm is polynomial

Summary

- Not all problems are computable.
- Some problems can be solved in polynomial time

(P). - Some problems can be verified in polynomial time

(NP). - Nobody knows whether PNP.
- But the existence of NP-complete problems is

often taken as an indication that P?NP. - In the meantime, we use approximation to find

good-enough solutions to hard problems.

Whats Next?

?

- But where can we draw the line?
- Does every NP-hard problem have an approximation?
- And to within which factor?
- Can approximation be NP-hard as well?