Better Approximations for the Minimum Common Integer Partition Problem - PowerPoint PPT Presentation

About This Presentation

Title:

Better Approximations for the Minimum Common Integer Partition Problem

Description:

Let c1,2 be the # of common integers of Y1 and Y2 ... Find the expected output size as a function of the frequency of different integers ... – PowerPoint PPT presentation

Number of Views:62

Avg rating:3.0/5.0

Slides: 23

Provided by: davidwo1

Learn more at: http://web.mit.edu

Category:

more less

Transcript and Presenter's Notes

Title: Better Approximations for the Minimum Common Integer Partition Problem

1
Better Approximations for the Minimum Common
Integer Partition Problem

David Woodruff

MIT and Tsinghua University
Approx 2006
2
Minimum Common Integer Partition

X x1, , xr, Y y1, , ys are multisets of
positive integers. r s
Consider a partition of X into s subsets B1, ,
Bs
If there exist B1, , Bs with ?b 2 Bi b yi for
all i, then X is an integer partition of Y. Think
of X as a refinement of Y
k-MCIP problem Given Y1, , Yk, find a smallest
integer partition X of each of Y1, , Yk
Let m ?i1k Yi. Efficiency in terms of m.

3
MCIP Example

Y1 2, 2, 3, Y2 1, 1, 5
Claim 1, 1, 2, 3 k-MCIP(Y1, Y2)
Proof Partition 1 1, 1, 2, 3
Partition 2 1, 1, 2, 3
1, 1, 2, 3 is an integer partition of
Y1 and Y2
Any integer partition of both Y1,
Y2 has size 4

4
Applications
AAA-AAAAA-AA-A
AA-AA-AAAA-AAA
2,2,4,3
3,5,2,1
MCIP 2, 3, 1, 2, 3
Since MCIP small, humans and monkeys are
similar (this measure has been proposed in
practice Jiang, et al)
5
Applications
A-A-A-A-AA-A-AA-A-A
AA-AA-AAAA-AAA
2,2,4,3
1,1,1,1,2,1,2,1,1
MCIP 1, 1, 1, 1, 1, 1, 1, 2, 2
Since MCIP large, humans and mice are not
similar
6
Applications

DNA fingerprint assembly
Oligonucleotide Fingerprinting Ribosomal Genes
Project Valinsky, et al
Goal is to identify microbial organisms
Use MCIP as a subroutine, k ¼ 28, m ¼ 212
Jiang
Clustering? Scheduling?

7
Previous Work

k-MCIP problem Given Y1, , Yk, find a smallest
integer partition of each of Y1, , Yk

CLLJ NP-hard (Maximum Set Packing)
APX-hard for every k 2 (Maximum-3-Dimensiona
l Matching with Bounded Degree)
8
Previous Work

CLLJ Upper Bounds
(5/4)-approximation for k 2
Problem ?(m9) running time
(m ¼ 212 in practice)
(k-1/3)-approximation in general
Problems
(1) Large ratio
(2) Unknown if there is a tight instance

9
Our Contributions

.614k o(k) approximation
O(m log k) time
Extremely easy to implement
If Y1, , Yk are disjoint, then (k1)/2
approximation
We show that the CLLJ k-1/3 approximation
algorithm is actually a k-1/2 approximation, and
this is tight

10
Algorithm Overview

Let A be an algorithm for 2-MCIP. We build an
algorithm B for k-MCIP
Choose a random set partition ? of 1, , k into
pairs of integers
For each pair (i,j) 2 ?, let Ai,j A(Yi, Yj)
If there is only one pair (1,2) 2 ?, output A1,2,
otherwise recurse on multisets Ai,j with (i,j) 2 ?

11
2-MCIP Algorithm

What is the algorithm for 2-MCIP?
Greedy algorithm

Output
3
4
2
2
1
Y1
3
2
1
3
1
2
5
3
0
Y2
Choose two integers
Take the minimum
Subtract the minimum from both integers and
append it to the output
Remove all 0s
Repeat
Greedy(Y1, Y2) lt Y1 Y2
Generalization Greedy(Y1, , Yk) ?i1k Yi
m
12
Better 2-MCIP Algorithm

CommonElements algorithm for 2-MCIP of Y1, Y2
T Ã . While there is a common integer x of Y1
and Y2,
T Ã T x
Y1 Ã Y1 n x
Y2 Ã Y2 n x
Output T Greedy(Y1, Y2)
Let c1,2 be the of common integers of Y1 and Y2
CommonElements(Y1, Y2) (Y1 Y2 - 2c1,2)
c1,2
Y1 Y2 - c1,2

13
Algorithm Recap

Choose a random set partition ? of 1, , k into
pairs of integers
For each pair (i,j) 2 ?, let Ai,j
CommonElements(Yi, Yj)
If there is only one pair (1,2) 2 ?, output A1,2,
otherwise recurse on multisets Ai,j with (i,j) 2
?

14
Analysis

Lower bound the output size of our algorithm as a
function of the frequency of different integers
Find the expected output size as a function of
the frequency of different integers
Divide these two to get a worst-case (expected)
ratio
Derandomize using conditional expectations

15
Frequency of Integers

Define the r-redundancy Red(r) to capture
integer frequencies

1
1
3
4
1
Y1
Y2
1
1
1
2
5
Y3
1
3
1
3
2
Consider r disjoint multisets A1, , Ar such that
1. Each Ai intersects at most one input
multiset 2. Ai only contains 1 distinct integer
Red(r) is maxA1, , Ar ?i1r Ai
16
Lower Bound

Opt is the size of k-MCIP

Elements of Y1 , Y2, , Yk
Elements of k-MCIP
5
2
There are opt right vertices each of degree k
A left vertex is joined to elements
partitioning it
3
degree-1 vertices on the left is
Red(opt). So, edges is 1Red(opt) 2(m
Red(opt)). But, edges is exactly kopt. So, k
opt 2m Red(opt)
17
Example

Our bound is k opt 2m Red(opt)
If input multisets are disjoint, Red(opt)opt
Trivial greedy algorithm has output size m
So greedy algorithm is a m/opt (k1)/2
approximation

18
Algorithm Recap

Choose a random set partition ? of 1, , k into
pairs of integers
For each pair (i,j) 2 ?, let Ai,j
CommonElements(Yi, Yj)
If there is only one pair (1,2) 2 ?, output A1,2,
otherwise recurse on multisets Ai,j with (i,j) 2
?

19
Upper Bound

In some recursive call on multisets Ya and Yb, we
are interested in the number of common elements
of Ya, Yb
Since we choose a random partition of input
multisets, we can bound the expected number of
common elements as a function of Red(opt)
Linearity of expectations and some calculus
allows us to bound the expected number of common
elements encountered over all recursive calls, in
terms of Red(opt)
Use lower bound in terms of Red(opt) to get
overall ratio

20
Upper Bound

Each of O(log k) recursive calls can be
implemented in O(m) time, so O(m log k) time
Actually, proof shows that only 3 recursive calls
are necessary to get .614k o(k) approximation
This allows derandomization using conditional
expectations in O(m poly(k)) time

21
Conclusions and Future Work

.614k o(k) approximation in O(m log k) time
Improve analysis of previous best algorithm,
showing it has ratio exactly k-1/2.
Upper bound uses our notion of redundancy
Lower bound uses an adversarial argument
Best known lower bound is ?(1), so there is a
huge gap.

22
Another Example

Consider algorithm which repeatedly removes an
integer common to all k input multisets, and then
runs a greedy algorithm on the remaining
multisets CLLJ06
Suppose r common integers are removed. Then
output size (m-rk) r
But Red(opt) rk (opt r)(k-1).
Our bound is k opt 2m Red(opt)
This implies opt (2m-r)/(2k-1), and
(m-rkr)/opt k ½.
Using an adversarial argument, can show this is
tight