Title: Better Approximations for the Minimum Common Integer Partition Problem
1Better Approximations for the Minimum Common
Integer Partition Problem
MIT and Tsinghua University
Approx 2006
2Minimum Common Integer Partition
- X x1, , xr, Y y1, , ys are multisets of
positive integers. r s - Consider a partition of X into s subsets B1, ,
Bs - If there exist B1, , Bs with ?b 2 Bi b yi for
all i, then X is an integer partition of Y. Think
of X as a refinement of Y - k-MCIP problem Given Y1, , Yk, find a smallest
integer partition X of each of Y1, , Yk - Let m ?i1k Yi. Efficiency in terms of m.
3MCIP Example
- Y1 2, 2, 3, Y2 1, 1, 5
- Claim 1, 1, 2, 3 k-MCIP(Y1, Y2)
- Proof Partition 1 1, 1, 2, 3
-
- Partition 2 1, 1, 2, 3
-
- 1, 1, 2, 3 is an integer partition of
Y1 and Y2 - Any integer partition of both Y1,
Y2 has size 4
4Applications
AAA-AAAAA-AA-A
AA-AA-AAAA-AAA
2,2,4,3
3,5,2,1
MCIP 2, 3, 1, 2, 3
Since MCIP small, humans and monkeys are
similar (this measure has been proposed in
practice Jiang, et al)
5Applications
A-A-A-A-AA-A-AA-A-A
AA-AA-AAAA-AAA
2,2,4,3
1,1,1,1,2,1,2,1,1
MCIP 1, 1, 1, 1, 1, 1, 1, 2, 2
Since MCIP large, humans and mice are not
similar
6Applications
- DNA fingerprint assembly
- Oligonucleotide Fingerprinting Ribosomal Genes
Project Valinsky, et al - Goal is to identify microbial organisms
- Use MCIP as a subroutine, k ¼ 28, m ¼ 212
Jiang - Clustering? Scheduling?
7Previous Work
- k-MCIP problem Given Y1, , Yk, find a smallest
integer partition of each of Y1, , Yk
CLLJ NP-hard (Maximum Set Packing)
APX-hard for every k 2 (Maximum-3-Dimensiona
l Matching with Bounded Degree)
8Previous Work
- CLLJ Upper Bounds
-
- (5/4)-approximation for k 2
- Problem ?(m9) running time
- (m ¼ 212 in practice)
- (k-1/3)-approximation in general
- Problems
- (1) Large ratio
- (2) Unknown if there is a tight instance
9Our Contributions
- .614k o(k) approximation
- O(m log k) time
- Extremely easy to implement
- If Y1, , Yk are disjoint, then (k1)/2
approximation - We show that the CLLJ k-1/3 approximation
algorithm is actually a k-1/2 approximation, and
this is tight
10Algorithm Overview
- Let A be an algorithm for 2-MCIP. We build an
algorithm B for k-MCIP - Choose a random set partition ? of 1, , k into
pairs of integers - For each pair (i,j) 2 ?, let Ai,j A(Yi, Yj)
- If there is only one pair (1,2) 2 ?, output A1,2,
otherwise recurse on multisets Ai,j with (i,j) 2 ?
112-MCIP Algorithm
- What is the algorithm for 2-MCIP?
- Greedy algorithm
Output
3
4
2
2
1
Y1
3
2
1
3
1
2
5
3
0
Y2
Choose two integers
Take the minimum
Subtract the minimum from both integers and
append it to the output
Remove all 0s
Repeat
Greedy(Y1, Y2) lt Y1 Y2
Generalization Greedy(Y1, , Yk) ?i1k Yi
m
12Better 2-MCIP Algorithm
- CommonElements algorithm for 2-MCIP of Y1, Y2
- T Ã . While there is a common integer x of Y1
and Y2, - T Ã T x
- Y1 Ã Y1 n x
- Y2 Ã Y2 n x
- Output T Greedy(Y1, Y2)
- Let c1,2 be the of common integers of Y1 and Y2
- CommonElements(Y1, Y2) (Y1 Y2 - 2c1,2)
c1,2 -
Y1 Y2 - c1,2
13Algorithm Recap
- Choose a random set partition ? of 1, , k into
pairs of integers - For each pair (i,j) 2 ?, let Ai,j
CommonElements(Yi, Yj) - If there is only one pair (1,2) 2 ?, output A1,2,
otherwise recurse on multisets Ai,j with (i,j) 2
?
14Analysis
- Lower bound the output size of our algorithm as a
function of the frequency of different integers - Find the expected output size as a function of
the frequency of different integers - Divide these two to get a worst-case (expected)
ratio - Derandomize using conditional expectations
15Frequency of Integers
- Define the r-redundancy Red(r) to capture
integer frequencies
1
1
3
4
1
Y1
Y2
1
1
1
2
5
Y3
1
3
1
3
2
Consider r disjoint multisets A1, , Ar such that
1. Each Ai intersects at most one input
multiset 2. Ai only contains 1 distinct integer
Red(r) is maxA1, , Ar ?i1r Ai
16Lower Bound
- Opt is the size of k-MCIP
Elements of Y1 , Y2, , Yk
Elements of k-MCIP
5
2
There are opt right vertices each of degree k
A left vertex is joined to elements
partitioning it
3
degree-1 vertices on the left is
Red(opt). So, edges is 1Red(opt) 2(m
Red(opt)). But, edges is exactly kopt. So, k
opt 2m Red(opt)
17Example
- Our bound is k opt 2m Red(opt)
- If input multisets are disjoint, Red(opt)opt
- Trivial greedy algorithm has output size m
- So greedy algorithm is a m/opt (k1)/2
approximation
18Algorithm Recap
- Choose a random set partition ? of 1, , k into
pairs of integers - For each pair (i,j) 2 ?, let Ai,j
CommonElements(Yi, Yj) - If there is only one pair (1,2) 2 ?, output A1,2,
otherwise recurse on multisets Ai,j with (i,j) 2
?
19Upper Bound
- In some recursive call on multisets Ya and Yb, we
are interested in the number of common elements
of Ya, Yb - Since we choose a random partition of input
multisets, we can bound the expected number of
common elements as a function of Red(opt) - Linearity of expectations and some calculus
allows us to bound the expected number of common
elements encountered over all recursive calls, in
terms of Red(opt) - Use lower bound in terms of Red(opt) to get
overall ratio
20Upper Bound
- Each of O(log k) recursive calls can be
implemented in O(m) time, so O(m log k) time - Actually, proof shows that only 3 recursive calls
are necessary to get .614k o(k) approximation - This allows derandomization using conditional
expectations in O(m poly(k)) time
21Conclusions and Future Work
- .614k o(k) approximation in O(m log k) time
- Improve analysis of previous best algorithm,
showing it has ratio exactly k-1/2. - Upper bound uses our notion of redundancy
- Lower bound uses an adversarial argument
- Best known lower bound is ?(1), so there is a
huge gap.
22Another Example
- Consider algorithm which repeatedly removes an
integer common to all k input multisets, and then
runs a greedy algorithm on the remaining
multisets CLLJ06 - Suppose r common integers are removed. Then
output size (m-rk) r - But Red(opt) rk (opt r)(k-1).
- Our bound is k opt 2m Red(opt)
- This implies opt (2m-r)/(2k-1), and
(m-rkr)/opt k ½. - Using an adversarial argument, can show this is
tight