Better Approximations for the Minimum Common Integer Partition Problem - PowerPoint PPT Presentation

About This Presentation
Title:

Better Approximations for the Minimum Common Integer Partition Problem

Description:

Let c1,2 be the # of common integers of Y1 and Y2 ... Find the expected output size as a function of the frequency of different integers ... – PowerPoint PPT presentation

Number of Views:62
Avg rating:3.0/5.0
Slides: 23
Provided by: davidwo1
Learn more at: http://web.mit.edu
Category:

less

Transcript and Presenter's Notes

Title: Better Approximations for the Minimum Common Integer Partition Problem


1
Better Approximations for the Minimum Common
Integer Partition Problem
  • David Woodruff

MIT and Tsinghua University
Approx 2006
2
Minimum Common Integer Partition
  • X x1, , xr, Y y1, , ys are multisets of
    positive integers. r s
  • Consider a partition of X into s subsets B1, ,
    Bs
  • If there exist B1, , Bs with ?b 2 Bi b yi for
    all i, then X is an integer partition of Y. Think
    of X as a refinement of Y
  • k-MCIP problem Given Y1, , Yk, find a smallest
    integer partition X of each of Y1, , Yk
  • Let m ?i1k Yi. Efficiency in terms of m.

3
MCIP Example
  • Y1 2, 2, 3, Y2 1, 1, 5
  • Claim 1, 1, 2, 3 k-MCIP(Y1, Y2)
  • Proof Partition 1 1, 1, 2, 3
  • Partition 2 1, 1, 2, 3
  • 1, 1, 2, 3 is an integer partition of
    Y1 and Y2
  • Any integer partition of both Y1,
    Y2 has size 4

4
Applications
AAA-AAAAA-AA-A
AA-AA-AAAA-AAA
2,2,4,3
3,5,2,1
MCIP 2, 3, 1, 2, 3
Since MCIP small, humans and monkeys are
similar (this measure has been proposed in
practice Jiang, et al)
5
Applications
A-A-A-A-AA-A-AA-A-A
AA-AA-AAAA-AAA
2,2,4,3
1,1,1,1,2,1,2,1,1
MCIP 1, 1, 1, 1, 1, 1, 1, 2, 2
Since MCIP large, humans and mice are not
similar
6
Applications
  • DNA fingerprint assembly
  • Oligonucleotide Fingerprinting Ribosomal Genes
    Project Valinsky, et al
  • Goal is to identify microbial organisms
  • Use MCIP as a subroutine, k ¼ 28, m ¼ 212
    Jiang
  • Clustering? Scheduling?

7
Previous Work
  • k-MCIP problem Given Y1, , Yk, find a smallest
    integer partition of each of Y1, , Yk

CLLJ NP-hard (Maximum Set Packing)
APX-hard for every k 2 (Maximum-3-Dimensiona
l Matching with Bounded Degree)
8
Previous Work
  • CLLJ Upper Bounds
  • (5/4)-approximation for k 2
  • Problem ?(m9) running time
  • (m ¼ 212 in practice)
  • (k-1/3)-approximation in general
  • Problems
  • (1) Large ratio
  • (2) Unknown if there is a tight instance

9
Our Contributions
  • .614k o(k) approximation
  • O(m log k) time
  • Extremely easy to implement
  • If Y1, , Yk are disjoint, then (k1)/2
    approximation
  • We show that the CLLJ k-1/3 approximation
    algorithm is actually a k-1/2 approximation, and
    this is tight

10
Algorithm Overview
  • Let A be an algorithm for 2-MCIP. We build an
    algorithm B for k-MCIP
  • Choose a random set partition ? of 1, , k into
    pairs of integers
  • For each pair (i,j) 2 ?, let Ai,j A(Yi, Yj)
  • If there is only one pair (1,2) 2 ?, output A1,2,
    otherwise recurse on multisets Ai,j with (i,j) 2 ?

11
2-MCIP Algorithm
  • What is the algorithm for 2-MCIP?
  • Greedy algorithm

Output
3
4
2
2
1
Y1
3
2
1
3
1
2
5
3
0
Y2
Choose two integers
Take the minimum
Subtract the minimum from both integers and
append it to the output
Remove all 0s
Repeat
Greedy(Y1, Y2) lt Y1 Y2
Generalization Greedy(Y1, , Yk) ?i1k Yi
m
12
Better 2-MCIP Algorithm
  • CommonElements algorithm for 2-MCIP of Y1, Y2
  • T Ã . While there is a common integer x of Y1
    and Y2,
  • T Ã T x
  • Y1 Ã Y1 n x
  • Y2 Ã Y2 n x
  • Output T Greedy(Y1, Y2)
  • Let c1,2 be the of common integers of Y1 and Y2
  • CommonElements(Y1, Y2) (Y1 Y2 - 2c1,2)
    c1,2

  • Y1 Y2 - c1,2

13
Algorithm Recap
  • Choose a random set partition ? of 1, , k into
    pairs of integers
  • For each pair (i,j) 2 ?, let Ai,j
    CommonElements(Yi, Yj)
  • If there is only one pair (1,2) 2 ?, output A1,2,
    otherwise recurse on multisets Ai,j with (i,j) 2
    ?

14
Analysis
  • Lower bound the output size of our algorithm as a
    function of the frequency of different integers
  • Find the expected output size as a function of
    the frequency of different integers
  • Divide these two to get a worst-case (expected)
    ratio
  • Derandomize using conditional expectations

15
Frequency of Integers
  • Define the r-redundancy Red(r) to capture
    integer frequencies

1
1
3
4
1
Y1
Y2
1
1
1
2
5
Y3
1
3
1
3
2
Consider r disjoint multisets A1, , Ar such that
1. Each Ai intersects at most one input
multiset 2. Ai only contains 1 distinct integer
Red(r) is maxA1, , Ar ?i1r Ai
16
Lower Bound
  • Opt is the size of k-MCIP

Elements of Y1 , Y2, , Yk
Elements of k-MCIP
5
2
There are opt right vertices each of degree k
A left vertex is joined to elements
partitioning it
3
degree-1 vertices on the left is
Red(opt). So, edges is 1Red(opt) 2(m
Red(opt)). But, edges is exactly kopt. So, k
opt 2m Red(opt)
17
Example
  • Our bound is k opt 2m Red(opt)
  • If input multisets are disjoint, Red(opt)opt
  • Trivial greedy algorithm has output size m
  • So greedy algorithm is a m/opt (k1)/2
    approximation

18
Algorithm Recap
  • Choose a random set partition ? of 1, , k into
    pairs of integers
  • For each pair (i,j) 2 ?, let Ai,j
    CommonElements(Yi, Yj)
  • If there is only one pair (1,2) 2 ?, output A1,2,
    otherwise recurse on multisets Ai,j with (i,j) 2
    ?

19
Upper Bound
  • In some recursive call on multisets Ya and Yb, we
    are interested in the number of common elements
    of Ya, Yb
  • Since we choose a random partition of input
    multisets, we can bound the expected number of
    common elements as a function of Red(opt)
  • Linearity of expectations and some calculus
    allows us to bound the expected number of common
    elements encountered over all recursive calls, in
    terms of Red(opt)
  • Use lower bound in terms of Red(opt) to get
    overall ratio

20
Upper Bound
  • Each of O(log k) recursive calls can be
    implemented in O(m) time, so O(m log k) time
  • Actually, proof shows that only 3 recursive calls
    are necessary to get .614k o(k) approximation
  • This allows derandomization using conditional
    expectations in O(m poly(k)) time

21
Conclusions and Future Work
  • .614k o(k) approximation in O(m log k) time
  • Improve analysis of previous best algorithm,
    showing it has ratio exactly k-1/2.
  • Upper bound uses our notion of redundancy
  • Lower bound uses an adversarial argument
  • Best known lower bound is ?(1), so there is a
    huge gap.

22
Another Example
  • Consider algorithm which repeatedly removes an
    integer common to all k input multisets, and then
    runs a greedy algorithm on the remaining
    multisets CLLJ06
  • Suppose r common integers are removed. Then
    output size (m-rk) r
  • But Red(opt) rk (opt r)(k-1).
  • Our bound is k opt 2m Red(opt)
  • This implies opt (2m-r)/(2k-1), and
    (m-rkr)/opt k ½.
  • Using an adversarial argument, can show this is
    tight
Write a Comment
User Comments (0)
About PowerShow.com