Randomized Approximation Algorithms for - PowerPoint PPT Presentation

About This Presentation
Title:

Randomized Approximation Algorithms for

Description:

some constant 0 1) (Raz and Safra 1997) - lower bound can be generalized in terms of ... B. C. unknown. initially unknown, but can be queried. columns are ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 56
Provided by: bhas7
Learn more at: https://www.cs.uic.edu
Category:

less

Transcript and Presenter's Notes

Title: Randomized Approximation Algorithms for


1
  • Randomized Approximation Algorithms for
  • Offline and Online Set Multicover Problems
  • Bhaskar DasGupta
  • Department of Computer Science
  • Univ of IL at Chicago
  • dasgupta_at_cs.uic.edu
  • Joint works with Piotr Berman (Penn State) and
    Eduardo Sontag (Rutgers)
  • collection of results that appeared in
    APPROX-2004, WADS-2005 and to appear in Discrete
    Applied Math (special issue on computational
    biology)
  • Supported by NSF grants CCR-0206795,
    CCR-0208749 and a CAREER award IIS-0346973

2
  • More interesting title for the theoretical
    computer science community
  • Randomized Approximation Algorithms for
  • Set Multicover Problems
  • with Applications to
  • Reverse Engineering of Protein and Gene Networks

3
  • More interesting title for the biological
    community
  • Randomized Approximation Algorithms for
  • Set Multicover Problems
  • with Applications to
  • Reverse Engineering of Protein and Gene Networks

4
  • Set k-multicover (SCk)
  • Input Universe U1,2,?,n, sets S1,S2,?,Sm ? U,
  • integer (coverage factor) k?1
  • Valid Solution cover every element of universe
    ?k times
  • subset of indices I ? 1,2,?,m such that
  • ?x?U j?I x?Sj ? k
  • Objective minimize number of picked sets I
  • k1 ? simply called (unweighted) set-cover
  • a well-studied problem
  • Special case of interest in our applications
  • k is large, e.g., kn-1

5
(maximum size of any set)
  • Known positive results
  • Set-cover (k1)
  • can approximate with approx. ratio of 1ln a
  • (determinstic or randomized)
  • Johnson 1974, Chvátal 1979, Lovász 1975
  • Set-multicover (kgt1)
  • same holds for k?1
  • e.g., primal-dual fitting Rajagopalan and
    Vazirani 1999

6
  • Known negative results for setcover (i.e., k1)
  • - (modulo NP ? DTIME(nloglog n))
  • approx ratio better than (1-?)ln n is not
  • possible for any constant 0???1 (Feige
    1998)
  • - (modulo NP?P)
  • better than (1-?)ln n not possible for
  • some constant 0???1) (Raz and Safra 1997)
  • - lower bound can be generalized in terms of
  • set size a
  • better than ln a-O(ln ln a) is not
    possible

  • (Trevisan, 2001)

7
  • r(a,k) approx. ratio of an algorithm as function
    of a,k
  • We know that for greedy algorithm r(a,k) ? 1ln a
  • at every step select set that contains maximum
    number of elements not covered k times yet
  • Can we design algorithm such that r(a,k)
    decreases with increasing k ?
  • possible approaches
  • improved analysis of greedy?
  • randomized approach (LP rounding) ?
  • ?

8
  • Our results (very roughly)
  • n number of elements of universe U
  • k number of times each element must be covered
  • a maximum size of any set
  • Greedy would not do any better
  • r(a,k)?(log n) even if k is large, e.g, kn
  • But can design randomized algorithm based on
    LProunding approach such that the expected
    approx. ratio is better
  • Er(a,k) ? max2o(1), ln(a/k) (as appears in
    conference proceedings)
  • ? (further
    improvement (via comments from Feige))
  • ? max1o(1), ln(a/k)

9
  • More precise bounds on Er(a,k)
  • 1ln a if
    k1
  • (1e-(k-1)/5) ln(a/(k-1)) if
    a/(k-1) ? e2 ?7.4 and kgt1
  • min22e-(k-1)/5,20.46 a/k if ¼ ? a/(k-1) ?
    e2 and kgt1
  • 12(a/k)½ if
    a/(k-1) ? ¼ and kgt1

Er(a,k)
10
  • Can Er(a,k) coverge to 1 at a much faster rate?
  • Probably not...for example, problem can be shown
    to be APX-hard for a/k ? 1
  • Can we prove matching lower bounds of the form
  • max 1o(1) , 1ln(a/k) ?
  • Do not know...

11
  • How about the weighted case?
  • each set has arbitrary positive weight
  • minimize sum of weights of selected sets
  • It seems that the multi-cover version may not be
    much easier than the single-cover version
  • take single-cover instance
  • add few new elements and new must-select sets
    with almost-zero weights that covers original
    elements
  • k-1 times and all new elements k times

12
  • Our randomized algorithm
  • Standard LP-relaxation for set multicover (SCk)
  • selection variable xi for each set Si (1 ? i ?
    m)
  • minimize
  • subject to

0 ? xi ? 1 for all i
13
  • Our randomized algorithm
  • Solve the LP-relaxation
  • Select a scaling factor ? carefully
  • ln a if k1
  • ln (a/(k-1)) if a/(k-1)?e2 and k?1
  • 2 if ¼?a/(k-1)?e2 and
    k?1
  • 1(a/k)½ otherwise
  • Deterministic rounding select Si if ?xi?1
  • C0 Si ?xi?1
  • Randomized rounding select Si?S1,?,Sm\C0 with
    prob. ?xi
  • C1 collection of such selected sets
  • Greedy choice if an element u?U is covered less
    than k
  • times, pick sets from S1,?,Sm\(C0 ?C1)
    arbitrarily

14
  • Most non-trivial part of the analysis involved
    proving the following bound for Er(a,k)
  • Er(a,k) ? (1e-(k-1)/5) ln(a/(k-1)) if
    a/(k-1) ? e2 and kgt1
  • Needed to do an amortized analysis of the
    interaction between the deterministic and
    randomized rounding steps with the greedy step.
  • For tight analysis, the standard Chernoff bounds
    were not always sufficient and hence needed to
    devise more appropriate bounds for certain
    parameter ranges.

15
  • Proof of the simplest of the bounds
  • Er(a,k) ? 12(a/k)½ if a/k ? ¼
  • Notational simplification
  • a (a/k)½ 2
  • thus, ß 1(1/a)
  • need to show that Er(a,k) ? 1(2/a)
  • (x1, x2, ...,xn) is the solution vector for the
    LP
  • thus, OPT
  • Also, obviously, OPT (n k)/a n a2

16
  • Focus on a single element j?U
  • Remember the algorithm
  • Deterministic rounding select Si if ?xi?1
  • C0 Si ?xi?1
  • Let C0,j those sets in C0 that contained j
  • Randomized rounding select Si?S1,?,Sm\C0 with
    prob. ?xi
  • C1 collection of such selected sets
  • Let C1,j those sets in C1 that contained
    j
  • p sum of prob. of those sets that
    contained j
  • Greedy choice if an element j?U is covered less
    than k times, pick sets from S1,?,Sm\(C0 ?C1)
    that contains j arbitrarily let C2 be all such
    sets selected
  • Let C2,j be those sets in C2 that
    contained j

17
  • What is E C0 C1 ?
  • Obvious.
  • E C0C1 ß( ) ? (1a-1).OPT
  • ( no set is both in C0 and C1 )

18
  • What is E C2,j ?
  • Suppose that C0,jk-f for some f
  • S1, S2, ...,Sk-f,Sk-f1,....Sk-f
    ?

C0,j
?f?, say
and xj ? 1 for any j imply
19
  • (Focus on a single element j?U)
  • Goal is to
  • first determine E C0 C1
  • then determine
  • E C2,j
  • sum it up over all j to get E C2
  • finallly determine E C0 C1 C2

20
  • What is E C2,j ? (contd.)

C1,j f-C2,j and thus after some algebra
21
  • What is E C2,j ? (contd.)

22
(No Transcript)
23
  • One application
  • We used the randomized algorithm for robust
    string barcoding
  • Check the publications in the software webpage
  • http//dna.engr.uconn.edu/software/barcode/
  • (joint project with Kishori Konwar, Ion Mandoiu
    and Alex Shvartsman at Univ. of Connecticut)

24
  • Another (the original) motivation for looking at
  • set-multicover
  • Reverse engineering of biological networks

25
Biological problem via Differential Equations
Linear Algebraic formulation
Set-multicover formulation
Randomized Algorithm
Selection of appropriate biological experiments
Biological Motivation
26
Biological problem via Differential Equations
Linear Algebraic formulation
Set multicover formulation
Randomized Algorithm
Selection of appropriate biological experiments
Biological Motivation
27
n
1
m
m
1
1
1
1
1
Ai


Bj
n
n
n

A
B
C

unknown
  • initially unknown,
  • but can be queried
  • columns are linearly
  • independent

0 ?
0 ?
Get Zero structure of jth column Cj
Query jth column Bj
0 ?
0 ?
28
1
m
m
n
1
1
B1
B0
B2
B4
B3
1
1
0 2 0 1 3 4 1 2 0 0 0 0
5 0 1
1
  • 3 37 1 10
  • 4 5 52 2 16
  • 0 0 -5 0 -1
  • -1 1 3
  • -1 4
  • 0 0 -1


x
n
n
n
B
C
A
(columns are in general position)
B2
0 ?0 0 ?0 0 ?0 ?0 ?0 0 0 0 0 ?0
0 ?0
? ? ? ? ? ? ? ? ?
37 52 -5
what is B2 ?
C0 zero structure of C known
unknown
initially unknown but can query columns
29
  • Rough objective obtain as much information about
    A performing as few queries as possible
  • Obviously, the best we can hope is to identify A
    upto scaling

30
n
1
B1
B0
B2
B4
B3
1
1
1
  • 3 37 1 10
  • 4 5 52 2 16
  • 0 0 -5 0 -1

0 ?0 0 ?0 0 ?0 ?0 ?0 0 0 0 0 ?0
0 ?0
? ? ? ? ? ? ? ? ?

x
n
n
n
B
A
C0
J1? 2 n-1
37 52 -5
10 16 -1
0 0 ?0 0
?0 ?0
can be recovered (upto scaling)
A
31
  • Suppose we query columns Bj for j?J j1,?, jl
  • Let Jij j?J and cij0
  • Suppose Ji ? n-1.Then,each Ai is uniquely
    determined upto a scalar multiple (theoretically
    the best possible)
  • Thus, the combinatorial question is
  • find J of minimum cardinality such that
  • Ji ? n-1 for all i

32
  • Combinatorial Question
  • Input sets Ji ? 1,2,,n for 1 ? i ? m
  • Valid Solution a subset ? ? 1,2,...,m such
    that
  • ? 1 ? i ? n J? ??? and i?J? ? n-1
  • Goal minimize ?
  • This is the set-multicover problem with coverage
    factor n-1
  • More generally, one can ask for lower coverage
    factor, n-k for some k?1, to allow fewer queries
    but resulting in ambiguous determination of A

33
Biological problem via Differential Equations
Linear Algebraic formulation
Combinatorial Algorithms (randomized)
Combinatorial formulation
Selection of appropriate biological experiments
34
  • Time evolution of state variables
    (x1(t),x2(t),?,xn(t)) given by a set of
    differential equations
  • ?x1/?t f1(x1,x2,?,xn,p1,p2,
    ?,pm)
  • ?x/?t f(x,p) ? ?
  • ?xn/?t fn(x1,x2,?,xn,p1,p2
    ,?,pm)
  • p(p1,p2,?,pm) represents concentration of
    certain enzymes
  • f(x?,p?)0
  • p? is wild type (i.e. normal) condition of p
  • x? is corresponding steday-state
    condition

35
  • Goal
  • We are interested in obtaining information about
    the sign of ?fi/?xj(x?,p?)
  • e.g., if ?fi/?xj ? 0, then xj has a positive
    (catalytic) effect on the formation of xi

36
  • Assumption
  • We do not know f, but do know that certain
    parameters pj do not effect certain variables xi
  • This gives zero structure of matrix C
  • matrix C0(c0ij) with c0ij0 ? ?fi/?xj0

37
  • m experiments
  • change one parameter, say pk (1 ? k ? m)
  • for perturbed p ? p?, measure steady state vector
    x ?(p)
  • estimate n sensitivities

where ej is the jth canonical basis vector
  • consider matrix B (bij)

38
  • In practice, perturbation experiment involves
  • letting the system relax to steady state
  • measure expression profiles of variables xi
    (e.g., using microarrys)

39
  • Biology to linear algebra (continued)
  • Let A be the Jacobian matrix ?f/?x
  • Let C be the negative of the Jacobian matrix
    ?f/?p
  • From f(?(p),p)0, taking derivative with respect
    to p and using chain rules, we get CAB.
  • This gives the linear algebraic formulation of
    the problem.

40
  • Online Set-multicover

41
  • Performance measure
  • Via competitive ratio
  • ratio of the total cost of the online algorithm
    to that of an optimal offline algorithm that
    knows the entire input in advance
  • For randomized algorithm, we measure the expected
    competitive ratio

42
  • Parameters of interest
  • (for performance measure)
  • frequency m
  • (maximum number of sets in which any presented
    element belongs)
  • unknown
  • maximum set size d
  • (maximum number of presented elements a set
    contains)
  • unknown
  • total number of elements in the universe n
  • ( d) unknown
  • coverage factor k
  • given

43
  • Previous result
  • Alon, Awerbuch, Azar, Buchbinder, and Naor
  • (STOC 2003 and SODA 2004)
  • considered k1
  • both deterministic and randomized algorithms
  • competitive ratio O(log m log n),
    worst-case/expected
  • almost matching lower bound of

for deterministic algorithms and almost all
parameter values
44
  • Our improved algorithm
  • Expected competitive ratio of
  • O(log m log n)

O(log m log d)
d ? n
log2m ln d lower order term
small precise constants
ratio improves with larger k
c largest weight / smallest weight
45
  • Even more precise smaller constants for
  • unweighted k1 case
  • via improved analysis

46
  • Our lower bounds on competitive ratio
  • (for deterministic algorithms)

unweighted case
weighted case
for many values of parameters
47
  • Work concurrent to our conference publication
  • Alon, Azar and Gutner (SPAA 2005)
  • different version of the online problem (weighted
    case)
  • same element can be presented multiple times
  • if the same element is presented k times, our
    goal is to cover it by at least k different sets
  • expected competitive ratio O(log m log n)
  • easy to see that it applies to our version with
    same bounds
  • Conversely,
  • our algorithm and analysis can be easily adapted
    to provide expected competitive ratio of
  • log2m ln (d/....)
  • for the above version

48
  • Yet another version of online set-cover
  • Awerbuch, Azar, Fiat, Leighton (STOC 96)
  • elements presented one at a time
  • allowed to pick k sets at a given time for a
    specified k
  • goal maximize number of presented elements for
    which
  • at least one set containing the element was
    selected before the element was presented
  • provides efficient radomized approximation
    algorithms and matching lower bounds

49
  • Our algorithmic approach
  • Randomized version of the so-called winnowing
    approach
  • (deterministic) winnowing approach was first
    used long ago
  • N. Littlestone, Learning Quickly When Irrelevant
    Attributes Abound A New Linear-Threshold
    Algorithm, Machine Learning, 2, pp. 285-318,
    1988.
  • this approach was also used by Alon, Awerbuch,
    Azar, Buchbinder and Naor in their STOC-2003
    paper

50
  • Very very rough description of our approach
  • every set starts with zero probability of
    selection
  • start with an empty solution
  • when the next element i is presented
  • if already k sets contain i, terminate
  • appropriately increase probabilities of all
    sets containing i (promotion step of winnowing)
  • select sets containing i with the above
    probabilities
  • if still k sets not selected, then just select
    more sets greedily
  • select the least-cost set not selected already,
    then the next least-cost sets etc.

51
  • Many desirable (and, sometimes conflicting goals)
  • increase in probability of each set should not be
    too much
  • else, e.g., randomized step may select too many
    sets
  • increase in probability of each set should not be
    too little
  • else, e.g., optimal sets may be missed too many
    times,
  • greedy step may dominate too
    much
  • light sets should be preferable over heavy
    sets unless heavy sets are in an optimal solution
  • increase in probability should be somehow
    inversely linked to the frequency of i to
    eliminate selection of too many sets in the
    randomized step

52
(No Transcript)
53
  • Slightly improved algorithm for unweighted case
  • (expected competitive ratio has better
    constants/asymptotic)
  • Modify the promotion step slightly

change
to
54
  • New expected competitive ratio

55
  • Motivation for the online version
  • Similar to before except that
  • we use fluorescent proteins instead of
    microarrays
  • Fluorescent proteins can be used to know the rate
    at which a certain gene transcribes in a cell
    under a set of conditions.
  • a priori matrix C is not known completely but to
    be learnt by doing experiments

56
  • Thank you for your attention!
Write a Comment
User Comments (0)
About PowerShow.com