Multiplicative weights method: A meta algorithm with applications to linear and semi-definite programming - PowerPoint PPT Presentation

About This Presentation
Title:

Multiplicative weights method: A meta algorithm with applications to linear and semi-definite programming

Description:

Multiplicative weights method: ... Potential: t = Sum of weights= i wit (initially n) ... Payoff: (for weights di,j on experts) ... – PowerPoint PPT presentation

Number of Views:246
Avg rating:3.0/5.0
Slides: 34
Provided by: sanj58
Category:

less

Transcript and Presenter's Notes

Title: Multiplicative weights method: A meta algorithm with applications to linear and semi-definite programming


1
Multiplicative weights method A meta algorithm
with applications to linear and semi-definite
programming
  • Sanjeev Arora
  • Princeton University

Based uponFast algorithms for Approximate SDP
FOCS 05?log(n) approximation to SPARSEST CUT
in Õ(n2) time FOCS 04The multiplicative
weights update method and its applications
05See also recent papers by Hazan and Kale.
2
Multiplicative update rule (long history)
n agents
weights
w1 w2 . . . wn
Update weights according to performance wit1 Ã
wit (1 ? performance of i)
  • Applications approximate solutions to LPs and
    SDPs, flow problems, online learning (boosting),
    derandomization chernoff bounds, online convex
    optimization, computational geometry,
    metricembeddongs, portfolio management (see our
    survey)

3
Simplest setting predicting the market
1 for correct prediction
0 for incorrect
  • N experts on TV
  • Can we perform as good as the best expert ?

4
Weighted majority algorithm LW 94
Predict according to the weighted majority.
  • Multiplicative update (initially all wi 1)
  • If expert predicted correctly wit1 Ã wit
  • If incorrectly, wit1 Ã wit(1 - ?)
  • Claim mistakes by algorithm ¼ 2(1?)(mistakes
    by best expert)
  • Potential ?t Sum of weights ?i wit
    (initially n)
  • If algorithm predicts incorrectly ) ?t1 ?t - ?
    ?t /2
  • ?T (1-?/2)m(A) n m(A) mistakes by
    algorithm
  • ?T (1-?)mi
  • ) m(A) 2(1?)mi O(log n/?)

5
Generalized Weighted majority A.,Hazan, Kale
05
Set of events (possibly infinite)
n agents
event j
expert i
payoff M(i,j)
6
Generalized Weighted majority AHK 05
Set of events (possibly infinite)
n agents
p1 p2 . . . pn
Algorithm plays distribution on experts
(p1,,pn) Payoff for event j ?i pi
M(i,j) Update rule pit1 Ã pit (1 ?
M(i,j))
Claim After T iterations, Algorithm payoff
(1-?) best expert O(log n / ?)
7
Lagrangean relaxation
Game playing, Online optimization
Gradient descent
Chernoff bounds
Games with Matrix Payoffs
Boosting
8
Common features of MW algorithms
  • competition amongst n experts
  • Appearance of terms like
  • exp( - ?t (performance at time t) )
  • Time to get ?-approximate solutions is
    proportional to 1/?2.

9
Boosting and AdaBoost(Schapire91,
Freund-Schapire97)
Weak learning ) Strong learning
Input 0/1 vectors with a Yes/No
label (white, tall, vegetarian, smoker,,)
Has disease (nonwhite, short, vegetarian,
nonsmoker,..) No disease
Desired Short OR-of-AND (i.e., DNF) formula that
describes ? fraction of data (if one exists)
(white Æ vegetarian) Ç (nonwhite Æ nonsmoker)
? 1-? Strong Learning ? ½ ? Weak
Learning
Main idea in boosting data points are experts,
weak
learners are events.
10
Approximate solutions to convex programming
  • e.g., Plotkin Shmoys Tardos 91, Young97, Garg
    Koenemann99, Fleischer99

MW Meta-Algorithm gives unified view
Note Distribution Convex
Combination
pi ?i pi 1
If P is a convex set and each xi 2 P then ?i pi
xi 2 P
11
Solving LPs (feasibility)
- ? - ? ? ? ? - ?
a1 x b1 a2 x b2 ? ? ? am x bm
x 2 P
w1 w2 ? ? ? wm
P convex domain
?k wk(ak x bk) 0 x 2 P
Event Oracle
12
Solving LPs (feasibility)
a1 x b1 a2 x b2 ? ? ? am x bm
x 2 P
w1 w2 ? ? ? wm
1 ?(a1 x0 b1)/? 1 ?(a2 x0 b2)/?
? ? ? 1 ?(am x0 bm)/?
Final solution Average x vector
?k wk(ak x bk) 0 x 2 P
Oracle
13
Performance guarantees
  • In O(?2 log(n)/?2) iterations, average x is ?
    feasible.
  • Packing-Covering LPs Plotkin, Shmoys, Tardos
    91
  • 9? x 2 P
  • j 1, 2, m aj x 1
  • Want to find x 2 P s.t. aj x 1 - ?
  • Assume 8 x 2 P 0 aj x ?
  • MW algorithm gets ? feasible x in O(? log(n)/?2)
    iterations

Covering problem
14
Connection to Chernoff bounds and derandomization
Deterministic approximation algorithms for 0/1
packing/covering problem a la Raghavan-Thompson
Derandomize using pessimistic estimators
exp(?i t f(yi))
Young 95 Randomized rounding without solving
the LP. MW update rule mimics pessimistic
estimator.
15
Semidefinite programming (Klein-Lu97)
Application 2
a1 x b1 a2 x b2 ? ? ? am x bm
x 2 P
  • aj and x symmetric matrices in Rn n

P
x x is psd tr(x) 1
Oracle max ?j wj (aj x) over P (eigenvalue
computation!)
16
The promise of SDPs
  • 0.878-approximation to MAX-CUT (GW94),7/8-approx
    imation to MAX-3SAT (KZ00)
  • plog n approximation to SPARSEST CUT, BALANCED
    SEPARATOR (ARV04).
  • plog n-approximation to MIN-2CNF
    DELETION,MIN-UNCUT (ACMM05), MIN-VERTEX
    SEPARATOR (FHL06),etc. etc.

The pitfall
High running times ---as high as n4.5 in recent
works
17
Solving SDP relaxations more efficientlyusing MW
AHK05
Problem Using Interior Point Our result
MAXQP (e.g. MAX-CUT) Õ(n3.5) Õ(n1.5N/?2.5) or Õ(n3/??3.5)
HAPLOFREQ Õ(n4) Õ(n2.5/?2.5)
SCP Õ(n4) Õ(n1.5N/?4.5)
EMBEDDING Õ(n4) Õ(n3/d5?3.5)
SPARSEST CUT Õ(n4.5) Õ(n3/?2)
MIN UNCUT etc Õ(n4.5) Õ(n3.5/?2)
18
Key difference between efficient and
not-so-efficient implementations of the MW
idea Width management. (e.g., the difference
between PST91 and GK99)
19
Recall issue of width
MW a1 x b1 a2 x b2 ? ? ? am x bm
  • Õ(?2/?2) iterations to obtain ? feasible x
  • ? maxk ak x bk
  • ? is too large!!

Oracle
?k wk(ak x bk) 0 x 2 P
20
Issue 1Dealing with width
MW a1 x b1 a2 x b2 ? ? ? am x bm
  • A few high width constraints
  • Run a hybrid of MW and ellipsoid/Vaidya
  • poly(m, log(?/?)) iterations to obtain ? feasible
    x

Oracle
?k wk(ak x bk) 0 x 2 P
21
Dealing with width (contd)
MW a1 x b1 a2 x b2 ? am x bm
  • Hybrid of MW and Vaidya
  • Õ(?L2/?2) iterations to obtain ? feasible x
  • ?L ?

Dual ellipsoid/Vaidya
Oracle
?k wk(ak x bk) 0 x 2 P
22
Issue 2 Efficient implementation of Oracle fast
eigenvalues via matrix sparsification
C
O(?n?ijCij/?) non-zero entries k C Ck ?
  • Lanczos algorithm effectively uses sparsity of C
  • Similar to Achlioptas, McSherry 01, but better
    in some situations (also easier analysis)

23
O(n2)-time algorithm to compute O(plog
n)-approximationto SPARSEST CUT A., Hazan,
Kale 04 (combinatorial algorithm improves
upon O(n4.5) algorithm of A, Rao,
Vazirani)
24
Sparsest Cut
?(G) 2/5
S
The sparsest cut
  • O(log n) approximation Leighton Rao 88
  • O(plog n) approximation A., Rao, Vazirani04
  • O(p log n) approximation in O(n2) time.
    (Actually, finds expander flows) A., Hazan,
    Kale05

25
MW algorithm to find expander flows
  • Events (s,w,z) weights on vertices, edges,
    cuts
  • Experts pairs of vertices (i,j)
  • Payoff (for weights di,j on experts)
  • Fact If events are chosen optimally, the
    distribution on experts di,j converges to a
    demand graph which is an expander flowby
    results of Arora-Rao-Vazirani 04 suffices to
    produce approx. sparsest cut

shortest path according to weights we
Cuts separating i and j
26
New Online games with matrix payoffs (joint w/
Satyen Kale06)
Payoff is a matrix, and so is the distribution
on experts!
Uses matrix analogues of usual inequalities
1 x ex
I A eA
Can be used to give a new primal-dual view of
SDP relaxations () easier, faster approximation
algorithms)
27
New Faster algorithms for online learningand
portfolio management (Agarwal-Hazan06,
Agarwal-Hazan-Kalai-Kale06 )
  • Framework for online optimization inspired by
    Newtons method (2nd order optimization). (Note
    MW ¼ gradient descent)
  • Fast algorithms for Portfolio management and
    other online optimization problems

28
Open problems
  • Better approaches to width management?
  • Faster run times?
  • Lower bounds?

THANK YOU
29
Connection to Chernoff bounds and derandomization
  • Deterministic approximation algorithms a la
    Raghavan-Thompson
  • Packing/covering IP with variables xi 0/1
  • 9? x 2 P 8 j 2 m, fj(x) 0
  • Solve LP relaxation using variables yi 2 0, 1
  • Randomized rounding
  • xi
  • Chernoff O(log m) sampling iterations suffice

30
Derandomization Young, 95
  • Can derandomize the rounding using exp(t?j fj(x))
    as a pessimistic estimator of failure probability
  • By minimizing the estimator in every iteration,
    we mimic the random expt, so O(log m) iterations
    suffice
  • The structure of the estimator obviates the need
    to solve the LP Randomized rounding without
    solving the Linear Program
  • Punchline resulting algorithm is the MW
    algorithm!

31
Weighted majority LW 94
  • If lost at t, ?t1 (1-½ ?) ?t
  • At time T ?T (1-½ ?)mistakes ?0
  • Overallmistakes log(n)/? (1?) mi

mistakes of expert i
32
Semidefinite programming
  • Vectors aj and x symmetric matrices in Rn n
  • x º 0
  • Assume Tr(x) 1
  • Set P x x º 0, Tr(x) 1
  • Oracle max ?j wj(aj x) over P
  • Optimum x vvT where v is the largest
    eigenvector of ?jwj aj

33
Efficiently implementing the oracle
  • Optimum x vvT
  • v is the largest eigenvector of some matrix C
  • Suffices to find a vector v such that vTCv 0
  • Lanczos algorithm with a random starting vector
    is ideal for this
  • Advantage uses only matrix-vector products
  • Exploits sparsity (also sparsification
    procedure)
  • Use analysis of Kuczynski and Wozniakowski 92
Write a Comment
User Comments (0)
About PowerShow.com