Title: Multiplicative weights method: A meta algorithm with applications to linear and semi-definite programming
1Multiplicative weights method A meta algorithm
with applications to linear and semi-definite
programming
- Sanjeev Arora
- Princeton University
Based uponFast algorithms for Approximate SDP
FOCS 05?log(n) approximation to SPARSEST CUT
in Õ(n2) time FOCS 04The multiplicative
weights update method and its applications
05See also recent papers by Hazan and Kale.
2Multiplicative update rule (long history)
n agents
weights
w1 w2 . . . wn
Update weights according to performance wit1 Ã
wit (1 ? performance of i)
- Applications approximate solutions to LPs and
SDPs, flow problems, online learning (boosting),
derandomization chernoff bounds, online convex
optimization, computational geometry,
metricembeddongs, portfolio management (see our
survey)
3Simplest setting predicting the market
1 for correct prediction
0 for incorrect
- N experts on TV
- Can we perform as good as the best expert ?
4Weighted majority algorithm LW 94
Predict according to the weighted majority.
- Multiplicative update (initially all wi 1)
- If expert predicted correctly wit1 Ã wit
- If incorrectly, wit1 Ã wit(1 - ?)
- Claim mistakes by algorithm ¼ 2(1?)(mistakes
by best expert) - Potential ?t Sum of weights ?i wit
(initially n) - If algorithm predicts incorrectly ) ?t1 ?t - ?
?t /2 - ?T (1-?/2)m(A) n m(A) mistakes by
algorithm - ?T (1-?)mi
- ) m(A) 2(1?)mi O(log n/?)
5Generalized Weighted majority A.,Hazan, Kale
05
Set of events (possibly infinite)
n agents
event j
expert i
payoff M(i,j)
6Generalized Weighted majority AHK 05
Set of events (possibly infinite)
n agents
p1 p2 . . . pn
Algorithm plays distribution on experts
(p1,,pn) Payoff for event j ?i pi
M(i,j) Update rule pit1 Ã pit (1 ?
M(i,j))
Claim After T iterations, Algorithm payoff
(1-?) best expert O(log n / ?)
7Lagrangean relaxation
Game playing, Online optimization
Gradient descent
Chernoff bounds
Games with Matrix Payoffs
Boosting
8Common features of MW algorithms
- competition amongst n experts
- Appearance of terms like
- exp( - ?t (performance at time t) )
- Time to get ?-approximate solutions is
proportional to 1/?2.
9Boosting and AdaBoost(Schapire91,
Freund-Schapire97)
Weak learning ) Strong learning
Input 0/1 vectors with a Yes/No
label (white, tall, vegetarian, smoker,,)
Has disease (nonwhite, short, vegetarian,
nonsmoker,..) No disease
Desired Short OR-of-AND (i.e., DNF) formula that
describes ? fraction of data (if one exists)
(white Æ vegetarian) Ç (nonwhite Æ nonsmoker)
? 1-? Strong Learning ? ½ ? Weak
Learning
Main idea in boosting data points are experts,
weak
learners are events.
10Approximate solutions to convex programming
- e.g., Plotkin Shmoys Tardos 91, Young97, Garg
Koenemann99, Fleischer99
MW Meta-Algorithm gives unified view
Note Distribution Convex
Combination
pi ?i pi 1
If P is a convex set and each xi 2 P then ?i pi
xi 2 P
11Solving LPs (feasibility)
- ? - ? ? ? ? - ?
a1 x b1 a2 x b2 ? ? ? am x bm
x 2 P
w1 w2 ? ? ? wm
P convex domain
?k wk(ak x bk) 0 x 2 P
Event Oracle
12Solving LPs (feasibility)
a1 x b1 a2 x b2 ? ? ? am x bm
x 2 P
w1 w2 ? ? ? wm
1 ?(a1 x0 b1)/? 1 ?(a2 x0 b2)/?
? ? ? 1 ?(am x0 bm)/?
Final solution Average x vector
?k wk(ak x bk) 0 x 2 P
Oracle
13Performance guarantees
- In O(?2 log(n)/?2) iterations, average x is ?
feasible. - Packing-Covering LPs Plotkin, Shmoys, Tardos
91 - 9? x 2 P
- j 1, 2, m aj x 1
- Want to find x 2 P s.t. aj x 1 - ?
- Assume 8 x 2 P 0 aj x ?
- MW algorithm gets ? feasible x in O(? log(n)/?2)
iterations
Covering problem
14Connection to Chernoff bounds and derandomization
Deterministic approximation algorithms for 0/1
packing/covering problem a la Raghavan-Thompson
Derandomize using pessimistic estimators
exp(?i t f(yi))
Young 95 Randomized rounding without solving
the LP. MW update rule mimics pessimistic
estimator.
15Semidefinite programming (Klein-Lu97)
Application 2
a1 x b1 a2 x b2 ? ? ? am x bm
x 2 P
- aj and x symmetric matrices in Rn n
P
x x is psd tr(x) 1
Oracle max ?j wj (aj x) over P (eigenvalue
computation!)
16The promise of SDPs
- 0.878-approximation to MAX-CUT (GW94),7/8-approx
imation to MAX-3SAT (KZ00) - plog n approximation to SPARSEST CUT, BALANCED
SEPARATOR (ARV04). - plog n-approximation to MIN-2CNF
DELETION,MIN-UNCUT (ACMM05), MIN-VERTEX
SEPARATOR (FHL06),etc. etc.
The pitfall
High running times ---as high as n4.5 in recent
works
17Solving SDP relaxations more efficientlyusing MW
AHK05
Problem Using Interior Point Our result
MAXQP (e.g. MAX-CUT) Õ(n3.5) Õ(n1.5N/?2.5) or Õ(n3/??3.5)
HAPLOFREQ Õ(n4) Õ(n2.5/?2.5)
SCP Õ(n4) Õ(n1.5N/?4.5)
EMBEDDING Õ(n4) Õ(n3/d5?3.5)
SPARSEST CUT Õ(n4.5) Õ(n3/?2)
MIN UNCUT etc Õ(n4.5) Õ(n3.5/?2)
18Key difference between efficient and
not-so-efficient implementations of the MW
idea Width management. (e.g., the difference
between PST91 and GK99)
19Recall issue of width
MW a1 x b1 a2 x b2 ? ? ? am x bm
- Õ(?2/?2) iterations to obtain ? feasible x
- ? maxk ak x bk
- ? is too large!!
Oracle
?k wk(ak x bk) 0 x 2 P
20Issue 1Dealing with width
MW a1 x b1 a2 x b2 ? ? ? am x bm
- A few high width constraints
- Run a hybrid of MW and ellipsoid/Vaidya
- poly(m, log(?/?)) iterations to obtain ? feasible
x
Oracle
?k wk(ak x bk) 0 x 2 P
21Dealing with width (contd)
MW a1 x b1 a2 x b2 ? am x bm
- Hybrid of MW and Vaidya
- Õ(?L2/?2) iterations to obtain ? feasible x
- ?L ?
Dual ellipsoid/Vaidya
Oracle
?k wk(ak x bk) 0 x 2 P
22Issue 2 Efficient implementation of Oracle fast
eigenvalues via matrix sparsification
C
O(?n?ijCij/?) non-zero entries k C Ck ?
- Lanczos algorithm effectively uses sparsity of C
- Similar to Achlioptas, McSherry 01, but better
in some situations (also easier analysis)
23O(n2)-time algorithm to compute O(plog
n)-approximationto SPARSEST CUT A., Hazan,
Kale 04 (combinatorial algorithm improves
upon O(n4.5) algorithm of A, Rao,
Vazirani)
24Sparsest Cut
?(G) 2/5
S
The sparsest cut
- O(log n) approximation Leighton Rao 88
- O(plog n) approximation A., Rao, Vazirani04
- O(p log n) approximation in O(n2) time.
(Actually, finds expander flows) A., Hazan,
Kale05
25MW algorithm to find expander flows
- Events (s,w,z) weights on vertices, edges,
cuts - Experts pairs of vertices (i,j)
- Payoff (for weights di,j on experts)
- Fact If events are chosen optimally, the
distribution on experts di,j converges to a
demand graph which is an expander flowby
results of Arora-Rao-Vazirani 04 suffices to
produce approx. sparsest cut
shortest path according to weights we
Cuts separating i and j
26New Online games with matrix payoffs (joint w/
Satyen Kale06)
Payoff is a matrix, and so is the distribution
on experts!
Uses matrix analogues of usual inequalities
1 x ex
I A eA
Can be used to give a new primal-dual view of
SDP relaxations () easier, faster approximation
algorithms)
27New Faster algorithms for online learningand
portfolio management (Agarwal-Hazan06,
Agarwal-Hazan-Kalai-Kale06 )
- Framework for online optimization inspired by
Newtons method (2nd order optimization). (Note
MW ¼ gradient descent) - Fast algorithms for Portfolio management and
other online optimization problems
28Open problems
- Better approaches to width management?
- Faster run times?
- Lower bounds?
THANK YOU
29Connection to Chernoff bounds and derandomization
- Deterministic approximation algorithms a la
Raghavan-Thompson - Packing/covering IP with variables xi 0/1
- 9? x 2 P 8 j 2 m, fj(x) 0
- Solve LP relaxation using variables yi 2 0, 1
- Randomized rounding
- xi
- Chernoff O(log m) sampling iterations suffice
30Derandomization Young, 95
- Can derandomize the rounding using exp(t?j fj(x))
as a pessimistic estimator of failure probability - By minimizing the estimator in every iteration,
we mimic the random expt, so O(log m) iterations
suffice - The structure of the estimator obviates the need
to solve the LP Randomized rounding without
solving the Linear Program - Punchline resulting algorithm is the MW
algorithm!
31Weighted majority LW 94
- If lost at t, ?t1 (1-½ ?) ?t
- At time T ?T (1-½ ?)mistakes ?0
- Overallmistakes log(n)/? (1?) mi
mistakes of expert i
32Semidefinite programming
- Vectors aj and x symmetric matrices in Rn n
- x º 0
- Assume Tr(x) 1
- Set P x x º 0, Tr(x) 1
- Oracle max ?j wj(aj x) over P
- Optimum x vvT where v is the largest
eigenvector of ?jwj aj
33Efficiently implementing the oracle
- Optimum x vvT
- v is the largest eigenvector of some matrix C
- Suffices to find a vector v such that vTCv 0
- Lanczos algorithm with a random starting vector
is ideal for this - Advantage uses only matrix-vector products
- Exploits sparsity (also sparsification
procedure) - Use analysis of Kuczynski and Wozniakowski 92