Multiplicative weights method: A meta algorithm with applications to linear and semi-definite programming

About This Presentation

Title:

Multiplicative weights method: A meta algorithm with applications to linear and semi-definite programming

Description:

Multiplicative weights method: ... Potential: t = Sum of weights= i wit (initially n) ... Payoff: (for weights di,j on experts) ... – PowerPoint PPT presentation

Number of Views:246

Avg rating:3.0/5.0

Slides: 34

Provided by: sanj58

Learn more at: https://www.cs.princeton.edu

Category:

more less

Transcript and Presenter's Notes

Title: Multiplicative weights method: A meta algorithm with applications to linear and semi-definite programming

1
Multiplicative weights method A meta algorithm
with applications to linear and semi-definite
programming

Sanjeev Arora
Princeton University

Based uponFast algorithms for Approximate SDP
FOCS 05?log(n) approximation to SPARSEST CUT
in Õ(n2) time FOCS 04The multiplicative
weights update method and its applications
05See also recent papers by Hazan and Kale.
2
Multiplicative update rule (long history)
n agents
weights
w1 w2 . . . wn
Update weights according to performance wit1 Ã
wit (1 ? performance of i)

Applications approximate solutions to LPs and
SDPs, flow problems, online learning (boosting),
derandomization chernoff bounds, online convex
optimization, computational geometry,
metricembeddongs, portfolio management (see our
survey)

3
Simplest setting predicting the market
1 for correct prediction
0 for incorrect

N experts on TV
Can we perform as good as the best expert ?

4
Weighted majority algorithm LW 94
Predict according to the weighted majority.

Multiplicative update (initially all wi 1)
If expert predicted correctly wit1 Ã wit
If incorrectly, wit1 Ã wit(1 - ?)
Claim mistakes by algorithm ¼ 2(1?)(mistakes
by best expert)
Potential ?t Sum of weights ?i wit
(initially n)
If algorithm predicts incorrectly ) ?t1 ?t - ?
?t /2
?T (1-?/2)m(A) n m(A) mistakes by
algorithm
?T (1-?)mi
) m(A) 2(1?)mi O(log n/?)

5
Generalized Weighted majority A.,Hazan, Kale
05
Set of events (possibly infinite)
n agents
event j
expert i
payoff M(i,j)
6
Generalized Weighted majority AHK 05
Set of events (possibly infinite)
n agents
p1 p2 . . . pn
Algorithm plays distribution on experts
(p1,,pn) Payoff for event j ?i pi
M(i,j) Update rule pit1 Ã pit (1 ?
M(i,j))
Claim After T iterations, Algorithm payoff
(1-?) best expert O(log n / ?)
7
Lagrangean relaxation
Game playing, Online optimization
Gradient descent
Chernoff bounds
Games with Matrix Payoffs
Boosting
8
Common features of MW algorithms

competition amongst n experts
Appearance of terms like
exp( - ?t (performance at time t) )

Time to get ?-approximate solutions is
proportional to 1/?2.

9
Boosting and AdaBoost(Schapire91,
Freund-Schapire97)
Weak learning ) Strong learning
Input 0/1 vectors with a Yes/No
label (white, tall, vegetarian, smoker,,)
Has disease (nonwhite, short, vegetarian,
nonsmoker,..) No disease
Desired Short OR-of-AND (i.e., DNF) formula that
describes ? fraction of data (if one exists)
(white Æ vegetarian) Ç (nonwhite Æ nonsmoker)
? 1-? Strong Learning ? ½ ? Weak
Learning
Main idea in boosting data points are experts,
weak
learners are events.
10
Approximate solutions to convex programming

e.g., Plotkin Shmoys Tardos 91, Young97, Garg
Koenemann99, Fleischer99

MW Meta-Algorithm gives unified view
Note Distribution Convex
Combination
pi ?i pi 1
If P is a convex set and each xi 2 P then ?i pi
xi 2 P
11
Solving LPs (feasibility)
- ? - ? ? ? ? - ?
a1 x b1 a2 x b2 ? ? ? am x bm
x 2 P
w1 w2 ? ? ? wm
P convex domain
?k wk(ak x bk) 0 x 2 P
Event Oracle
12
Solving LPs (feasibility)
a1 x b1 a2 x b2 ? ? ? am x bm
x 2 P
w1 w2 ? ? ? wm
1 ?(a1 x0 b1)/? 1 ?(a2 x0 b2)/?
? ? ? 1 ?(am x0 bm)/?
Final solution Average x vector
?k wk(ak x bk) 0 x 2 P
Oracle
13
Performance guarantees

In O(?2 log(n)/?2) iterations, average x is ?
feasible.
Packing-Covering LPs Plotkin, Shmoys, Tardos
91
9? x 2 P
j 1, 2, m aj x 1
Want to find x 2 P s.t. aj x 1 - ?
Assume 8 x 2 P 0 aj x ?
MW algorithm gets ? feasible x in O(? log(n)/?2)
iterations

Covering problem
14
Connection to Chernoff bounds and derandomization
Deterministic approximation algorithms for 0/1
packing/covering problem a la Raghavan-Thompson
Derandomize using pessimistic estimators
exp(?i t f(yi))
Young 95 Randomized rounding without solving
the LP. MW update rule mimics pessimistic
estimator.
15
Semidefinite programming (Klein-Lu97)
Application 2
a1 x b1 a2 x b2 ? ? ? am x bm
x 2 P

aj and x symmetric matrices in Rn n

P
x x is psd tr(x) 1
Oracle max ?j wj (aj x) over P (eigenvalue
computation!)
16
The promise of SDPs

0.878-approximation to MAX-CUT (GW94),7/8-approx
imation to MAX-3SAT (KZ00)
plog n approximation to SPARSEST CUT, BALANCED
SEPARATOR (ARV04).
plog n-approximation to MIN-2CNF
DELETION,MIN-UNCUT (ACMM05), MIN-VERTEX
SEPARATOR (FHL06),etc. etc.

The pitfall
High running times ---as high as n4.5 in recent
works
17
Solving SDP relaxations more efficientlyusing MW
AHK05
Problem Using Interior Point Our result
MAXQP (e.g. MAX-CUT) Õ(n3.5) Õ(n1.5N/?2.5) or Õ(n3/??3.5)
HAPLOFREQ Õ(n4) Õ(n2.5/?2.5)
SCP Õ(n4) Õ(n1.5N/?4.5)
EMBEDDING Õ(n4) Õ(n3/d5?3.5)
SPARSEST CUT Õ(n4.5) Õ(n3/?2)
MIN UNCUT etc Õ(n4.5) Õ(n3.5/?2)
18
Key difference between efficient and
not-so-efficient implementations of the MW
idea Width management. (e.g., the difference
between PST91 and GK99)
19
Recall issue of width
MW a1 x b1 a2 x b2 ? ? ? am x bm

Õ(?2/?2) iterations to obtain ? feasible x
? maxk ak x bk
? is too large!!

Oracle
?k wk(ak x bk) 0 x 2 P
20
Issue 1Dealing with width
MW a1 x b1 a2 x b2 ? ? ? am x bm

A few high width constraints
Run a hybrid of MW and ellipsoid/Vaidya
poly(m, log(?/?)) iterations to obtain ? feasible
x

Oracle
?k wk(ak x bk) 0 x 2 P
21
Dealing with width (contd)
MW a1 x b1 a2 x b2 ? am x bm

Hybrid of MW and Vaidya
Õ(?L2/?2) iterations to obtain ? feasible x
?L ?

Dual ellipsoid/Vaidya
Oracle
?k wk(ak x bk) 0 x 2 P
22
Issue 2 Efficient implementation of Oracle fast
eigenvalues via matrix sparsification
C
O(?n?ijCij/?) non-zero entries k C Ck ?

Lanczos algorithm effectively uses sparsity of C
Similar to Achlioptas, McSherry 01, but better
in some situations (also easier analysis)

23
O(n2)-time algorithm to compute O(plog
n)-approximationto SPARSEST CUT A., Hazan,
Kale 04 (combinatorial algorithm improves
upon O(n4.5) algorithm of A, Rao,
Vazirani)
24
Sparsest Cut
?(G) 2/5
S
The sparsest cut

O(log n) approximation Leighton Rao 88
O(plog n) approximation A., Rao, Vazirani04
O(p log n) approximation in O(n2) time.
(Actually, finds expander flows) A., Hazan,
Kale05

25
MW algorithm to find expander flows

Events (s,w,z) weights on vertices, edges,
cuts
Experts pairs of vertices (i,j)
Payoff (for weights di,j on experts)
Fact If events are chosen optimally, the
distribution on experts di,j converges to a
demand graph which is an expander flowby
results of Arora-Rao-Vazirani 04 suffices to
produce approx. sparsest cut

shortest path according to weights we
Cuts separating i and j
26
New Online games with matrix payoffs (joint w/
Satyen Kale06)
Payoff is a matrix, and so is the distribution
on experts!
Uses matrix analogues of usual inequalities
1 x ex
I A eA
Can be used to give a new primal-dual view of
SDP relaxations () easier, faster approximation
algorithms)
27
New Faster algorithms for online learningand
portfolio management (Agarwal-Hazan06,
Agarwal-Hazan-Kalai-Kale06 )

Framework for online optimization inspired by
Newtons method (2nd order optimization). (Note
MW ¼ gradient descent)
Fast algorithms for Portfolio management and
other online optimization problems

28
Open problems

Better approaches to width management?
Faster run times?
Lower bounds?

THANK YOU
29
Connection to Chernoff bounds and derandomization

Deterministic approximation algorithms a la
Raghavan-Thompson
Packing/covering IP with variables xi 0/1
9? x 2 P 8 j 2 m, fj(x) 0
Solve LP relaxation using variables yi 2 0, 1
Randomized rounding
xi
Chernoff O(log m) sampling iterations suffice

30
Derandomization Young, 95

Can derandomize the rounding using exp(t?j fj(x))
as a pessimistic estimator of failure probability
By minimizing the estimator in every iteration,
we mimic the random expt, so O(log m) iterations
suffice
The structure of the estimator obviates the need
to solve the LP Randomized rounding without
solving the Linear Program
Punchline resulting algorithm is the MW
algorithm!

31
Weighted majority LW 94

If lost at t, ?t1 (1-½ ?) ?t
At time T ?T (1-½ ?)mistakes ?0
Overallmistakes log(n)/? (1?) mi

mistakes of expert i
32
Semidefinite programming

Vectors aj and x symmetric matrices in Rn n
x º 0
Assume Tr(x) 1
Set P x x º 0, Tr(x) 1
Oracle max ?j wj(aj x) over P
Optimum x vvT where v is the largest
eigenvector of ?jwj aj

33
Efficiently implementing the oracle

Optimum x vvT
v is the largest eigenvector of some matrix C
Suffices to find a vector v such that vTCv 0
Lanczos algorithm with a random starting vector
is ideal for this
Advantage uses only matrix-vector products
Exploits sparsity (also sparsification
procedure)
Use analysis of Kuczynski and Wozniakowski 92

Write a Comment

User Comments (0)

About PowerShow.com

Multiplicative weights method: A meta algorithm with applications to linear and semi-definite programming - PowerPoint PPT Presentation

Multiplicative weights method: A meta algorithm with applications to linear and semi-definite programming

Multiplicative weights method: ... Potential: t = Sum of weights= i wit (initially n) ... Payoff: (for weights di,j on experts) ... – PowerPoint PPT presentation