Lecturer: Moni Naor

About This Presentation

Title:

Lecturer: Moni Naor

Description:

Using 'expert' advice. We solicit n 'experts' for their advice: will the market ... If computation time is no object, can have one 'expert' per concept in C. ... – PowerPoint PPT presentation

Number of Views:16

Avg rating:3.0/5.0

Slides: 34

Provided by: wisdomWe

Category:

more less

Transcript and Presenter's Notes

Title: Lecturer: Moni Naor

1
Algorithmic Game Theory Uri Feige Robi
Krauthgamer Moni NaorLecture 8 Regret
Minimization

Lecturer Moni Naor

2
Announcements

Next Week (Dec 24th)
Israeli Seminar on Computational Game Theory
(1000 - 430)
At Microsoft Israel RD Center,
13 Shenkar St. Herzliya.
January course will be 1300-1500
The meetings on Jan 7th, 14th and 21st 2009

3
The Peanuts Game

There are n bins.
At each round nature throws a peanut into one
of the bins
If you (the player) are at the chosen bin you
catch the peanut
Otherwise you miss it
You may choose to move to any bin before any
round
Game ends when d peanuts were thrown at some bin
Independent of whether they were caught or not
Goal to catch as many peanuts as possible.
Hopeless if the opponent is tossing the peanuts
based on knowing where you stand.
Say sequence of bins is predetermined (but
unknown).
How well can we do as a function of d and n?

4
Basic setting

View learning as a sequence of trials.
In each trial, algorithm is given x, asked to
predict f, and then is told the correct value.
Make no assumptions about how examples are
chosen.
Goal is to minimize number of mistakes.

Focus on number of mistakes. Need to learn from
our mistakes.
5
Using expert advice
Want to predict the stock market.

We solicit n experts for their advice will the
market go up or down tomorrow?
Want to use their advice to make our prediction.
E.g.,

What is a good strategy for using their opinions,
No a priori knowledge which expert is the best?
Expert someone with an opinion.

6
Some expert is perfect

We have n experts.
One of these is perfect (never makes a mistake).
We just dont know which one.
Can we find a strategy that makes no more than
log n mistakes?

Simple algorithm take majority vote over all
experts that have been completely correct so far.

What if we have a prior p over the experts.
Want no more than log(1/pi) mistakes, where
expert i is the perfect one? Take weighted vote
according to p.
7
Relation to concept learning

If computation time is no object, can have one
expert per concept in C.
If target in C, then number of mistakes at most
logC.
More generally, for any description language,
number of mistakes is at most number of bits to
write down f.

8
What if no expert is perfect?

Goal is to do nearly as well as the best one in
hindsight.
Strategy 1
Iterated halving algorithm. Same as before, but
once we've crossed off all the experts, restart
from the beginning.
Makes at most log(n)OPT mistakes,
OPT is mistakes of the best expert in
hindsight.
Seems wasteful constantly forgetting what we've
learned.

x
log n
log n
log n
log n
9
Weighted Majority Algorithm

Intuition Making a mistake doesn't completely
disqualify an expert. So, instead of crossing
off, just lower its weight.
Weighted Majority Alg
Start with all experts having weight 1.
Predict based on weighted majority vote.
Penalize mistakes by cutting weight in half.

10
Weighted Majority Algorithm

Weighted Majority Alg
Start with all experts having weight 1.
Predict based on weighted majority vote.
Penalize mistakes by cutting weight in half.
Example

Day 1
Day 2
Day 3
Day 4
11
Analysis not far from best expert in hindsight

M mistakes we've made so far.
b mistakes best expert has made so far.
W total weight. Initially W is set to n.
After each mistake W drops by at least 25.
So, after M mistakes, W is at most n(3/4)M.
Weight of best expert is (1/2)b. So
(1/2)b n(3/4)M
and
M ?(blog n)

constant comp. ratio
12
Randomized Weighted Majority

If the best expert makes a mistake 20 of the
time, then ?(blog n) not so good Can we do
better?
Instead of majority vote use weights as
probabilities.
If 70 on up, 30 on down, then pick 7030
Idea smooth out the worst case.
Also, generalize ½ to 1- e.

13
Randomized Weighted Majority

/ Initialization /
Wi ? 1 for i 2 1..n
/ Main loop /
For t1.. T
Let Pt(i) Wi/(?j1n Wj) .
Choose i according to Pt
/Update Scores /
Observe losses
for i 2 1..n
Wi ? Wi (1-?)loss(i)

14
Analysis

Say at time t fraction Ft of weight on experts
that made mistake.
We have probability Ft of making a mistake
remove an eFt fraction of the total weight.
Wfinal n(1-e F1)(1 - e F2)...
ln(Wfinal) ln(n) ?t ln(1 - e Ft) ln(n) -
e ?t Ft
(using ln(1-x) lt -x)
ln(n) - e M.
(å Ft E mistakes)
If best expert makes b mistakes, then ln(Wfinal)
gt ln((1-?)b).
Now solve ln(n) - e M gt b ln(1-e)
M b ln(1-e)/(-e) ln(n) /e
b (1e) ln(n) /e .

M Expected mistakes
b mistakes best expert made
W total weight.

-ln(1-x) -xx2 for 0 x 1/2
15
Summarizing RWM

Can be (1?)-competitive with best expert in
hindsight, with additive ?-1log(n).
If running T time steps, set ? (ln n/T)1/2 to
get
M b (1 (ln n/T)1/2) ln(n) / (ln n/T)1/2
b (b2ln n/T)1/2 ) (ln(n) T)1/2
b 2(ln(n) T)1/2

M mistakes made
b mistakes best expert made
M b (1e) ln(n) /e

additive loss
16
Questions

Isnt it better to sometimes take the majority?
The best expert may have a hard time on the easy
question and we would be better of using the
wisdom of the crowds
Answer if it a good idea, make the majority
expert one of the experts!

17
Lower Bounds

Cannot hope to do better than
log n
T1/2

18
What can we use this for?

Can use to combine multiple algorithms to do
nearly as well as best in hindsight.
E.g., online auctions one expert per price
level.
Play repeated game to do nearly as well as best
strategy in hindsight Regret Minimization
Extensions bandit problem, movement costs.

19
No-regret algorithms for repeated games

Repeated play of matrix game with N rows.
(Algorithm is row-player, rows represent
different possible actions).
At each step t algorithm picks row life picks
column.
Alg pays cost for action chosen. Ct(it)
Alg gets column as feedback Ct
or just its own cost in the bandit model).
Assume bound on max cost all costs between 0 and
1.

it
Ct
20
No-regret algorithms for repeated games

At each time step, algorithm picks row, life
picks column.
Alg pays cost for action chosen Ct(i)
Alg gets column as feedback
Assume bound on max cost all costs between 0 and
1.

Define average regret in T time steps as
avg cost of alg avg cost of best fixed row
in hindsight 1/T ?t1T Ct(it) - mini ?t1T
Ct(i) Want this to go to 0 or better as T gets
large no-regret algorithm.
21
Randomized Weighted Majority

/ Initialization /
Wi ? 1 for i 2 1..n
/ Main loop /
For t1.. T
Let Pt(i) Wi/(?j1n Wj) .
Choose i according to Pt
/Update Scores /
Observe column Ct
for i 2 1..n
Wi ? Wi (1-?)Ct(i)

i
Ct
22
Analysis

Similar to 0,1 case.
Ecost of RWM
(mini ?t1T Ct(i) )/(1-e) ln(n)/e
(mini ?t1T Ct(i) (12e)ln(n)/e
No Regret as T grows difference goes to 0

For e ½
23
Properties of no-regret algorithms.

Time-average performance guaranteed to approach
minimax value V of game
or better, if life isnt adversarial.
Two NR algorithms playing against each other
have empirical distribution approaching minimax
optimal.
Existence of no-regret algorithms yields proof
of minimax theorem.

24
von Neumans Minimax Theorem

Zero-sum game u2(a1,a2) -u1(a1,a2)
Theorem
For any two-player zero sum game with finite
strategy set A1, A2 there is a value v 2 R, the
game value, s.t.
v maxp 2 ?(A1) minq 2 ?(A2) u1(p,q)
minq 2 ?(A2) maxp 2 ?(A1) u1(p,q)
For all mixed Nash equilibria (p,q) u1(p,q)v

?(A) mixed strategies over A
25
Convergence to Minimax

Suppose we know
v maxp 2 ?(A1) minq 2 ?(A2) u1(p,q)
minq 2 ?(A2) maxp 2 ?(A1) u1(p,q)
Consider distribution q for player 2 observed
frequencies of player 2 for T steps
There is best response x 2 A1 for q so that
u2(x,q) v
If player 1 always plays x then expected gain is
vT
If player 1 follows a no-regret procedure
loss is at most vT R where R/T ? 0
Using RWM average loss is v O(log n/T)1/2)

26
Proof of the Minimax

Want to show
v maxp 2 ?(A1) minq 2 ?(A2) u1(p,q)
minq 2 ?(A2) maxp 2 ?(A1) u1(p,q)
Consider for player 1
v1max maxx 2 A1 minq 2 ?(A2) u1(x,q)
v1min miny 2 A2 maxp 2 ?(A1) u1(p,y)
For player 2
v2max maxy 2 A2 minp 2 ?(A2) -u1(p,y)
v2min minx 2 A1 maxq 2 ?(A1) -u1(x,q)
Need to prove v1max v1min . Easy v1max v1min
Suppose v1max v1min ? for ? 0
Player 1 and 2 follow a no-regret procedure for T
steps with regret R
Need R/T lt ?/2

Best choice given player 2 distribution
Best distribution not given player 2 distribution
27
Proof of the Minimax

Consider for player 1
v1max maxx 2 A1 minq 2 ?(A2) u1(x,q)
v1min miny 2 A2 maxp 2 ?(A1) u1(p,y)
For player 2
v2max maxy 2 A2 minp 2 ?(A2) -u1(p,y)
v2min minx 2 A1 maxq 2 ?(A1) -u1(x,q)
Suppose v1max v1min ? for ? 0
Player 1 and 2 follow a no-regret procedure for T
steps with regret R
Need R/T lt ?/2
Losses are LT and -LT.
Let L1 and L2 be best response losses for the
empirical distributions
Then L1 /T v1max and L2 /T v2max
But L1 LT R and L2 - LT - R

28
History and development

Hannan57, Blackwell56 Alg. with regret
O((N/T)1/2).
Need T O(N/?2) steps to get time-average regret
?.
Call this quantity T?
Optimal dependence on T (or ?). View N as
constant
Learning-theory 80s-90s combining expert
advice
Perform (nearly) as well as best f2C. View N as
large.
LittlestoneWarmuth89 Weighted-majority
algorithm
Ecost OPT(1e) (log N)/e OPT?T(log N)/?
Regret O((log N)/T)1/2.
T? O((log N)/?2).

29
Why Regret Minimization?

Finding Nash equilibria can be computationally
difficult
Not clear that agents would converge to it, or
remain in one if there are several
Regret minimization is realistic
There are efficient algorithms that minimize
regret
It is locally computed,
Players improve by lowering regret

30
Efficient implicit implementation for large n

Bounds have only log dependence on n.
So, conceivably can do well when n is exponential
in natural problem size, if only could implement
efficiently.
E.g., case of paths
Recent years series of results giving efficient
implementation/alternatives in various settings

31
The Evesdropping Game

Let G(V,E)
Player 1 chooses and edge e of E
Player 2 chooses a spanning tree T
Payoff u1(e,T) 1 if e 2 T and 0 otherwise
The number of moves exponential in G
But best response for Player 2 given
distribution on edges
solve a minimum spanning tree on the
probabilities

32
Correlated Equilibrium and Swap Regret

What about Nash?
For correlated equilibrium if algorithm has low
swap regret then converges to correlated
equilibrium.

33
Back to the Peanuts Game

There are n bins.
At each round nature throws a peanut into one
of the bins
If you (the player) are at the chosen bin you
catch the peanut
Otherwise you miss it
You may choose to move to any bin before any
round
Game ends when d peanuts were thrown at some bin
Homework what guarantees can you provide

Write a Comment

User Comments (0)

About PowerShow.com

Lecturer: Moni Naor - PowerPoint PPT Presentation

Lecturer: Moni Naor

Using 'expert' advice. We solicit n 'experts' for their advice: will the market ... If computation time is no object, can have one 'expert' per concept in C. ... – PowerPoint PPT presentation