Lecturer: Moni Naor - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Lecturer: Moni Naor

Description:

Using 'expert' advice. We solicit n 'experts' for their advice: will the market ... If computation time is no object, can have one 'expert' per concept in C. ... – PowerPoint PPT presentation

Number of Views:16
Avg rating:3.0/5.0
Slides: 34
Provided by: wisdomWe
Category:

less

Transcript and Presenter's Notes

Title: Lecturer: Moni Naor


1
Algorithmic Game Theory Uri Feige Robi
Krauthgamer Moni NaorLecture 8 Regret
Minimization
  • Lecturer Moni Naor

2
Announcements
  • Next Week (Dec 24th)
  • Israeli Seminar on Computational Game Theory
    (1000 - 430)
  • At Microsoft Israel RD Center,
  • 13 Shenkar St. Herzliya.
  • January course will be 1300-1500
  • The meetings on Jan 7th, 14th and 21st 2009

3
The Peanuts Game
  • There are n bins.
  • At each round nature throws a peanut into one
    of the bins
  • If you (the player) are at the chosen bin you
    catch the peanut
  • Otherwise you miss it
  • You may choose to move to any bin before any
    round
  • Game ends when d peanuts were thrown at some bin
  • Independent of whether they were caught or not
  • Goal to catch as many peanuts as possible.
  • Hopeless if the opponent is tossing the peanuts
    based on knowing where you stand.
  • Say sequence of bins is predetermined (but
    unknown).
  • How well can we do as a function of d and n?

4
Basic setting
  • View learning as a sequence of trials.
  • In each trial, algorithm is given x, asked to
    predict f, and then is told the correct value.
  • Make no assumptions about how examples are
    chosen.
  • Goal is to minimize number of mistakes.

Focus on number of mistakes. Need to learn from
our mistakes.
5
Using expert advice
Want to predict the stock market.
  • We solicit n experts for their advice will the
    market go up or down tomorrow?
  • Want to use their advice to make our prediction.
    E.g.,
  • What is a good strategy for using their opinions,
  • No a priori knowledge which expert is the best?
  • Expert someone with an opinion.

6
Some expert is perfect
  • We have n experts.
  • One of these is perfect (never makes a mistake).
    We just dont know which one.
  • Can we find a strategy that makes no more than
    log n mistakes?

Simple algorithm take majority vote over all
experts that have been completely correct so far.

What if we have a prior p over the experts.
Want no more than log(1/pi) mistakes, where
expert i is the perfect one? Take weighted vote
according to p.
7
Relation to concept learning
  • If computation time is no object, can have one
    expert per concept in C.
  • If target in C, then number of mistakes at most
    logC.
  • More generally, for any description language,
    number of mistakes is at most number of bits to
    write down f.

8
What if no expert is perfect?
  • Goal is to do nearly as well as the best one in
    hindsight.
  • Strategy 1
  • Iterated halving algorithm. Same as before, but
    once we've crossed off all the experts, restart
    from the beginning.
  • Makes at most log(n)OPT mistakes,
  • OPT is mistakes of the best expert in
    hindsight.
  • Seems wasteful constantly forgetting what we've
    learned.

x
log n
log n
log n
log n
9
Weighted Majority Algorithm
  • Intuition Making a mistake doesn't completely
    disqualify an expert. So, instead of crossing
    off, just lower its weight.
  • Weighted Majority Alg
  • Start with all experts having weight 1.
  • Predict based on weighted majority vote.
  • Penalize mistakes by cutting weight in half.

10
Weighted Majority Algorithm
  • Weighted Majority Alg
  • Start with all experts having weight 1.
  • Predict based on weighted majority vote.
  • Penalize mistakes by cutting weight in half.
  • Example

Day 1
Day 2
Day 3
Day 4
11
Analysis not far from best expert in hindsight
  • M mistakes we've made so far.
  • b mistakes best expert has made so far.
  • W total weight. Initially W is set to n.
  • After each mistake W drops by at least 25.
  • So, after M mistakes, W is at most n(3/4)M.
  • Weight of best expert is (1/2)b. So
  • (1/2)b n(3/4)M
  • and
  • M ?(blog n)

constant comp. ratio
12
Randomized Weighted Majority
  • If the best expert makes a mistake 20 of the
    time, then ?(blog n) not so good Can we do
    better?
  • Instead of majority vote use weights as
    probabilities.
  • If 70 on up, 30 on down, then pick 7030
  • Idea smooth out the worst case.
  • Also, generalize ½ to 1- e.

13
Randomized Weighted Majority
  • / Initialization /
  • Wi ? 1 for i 2 1..n
  • / Main loop /
  • For t1.. T
  • Let Pt(i) Wi/(?j1n Wj) .
  • Choose i according to Pt
  • /Update Scores /
  • Observe losses
  • for i 2 1..n
  • Wi ? Wi (1-?)loss(i)

14
Analysis
  • Say at time t fraction Ft of weight on experts
    that made mistake.
  • We have probability Ft of making a mistake
    remove an eFt fraction of the total weight.
  • Wfinal n(1-e F1)(1 - e F2)...
  • ln(Wfinal) ln(n) ?t ln(1 - e Ft) ln(n) -
    e ?t Ft
  • (using ln(1-x) lt -x)
  • ln(n) - e M.
    (å Ft E mistakes)
  • If best expert makes b mistakes, then ln(Wfinal)
    gt ln((1-?)b).
  • Now solve ln(n) - e M gt b ln(1-e)
  • M b ln(1-e)/(-e) ln(n) /e
  • b (1e) ln(n) /e .
  • M Expected mistakes
  • b mistakes best expert made
  • W total weight.

-ln(1-x) -xx2 for 0 x 1/2
15
Summarizing RWM
  • Can be (1?)-competitive with best expert in
    hindsight, with additive ?-1log(n).
  • If running T time steps, set ? (ln n/T)1/2 to
    get
  • M b (1 (ln n/T)1/2) ln(n) / (ln n/T)1/2
  • b (b2ln n/T)1/2 ) (ln(n) T)1/2
  • b 2(ln(n) T)1/2
  • M mistakes made
  • b mistakes best expert made
  • M b (1e) ln(n) /e

additive loss
16
Questions
  • Isnt it better to sometimes take the majority?
  • The best expert may have a hard time on the easy
    question and we would be better of using the
    wisdom of the crowds
  • Answer if it a good idea, make the majority
    expert one of the experts!

17
Lower Bounds
  • Cannot hope to do better than
  • log n
  • T1/2

18
What can we use this for?
  • Can use to combine multiple algorithms to do
    nearly as well as best in hindsight.
  • E.g., online auctions one expert per price
    level.
  • Play repeated game to do nearly as well as best
    strategy in hindsight Regret Minimization
  • Extensions bandit problem, movement costs.

19
No-regret algorithms for repeated games
  • Repeated play of matrix game with N rows.
    (Algorithm is row-player, rows represent
    different possible actions).
  • At each step t algorithm picks row life picks
    column.
  • Alg pays cost for action chosen. Ct(it)
  • Alg gets column as feedback Ct
  • or just its own cost in the bandit model).
  • Assume bound on max cost all costs between 0 and
    1.

it
Ct
20
No-regret algorithms for repeated games
  • At each time step, algorithm picks row, life
    picks column.
  • Alg pays cost for action chosen Ct(i)
  • Alg gets column as feedback
  • Assume bound on max cost all costs between 0 and
    1.

Define average regret in T time steps as
avg cost of alg avg cost of best fixed row
in hindsight 1/T ?t1T Ct(it) - mini ?t1T
Ct(i) Want this to go to 0 or better as T gets
large no-regret algorithm.
21
Randomized Weighted Majority
  • / Initialization /
  • Wi ? 1 for i 2 1..n
  • / Main loop /
  • For t1.. T
  • Let Pt(i) Wi/(?j1n Wj) .
  • Choose i according to Pt
  • /Update Scores /
  • Observe column Ct
  • for i 2 1..n
  • Wi ? Wi (1-?)Ct(i)

i
Ct
22
Analysis
  • Similar to 0,1 case.
  • Ecost of RWM
  • (mini ?t1T Ct(i) )/(1-e) ln(n)/e
  • (mini ?t1T Ct(i) (12e)ln(n)/e
  • No Regret as T grows difference goes to 0

For e ½
23
Properties of no-regret algorithms.
  • Time-average performance guaranteed to approach
    minimax value V of game
  • or better, if life isnt adversarial.
  • Two NR algorithms playing against each other
  • have empirical distribution approaching minimax
    optimal.
  • Existence of no-regret algorithms yields proof
    of minimax theorem.

24
von Neumans Minimax Theorem
  • Zero-sum game u2(a1,a2) -u1(a1,a2)
  • Theorem
  • For any two-player zero sum game with finite
    strategy set A1, A2 there is a value v 2 R, the
    game value, s.t.
  • v maxp 2 ?(A1) minq 2 ?(A2) u1(p,q)
  • minq 2 ?(A2) maxp 2 ?(A1) u1(p,q)
  • For all mixed Nash equilibria (p,q) u1(p,q)v

?(A) mixed strategies over A
25
Convergence to Minimax
  • Suppose we know
  • v maxp 2 ?(A1) minq 2 ?(A2) u1(p,q)
  • minq 2 ?(A2) maxp 2 ?(A1) u1(p,q)
  • Consider distribution q for player 2 observed
    frequencies of player 2 for T steps
  • There is best response x 2 A1 for q so that
    u2(x,q) v
  • If player 1 always plays x then expected gain is
    vT
  • If player 1 follows a no-regret procedure
  • loss is at most vT R where R/T ? 0
  • Using RWM average loss is v O(log n/T)1/2)

26
Proof of the Minimax
  • Want to show
  • v maxp 2 ?(A1) minq 2 ?(A2) u1(p,q)
  • minq 2 ?(A2) maxp 2 ?(A1) u1(p,q)
  • Consider for player 1
  • v1max maxx 2 A1 minq 2 ?(A2) u1(x,q)
  • v1min miny 2 A2 maxp 2 ?(A1) u1(p,y)
  • For player 2
  • v2max maxy 2 A2 minp 2 ?(A2) -u1(p,y)
  • v2min minx 2 A1 maxq 2 ?(A1) -u1(x,q)
  • Need to prove v1max v1min . Easy v1max v1min
  • Suppose v1max v1min ? for ? 0
  • Player 1 and 2 follow a no-regret procedure for T
    steps with regret R
  • Need R/T lt ?/2

Best choice given player 2 distribution
Best distribution not given player 2 distribution
27
Proof of the Minimax
  • Consider for player 1
  • v1max maxx 2 A1 minq 2 ?(A2) u1(x,q)
  • v1min miny 2 A2 maxp 2 ?(A1) u1(p,y)
  • For player 2
  • v2max maxy 2 A2 minp 2 ?(A2) -u1(p,y)
  • v2min minx 2 A1 maxq 2 ?(A1) -u1(x,q)
  • Suppose v1max v1min ? for ? 0
  • Player 1 and 2 follow a no-regret procedure for T
    steps with regret R
  • Need R/T lt ?/2
  • Losses are LT and -LT.
  • Let L1 and L2 be best response losses for the
    empirical distributions
  • Then L1 /T v1max and L2 /T v2max
  • But L1 LT R and L2 - LT - R

28
History and development
  • Hannan57, Blackwell56 Alg. with regret
    O((N/T)1/2).
  • Need T O(N/?2) steps to get time-average regret
    ?.
  • Call this quantity T?
  • Optimal dependence on T (or ?). View N as
    constant
  • Learning-theory 80s-90s combining expert
    advice
  • Perform (nearly) as well as best f2C. View N as
    large.
  • LittlestoneWarmuth89 Weighted-majority
    algorithm
  • Ecost OPT(1e) (log N)/e OPT?T(log N)/?
  • Regret O((log N)/T)1/2.
  • T? O((log N)/?2).

29
Why Regret Minimization?
  • Finding Nash equilibria can be computationally
    difficult
  • Not clear that agents would converge to it, or
    remain in one if there are several
  • Regret minimization is realistic
  • There are efficient algorithms that minimize
    regret
  • It is locally computed,
  • Players improve by lowering regret

30
Efficient implicit implementation for large n
  • Bounds have only log dependence on n.
  • So, conceivably can do well when n is exponential
    in natural problem size, if only could implement
    efficiently.
  • E.g., case of paths
  • Recent years series of results giving efficient
    implementation/alternatives in various settings

31
The Evesdropping Game
  • Let G(V,E)
  • Player 1 chooses and edge e of E
  • Player 2 chooses a spanning tree T
  • Payoff u1(e,T) 1 if e 2 T and 0 otherwise
  • The number of moves exponential in G
  • But best response for Player 2 given
    distribution on edges
  • solve a minimum spanning tree on the
    probabilities

32
Correlated Equilibrium and Swap Regret
  • What about Nash?
  • For correlated equilibrium if algorithm has low
    swap regret then converges to correlated
    equilibrium.

33
Back to the Peanuts Game
  • There are n bins.
  • At each round nature throws a peanut into one
    of the bins
  • If you (the player) are at the chosen bin you
    catch the peanut
  • Otherwise you miss it
  • You may choose to move to any bin before any
    round
  • Game ends when d peanuts were thrown at some bin
  • Homework what guarantees can you provide
Write a Comment
User Comments (0)
About PowerShow.com