Fictitious Play The Theory of Learning in Games D. Fudenberg and D. Levine - PowerPoint PPT Presentation


PPT – Fictitious Play The Theory of Learning in Games D. Fudenberg and D. Levine PowerPoint presentation | free to download - id: 30bce-MThhY


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

Fictitious Play The Theory of Learning in Games D. Fudenberg and D. Levine


Teach the opponent to play a best response to a particular action by repeating it over and over. ... is dependent on the opponent's play so it does not change ... – PowerPoint PPT presentation

Number of Views:89
Avg rating:3.0/5.0
Slides: 68
Provided by: tau1
Learn more at:


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Fictitious Play The Theory of Learning in Games D. Fudenberg and D. Levine

Fictitious PlayThe Theory of Learning in
GamesD. Fudenberg and D. Levine
  • Speaker Tzur Sayag
  • 03/06/2003

Do you believe that PM Sharon is serious about
the peace process?
  • A voter has to decide if he should support PM
  • Belief Sharon will never evacuate settlements
  • Action Vote against the new economics
  • May 24 Sharon announces occupation is no-good
  • Belief Sharon will probably never evacuate
  • Action Vote against the new economics revolution
  • Jun 5 Sharon meets Abu-Mazen and declares
    support for a Palestinian state.
  • Belief Seems like Sharon might evacuate the
    settlements after all
  • Action Vote for the new economics revolution.

  • Introduction to the common models of learning in
  • Cournot adjustment
  • Fictitious play and Nash equilibriums
  • Motivation
  • Definitions
  • Results
  • Generalizations of fictitious play if we have

P1 gets a1 and p2 gets b1 if they play
Action1,Action1 respectively
Player 2
Player 1
Learning in Games - 1
  • Repeated games same or related
  • fixed-player model
  • Teach the opponent to play a best response to a
    particular action by repeating it over and over.

Being Sophisticated Example
  • D is dominant for Bob.
  • If Alice learns Bob only plays D, game converges
    to ltD,Lgt
  • Bobs payoff for ltD,Lgt is 2.
  • If Bob is patient, he can play U always and just
    wait for a while
  • If Bob always plays U,
  • Alice who thought Bobs gonna play D should shift
    its play from L to R (since R was only good when
    Bob actually played D)
  • So Bob plays constant U which leads Alice to play
    constant R with payoff 2 gt 1.
  • in this case Bob gets 3 which is better.
  • Bingo!

Being Sophisticated Abstracting
  • Most learning theory rely on models in which the
    incentive is small to alter the future play of
    the opponent.
  • Locked in for 2 periods
  • Large anonymous population
  • Embed a two player game by pairing players
    randomly from a large population.

Models of Embedding
  • Single-pair model
  • random single pair, actions revealed to everyone
  • Aggregate static model
  • all players randomly matched, aggregates outcomes
    revealed to everyone
  • Random-matching model
  • all players randomly matched, each player sees
    his game outcome only

Three common models of Learning
  • Fictitious play
  • Players observe only their own matches and play a
    best response to the frequencies.
  • Partial best-response
  • A fixed portion switches each period from its
    current action to a BR to the aggregate stats
    from the previous period.
  • Replicator Dynamics
  • The share of the population using each strategy
    grows proportionally to that strategys current

Cournot Adjustment a flavor of analysis
  • Two firms 1 and 2.
  • Strategy choose a quantity si?0,8)
  • Strategy profile is ltsi, s-igt?S
  • Utility for i is ui(ltsi, s-igt)
  • Assume ui(lt., s-igt) is strictly convex
  • BR(s-i) argmax ( ui(ltx, s-igt) ) x?S

BR is unique since u is concave so the relevant
u is positive, this means that u is a monotone
increasing function which means it has at most
one zero which means, yes, you guessed it right U
only has one extreme point and the max is
therefore unique. u cant be fixed since it is
STRICTLY concave by assumption
Cournot Adjustment Model
  • time periods t 0,1,2,, discrete
  • State profile ?0 ?S
  • in each period the player chooses a pure strategy
    that is BR to the previous period
  • Formally i chooses stBR(s-it-1)

Cournot DynamicsReaction Curve
BR1 For every ?2 the line states the BR of player
1 against it. The value for player 1 is the
height at point ?2
?t (?t1 , ?t2)
Can you convince yourself this point is a Nash?
New BR if 2 plays ?t2
Cournot Dynamics
  • A movement between profiles such that
  • ?t1 f(?t) , fi(?t) BRi(?t-i)
  • A steady state is ?s s.t. ?s f(?s)
  • Once ?t ?s the system remains there
  • Claim (simple) ?s is a NASH
  • Proof by definition for every player
    ?sBRi(?-i), so players dont want to move.

Cournot Dynamics oblivions to linear
  • Proposition 1.1 Suppose ui(s)aui(s) vi(s-i)
    for all players I, Then u and u are
    best-response equivalent
  • Proof
  • vi(s-i) is dependent on the opponents play so it
    does not change the magnitude order (seder)
    of my actions
  • Multiplying all payoffs by the same constant a
    has no effect on the order
  • So, a transformation that leaves preferences, and
    consequently best responses, will give rise to
    the same dynamic learning process.

Cournot Dynamics and Zero sum Games
  • Recall payoffs in ZSG add to zero.
  • Proposition 1.2 every 2 x 2 game for which the
    best response correspondences have a unique
    intersection that lies in the interior of the
    strategy space is best-response equivalent to a
    zero-sum game.
  • Proof given G, a 2x2 game, with unique
  • w.l.o.g. assume 1) A is BR for player 1 against
    A 2) B is BR for player 2 against A
  • If A was also a BR for player 2 then ltA,Agt is a
    BR correspondence at a pure profile which
    contradicts our assumption.

Cournot Dynamics and Zero Sum Games 2
  • Proof outline Given G, the 2x2 game with unique
    intersection, we build a zero sum game that has
    the same Best Responses. Observe the following
  • If alt1 then BR1(A)A since u1(A,A) 1 but
    u1(A,B) is only a
  • If alt1 then BR2(A)B since u2(A,B) 0 but
    u1(A,A) is only 0
  • Denote si player is probability to play A
  • Claim 1 player 1 is indifferent between A and B
    if, s2 a s2 b (1- s2)
  • Claim 2 player 2 is indifferent between A and B
    if, s1 a (1-s1) b (1- s1)

Proof ofplayer 1 is indifferent between A and B
if, s2 a s2 b (1- s2)
  • Assume s2 a s2 b (1- s2) () (s2 is the
    prob. 2
  • (1) If player 1 plays A he (1) gets plays A)
  • u1(A,?) s2 (u1(A,A)) (1 - s2)
    (u1(A,B))? s2 (1) (1- s2) (0) s2 (by
    the game table)
  • (2) If player 1 plays B he gets
  • u1(A,?) s2 u1(B,A) (1 - s2)
    u1(B,B)? s2 (a) (1- s2) b (by the game
  • So if (1) (2) he does not care which to choose,
    (1) (2)? s2 s2 (a) (1- s2) b as
  • Proof of claim 2 regarding 2s indifference
    follows the same path.

Proof contBuilding the ZSG Game
Mental note si Prplayer i playing A
  • 1 is indifferent between A and B if s2 a s2
    b (1- s2)
  • 2 is indifferent between A and B if s1 a
    (1-s1) b (1- s1)
  • Fixing an intersection point s1, s2 We can solve
    for the unknown payoffs a,b a (s2 s1) / (1
    s2 s1) Notice that (s2 s1) lt 1 (si gt 0
    otherwise i never plays A)
  • (s2 s1) lt 1 implies alt1 (since (1 s2 s1)gt1)
    Q.E.D.We already showed that when alt 1 it means
    that we get the same best responses we had in the
    original game G A for player 1 against A, B
    for player 2 against A
  • To sum up it should have been obvious that (s1,
    s2 ) is a Nash, the point was to find a 2x2 ZSG
    which has the same best responses as the original

Strategic-Form Games
  • Finite actions
  • One shot simultaneous-move games
  • Players, strategy space, payoff functions is
    the strategic form of a game

Nash and Correlated Nash
  • A game can have several NashsltA,Agt,
    ltB,Bgt,lt(1/2,1/2), (1/2,1/2)gtbut the payoffs may
    be different.ltA,Agt gets 2 for eachlt(1/2,1/2),
    (1/2,1/2)gt gets 1 for each.
  • Lets question the robustness of the mixed
    strategy Nash point.
  • Intuitively, at the mixed, players are
    indifferent (in real life) play A,B whateverso
    one may believe that the other one plays A with
    slightly more probability. He then wants to
    switch to pure A so the robustness of Nash seems

Nash and Correlated Nash
  • A Nash is strict if for each player i, si is the
    unique best response to s-i
  • Only pure strategies can be strict since if a
    mixed is BR than so is every pure strategy in the
    mixed strategys support otherwise there is no
    point of including it.
  • Recall Support for a mixed strategy are the pure
    strategies that participate with positive

Some Questions in Theory of Games
  • When and why should we expect play to correspond
    to a Nash equilibrium
  • If there are several Nash equilibria, when one
    should we expect to occur?
  • In the previous example, in the absence of
    coordination, we are faced with the possibility
    that player 1 expects NE1ltA,Agt so he plays A,
    the opponent might expect NE2ltB,Bgt and he plays
    B, with the results of the non-equilibrium
    outcome profile ltA,Bgt

The Idea of Learning based explanation of
  • Intuitively, the history of observations can
    provide a way for the players to coordinate their
    expectations on one of the two pure-strategy
  • Typically, Learning models predict that this
    coordination will eventually occur, with the
    determination of which of the two eq. arise left
    to initial conditions or to random chance.

The Idea of Learning based explanation of
  • For the history to serve this coordination role,
    the sequence of actions played must eventually
    become constant or at least readily predictable
    by the players, of course, there is no
    presumption that this is always the case.
  • Perhaps, rather than going to a Nash, players
    wander around the space aimlessly, or perhaps
    play lies in some set of alternatives larger than
    the set of Nashs?

The Idea of Learning based explanation of
  • For the simple coordination game (symmetric
    lt2,2gt, lt0,0gt) there is no reason to think that
    any learning process will prefer one Nash over
    the other.
  • What if we alter it such that there is a better
    Nash. Will the players learn to play the ltA,Agt

Correlated Nash (Aumann 74)
  • Suppose the players have access to randomized
    devices that are privately viewed.
  • If a player chooses a strategy according to his
    own randomized device, the result is a
    probability distribution over strategy profiles,
    denoted µ??(S).
  • Unlike a profile of mixed strategies which is by
    definition uncorrelated, such a distribution may
    be correlated.

Correlated Nash Jordans matching pennies
  • 3 players.
  • Each chooses H or T
  • Payoffs are 1 or -1 only
  • 1 wins if he matches 2
  • 2 wins if he matches 3
  • 3 wins by not matching 1
  • This game has a unique NE, each play (1/2,1/2)
  • HoweverIt has many correlated NE.

Player 3 plays H
Player 3 plays T
Correlated Nash Jordans matching pennies
Player 3 plays H
  • C-NE unified distribution over these 6
    profiles(H,H,H) (H,H,T) (H,T,T) (T,T,T)
    (T,T,H) (T,H,H)
  • Each player has 50 to play H.
  • No weight is placed on (H,T,H), so the play of
    the players is not independent (it is correlated)
  • For Player 1 When he plays H he faces 1/3 chance
    each of his opponents play (H,H), (H,T),(T,T).
    Since his goal is to match 2, he wins 2/3 of the
    times by playing H and only one third if he plays
    Y. similarly if he plays T his opponents might
    only play (T,T), (T,H), (H,H). Now tails win 2/3
    of the times as against heads which wins only 1/3
    of the time. So he is evened. He is at a Nash.

Player 3 plays T
Why is Correlated Nash of Significance?
  • Hint Cycles create correlation between profile
  • Informally a cycle is a finite sequence of
    profiles of length k such that s0sk.
  • Cournot play can exhibit cycles example
  • So cycles gt correlation gt correlated Nash

Cournot Cycle - matching pennies.
  • 3 player (head, tail)
  • 1 wants to match 2.
  • 2 wants to match 3.
  • 3 wants to un-match 1.
  • Cournot means each player assumes his
    opponents play the same as in their last step

  • Introduction to the common models of learning
  • Cournot adjustment
  • Fictitious play and Nash equilibriums
  • Motivation
  • Definitions
  • Results
  • Generalizations of fictitious play

Fictitious play - Introduction
  • Motivation
  • Repeated game, stationary assumption.
  • Each player forms a belief of his opponents
    strategy by looking at what happened
  • Player plays Best Response according to his/her

Two-Player Fictitious Play - notations
  • S1 and S2 are finite actions spaces for players
    one and two respectively.
  • S1 ,?,?
  • S2 ?,?,?
  • u1, u2 player payoff functions
  • u1(, ?)15
  • for mixed strategy we take
  • u1(lt½,½gt,lt¼, ¾gt ) u1(
  • u1(,lt¼, ¾gt )¼ u1(, ?) ¾ u1(, ?)
  • Player is pi, opponent is p-i i1,2

Two-Player Fictitious Play
  • Notion of belief
  • A prediction of the opponent action distribution
    the degree to which 1 believes 2 will play ?
  • Assume players choose their actions for each
    period to maximize their expected payoff, with
    respect to their belief for the current period.

Two-Player Fictitious Play Forming Beliefs
  • Player i starts with a weight function K0i
  • K0i S-i ? ?
  • For example
  • K0i?,,? ? ?
  • K0i()4
  • As the game is iteratively repeated K is updated

Two-Player Fictitious Play Belief update
  • If some action say was played (by the
    opponent!) the last time, we add 1 to its count,
  • Kt (s-i) 1 if s-it-1 s-i
    0 otherwise
  • Thats a complicated way of saying that K(s)
    simply counts the number of times the opponent
    played s.

Two-Player Fictitious Play Using frequencies to
form beliefs
  • Given K the frequency vector,
  • Each player forms a probability vector ? over his
    opponents actions
  • His belief can be said to be that the
  • Pri plays Kt() / steps
  • Simple normalization

Reads the belief player i holds at time t
regarding the probability of his opponent to
plays s-I in time t
Two-Player Fictitious Play Using frequencies 2
My belief is that my opponent plays ? with
probability ½, ,? with prob ¼ and ¼ ?, looking at
my payoff table, by playing I can max the
  • We now have a belief of how the opponent plays.
  • A FP is any rule ?it which assigns a Best
    Response action to the belief ?it
  • Example
  • ?1(lt½,¼,¼gt) (extend naturally to mixed)
  • This implies that u1(, lt½,¼,¼gt) is better
    for player 1 than any other action against

Two-Player Fictitious Play remarks
  • Many BR are possible for a given belief set
  • An example of such rules ? may be
  • Always prefer pure action over mixed action
  • Pick the best response for which your action
    index is least, (thats the limit of my
  • (both of course must still be best responses)

Two-Player Fictitious Play Interpretation (page
  • Bayesian inference
  • Player i believes opponents play corresponds to
    a sequence of i.i.d. multinomial random variable
    with a fixed but unknown distribution.
  • Player is prior over that unknown distribution
    takes the form of a Dirichlet distribution.
  • is prior and posterior belief corresponds to a
    distribution over the set ?(S-i) of probability
    distributions over S-i
  • The distribution over oppnents strategies ?I t
    is the induced marginal distribution over pure
  • If beliefs over ?(S-i) are denoted µi, then we

Two-Player Fictitious Play Interpretation
  • Denote the marginal empirical distribution as
  • The assessment ? is not the same as d because of
    the influence of is prior belief
  • This has the form of a fictitious sample
    observed before the game started.
  • As observations are incorporated into ?, it will
    converge to d (the empirical distribution)

Two-Player Fictitious Play Interpretation
  • Notes
  • As long as the initial weights are positive it
    will stay positive
  • The belief reflects the conviction that the
    opponent strategy is constant and unknown.
  • It may be wrong If the process cycles.
  • Any finite sequence of what looks like a cycle is
    actually consistent with this assumption that the
    world is constant and those observations are a
  • If cycles persist, we might expect i to notice it
    but in any case, his beliefs will not be
    falsified in the first few periods as they did in
    the Cournot process.

Asymptotic Behavior does play converges
  • Sufficient conditions
  • Proposition 2.1
  • (1) if s is a strict Nash and is s is played at
    time t in the process of FP then s is played at
    all subsequent dates
  • (2) any pure-strategy steady state of FP must be
    a Nash

If s a strict Nash and played at time t s is
played at all subsequent dates
  • Proof
  • Suppose ?it (players beliefs) are such that the
    actions are strict Nash s.
  • believe me that When profile s is played at
    time t, each players belief at t1 are a convex
    combination of ?it and a mass point on s-i ?it1
    (1-at) ?it atd(s-i)
  • we get

If s a strict Nash and played at time t s is
played at all subsequent dates
  • We want to show that this payoff is still better
    than any other payoff involving ?it1
  • Now si was a strict BR for ?it
  • Should be obvious for the first term (by
    assumption that it is strict BR for ?it).
  • for the second term, note that for the point mass
    it is obvious that si is better because it
    implies that the profile lt si , s-i gt is a Nash
    which was our assumption

So, what is a point mass on s-i and why is ?it1
a convex combination of it and of ?it ?
  • I need to show you that ?it1 (1-at) ?it
  • For clarity, lets say t10, there are 2 players,
  • Lets say s( ½, ½ ), ( ¼ , ¾ )
  • S-iS2C,D and look at ?1101(C)
  • recall ?110 (C)K10 (C) /10 (ignore prior it
    matters not)
  • Suppose that at time 10 player 2 actually played
    C (he played a mixed which is interpreted that ¼
    of the times he would play C..)
  • ?111 (C) K10 (C) 1 / 11

2nd part Any pure-strategy steady state of FP
must be a Nash
  • A steady state is a strategy profile that is
    played in every step after perhaps a finite time
  • Ideas?
  • If play remains at a pure-strategy profile then
    eventually the assessments will become
    concentrated at this profile.
  • If it was not a Nash for one of the players, him
    playing what he played would not be a BR, this is
    a contradiction to how FP works,
  • Since all players always play BR according to
    their belief.
  • Food for thought Why does it not work for
    mixed-strategy profile?

To Conclude this
  • we wanted to show that if s is a strict Nash and
    is s is played at time t in the process of FP
    then s is played at all subsequent dates
  • We showed it by looking at what happens to
    players belief and prove that the actions at
    given the new belief are still strict BR.
  • This means the system is at a steady state.
  • We also showed that if it is a pure-strategy
    steady state it is a Nash.

No Pure Nash gt FP cant converge to a pure
  • Matching pennies
  • For example
  • At time3 player I believes that II prefers
    Tails, so he plays Tails to match
  • But II plays Heads so I adds one to Heads
  • Now HeadsgtTails and I convinced himself II will
    play Heads so he switches to H
  • The game cycles and never converges to the Nash

No Pure Nash gt FP cant converge to a pure
  • If the game did converge it would be in a
    steady state that is pure and not a Nash (since
    matching pennies has no pure Nash) but we showed
    that any pure-steady state must be a Nash.
  • Its ok then that the game does not converge.
  • Interestingly, the empirical distributions over
    player is strategies are converging to ( ½ , ½ )
    their product ( ½ , ½ ), ( ½ , ½ ) is a Nash.

Asymptotic Behavior
  • Proposition 2.2
  • If the empirical d over each players choices
    converges, the strategy profile corresponding to
    the product of these distributions is a Nash
  • Proof
  • intuitively, if the empirical does converge, then
    the belief converges to the same thing, hence, if
    it was not a Nash players would move from there.
  • Generally, for this it is enough that the beliefs
    are asymptotically empirical, need not be FP

Asymptotic Behavior
  • More results (proof omitted)
  • The empirical converges if
  • (1) generic payoff and 2x2 game
  • (2) zero sum
  • (3) solvable by iterated strict dominance
  • The empirical distribution however need not
    converge! 2 examples.

Example Shapley (1964)
Nash is at ( 1/3, 1/3,1/3) for both. if initial
weights lead to ltT,Mgt we cycle. Diagonals are
never played. (ltT,Lgt, ltD,Mgt,ltT,Mgt the number of
consecutive periods each profile is played
increases sufficiently fast so the empirical
distributions never converges.
Example due to Jordan
  • Increase to (1/2,1/2) the diagonal.
  • We get Even-Nyar Umisprayim a zero sum game
    with the same Nash
  • Here, since the empirical do converge, there are
    still cycles, but the rate of repeated profiles
    within a sequence grows slowly.
  • Note that this does not conflict with our
    previous statements, namely
  • Heres a zsg whos empirical converge but its
    not steady (cycles)
  • Had the empirical converged to a pure-strategy
    Nash the fact that it cycled would have been a
    conflict to ltLINKgt

Payoffs in Fictitious Play
  • The question we deal with here is if FP learns
    the distributions then it should, asymptotically
    yield the same utility that would be achieved
    when the frequency distribution is known in
  • Here we will suppose more than 2 players, their
    assessment track the joint distribution of
    opponent strategies.

Payoffs Fictitious Play - Notations
  • Empirical joint distribution
  • Best payoff against the empirical
  • Time avg of realized payoffs
  • Definition
  • Fictitious play is e-consistent along a history
    if there is T such that for any T

Payoffs Fictitious Play - Notations
  • Empirical joint distribution
  • Best payoff against the empirical
  • Time avg of realized payoffs
  • Definition
  • Fictitious play is e-consistent along a history
    if there is T such that for any T

  • We dont look at how good the player does
    globally but how good he does with comparison to
    his expectations which are built upon his
  • If the FP is not consistent it would be much less
    interesting model, it would be as if someone
    simply plays a different game. So, we want

  • If A game is consistent it can be useful, if
    after some period of time, a player sees that his
    expectations are not fulfilled (by comparing the
    expected payoff with the actual payoff), he can
    deduce that something is wrong in his model of
    the world.

Consistency main result
  • Relate player frequency of switching strategies
    of a player to the consistency of his play.
  • We define the frequency of switches ?it to be the
    fraction of periods for which

Fictitious Play and BR dynamics
  • Definition
  • Fictitious play exhibits infrequent switches
    along a history if for every e gt 0 there is a T
    and for any t ? T, ?it ? e for all i.
  • Proposition (Fudenberg and Levine 94)
  • If FP exhibits infrequent switches along a
    history, then it is e-consistent along that
    history for every e gt 0

infrequent switches ? e-consistency
  • Intuition
  • Once prior looses its influence, at each date
    player i plays BR to the empirical (observations)
    through date t-1.
  • On the other hand, if i is not doing on average
    as well as best response to the empirical, there
    must be a nonmalleable fraction of dates t for
    which i is not playing BR, but in this case,
    player i must switch in date t1. conversely,
    infrequent switches imply that most of the time,
    is date-t action is a best response to the
    empirical distribution at the end of date t.

Proof - notations
  • k length of initial history (kept as prior
  • initial belief
  • best response strategy of i to
  • player i expected date-t payoff is

Proof - summary
  • We showed that if there are not many switches
    along the players history, his play is
  • It means that the actual payoffs do not flounder
    below the expectation!
  • Again, one can use this to see if something is
    wrong with his model of the env.

  • Crash introduction to the common models of
  • Cournot adjustment
  • Fictitious play and Nash equilibriums
  • Motivation
  • Definitions
  • Results
  • Generalizations of fictitious play

Generalizations of Fictitious Play
  • Mainly generalize the update rule for the belief
  • Example exponential weighting
  • As long as beliefs are asymptotically empirical
    what we showed for FP holds.

  • Dynamics of Games and a flavor of analysis
  • Although different from Nash analysis, Nash is
    still an important point, we showed that it is
    still a point FP can converge to
  • FP is a consistent play
  • Have a nice summer vacation thanx for listening.