Title: Fictitious Play The Theory of Learning in Games D. Fudenberg and D. Levine
1Fictitious PlayThe Theory of Learning in
GamesD. Fudenberg and D. Levine
- Speaker Tzur Sayag
- 03/06/2003
2Do you believe that PM Sharon is serious about
the peace process?
- A voter has to decide if he should support PM
Sharon - Belief Sharon will never evacuate settlements
- Action Vote against the new economics
revolution. - May 24 Sharon announces occupation is no-good
- Belief Sharon will probably never evacuate
settlements - Action Vote against the new economics revolution
- Jun 5 Sharon meets Abu-Mazen and declares
support for a Palestinian state. - Belief Seems like Sharon might evacuate the
settlements after all - Action Vote for the new economics revolution.
3Roadmap
- Introduction to the common models of learning in
games - Cournot adjustment
- Fictitious play and Nash equilibriums
- Motivation
- Definitions
- Results
- Generalizations of fictitious play if we have
time
4Notations
P1 gets a1 and p2 gets b1 if they play
Action1,Action1 respectively
Player 2
Action1 Action2
Action1 (a1,b1) (a2,b2)
Action2 (a3,b3) (a4,b4)
Player 1
5Learning in Games - 1
- Repeated games same or related
- fixed-player model
- Teach the opponent to play a best response to a
particular action by repeating it over and over.
6Being Sophisticated Example
- D is dominant for Bob.
- If Alice learns Bob only plays D, game converges
to ltD,Lgt - Bobs payoff for ltD,Lgt is 2.
- If Bob is patient, he can play U always and just
wait for a while - If Bob always plays U,
- Alice who thought Bobs gonna play D should shift
its play from L to R (since R was only good when
Bob actually played D) - So Bob plays constant U which leads Alice to play
constant R with payoff 2 gt 1. - in this case Bob gets 3 which is better.
- Bingo!
Alice
L R
U 1,0 3,2
D 2,1 4,0
Bob
7Being Sophisticated Abstracting
- Most learning theory rely on models in which the
incentive is small to alter the future play of
the opponent. - Locked in for 2 periods
- Large anonymous population
- Embed a two player game by pairing players
randomly from a large population.
8Models of Embedding
- Single-pair model
- random single pair, actions revealed to everyone
- Aggregate static model
- all players randomly matched, aggregates outcomes
revealed to everyone - Random-matching model
- all players randomly matched, each player sees
his game outcome only
9Three common models of Learning
- Fictitious play
- Players observe only their own matches and play a
best response to the frequencies. - Partial best-response
- A fixed portion switches each period from its
current action to a BR to the aggregate stats
from the previous period. - Replicator Dynamics
- The share of the population using each strategy
grows proportionally to that strategys current
payoff.
10Cournot Adjustment a flavor of analysis
- Two firms 1 and 2.
- Strategy choose a quantity si?0,8)
- Strategy profile is ltsi, s-igt?S
- Utility for i is ui(ltsi, s-igt)
- Assume ui(lt., s-igt) is strictly convex
- BR(s-i) argmax ( ui(ltx, s-igt) ) x?S
BR is unique since u is concave so the relevant
u is positive, this means that u is a monotone
increasing function which means it has at most
one zero which means, yes, you guessed it right U
only has one extreme point and the max is
therefore unique. u cant be fixed since it is
STRICTLY concave by assumption
11Cournot Adjustment Model
- time periods t 0,1,2,, discrete
- State profile ?0 ?S
- in each period the player chooses a pure strategy
that is BR to the previous period - Formally i chooses stBR(s-it-1)
-
12Cournot DynamicsReaction Curve
BR1 For every ?2 the line states the BR of player
1 against it. The value for player 1 is the
height at point ?2
?t (?t1 , ?t2)
?2
Can you convince yourself this point is a Nash?
?t1
?t2
BR2
?1
?t1
New BR if 2 plays ?t2
13Cournot Dynamics
- A movement between profiles such that
- ?t1 f(?t) , fi(?t) BRi(?t-i)
- A steady state is ?s s.t. ?s f(?s)
- Once ?t ?s the system remains there
- Claim (simple) ?s is a NASH
- Proof by definition for every player
?sBRi(?-i), so players dont want to move. - SO EVERY STEADY STATE IS A NASH EQUILIBRIUM
14Cournot Dynamics oblivions to linear
transformation
- Proposition 1.1 Suppose ui(s)aui(s) vi(s-i)
for all players I, Then u and u are
best-response equivalent - Proof
- vi(s-i) is dependent on the opponents play so it
does not change the magnitude order (seder)
of my actions - Multiplying all payoffs by the same constant a
has no effect on the order - So, a transformation that leaves preferences, and
consequently best responses, will give rise to
the same dynamic learning process.
15Cournot Dynamics and Zero sum Games
- Recall payoffs in ZSG add to zero.
- Proposition 1.2 every 2 x 2 game for which the
best response correspondences have a unique
intersection that lies in the interior of the
strategy space is best-response equivalent to a
zero-sum game. - Proof given G, a 2x2 game, with unique
intersection, - w.l.o.g. assume 1) A is BR for player 1 against
A 2) B is BR for player 2 against A - If A was also a BR for player 2 then ltA,Agt is a
BR correspondence at a pure profile which
contradicts our assumption.
16Cournot Dynamics and Zero Sum Games 2
- Proof outline Given G, the 2x2 game with unique
intersection, we build a zero sum game that has
the same Best Responses. Observe the following
zsg. - If alt1 then BR1(A)A since u1(A,A) 1 but
u1(A,B) is only a - If alt1 then BR2(A)B since u2(A,B) 0 but
u1(A,A) is only 0 - Denote si player is probability to play A
- Claim 1 player 1 is indifferent between A and B
if, s2 a s2 b (1- s2) - Claim 2 player 2 is indifferent between A and B
if, s1 a (1-s1) b (1- s1)
A B
A (1,-1) (0,0)
B (a,-a) (b,-b)
17Proof ofplayer 1 is indifferent between A and B
if, s2 a s2 b (1- s2)
- Assume s2 a s2 b (1- s2) () (s2 is the
prob. 2 - (1) If player 1 plays A he (1) gets plays A)
- u1(A,?) s2 (u1(A,A)) (1 - s2)
(u1(A,B))? s2 (1) (1- s2) (0) s2 (by
the game table) - (2) If player 1 plays B he gets
- u1(A,?) s2 u1(B,A) (1 - s2)
u1(B,B)? s2 (a) (1- s2) b (by the game
table) - So if (1) (2) he does not care which to choose,
(1) (2)? s2 s2 (a) (1- s2) b as
required. - Proof of claim 2 regarding 2s indifference
follows the same path.
18Proof contBuilding the ZSG Game
Mental note si Prplayer i playing A
- 1 is indifferent between A and B if s2 a s2
b (1- s2) - 2 is indifferent between A and B if s1 a
(1-s1) b (1- s1) - Fixing an intersection point s1, s2 We can solve
for the unknown payoffs a,b a (s2 s1) / (1
s2 s1) Notice that (s2 s1) lt 1 (si gt 0
otherwise i never plays A) - (s2 s1) lt 1 implies alt1 (since (1 s2 s1)gt1)
Q.E.D.We already showed that when alt 1 it means
that we get the same best responses we had in the
original game G A for player 1 against A, B
for player 2 against A - To sum up it should have been obvious that (s1,
s2 ) is a Nash, the point was to find a 2x2 ZSG
which has the same best responses as the original
game
19Strategic-Form Games
- Finite actions
- One shot simultaneous-move games
- Players, strategy space, payoff functions is
the strategic form of a game
20Nash and Correlated Nash
- A game can have several NashsltA,Agt,
ltB,Bgt,lt(1/2,1/2), (1/2,1/2)gtbut the payoffs may
be different.ltA,Agt gets 2 for eachlt(1/2,1/2),
(1/2,1/2)gt gets 1 for each. - Lets question the robustness of the mixed
strategy Nash point. - Intuitively, at the mixed, players are
indifferent (in real life) play A,B whateverso
one may believe that the other one plays A with
slightly more probability. He then wants to
switch to pure A so the robustness of Nash seems
questionable..
A B
A (2,2) (0,0)
B (0,0) (2,2)
21Nash and Correlated Nash
- A Nash is strict if for each player i, si is the
unique best response to s-i - Only pure strategies can be strict since if a
mixed is BR than so is every pure strategy in the
mixed strategys support otherwise there is no
point of including it. - Recall Support for a mixed strategy are the pure
strategies that participate with positive
probability.
22Some Questions in Theory of Games
- When and why should we expect play to correspond
to a Nash equilibrium - If there are several Nash equilibria, when one
should we expect to occur? - In the previous example, in the absence of
coordination, we are faced with the possibility
that player 1 expects NE1ltA,Agt so he plays A,
the opponent might expect NE2ltB,Bgt and he plays
B, with the results of the non-equilibrium
outcome profile ltA,Bgt
23The Idea of Learning based explanation of
equilibrium
- Intuitively, the history of observations can
provide a way for the players to coordinate their
expectations on one of the two pure-strategy
equilibrium. - Typically, Learning models predict that this
coordination will eventually occur, with the
determination of which of the two eq. arise left
to initial conditions or to random chance.
24The Idea of Learning based explanation of
equilibrium
- For the history to serve this coordination role,
the sequence of actions played must eventually
become constant or at least readily predictable
by the players, of course, there is no
presumption that this is always the case. - Perhaps, rather than going to a Nash, players
wander around the space aimlessly, or perhaps
play lies in some set of alternatives larger than
the set of Nashs?
25The Idea of Learning based explanation of
equilibrium
- For the simple coordination game (symmetric
lt2,2gt, lt0,0gt) there is no reason to think that
any learning process will prefer one Nash over
the other. - What if we alter it such that there is a better
Nash. Will the players learn to play the ltA,Agt
Nash?
Altered
(2,2) (-a,0)
(0,-a) (1,1)
26Correlated Nash (Aumann 74)
- Suppose the players have access to randomized
devices that are privately viewed. - If a player chooses a strategy according to his
own randomized device, the result is a
probability distribution over strategy profiles,
denoted µ??(S). - Unlike a profile of mixed strategies which is by
definition uncorrelated, such a distribution may
be correlated.
27Correlated Nash Jordans matching pennies
- 3 players.
- Each chooses H or T
- Payoffs are 1 or -1 only
- 1 wins if he matches 2
- 2 wins if he matches 3
- 3 wins by not matching 1
- This game has a unique NE, each play (1/2,1/2)
- HoweverIt has many correlated NE.
Player 3 plays H
H T
H 1 ,1,-1 -1,-1,-1
T -1, 1, 1 1 ,-1, 1
Player 3 plays T
H T
H 1 , 1, -1 -1,-1,-1
T -1, 1, 1 1, -1, 1
28Correlated Nash Jordans matching pennies
Player 3 plays H
- C-NE unified distribution over these 6
profiles(H,H,H) (H,H,T) (H,T,T) (T,T,T)
(T,T,H) (T,H,H) - Each player has 50 to play H.
- No weight is placed on (H,T,H), so the play of
the players is not independent (it is correlated) - For Player 1 When he plays H he faces 1/3 chance
each of his opponents play (H,H), (H,T),(T,T).
Since his goal is to match 2, he wins 2/3 of the
times by playing H and only one third if he plays
Y. similarly if he plays T his opponents might
only play (T,T), (T,H), (H,H). Now tails win 2/3
of the times as against heads which wins only 1/3
of the time. So he is evened. He is at a Nash.
H T
H 1 ,1,-1 -1,-1,-1
T -1, 1, 1 1 ,-1, 1
Player 3 plays T
H T
H 1 , 1, -1 -1,-1,-1
T -1, 1, 1 1, -1, 1
29Why is Correlated Nash of Significance?
- Hint Cycles create correlation between profile
strategies. - Informally a cycle is a finite sequence of
profiles of length k such that s0sk. - Cournot play can exhibit cycles example
follows. - So cycles gt correlation gt correlated Nash
30Cournot Cycle - matching pennies.
- 3 player (head, tail)
- 1 wants to match 2.
- 2 wants to match 3.
- 3 wants to un-match 1.
- Cournot means each player assumes his
opponents play the same as in their last step
Current Profile(s1,s2,s3) Remarks
H,H,H P3 H-gtT (switch) P2 H-gtH (stay) P1H-gtH (stay)
H,H,T P2 H-gtT
H,T,T P1H-gtT
T,T,T P3 T-gtH
T,T,H P2 T-gtH
T,H,H P1T-gtH
H,H,H P3 H-gtT
31Roadmap
- Introduction to the common models of learning
- Cournot adjustment
- Fictitious play and Nash equilibriums
- Motivation
- Definitions
- Results
- Generalizations of fictitious play
32Fictitious play - Introduction
- Motivation
- Repeated game, stationary assumption.
- Each player forms a belief of his opponents
strategy by looking at what happened - Player plays Best Response according to his/her
belief
33Two-Player Fictitious Play - notations
- S1 and S2 are finite actions spaces for players
one and two respectively. - S1 ,?,?
- S2 ?,?,?
- u1, u2 player payoff functions
- u1(, ?)15
- for mixed strategy we take
- u1(lt½,½gt,lt¼, ¾gt ) u1(
- u1(,ltÂĽ, Âľgt )ÂĽ u1(, ?) Âľ u1(, ?)
- Player is pi, opponent is p-i i1,2
34Two-Player Fictitious Play
- Notion of belief
- A prediction of the opponent action distribution
the degree to which 1 believes 2 will play ? - Assume players choose their actions for each
period to maximize their expected payoff, with
respect to their belief for the current period.
35Two-Player Fictitious Play Forming Beliefs
- Player i starts with a weight function K0i
- K0i S-i ? ?
- For example
- K0i?,,? ? ?
- K0i()4
- As the game is iteratively repeated K is updated
36Two-Player Fictitious Play Belief update
- If some action say was played (by the
opponent!) the last time, we add 1 to its count,
generally - Kt (s-i) 1 if s-it-1 s-i
0 otherwise - Thats a complicated way of saying that K(s)
simply counts the number of times the opponent
played s.
37Two-Player Fictitious Play Using frequencies to
form beliefs
- Given K the frequency vector,
- Each player forms a probability vector ? over his
opponents actions - His belief can be said to be that the
- Pri plays Kt() / steps
- Simple normalization
Reads the belief player i holds at time t
regarding the probability of his opponent to
plays s-I in time t
38Two-Player Fictitious Play Using frequencies 2
My belief is that my opponent plays ? with
probability ½, ,? with prob ¼ and ¼ ?, looking at
my payoff table, by playing I can max the
utility
- We now have a belief of how the opponent plays.
- A FP is any rule ?it which assigns a Best
Response action to the belief ?it - Example
- ?1(lt½,¼,¼gt) (extend naturally to mixed)
- This implies that u1(, lt½,¼,¼gt) is better
for player 1 than any other action against
lt½,¼,¼gt
39Two-Player Fictitious Play remarks
- Many BR are possible for a given belief set
- An example of such rules ? may be
- Always prefer pure action over mixed action
- Pick the best response for which your action
index is least, (thats the limit of my
creativity) - (both of course must still be best responses)
40Two-Player Fictitious Play Interpretation (page
31-32)
- Bayesian inference
- Player i believes opponents play corresponds to
a sequence of i.i.d. multinomial random variable
with a fixed but unknown distribution. - Player is prior over that unknown distribution
takes the form of a Dirichlet distribution. - is prior and posterior belief corresponds to a
distribution over the set ?(S-i) of probability
distributions over S-i - The distribution over oppnents strategies ?I t
is the induced marginal distribution over pure
strategies. - If beliefs over ?(S-i) are denoted µi, then we
have
41Two-Player Fictitious Play Interpretation
- Denote the marginal empirical distribution as
- The assessment ? is not the same as d because of
the influence of is prior belief - This has the form of a fictitious sample
observed before the game started. - As observations are incorporated into ?, it will
converge to d (the empirical distribution)
42Two-Player Fictitious Play Interpretation
- Notes
- As long as the initial weights are positive it
will stay positive - The belief reflects the conviction that the
opponent strategy is constant and unknown. - It may be wrong If the process cycles.
- Any finite sequence of what looks like a cycle is
actually consistent with this assumption that the
world is constant and those observations are a
fluke - If cycles persist, we might expect i to notice it
but in any case, his beliefs will not be
falsified in the first few periods as they did in
the Cournot process.
43Asymptotic Behavior does play converges
- Sufficient conditions
- Proposition 2.1
- (1) if s is a strict Nash and is s is played at
time t in the process of FP then s is played at
all subsequent dates - (2) any pure-strategy steady state of FP must be
a Nash
44If s a strict Nash and played at time t s is
played at all subsequent dates
- Proof
- Suppose ?it (players beliefs) are such that the
actions are strict Nash s. - believe me that When profile s is played at
time t, each players belief at t1 are a convex
combination of ?it and a mass point on s-i ?it1
(1-at) ?it atd(s-i) - we get
45If s a strict Nash and played at time t s is
played at all subsequent dates
- We want to show that this payoff is still better
than any other payoff involving ?it1 - Now si was a strict BR for ?it
- Should be obvious for the first term (by
assumption that it is strict BR for ?it). - for the second term, note that for the point mass
it is obvious that si is better because it
implies that the profile lt si , s-i gt is a Nash
which was our assumption
46So, what is a point mass on s-i and why is ?it1
a convex combination of it and of ?it ?
- I need to show you that ?it1 (1-at) ?it
atd(s) - For clarity, lets say t10, there are 2 players,
- Lets say s( ½, ½ ), ( ¼ , ¾ )
- S-iS2C,D and look at ?1101(C)
- recall ?110 (C)K10 (C) /10 (ignore prior it
matters not) - Suppose that at time 10 player 2 actually played
C (he played a mixed which is interpreted that ÂĽ
of the times he would play C..) - ?111 (C) K10 (C) 1 / 11
472nd part Any pure-strategy steady state of FP
must be a Nash
- A steady state is a strategy profile that is
played in every step after perhaps a finite time
T. - Ideas?
- If play remains at a pure-strategy profile then
eventually the assessments will become
concentrated at this profile. - If it was not a Nash for one of the players, him
playing what he played would not be a BR, this is
a contradiction to how FP works, - Since all players always play BR according to
their belief. - Food for thought Why does it not work for
mixed-strategy profile?
48To Conclude this
- we wanted to show that if s is a strict Nash and
is s is played at time t in the process of FP
then s is played at all subsequent dates - We showed it by looking at what happens to
players belief and prove that the actions at
given the new belief are still strict BR. - This means the system is at a steady state.
- We also showed that if it is a pure-strategy
steady state it is a Nash.
49No Pure Nash gt FP cant converge to a pure
profile
- Matching pennies
- For example
- At time3 player I believes that II prefers
Tails, so he plays Tails to match - But II plays Heads so I adds one to Heads
- Now HeadsgtTails and I convinced himself II will
play Heads so he switches to H - The game cycles and never converges to the Nash
profile.
H T
H 1,-1 -1,1
T -1,1 1,-1
T Profile I II
1 Initial (1.5,2) (2,1.5)
2 (T,T) (1.5,3) (2,2.5)
3 (T,H) (2.5,3) (2,3.5)
4 (T,H) (3.5,3) (3,3.5)
5 (H,H) (4.5,3) (4,3.5)
50No Pure Nash gt FP cant converge to a pure
profile
- If the game did converge it would be in a
steady state that is pure and not a Nash (since
matching pennies has no pure Nash) but we showed
that any pure-steady state must be a Nash. - Its ok then that the game does not converge.
- Interestingly, the empirical distributions over
player is strategies are converging to ( ½ , ½ )
their product ( ½ , ½ ), ( ½ , ½ ) is a Nash.
51Asymptotic Behavior
- Proposition 2.2
- If the empirical d over each players choices
converges, the strategy profile corresponding to
the product of these distributions is a Nash - Proof
- intuitively, if the empirical does converge, then
the belief converges to the same thing, hence, if
it was not a Nash players would move from there. - Generally, for this it is enough that the beliefs
are asymptotically empirical, need not be FP
52Asymptotic Behavior
- More results (proof omitted)
- The empirical converges if
- (1) generic payoff and 2x2 game
- (2) zero sum
- (3) solvable by iterated strict dominance
- The empirical distribution however need not
converge! 2 examples.
53Example Shapley (1964)
L M R
T (0,0) (1,0) (0,1)
M (0,1) (0,0) (1,0)
D (1,0) (0,1) (0,0)
Nash is at ( 1/3, 1/3,1/3) for both. if initial
weights lead to ltT,Mgt we cycle. Diagonals are
never played. (ltT,Lgt, ltD,Mgt,ltT,Mgt the number of
consecutive periods each profile is played
increases sufficiently fast so the empirical
distributions never converges.
54Example due to Jordan
- Increase to (1/2,1/2) the diagonal.
- We get Even-Nyar Umisprayim a zero sum game
with the same Nash - Here, since the empirical do converge, there are
still cycles, but the rate of repeated profiles
within a sequence grows slowly. - Note that this does not conflict with our
previous statements, namely - Heres a zsg whos empirical converge but its
not steady (cycles) - Had the empirical converged to a pure-strategy
Nash the fact that it cycled would have been a
conflict to ltLINKgt
55Payoffs in Fictitious Play
- The question we deal with here is if FP learns
the distributions then it should, asymptotically
yield the same utility that would be achieved
when the frequency distribution is known in
advance. - Here we will suppose more than 2 players, their
assessment track the joint distribution of
opponent strategies.
56Payoffs Fictitious Play - Notations
- Empirical joint distribution
- Best payoff against the empirical
- Time avg of realized payoffs
- Definition
- Fictitious play is e-consistent along a history
if there is T such that for any T
57Payoffs Fictitious Play - Notations
- Empirical joint distribution
- Best payoff against the empirical
- Time avg of realized payoffs
- Definition
- Fictitious play is e-consistent along a history
if there is T such that for any T
58Consistency
- We dont look at how good the player does
globally but how good he does with comparison to
his expectations which are built upon his
beliefs. - If the FP is not consistent it would be much less
interesting model, it would be as if someone
simply plays a different game. So, we want
consistency.
59Consistency
- If A game is consistent it can be useful, if
after some period of time, a player sees that his
expectations are not fulfilled (by comparing the
expected payoff with the actual payoff), he can
deduce that something is wrong in his model of
the world.
60Consistency main result
- Relate player frequency of switching strategies
of a player to the consistency of his play. - We define the frequency of switches ?it to be the
fraction of periods for which
61Fictitious Play and BR dynamics
- Definition
- Fictitious play exhibits infrequent switches
along a history if for every e gt 0 there is a T
and for any t ? T, ?it ? e for all i. - Proposition (Fudenberg and Levine 94)
- If FP exhibits infrequent switches along a
history, then it is e-consistent along that
history for every e gt 0
62infrequent switches ? e-consistency
- Intuition
- Once prior looses its influence, at each date
player i plays BR to the empirical (observations)
through date t-1. - On the other hand, if i is not doing on average
as well as best response to the empirical, there
must be a nonmalleable fraction of dates t for
which i is not playing BR, but in this case,
player i must switch in date t1. conversely,
infrequent switches imply that most of the time,
is date-t action is a best response to the
empirical distribution at the end of date t.
63Proof - notations
- k length of initial history (kept as prior
belief) - initial belief
- best response strategy of i to
- player i expected date-t payoff is
64Proof - summary
- We showed that if there are not many switches
along the players history, his play is
consistent. - It means that the actual payoffs do not flounder
below the expectation! - Again, one can use this to see if something is
wrong with his model of the env.
65Roadmap
- Crash introduction to the common models of
learning - Cournot adjustment
- Fictitious play and Nash equilibriums
- Motivation
- Definitions
- Results
- Generalizations of fictitious play
66Generalizations of Fictitious Play
- Mainly generalize the update rule for the belief
- Example exponential weighting
- As long as beliefs are asymptotically empirical
what we showed for FP holds.
67Summary
- Dynamics of Games and a flavor of analysis
- Although different from Nash analysis, Nash is
still an important point, we showed that it is
still a point FP can converge to - FP is a consistent play
- Have a nice summer vacation thanx for listening.