Fictitious Play The Theory of Learning in Games D. Fudenberg and D. Levine

About This Presentation

Title:

Fictitious Play The Theory of Learning in Games D. Fudenberg and D. Levine

Description:

Title: PowerPoint Presentation Last modified by: tzurs Created Date: 1/1/1601 12:00:00 AM Document presentation format: On-screen Show Other titles – PowerPoint PPT presentation

Number of Views:146

Avg rating:3.0/5.0

Slides: 68

Provided by: tauAcIlm2

Category:

more less

Transcript and Presenter's Notes

Title: Fictitious Play The Theory of Learning in Games D. Fudenberg and D. Levine

1
Fictitious PlayThe Theory of Learning in
GamesD. Fudenberg and D. Levine

Speaker Tzur Sayag
03/06/2003

2
Do you believe that PM Sharon is serious about
the peace process?

A voter has to decide if he should support PM
Sharon
Belief Sharon will never evacuate settlements
Action Vote against the new economics
revolution.
May 24 Sharon announces occupation is no-good
Belief Sharon will probably never evacuate
settlements
Action Vote against the new economics revolution
Jun 5 Sharon meets Abu-Mazen and declares
support for a Palestinian state.
Belief Seems like Sharon might evacuate the
settlements after all
Action Vote for the new economics revolution.

3
Roadmap

Introduction to the common models of learning in
games
Cournot adjustment
Fictitious play and Nash equilibriums
Motivation
Definitions
Results
Generalizations of fictitious play if we have
time

4
Notations
P1 gets a1 and p2 gets b1 if they play
Action1,Action1 respectively
Player 2
Action1 Action2
Action1 (a1,b1) (a2,b2)
Action2 (a3,b3) (a4,b4)
Player 1
5
Learning in Games - 1

Repeated games same or related
fixed-player model
Teach the opponent to play a best response to a
particular action by repeating it over and over.

6
Being Sophisticated Example

D is dominant for Bob.
If Alice learns Bob only plays D, game converges
to ltD,Lgt
Bobs payoff for ltD,Lgt is 2.
If Bob is patient, he can play U always and just
wait for a while
If Bob always plays U,
Alice who thought Bobs gonna play D should shift
its play from L to R (since R was only good when
Bob actually played D)
So Bob plays constant U which leads Alice to play
constant R with payoff 2 gt 1.
in this case Bob gets 3 which is better.
Bingo!

Alice
L R
U 1,0 3,2
D 2,1 4,0
Bob
7
Being Sophisticated Abstracting

Most learning theory rely on models in which the
incentive is small to alter the future play of
the opponent.
Locked in for 2 periods
Large anonymous population
Embed a two player game by pairing players
randomly from a large population.

8
Models of Embedding

Single-pair model
random single pair, actions revealed to everyone
Aggregate static model
all players randomly matched, aggregates outcomes
revealed to everyone
Random-matching model
all players randomly matched, each player sees
his game outcome only

9
Three common models of Learning

Fictitious play
Players observe only their own matches and play a
best response to the frequencies.
Partial best-response
A fixed portion switches each period from its
current action to a BR to the aggregate stats
from the previous period.
Replicator Dynamics
The share of the population using each strategy
grows proportionally to that strategys current
payoff.

10
Cournot Adjustment a flavor of analysis

Two firms 1 and 2.
Strategy choose a quantity si?0,8)
Strategy profile is ltsi, s-igt?S
Utility for i is ui(ltsi, s-igt)
Assume ui(lt., s-igt) is strictly convex
BR(s-i) argmax ( ui(ltx, s-igt) ) x?S

BR is unique since u is concave so the relevant
u is positive, this means that u is a monotone
increasing function which means it has at most
one zero which means, yes, you guessed it right U
only has one extreme point and the max is
therefore unique. u cant be fixed since it is
STRICTLY concave by assumption
11
Cournot Adjustment Model

time periods t 0,1,2,, discrete
State profile ?0 ?S
in each period the player chooses a pure strategy
that is BR to the previous period
Formally i chooses stBR(s-it-1)

12
Cournot DynamicsReaction Curve
BR1 For every ?2 the line states the BR of player
1 against it. The value for player 1 is the
height at point ?2
?t (?t1 , ?t2)
?2
Can you convince yourself this point is a Nash?
?t1
?t2
BR2
?1
?t1
New BR if 2 plays ?t2
13
Cournot Dynamics

A movement between profiles such that
?t1 f(?t) , fi(?t) BRi(?t-i)
A steady state is ?s s.t. ?s f(?s)
Once ?t ?s the system remains there
Claim (simple) ?s is a NASH
Proof by definition for every player
?sBRi(?-i), so players dont want to move.
SO EVERY STEADY STATE IS A NASH EQUILIBRIUM

14
Cournot Dynamics oblivions to linear
transformation

Proposition 1.1 Suppose ui(s)aui(s) vi(s-i)
for all players I, Then u and u are
best-response equivalent
Proof
vi(s-i) is dependent on the opponents play so it
does not change the magnitude order (seder)
of my actions
Multiplying all payoffs by the same constant a
has no effect on the order
So, a transformation that leaves preferences, and
consequently best responses, will give rise to
the same dynamic learning process.

15
Cournot Dynamics and Zero sum Games

Recall payoffs in ZSG add to zero.
Proposition 1.2 every 2 x 2 game for which the
best response correspondences have a unique
intersection that lies in the interior of the
strategy space is best-response equivalent to a
zero-sum game.
Proof given G, a 2x2 game, with unique
intersection,
w.l.o.g. assume 1) A is BR for player 1 against
A 2) B is BR for player 2 against A
If A was also a BR for player 2 then ltA,Agt is a
BR correspondence at a pure profile which
contradicts our assumption.

16
Cournot Dynamics and Zero Sum Games 2

Proof outline Given G, the 2x2 game with unique
intersection, we build a zero sum game that has
the same Best Responses. Observe the following
zsg.
If alt1 then BR1(A)A since u1(A,A) 1 but
u1(A,B) is only a
If alt1 then BR2(A)B since u2(A,B) 0 but
u1(A,A) is only 0
Denote si player is probability to play A
Claim 1 player 1 is indifferent between A and B
if, s2 a s2 b (1- s2)
Claim 2 player 2 is indifferent between A and B
if, s1 a (1-s1) b (1- s1)

A B
A (1,-1) (0,0)
B (a,-a) (b,-b)
17
Proof ofplayer 1 is indifferent between A and B
if, s2 a s2 b (1- s2)

Assume s2 a s2 b (1- s2) () (s2 is the
prob. 2
(1) If player 1 plays A he (1) gets plays A)
u1(A,?) s2 (u1(A,A)) (1 - s2)
(u1(A,B))? s2 (1) (1- s2) (0) s2 (by
the game table)
(2) If player 1 plays B he gets
u1(A,?) s2 u1(B,A) (1 - s2)
u1(B,B)? s2 (a) (1- s2) b (by the game
table)
So if (1) (2) he does not care which to choose,
(1) (2)? s2 s2 (a) (1- s2) b as
required.
Proof of claim 2 regarding 2s indifference
follows the same path.

18
Proof contBuilding the ZSG Game
Mental note si Prplayer i playing A

1 is indifferent between A and B if s2 a s2
b (1- s2)
2 is indifferent between A and B if s1 a
(1-s1) b (1- s1)
Fixing an intersection point s1, s2 We can solve
for the unknown payoffs a,b a (s2 s1) / (1
s2 s1) Notice that (s2 s1) lt 1 (si gt 0
otherwise i never plays A)
(s2 s1) lt 1 implies alt1 (since (1 s2 s1)gt1)
Q.E.D.We already showed that when alt 1 it means
that we get the same best responses we had in the
original game G A for player 1 against A, B
for player 2 against A
To sum up it should have been obvious that (s1,
s2 ) is a Nash, the point was to find a 2x2 ZSG
which has the same best responses as the original
game

19
Strategic-Form Games

Finite actions
One shot simultaneous-move games
Players, strategy space, payoff functions is
the strategic form of a game

20
Nash and Correlated Nash

A game can have several NashsltA,Agt,
ltB,Bgt,lt(1/2,1/2), (1/2,1/2)gtbut the payoffs may
be different.ltA,Agt gets 2 for eachlt(1/2,1/2),
(1/2,1/2)gt gets 1 for each.
Lets question the robustness of the mixed
strategy Nash point.
Intuitively, at the mixed, players are
indifferent (in real life) play A,B whateverso
one may believe that the other one plays A with
slightly more probability. He then wants to
switch to pure A so the robustness of Nash seems
questionable..

A B
A (2,2) (0,0)
B (0,0) (2,2)
21
Nash and Correlated Nash

A Nash is strict if for each player i, si is the
unique best response to s-i
Only pure strategies can be strict since if a
mixed is BR than so is every pure strategy in the
mixed strategys support otherwise there is no
point of including it.
Recall Support for a mixed strategy are the pure
strategies that participate with positive
probability.

22
Some Questions in Theory of Games

When and why should we expect play to correspond
to a Nash equilibrium
If there are several Nash equilibria, when one
should we expect to occur?
In the previous example, in the absence of
coordination, we are faced with the possibility
that player 1 expects NE1ltA,Agt so he plays A,
the opponent might expect NE2ltB,Bgt and he plays
B, with the results of the non-equilibrium
outcome profile ltA,Bgt

23
The Idea of Learning based explanation of
equilibrium

Intuitively, the history of observations can
provide a way for the players to coordinate their
expectations on one of the two pure-strategy
equilibrium.
Typically, Learning models predict that this
coordination will eventually occur, with the
determination of which of the two eq. arise left
to initial conditions or to random chance.

24
The Idea of Learning based explanation of
equilibrium

For the history to serve this coordination role,
the sequence of actions played must eventually
become constant or at least readily predictable
by the players, of course, there is no
presumption that this is always the case.
Perhaps, rather than going to a Nash, players
wander around the space aimlessly, or perhaps
play lies in some set of alternatives larger than
the set of Nashs?

25
The Idea of Learning based explanation of
equilibrium

For the simple coordination game (symmetric
lt2,2gt, lt0,0gt) there is no reason to think that
any learning process will prefer one Nash over
the other.
What if we alter it such that there is a better
Nash. Will the players learn to play the ltA,Agt
Nash?

Altered
(2,2) (-a,0)
(0,-a) (1,1)
26
Correlated Nash (Aumann 74)

Suppose the players have access to randomized
devices that are privately viewed.
If a player chooses a strategy according to his
own randomized device, the result is a
probability distribution over strategy profiles,
denoted µ??(S).
Unlike a profile of mixed strategies which is by
definition uncorrelated, such a distribution may
be correlated.

27
Correlated Nash Jordans matching pennies

3 players.
Each chooses H or T
Payoffs are 1 or -1 only
1 wins if he matches 2
2 wins if he matches 3
3 wins by not matching 1
This game has a unique NE, each play (1/2,1/2)
HoweverIt has many correlated NE.

Player 3 plays H
H T
H 1 ,1,-1 -1,-1,-1
T -1, 1, 1 1 ,-1, 1
Player 3 plays T
H T
H 1 , 1, -1 -1,-1,-1
T -1, 1, 1 1, -1, 1
28
Correlated Nash Jordans matching pennies
Player 3 plays H

C-NE unified distribution over these 6
profiles(H,H,H) (H,H,T) (H,T,T) (T,T,T)
(T,T,H) (T,H,H)
Each player has 50 to play H.
No weight is placed on (H,T,H), so the play of
the players is not independent (it is correlated)
For Player 1 When he plays H he faces 1/3 chance
each of his opponents play (H,H), (H,T),(T,T).
Since his goal is to match 2, he wins 2/3 of the
times by playing H and only one third if he plays
Y. similarly if he plays T his opponents might
only play (T,T), (T,H), (H,H). Now tails win 2/3
of the times as against heads which wins only 1/3
of the time. So he is evened. He is at a Nash.

H T
H 1 ,1,-1 -1,-1,-1
T -1, 1, 1 1 ,-1, 1
Player 3 plays T
H T
H 1 , 1, -1 -1,-1,-1
T -1, 1, 1 1, -1, 1
29
Why is Correlated Nash of Significance?

Hint Cycles create correlation between profile
strategies.
Informally a cycle is a finite sequence of
profiles of length k such that s0sk.
Cournot play can exhibit cycles example
follows.
So cycles gt correlation gt correlated Nash

30
Cournot Cycle - matching pennies.

3 player (head, tail)
1 wants to match 2.
2 wants to match 3.
3 wants to un-match 1.
Cournot means each player assumes his
opponents play the same as in their last step

Current Profile(s1,s2,s3) Remarks
H,H,H P3 H-gtT (switch) P2 H-gtH (stay) P1H-gtH (stay)
H,H,T P2 H-gtT
H,T,T P1H-gtT
T,T,T P3 T-gtH
T,T,H P2 T-gtH
T,H,H P1T-gtH
H,H,H P3 H-gtT
31
Roadmap

Introduction to the common models of learning
Cournot adjustment
Fictitious play and Nash equilibriums
Motivation
Definitions
Results
Generalizations of fictitious play

32
Fictitious play - Introduction

Motivation
Repeated game, stationary assumption.
Each player forms a belief of his opponents
strategy by looking at what happened
Player plays Best Response according to his/her
belief

33
Two-Player Fictitious Play - notations

S1 and S2 are finite actions spaces for players
one and two respectively.
S1 ,?,?
S2 ?,?,?
u1, u2 player payoff functions
u1(, ?)15
for mixed strategy we take
u1(lt½,½gt,lt¼, ¾gt ) u1(
u1(,lt¼, ¾gt )¼ u1(, ?) ¾ u1(, ?)
Player is pi, opponent is p-i i1,2

34
Two-Player Fictitious Play

Notion of belief
A prediction of the opponent action distribution
the degree to which 1 believes 2 will play ?
Assume players choose their actions for each
period to maximize their expected payoff, with
respect to their belief for the current period.

35
Two-Player Fictitious Play Forming Beliefs

Player i starts with a weight function K0i
K0i S-i ? ?
For example
K0i?,,? ? ?
K0i()4
As the game is iteratively repeated K is updated

36
Two-Player Fictitious Play Belief update

If some action say was played (by the
opponent!) the last time, we add 1 to its count,
generally
Kt (s-i) 1 if s-it-1 s-i
0 otherwise
Thats a complicated way of saying that K(s)
simply counts the number of times the opponent
played s.

37
Two-Player Fictitious Play Using frequencies to
form beliefs

Given K the frequency vector,
Each player forms a probability vector ? over his
opponents actions
His belief can be said to be that the
Pri plays Kt() / steps
Simple normalization

Reads the belief player i holds at time t
regarding the probability of his opponent to
plays s-I in time t
38
Two-Player Fictitious Play Using frequencies 2
My belief is that my opponent plays ? with
probability ½, ,? with prob ¼ and ¼ ?, looking at
my payoff table, by playing I can max the
utility

We now have a belief of how the opponent plays.
A FP is any rule ?it which assigns a Best
Response action to the belief ?it
Example
?1(lt½,¼,¼gt) (extend naturally to mixed)
This implies that u1(, lt½,¼,¼gt) is better
for player 1 than any other action against
lt½,¼,¼gt

39
Two-Player Fictitious Play remarks

Many BR are possible for a given belief set
An example of such rules ? may be
Always prefer pure action over mixed action
Pick the best response for which your action
index is least, (thats the limit of my
creativity)
(both of course must still be best responses)

40
Two-Player Fictitious Play Interpretation (page
31-32)

Bayesian inference
Player i believes opponents play corresponds to
a sequence of i.i.d. multinomial random variable
with a fixed but unknown distribution.
Player is prior over that unknown distribution
takes the form of a Dirichlet distribution.
is prior and posterior belief corresponds to a
distribution over the set ?(S-i) of probability
distributions over S-i
The distribution over oppnents strategies ?I t
is the induced marginal distribution over pure
strategies.
If beliefs over ?(S-i) are denoted µi, then we
have

41
Two-Player Fictitious Play Interpretation

Denote the marginal empirical distribution as
The assessment ? is not the same as d because of
the influence of is prior belief
This has the form of a fictitious sample
observed before the game started.
As observations are incorporated into ?, it will
converge to d (the empirical distribution)

42
Two-Player Fictitious Play Interpretation

Notes
As long as the initial weights are positive it
will stay positive
The belief reflects the conviction that the
opponent strategy is constant and unknown.
It may be wrong If the process cycles.
Any finite sequence of what looks like a cycle is
actually consistent with this assumption that the
world is constant and those observations are a
fluke
If cycles persist, we might expect i to notice it
but in any case, his beliefs will not be
falsified in the first few periods as they did in
the Cournot process.

43
Asymptotic Behavior does play converges

Sufficient conditions
Proposition 2.1
(1) if s is a strict Nash and is s is played at
time t in the process of FP then s is played at
all subsequent dates
(2) any pure-strategy steady state of FP must be
a Nash

44
If s a strict Nash and played at time t s is
played at all subsequent dates

Proof
Suppose ?it (players beliefs) are such that the
actions are strict Nash s.
believe me that When profile s is played at
time t, each players belief at t1 are a convex
combination of ?it and a mass point on s-i ?it1
(1-at) ?it atd(s-i)
we get

45
If s a strict Nash and played at time t s is
played at all subsequent dates

We want to show that this payoff is still better
than any other payoff involving ?it1
Now si was a strict BR for ?it
Should be obvious for the first term (by
assumption that it is strict BR for ?it).
for the second term, note that for the point mass
it is obvious that si is better because it
implies that the profile lt si , s-i gt is a Nash
which was our assumption

46
So, what is a point mass on s-i and why is ?it1
a convex combination of it and of ?it ?

I need to show you that ?it1 (1-at) ?it
atd(s)
For clarity, lets say t10, there are 2 players,
Lets say s( ½, ½ ), ( ¼ , ¾ )
S-iS2C,D and look at ?1101(C)
recall ?110 (C)K10 (C) /10 (ignore prior it
matters not)
Suppose that at time 10 player 2 actually played
C (he played a mixed which is interpreted that ¼
of the times he would play C..)
?111 (C) K10 (C) 1 / 11

47
2nd part Any pure-strategy steady state of FP
must be a Nash

A steady state is a strategy profile that is
played in every step after perhaps a finite time
T.
Ideas?
If play remains at a pure-strategy profile then
eventually the assessments will become
concentrated at this profile.
If it was not a Nash for one of the players, him
playing what he played would not be a BR, this is
a contradiction to how FP works,
Since all players always play BR according to
their belief.
Food for thought Why does it not work for
mixed-strategy profile?

48
To Conclude this

we wanted to show that if s is a strict Nash and
is s is played at time t in the process of FP
then s is played at all subsequent dates
We showed it by looking at what happens to
players belief and prove that the actions at
given the new belief are still strict BR.
This means the system is at a steady state.
We also showed that if it is a pure-strategy
steady state it is a Nash.

49
No Pure Nash gt FP cant converge to a pure
profile

Matching pennies
For example
At time3 player I believes that II prefers
Tails, so he plays Tails to match
But II plays Heads so I adds one to Heads
Now HeadsgtTails and I convinced himself II will
play Heads so he switches to H
The game cycles and never converges to the Nash
profile.

H T
H 1,-1 -1,1
T -1,1 1,-1
T Profile I II
1 Initial (1.5,2) (2,1.5)
2 (T,T) (1.5,3) (2,2.5)
3 (T,H) (2.5,3) (2,3.5)
4 (T,H) (3.5,3) (3,3.5)
5 (H,H) (4.5,3) (4,3.5)
50
No Pure Nash gt FP cant converge to a pure
profile

If the game did converge it would be in a
steady state that is pure and not a Nash (since
matching pennies has no pure Nash) but we showed
that any pure-steady state must be a Nash.
Its ok then that the game does not converge.
Interestingly, the empirical distributions over
player is strategies are converging to ( ½ , ½ )
their product ( ½ , ½ ), ( ½ , ½ ) is a Nash.

51
Asymptotic Behavior

Proposition 2.2
If the empirical d over each players choices
converges, the strategy profile corresponding to
the product of these distributions is a Nash
Proof
intuitively, if the empirical does converge, then
the belief converges to the same thing, hence, if
it was not a Nash players would move from there.
Generally, for this it is enough that the beliefs
are asymptotically empirical, need not be FP

52
Asymptotic Behavior

More results (proof omitted)
The empirical converges if
(1) generic payoff and 2x2 game
(2) zero sum
(3) solvable by iterated strict dominance
The empirical distribution however need not
converge! 2 examples.

53
Example Shapley (1964)
L M R
T (0,0) (1,0) (0,1)
M (0,1) (0,0) (1,0)
D (1,0) (0,1) (0,0)
Nash is at ( 1/3, 1/3,1/3) for both. if initial
weights lead to ltT,Mgt we cycle. Diagonals are
never played. (ltT,Lgt, ltD,Mgt,ltT,Mgt the number of
consecutive periods each profile is played
increases sufficiently fast so the empirical
distributions never converges.
54
Example due to Jordan

Increase to (1/2,1/2) the diagonal.
We get Even-Nyar Umisprayim a zero sum game
with the same Nash
Here, since the empirical do converge, there are
still cycles, but the rate of repeated profiles
within a sequence grows slowly.
Note that this does not conflict with our
previous statements, namely
Heres a zsg whos empirical converge but its
not steady (cycles)
Had the empirical converged to a pure-strategy
Nash the fact that it cycled would have been a
conflict to ltLINKgt

55
Payoffs in Fictitious Play

The question we deal with here is if FP learns
the distributions then it should, asymptotically
yield the same utility that would be achieved
when the frequency distribution is known in
advance.
Here we will suppose more than 2 players, their
assessment track the joint distribution of
opponent strategies.

56
Payoffs Fictitious Play - Notations

Empirical joint distribution
Best payoff against the empirical
Time avg of realized payoffs
Definition
Fictitious play is e-consistent along a history
if there is T such that for any T

57
Payoffs Fictitious Play - Notations

Empirical joint distribution
Best payoff against the empirical
Time avg of realized payoffs
Definition
Fictitious play is e-consistent along a history
if there is T such that for any T

58
Consistency

We dont look at how good the player does
globally but how good he does with comparison to
his expectations which are built upon his
beliefs.
If the FP is not consistent it would be much less
interesting model, it would be as if someone
simply plays a different game. So, we want
consistency.

59
Consistency

If A game is consistent it can be useful, if
after some period of time, a player sees that his
expectations are not fulfilled (by comparing the
expected payoff with the actual payoff), he can
deduce that something is wrong in his model of
the world.

60
Consistency main result

Relate player frequency of switching strategies
of a player to the consistency of his play.
We define the frequency of switches ?it to be the
fraction of periods for which

61
Fictitious Play and BR dynamics

Definition
Fictitious play exhibits infrequent switches
along a history if for every e gt 0 there is a T
and for any t ? T, ?it ? e for all i.
Proposition (Fudenberg and Levine 94)
If FP exhibits infrequent switches along a
history, then it is e-consistent along that
history for every e gt 0

62
infrequent switches ? e-consistency

Intuition
Once prior looses its influence, at each date
player i plays BR to the empirical (observations)
through date t-1.
On the other hand, if i is not doing on average
as well as best response to the empirical, there
must be a nonmalleable fraction of dates t for
which i is not playing BR, but in this case,
player i must switch in date t1. conversely,
infrequent switches imply that most of the time,
is date-t action is a best response to the
empirical distribution at the end of date t.

63
Proof - notations

k length of initial history (kept as prior
belief)
initial belief
best response strategy of i to
player i expected date-t payoff is

64
Proof - summary

We showed that if there are not many switches
along the players history, his play is
consistent.
It means that the actual payoffs do not flounder
below the expectation!
Again, one can use this to see if something is
wrong with his model of the env.

65
Roadmap

Crash introduction to the common models of
learning
Cournot adjustment
Fictitious play and Nash equilibriums
Motivation
Definitions
Results
Generalizations of fictitious play

66
Generalizations of Fictitious Play

Mainly generalize the update rule for the belief
Example exponential weighting
As long as beliefs are asymptotically empirical
what we showed for FP holds.

67
Summary

Dynamics of Games and a flavor of analysis
Although different from Nash analysis, Nash is
still an important point, we showed that it is
still a point FP can converge to
FP is a consistent play
Have a nice summer vacation thanx for listening.

Write a Comment

User Comments (0)