Loading...

PPT – Fictitious Play The Theory of Learning in Games D. Fudenberg and D. Levine PowerPoint presentation | free to download - id: 30bce-MThhY

The Adobe Flash plugin is needed to view this content

Fictitious PlayThe Theory of Learning in

GamesD. Fudenberg and D. Levine

- Speaker Tzur Sayag
- 03/06/2003

Do you believe that PM Sharon is serious about

the peace process?

- A voter has to decide if he should support PM

Sharon - Belief Sharon will never evacuate settlements
- Action Vote against the new economics

revolution. - May 24 Sharon announces occupation is no-good
- Belief Sharon will probably never evacuate

settlements - Action Vote against the new economics revolution
- Jun 5 Sharon meets Abu-Mazen and declares

support for a Palestinian state. - Belief Seems like Sharon might evacuate the

settlements after all - Action Vote for the new economics revolution.

Roadmap

- Introduction to the common models of learning in

games - Cournot adjustment
- Fictitious play and Nash equilibriums
- Motivation
- Definitions
- Results
- Generalizations of fictitious play if we have

time

Notations

P1 gets a1 and p2 gets b1 if they play

Action1,Action1 respectively

Player 2

Player 1

Learning in Games - 1

- Repeated games same or related
- fixed-player model
- Teach the opponent to play a best response to a

particular action by repeating it over and over.

Being Sophisticated Example

- D is dominant for Bob.
- If Alice learns Bob only plays D, game converges

to ltD,Lgt - Bobs payoff for ltD,Lgt is 2.
- If Bob is patient, he can play U always and just

wait for a while - If Bob always plays U,
- Alice who thought Bobs gonna play D should shift

its play from L to R (since R was only good when

Bob actually played D) - So Bob plays constant U which leads Alice to play

constant R with payoff 2 gt 1. - in this case Bob gets 3 which is better.
- Bingo!

Alice

Bob

Being Sophisticated Abstracting

- Most learning theory rely on models in which the

incentive is small to alter the future play of

the opponent. - Locked in for 2 periods
- Large anonymous population
- Embed a two player game by pairing players

randomly from a large population.

Models of Embedding

- Single-pair model
- random single pair, actions revealed to everyone
- Aggregate static model
- all players randomly matched, aggregates outcomes

revealed to everyone - Random-matching model
- all players randomly matched, each player sees

his game outcome only

Three common models of Learning

- Fictitious play
- Players observe only their own matches and play a

best response to the frequencies. - Partial best-response
- A fixed portion switches each period from its

current action to a BR to the aggregate stats

from the previous period. - Replicator Dynamics
- The share of the population using each strategy

grows proportionally to that strategys current

payoff.

Cournot Adjustment a flavor of analysis

- Two firms 1 and 2.
- Strategy choose a quantity si?0,8)
- Strategy profile is ltsi, s-igt?S
- Utility for i is ui(ltsi, s-igt)
- Assume ui(lt., s-igt) is strictly convex
- BR(s-i) argmax ( ui(ltx, s-igt) ) x?S

BR is unique since u is concave so the relevant

u is positive, this means that u is a monotone

increasing function which means it has at most

one zero which means, yes, you guessed it right U

only has one extreme point and the max is

therefore unique. u cant be fixed since it is

STRICTLY concave by assumption

Cournot Adjustment Model

- time periods t 0,1,2,, discrete
- State profile ?0 ?S
- in each period the player chooses a pure strategy

that is BR to the previous period - Formally i chooses stBR(s-it-1)

Cournot DynamicsReaction Curve

BR1 For every ?2 the line states the BR of player

1 against it. The value for player 1 is the

height at point ?2

?t (?t1 , ?t2)

?2

Can you convince yourself this point is a Nash?

?t1

?t2

BR2

?1

?t1

New BR if 2 plays ?t2

Cournot Dynamics

- A movement between profiles such that
- ?t1 f(?t) , fi(?t) BRi(?t-i)
- A steady state is ?s s.t. ?s f(?s)
- Once ?t ?s the system remains there
- Claim (simple) ?s is a NASH
- Proof by definition for every player

?sBRi(?-i), so players dont want to move. - SO EVERY STEADY STATE IS A NASH EQUILIBRIUM

Cournot Dynamics oblivions to linear

transformation

- Proposition 1.1 Suppose ui(s)aui(s) vi(s-i)

for all players I, Then u and u are

best-response equivalent - Proof
- vi(s-i) is dependent on the opponents play so it

does not change the magnitude order (seder)

of my actions - Multiplying all payoffs by the same constant a

has no effect on the order - So, a transformation that leaves preferences, and

consequently best responses, will give rise to

the same dynamic learning process.

Cournot Dynamics and Zero sum Games

- Recall payoffs in ZSG add to zero.
- Proposition 1.2 every 2 x 2 game for which the

best response correspondences have a unique

intersection that lies in the interior of the

strategy space is best-response equivalent to a

zero-sum game. - Proof given G, a 2x2 game, with unique

intersection, - w.l.o.g. assume 1) A is BR for player 1 against

A 2) B is BR for player 2 against A - If A was also a BR for player 2 then ltA,Agt is a

BR correspondence at a pure profile which

contradicts our assumption.

Cournot Dynamics and Zero Sum Games 2

- Proof outline Given G, the 2x2 game with unique

intersection, we build a zero sum game that has

the same Best Responses. Observe the following

zsg. - If alt1 then BR1(A)A since u1(A,A) 1 but

u1(A,B) is only a - If alt1 then BR2(A)B since u2(A,B) 0 but

u1(A,A) is only 0 - Denote si player is probability to play A
- Claim 1 player 1 is indifferent between A and B

if, s2 a s2 b (1- s2) - Claim 2 player 2 is indifferent between A and B

if, s1 a (1-s1) b (1- s1)

Proof ofplayer 1 is indifferent between A and B

if, s2 a s2 b (1- s2)

- Assume s2 a s2 b (1- s2) () (s2 is the

prob. 2 - (1) If player 1 plays A he (1) gets plays A)
- u1(A,?) s2 (u1(A,A)) (1 - s2)

(u1(A,B))? s2 (1) (1- s2) (0) s2 (by

the game table) - (2) If player 1 plays B he gets
- u1(A,?) s2 u1(B,A) (1 - s2)

u1(B,B)? s2 (a) (1- s2) b (by the game

table) - So if (1) (2) he does not care which to choose,

(1) (2)? s2 s2 (a) (1- s2) b as

required. - Proof of claim 2 regarding 2s indifference

follows the same path.

Proof contBuilding the ZSG Game

Mental note si Prplayer i playing A

- 1 is indifferent between A and B if s2 a s2

b (1- s2) - 2 is indifferent between A and B if s1 a

(1-s1) b (1- s1) - Fixing an intersection point s1, s2 We can solve

for the unknown payoffs a,b a (s2 s1) / (1

s2 s1) Notice that (s2 s1) lt 1 (si gt 0

otherwise i never plays A) - (s2 s1) lt 1 implies alt1 (since (1 s2 s1)gt1)

Q.E.D.We already showed that when alt 1 it means

that we get the same best responses we had in the

original game G A for player 1 against A, B

for player 2 against A - To sum up it should have been obvious that (s1,

s2 ) is a Nash, the point was to find a 2x2 ZSG

which has the same best responses as the original

game

Strategic-Form Games

- Finite actions
- One shot simultaneous-move games
- Players, strategy space, payoff functions is

the strategic form of a game

Nash and Correlated Nash

- A game can have several NashsltA,Agt,

ltB,Bgt,lt(1/2,1/2), (1/2,1/2)gtbut the payoffs may

be different.ltA,Agt gets 2 for eachlt(1/2,1/2),

(1/2,1/2)gt gets 1 for each. - Lets question the robustness of the mixed

strategy Nash point. - Intuitively, at the mixed, players are

indifferent (in real life) play A,B whateverso

one may believe that the other one plays A with

slightly more probability. He then wants to

switch to pure A so the robustness of Nash seems

questionable..

Nash and Correlated Nash

- A Nash is strict if for each player i, si is the

unique best response to s-i - Only pure strategies can be strict since if a

mixed is BR than so is every pure strategy in the

mixed strategys support otherwise there is no

point of including it. - Recall Support for a mixed strategy are the pure

strategies that participate with positive

probability.

Some Questions in Theory of Games

- When and why should we expect play to correspond

to a Nash equilibrium - If there are several Nash equilibria, when one

should we expect to occur? - In the previous example, in the absence of

coordination, we are faced with the possibility

that player 1 expects NE1ltA,Agt so he plays A,

the opponent might expect NE2ltB,Bgt and he plays

B, with the results of the non-equilibrium

outcome profile ltA,Bgt

The Idea of Learning based explanation of

equilibrium

- Intuitively, the history of observations can

provide a way for the players to coordinate their

expectations on one of the two pure-strategy

equilibrium. - Typically, Learning models predict that this

coordination will eventually occur, with the

determination of which of the two eq. arise left

to initial conditions or to random chance.

The Idea of Learning based explanation of

equilibrium

- For the history to serve this coordination role,

the sequence of actions played must eventually

become constant or at least readily predictable

by the players, of course, there is no

presumption that this is always the case. - Perhaps, rather than going to a Nash, players

wander around the space aimlessly, or perhaps

play lies in some set of alternatives larger than

the set of Nashs?

The Idea of Learning based explanation of

equilibrium

- For the simple coordination game (symmetric

lt2,2gt, lt0,0gt) there is no reason to think that

any learning process will prefer one Nash over

the other. - What if we alter it such that there is a better

Nash. Will the players learn to play the ltA,Agt

Nash?

Altered

Correlated Nash (Aumann 74)

- Suppose the players have access to randomized

devices that are privately viewed. - If a player chooses a strategy according to his

own randomized device, the result is a

probability distribution over strategy profiles,

denoted µ??(S). - Unlike a profile of mixed strategies which is by

definition uncorrelated, such a distribution may

be correlated.

Correlated Nash Jordans matching pennies

- 3 players.
- Each chooses H or T
- Payoffs are 1 or -1 only
- 1 wins if he matches 2
- 2 wins if he matches 3
- 3 wins by not matching 1
- This game has a unique NE, each play (1/2,1/2)
- HoweverIt has many correlated NE.

Player 3 plays H

Player 3 plays T

Correlated Nash Jordans matching pennies

Player 3 plays H

- C-NE unified distribution over these 6

profiles(H,H,H) (H,H,T) (H,T,T) (T,T,T)

(T,T,H) (T,H,H) - Each player has 50 to play H.
- No weight is placed on (H,T,H), so the play of

the players is not independent (it is correlated) - For Player 1 When he plays H he faces 1/3 chance

each of his opponents play (H,H), (H,T),(T,T).

Since his goal is to match 2, he wins 2/3 of the

times by playing H and only one third if he plays

Y. similarly if he plays T his opponents might

only play (T,T), (T,H), (H,H). Now tails win 2/3

of the times as against heads which wins only 1/3

of the time. So he is evened. He is at a Nash.

Player 3 plays T

Why is Correlated Nash of Significance?

- Hint Cycles create correlation between profile

strategies. - Informally a cycle is a finite sequence of

profiles of length k such that s0sk. - Cournot play can exhibit cycles example

follows. - So cycles gt correlation gt correlated Nash

Cournot Cycle - matching pennies.

- 3 player (head, tail)
- 1 wants to match 2.
- 2 wants to match 3.
- 3 wants to un-match 1.
- Cournot means each player assumes his

opponents play the same as in their last step

Roadmap

- Introduction to the common models of learning
- Cournot adjustment
- Fictitious play and Nash equilibriums
- Motivation
- Definitions
- Results
- Generalizations of fictitious play

Fictitious play - Introduction

- Motivation
- Repeated game, stationary assumption.
- Each player forms a belief of his opponents

strategy by looking at what happened - Player plays Best Response according to his/her

belief

Two-Player Fictitious Play - notations

- S1 and S2 are finite actions spaces for players

one and two respectively. - S1 ,?,?
- S2 ?,?,?
- u1, u2 player payoff functions
- u1(, ?)15
- for mixed strategy we take
- u1(lt½,½gt,lt¼, ¾gt ) u1(
- u1(,lt¼, ¾gt )¼ u1(, ?) ¾ u1(, ?)
- Player is pi, opponent is p-i i1,2

Two-Player Fictitious Play

- Notion of belief
- A prediction of the opponent action distribution

the degree to which 1 believes 2 will play ? - Assume players choose their actions for each

period to maximize their expected payoff, with

respect to their belief for the current period.

Two-Player Fictitious Play Forming Beliefs

- Player i starts with a weight function K0i
- K0i S-i ? ?
- For example
- K0i?,,? ? ?
- K0i()4
- As the game is iteratively repeated K is updated

Two-Player Fictitious Play Belief update

- If some action say was played (by the

opponent!) the last time, we add 1 to its count,

generally - Kt (s-i) 1 if s-it-1 s-i

0 otherwise - Thats a complicated way of saying that K(s)

simply counts the number of times the opponent

played s.

Two-Player Fictitious Play Using frequencies to

form beliefs

- Given K the frequency vector,
- Each player forms a probability vector ? over his

opponents actions - His belief can be said to be that the
- Pri plays Kt() / steps
- Simple normalization

Reads the belief player i holds at time t

regarding the probability of his opponent to

plays s-I in time t

Two-Player Fictitious Play Using frequencies 2

My belief is that my opponent plays ? with

probability ½, ,? with prob ¼ and ¼ ?, looking at

my payoff table, by playing I can max the

utility

- We now have a belief of how the opponent plays.
- A FP is any rule ?it which assigns a Best

Response action to the belief ?it - Example
- ?1(lt½,¼,¼gt) (extend naturally to mixed)
- This implies that u1(, lt½,¼,¼gt) is better

for player 1 than any other action against

lt½,¼,¼gt

Two-Player Fictitious Play remarks

- Many BR are possible for a given belief set
- An example of such rules ? may be
- Always prefer pure action over mixed action
- Pick the best response for which your action

index is least, (thats the limit of my

creativity) - (both of course must still be best responses)

Two-Player Fictitious Play Interpretation (page

31-32)

- Bayesian inference
- Player i believes opponents play corresponds to

a sequence of i.i.d. multinomial random variable

with a fixed but unknown distribution. - Player is prior over that unknown distribution

takes the form of a Dirichlet distribution. - is prior and posterior belief corresponds to a

distribution over the set ?(S-i) of probability

distributions over S-i - The distribution over oppnents strategies ?I t

is the induced marginal distribution over pure

strategies. - If beliefs over ?(S-i) are denoted µi, then we

have

Two-Player Fictitious Play Interpretation

- Denote the marginal empirical distribution as
- The assessment ? is not the same as d because of

the influence of is prior belief - This has the form of a fictitious sample

observed before the game started. - As observations are incorporated into ?, it will

converge to d (the empirical distribution)

Two-Player Fictitious Play Interpretation

- Notes
- As long as the initial weights are positive it

will stay positive - The belief reflects the conviction that the

opponent strategy is constant and unknown. - It may be wrong If the process cycles.
- Any finite sequence of what looks like a cycle is

actually consistent with this assumption that the

world is constant and those observations are a

fluke - If cycles persist, we might expect i to notice it

but in any case, his beliefs will not be

falsified in the first few periods as they did in

the Cournot process.

Asymptotic Behavior does play converges

- Sufficient conditions
- Proposition 2.1
- (1) if s is a strict Nash and is s is played at

time t in the process of FP then s is played at

all subsequent dates - (2) any pure-strategy steady state of FP must be

a Nash

If s a strict Nash and played at time t s is

played at all subsequent dates

- Proof
- Suppose ?it (players beliefs) are such that the

actions are strict Nash s. - believe me that When profile s is played at

time t, each players belief at t1 are a convex

combination of ?it and a mass point on s-i ?it1

(1-at) ?it atd(s-i) - we get

If s a strict Nash and played at time t s is

played at all subsequent dates

- We want to show that this payoff is still better

than any other payoff involving ?it1 - Now si was a strict BR for ?it
- Should be obvious for the first term (by

assumption that it is strict BR for ?it). - for the second term, note that for the point mass

it is obvious that si is better because it

implies that the profile lt si , s-i gt is a Nash

which was our assumption

So, what is a point mass on s-i and why is ?it1

a convex combination of it and of ?it ?

- I need to show you that ?it1 (1-at) ?it

atd(s) - For clarity, lets say t10, there are 2 players,
- Lets say s( ½, ½ ), ( ¼ , ¾ )
- S-iS2C,D and look at ?1101(C)
- recall ?110 (C)K10 (C) /10 (ignore prior it

matters not) - Suppose that at time 10 player 2 actually played

C (he played a mixed which is interpreted that ¼

of the times he would play C..) - ?111 (C) K10 (C) 1 / 11

2nd part Any pure-strategy steady state of FP

must be a Nash

- A steady state is a strategy profile that is

played in every step after perhaps a finite time

T. - Ideas?
- If play remains at a pure-strategy profile then

eventually the assessments will become

concentrated at this profile. - If it was not a Nash for one of the players, him

playing what he played would not be a BR, this is

a contradiction to how FP works, - Since all players always play BR according to

their belief. - Food for thought Why does it not work for

mixed-strategy profile?

To Conclude this

- we wanted to show that if s is a strict Nash and

is s is played at time t in the process of FP

then s is played at all subsequent dates - We showed it by looking at what happens to

players belief and prove that the actions at

given the new belief are still strict BR. - This means the system is at a steady state.
- We also showed that if it is a pure-strategy

steady state it is a Nash.

No Pure Nash gt FP cant converge to a pure

profile

- Matching pennies
- For example
- At time3 player I believes that II prefers

Tails, so he plays Tails to match - But II plays Heads so I adds one to Heads
- Now HeadsgtTails and I convinced himself II will

play Heads so he switches to H - The game cycles and never converges to the Nash

profile.

No Pure Nash gt FP cant converge to a pure

profile

- If the game did converge it would be in a

steady state that is pure and not a Nash (since

matching pennies has no pure Nash) but we showed

that any pure-steady state must be a Nash. - Its ok then that the game does not converge.
- Interestingly, the empirical distributions over

player is strategies are converging to ( ½ , ½ )

their product ( ½ , ½ ), ( ½ , ½ ) is a Nash.

Asymptotic Behavior

- Proposition 2.2
- If the empirical d over each players choices

converges, the strategy profile corresponding to

the product of these distributions is a Nash - Proof
- intuitively, if the empirical does converge, then

the belief converges to the same thing, hence, if

it was not a Nash players would move from there. - Generally, for this it is enough that the beliefs

are asymptotically empirical, need not be FP

Asymptotic Behavior

- More results (proof omitted)
- The empirical converges if
- (1) generic payoff and 2x2 game
- (2) zero sum
- (3) solvable by iterated strict dominance
- The empirical distribution however need not

converge! 2 examples.

Example Shapley (1964)

Nash is at ( 1/3, 1/3,1/3) for both. if initial

weights lead to ltT,Mgt we cycle. Diagonals are

never played. (ltT,Lgt, ltD,Mgt,ltT,Mgt the number of

consecutive periods each profile is played

increases sufficiently fast so the empirical

distributions never converges.

Example due to Jordan

- Increase to (1/2,1/2) the diagonal.
- We get Even-Nyar Umisprayim a zero sum game

with the same Nash - Here, since the empirical do converge, there are

still cycles, but the rate of repeated profiles

within a sequence grows slowly. - Note that this does not conflict with our

previous statements, namely - Heres a zsg whos empirical converge but its

not steady (cycles) - Had the empirical converged to a pure-strategy

Nash the fact that it cycled would have been a

conflict to ltLINKgt

Payoffs in Fictitious Play

- The question we deal with here is if FP learns

the distributions then it should, asymptotically

yield the same utility that would be achieved

when the frequency distribution is known in

advance. - Here we will suppose more than 2 players, their

assessment track the joint distribution of

opponent strategies.

Payoffs Fictitious Play - Notations

- Empirical joint distribution
- Best payoff against the empirical
- Time avg of realized payoffs
- Definition
- Fictitious play is e-consistent along a history

if there is T such that for any T

Payoffs Fictitious Play - Notations

- Empirical joint distribution
- Best payoff against the empirical
- Time avg of realized payoffs
- Definition
- Fictitious play is e-consistent along a history

if there is T such that for any T

Consistency

- We dont look at how good the player does

globally but how good he does with comparison to

his expectations which are built upon his

beliefs. - If the FP is not consistent it would be much less

interesting model, it would be as if someone

simply plays a different game. So, we want

consistency.

Consistency

- If A game is consistent it can be useful, if

after some period of time, a player sees that his

expectations are not fulfilled (by comparing the

expected payoff with the actual payoff), he can

deduce that something is wrong in his model of

the world.

Consistency main result

- Relate player frequency of switching strategies

of a player to the consistency of his play. - We define the frequency of switches ?it to be the

fraction of periods for which

Fictitious Play and BR dynamics

- Definition
- Fictitious play exhibits infrequent switches

along a history if for every e gt 0 there is a T

and for any t ? T, ?it ? e for all i. - Proposition (Fudenberg and Levine 94)
- If FP exhibits infrequent switches along a

history, then it is e-consistent along that

history for every e gt 0

infrequent switches ? e-consistency

- Intuition
- Once prior looses its influence, at each date

player i plays BR to the empirical (observations)

through date t-1. - On the other hand, if i is not doing on average

as well as best response to the empirical, there

must be a nonmalleable fraction of dates t for

which i is not playing BR, but in this case,

player i must switch in date t1. conversely,

infrequent switches imply that most of the time,

is date-t action is a best response to the

empirical distribution at the end of date t.

Proof - notations

- k length of initial history (kept as prior

belief) - initial belief
- best response strategy of i to
- player i expected date-t payoff is

Proof - summary

- We showed that if there are not many switches

along the players history, his play is

consistent. - It means that the actual payoffs do not flounder

below the expectation! - Again, one can use this to see if something is

wrong with his model of the env.

Roadmap

- Crash introduction to the common models of

learning - Cournot adjustment
- Fictitious play and Nash equilibriums
- Motivation
- Definitions
- Results
- Generalizations of fictitious play

Generalizations of Fictitious Play

- Mainly generalize the update rule for the belief
- Example exponential weighting
- As long as beliefs are asymptotically empirical

what we showed for FP holds.

Summary

- Dynamics of Games and a flavor of analysis
- Although different from Nash analysis, Nash is

still an important point, we showed that it is

still a point FP can converge to - FP is a consistent play
- Have a nice summer vacation thanx for listening.