Title: On the Convergence of Regret Minimizing Dynamics in Concave Games
1On the Convergence of Regret Minimizing Dynamics
in Concave Games
Uri Nadav Tel Aviv University, Tel Aviv Israel
Joint work with Eyal Even Dar , Yishay Mansour
Microsoft Research, Cambridge UK, March 26, 2009
2Nash Equilibrium
- Nash equilibrium is a steady state of the game
- No player has an incentive to unilaterally
deviate from his state
Player II
Player I
- Existence (pure strategy)
- Uniqueness
- Quality
- Price of Anarchy/Stability
- Dynamics Reaching an equilibrium
3Dynamics
Day 1
Day 2
Day 3
Day 4
Day 5
Player 1
Player 2
4Example Dynamics
- Best Response
- On each day adjust to other players
- Ignore the fact that they also adjust
Day 1
Day 2
Day 3
Day 4
Day 5
Player 1
Player 2
Unfortunately, does not always converge to
equilibrium
5No External Regret
A procedure is without external regret if for
every sequence the external regret is sublinear
in T
- No single action significantly outperforms
dynamics
- Define regret in T time steps as
(total cost of best fixed row in hindsight)
(total cost of alg)
-
RegretAlg ( T )
- Many (different) algorithms can guarantee this
Hannan 57, Blackwell 56Banos 68Megiddo
80Fundberg, Levine 94Auer et. al 95
6Our Main Result
Socially concave games
- If each player uses a procedure without regret in
some class of interesting games then their joint
play converges to Nash equilibrium
Selfish Routing
Resource Allocation
Cournot Oligopoly
TCP Congestion Control
7Cournot Oligopoly Cournot 1838
- Firms select production level ( supply)
- Market price depends on total supply
- Firms maximize their Profit Revenue - Cost
Market price
Y
X
Cost1(X)
Cost2(Y)
P
X
y
Overall quantity
We will show no-regret dynamics converges to NE
for any number of players
- Best response dynamics
- Converges for 2 players
- Diverges for n ? 5 Theocharis 1960
8Resource Allocation Games
We can show that the best response dynamics
generally diverges for linear resource allocation
games
- Equilibrium
- Existence Uniqueness Hajek, Gopalakrishnan
- Efficiency Loss (POA) 3/4 Johari, Tsitsiklis
5M
10M
17M
25M
- Each advertiser wins a proportional market share
25
s allocated rate
5101725
- Utility
- Concave utility from allocated rate
- Quasi-linear with money
9Routing Games
s1
- Costi ?p2 (si, ti) Latency(p) flowi (p)
f1, L
f1
f1, R
f2, T
t2
s2
f2,T
e
t1
f1,L
f2, B
Latency on edge e Le(f1,L f2,T)
f2
10Socially Concave Games
There exists ?1,,?n gt 0 Such that ?1 u1 (x) ?2
u2(x)?n un(x)
- Closed convex strategy set
- A (weighted) social welfare is concave
- The utility of a player is convex in the vector
of actions of other players
R
Zero Sum Games ½ Socially concave games
- Some socially concave games
- Subclass of Cournot competition, Resource
allocation, Selfish Routing, TCP congestion
control(Near equilibrium)
11Our Main Result
If each players uses a procedure without regret
in socially concave games then their joint play
converges to Nash equilibrium
- The average action profile converges to NE
Day 1
Day 2
Day 3
Day T
Average of days 1T
Player 1
Player 2
?(T) - Nash equilibrium
Player n
- The average daily payoff of each player converges
to her payoff in NE
12Convergence to NE Proof Outline
Definition of ? - Nash equilibrium
- Goal Show that for every player, the utility
from the average action profile equals the
utility of playing best-response to the average
Utility of player i at average
Utility of i playing Best Response to the average
?
?
13Convergence to NE Proof Outline
- Upper bound on the utility of the average action
profile
For each player i
Sum of utilities
Utility of average action profile
By definition of Best Response
14Convergence to NE Proof Outline
- Lower bound on the sum of average utilities
is concave
By assumption, there exists ?1,,?n such that
Utility of average action profile
(Average) Sum of utilities
15Convergence to NE Proof Outline
Upper Bound Lower Bound Average Regret
Upper Bound
Lower Bound
Q.E.D
16Convergence in Almost Socially Concave Games
- TCP game is a Concave game
- Karp, Koutouspias, Papadimitriou, Shenker
- And the weighted social welfare is concave
- But, the utility of player i is not convex in the
entire strategy space of the other players
Therefore, the convergence theorem cannot be
directly applied
Playing gradient based dynamics, guarantees no
regret in concave decision making Zinkevich
Playing gradient based dynamics, guarantees
playing in a socially concave zone
17Regret Minimization Equilibrium
- Zero sum game
- Guarantee at least min-max value
- Correlated equilibrium
- Internal/Swap regret dynamics converge to it
Foster, Vohra, Hart, Mas-Colell, Blum
Mansour
- Specific games
- Routing Blum, Even-Dar, Ligett
- Price of Total Anarchy Blum, Ligett, Hajiaghay,
Roth
18Ongoing Research I
- We studied the allocation of a single link
- Extend for general resource allocation games
- A set of resources
- Players buy a path (subset of resources)
- Resource allocation in parallel edges is socially
concave
- An equilibrium does not necessarily exists in
general networks - Always exists in Johari, Tsitsiklis extended
game - Not socially concave
19Ongoing Research II
- Resource Allocation Game
- Players act as price anticipators
- Resource Allocation Market
- Players act as price takers
- Efficient competitive equilibrium exists (price
bids) Kelly - Continuous time algorithms converge to
equilibrium Kelly et. al
- No regret
- Players have no regret if they believe that they
dont influence the market price - Simulation Results fast convergence to market
equilibrium
20Other stuff I work on
21Thank you!
22TCP Congestion Control kkps 01
Fraction of good-put determined by router policy
User action push flow fi
Channel
gi
fi
li
good-putfraction of fi forwarded
loss li fraction of fi discard
User Utility ui gi ?i li
?i associated cost with lost flow retransmission
, lost bandwidth, utilization
23Router Policy
- Random Early Discard (RED)
- Number of dropped packets increases as queue grows
- Tail Drop
- Drop an incoming packet when out of space
Amount of flow to discard depends on the total
amount of flow
24Resource Allocation Games
We can show that the best response dynamics
generally diverges for linear resource allocation
games
- Equilibrium
- Existence Uniqueness Hajek, Gopalakrishnan
- Efficiency Loss (POA) 3/4 Johari, Tsitsiklis
- Users choose payment per unit time
5M
10M
17M
25M
- Users are allocated rate proportionally
25
s allocated rate
5101725
- Utility
- Concave utility from allocated rate
- Quasi-linear with money
25Nash Equilibrium
- Nash equilibrium is a steady state of the game
- No player has an incentive to unilaterally
deviate from his state
Player II
½
½
½
Player I
½
- Existence (pure strategy)
- Uniqueness
- Quality
- Price of Anarchy/Stability
- Dynamics Reaching an equilibrium