Decision%20Theory%20and%20Game%20Theory - PowerPoint PPT Presentation

About This Presentation

Title:

Decision%20Theory%20and%20Game%20Theory

Description:

Title: Lecture 6: MultiAgent Interactions Subject: Introduction to MultiAgent Systems Author: Jeff Rosenschein Last modified by: yzhang Created Date – PowerPoint PPT presentation

Number of Views:113

Avg rating:3.0/5.0

Slides: 36

Provided by: JeffRo162

Learn more at: http://www.cs.trinity.edu

Category:

more less

Transcript and Presenter's Notes

Title: Decision%20Theory%20and%20Game%20Theory

1
Decision Theory and Game Theory

An Introduction to Multi-Agent Systemshttp//www.
csc.liv.ac.uk/mjw/pubs/imas
Multi-Agent Systems Algorithm, Game-Theoretic,
and Logic Foundations
http//www.masfoundations.org/resources.html

2
Decision Theory

Probability
Self-interested agents
Utilities and Preferences
Rationality

3
Decision Theory

Decision Theory is a game between agents and
nature.
Such as Lotteries, slot machines.

4
Probability

Xi is a variable that captures some aspect of the
current state of the environment.
is a possible value of Xi.
is some possibility of .
? 0, 1.
.
Joint probability
if X1 and X2 are independent.
If not

5
Self-Interested Agents

What does it mean to say that an agent is
self-interested?
Not that they want to harm other agents
Not that they only care about things that benefit
them
That the agent has its preference over how the
environment is, and that its actions are
motivated by this description

6
State-Action Diagram

Set of agents
Each agent has a set of actions
Actions define outcomes
For each possible set of actions there is an
outcome.
Outcomes define payoffs
Agents derive utility from different outcomes

5
10
-8
7
Utilities and Preferences

Assume we have just two agents Ag i, j
Assume W w1, w2, is the set of outcomes
that agents have preferences over
We capture preferences by utility
functions ui W ? ? uj W ? ?
Utility functions lead to preference orderings
over outcomes w ?i w means ui(w) ? ui(w)
preferred w gti w means ui(w) gt ui(w)
strictly preferred
w i w means w ?i w or w ?i w
indifferent

8
What is Utility?

Utility is not money (but it is a useful analogy)
Typical relationship between utility money

9
Rationality

Agent attempts to maximize its expected utility.

Lottery 1 0.5M w.p. 1 Lottery 2 1M w.p.
0.5 0 w.p. 0.5 Agents strategy is
the choice of lottery
Risk aversion gt insurance companies
10
Game Theory

What is a game
Strategies
Dominant strategies
Nash equilibrium
Pareto optimality
Competition games
Cooperation games
Coordination games
Axelrods tournament
Bounded rationality

11
What is a Game

Game Formal representation of a situation of
strategic interdependence
We focus on games where
There are 2 or more players.
There is some choice of action where strategy
matters.
The game has one or more outcomes, e.g. someone
wins, someone loses.
The outcome depends on the strategies chosen by
all players there is strategic interaction.
What does this rule out?
Games of pure chance, e.g. lotteries, slot
machines. (Strategies don't matter).
Games without strategic interaction between
players, e.g. Solitaire

12
Strategies

Strategy
A strategy, si, is a comprehensive plan of
action defines actions agent i should take for
all possible states of the world
Prisoners Dilemma Defect, Confess
Strategy profile s(s1,,sn)
s-i (s1,,si-1,si1,,sn)
Utility function ui(si, s-i)
Note that the utility of an agent depends on the
strategy profile, not just its own strategy
We assume agents are expected utility maximizers

13
Three Elements of a Game

The players
how many players are there?
Two-players and Two-actions
does nature/chance play a role?
Pure strategies
Mixed strategies (with probability)
A complete description of the strategies of each
player
A description of the consequences (payoffs) for
each player for every possible profile of
strategy choices of all players.

14
Assumptions Game Theorists Make

Payoffs are known and fixed. People treat
expected payoffs the same as certain payoffs
(they are risk neutral).
Example a risk neutral person is indifferent
between 25 for certain or a 25 chance of
earning 100 and a 75 chance of earning 0.
We can relax this assumption to capture risk
averse behavior.
All players behave rationally.
They understand and seek to maximize their own
payoffs.
They are flawless in calculating which actions
will maximize their payoffs.
The rules of the game are common knowledge
Each player knows the set of players, strategies
and payoffs from all possible combinations of
strategies.

15
Multi-Agent Encounters

We need a model of the environment in which these
agents will act
agents simultaneously choose an action to
perform, and as a result of the actions they
select, an outcome in W will result
the actual outcome depends on the combination of
actions
assume each agent has just two possible actions
that it can perform, C (cooperate) and D
(defect)
Environment behavior given by state transformer
function

16
Multiagent Encounters

Here is a state transformer function(This
environment is sensitive to actions of both
agents.)
Here is another(Neither agent has any
influence in this environment.)
And here is another(This environment is
controlled by j.)

17
Rational Action

Suppose we have the case where both agents can
influence the outcome, and they have utility
functions as follows
With a bit of abuse of notation
Then agent is preferences are
C is the rational choice for i.(Because i
prefers all outcomes that arise through C over
all outcomes that arise through D.)

18
Normal Form

We can characterize the previous scenario in a
payoff matrix
Agent i is the column player
Agent j is the row player
Normal form is a way of describing a game. It
represent the game by way of a matrix.
This approach can be of greater use in
identifying Dominated Strategies and Nash
Equilibra.

19
Dominant Strategies

Given any particular strategy (either C or D) of
agent i, there will be a number of possible
outcomes
We say s1 dominates s2 if every outcome possible
by i playing s1 is preferred over every outcome
possible by i playing s2
A rational agent will never play a dominated
strategy
So in deciding what to do, we can delete
dominated strategies
Unfortunately, there isnt always a unique
undominated strategy

20
Nash Equilibrium

In general, we will say that two strategies s1
and s2 are in Nash equilibrium if
under the assumption that agent i plays s1, agent
j can do no better than play s2 and
under the assumption that agent j plays s2, agent
i can do no better than play s1.
Neither agent has any incentive to deviate from a
Nash equilibrium

21
Nash Equilibrium

Interpretations
Focal points, self-enforcing agreements, stable
social convention, consequence of rational
inference.
Criticisms
They may not be unique (Bach or Stravinsky)
Ways of overcoming this
Refinements of equilibrium concept, Mediation,
Learning
Do not exist in all games (in form defined)
They may be hard to find
People dont always behave based on what
equilibria would predict (ultimatum games and
notions of fairness,)

22
Pareto Optimality

Sometimes, one outcome O is at least as good for
every agent as another outcome O, and there is
some agent who strictly prefers O to O
In this case, it seems reasonable to say that O
is better than O
We say that O Pareto-dominates O
An outcome O is Pareto-optimal (or
Pareto-efficient) if there is no other outcome
that Pareto-dominates it.
Implied by social welfare maximization.

23
Competition Games

Where preferences of agents are diametrically
opposed we have strictly competitive scenarios
Competition Games
Players have exactly opposed interests
There must be precisely two players (otherwise
they cant have exactly opposed interests)
For all strategy profile s?S,
? some constant C, s.t. ui(s) uj(s) C

24
Zero-Sum Games

Zero-sum games are those where utilities sum to
zero
ui(s) uj(s) 0
Eg. Matching Pennies (a zero-sum game)

i
Head Tail
H 1, -1 -1, 1
T -1, 1 1, -1
j
25
Cooperation Game

Players have exactly the same interests.
No conflict all players want the same things
?s?S, ?i,j, ui(s) uj(s)

i
A B
A 1, 1 0, 0
B 0, 0 1, 1
j
Two Nash equilibria (A, A) and (B, B) They are
also Pareto optimality
26
Coordination Game - The Prisoners Dilemma

Prisoners Dilemma is any game
where T gt R gt P gt S.

D C
D P, P S, T
C T, S R, R
27
Prisoners Dilemma

Two people are arrested for a crime. If neither
suspect confesses both are released. If both
confess then they get sent to jail. If one
confesses and the other does not, then the
confessor gets a light sentence and the other
gets a heavy sentence.

Remain Silent Cooperate Confess Defect
D C
D P, P T, S
C S, T R, R
28
The Prisoners Dilemma

Payoff matrix forprisoners dilemma
Top left If both defect, then both get
punishment for mutual defection
Top right If i cooperates and j defects, i gets
suckers payoff of 1, while j gets 4
Bottom left If j cooperates and i defects, j
gets suckers payoff of 1, while i gets 4
Bottom right Reward for mutual cooperation

29
The Prisoners Dilemma

The individual rational action is defectThis
guarantees a payoff of no worse than 2, whereas
cooperating guarantees a payoff of at most 1
So defection is the best response to all possible
strategies both agents defect, and get payoff
2
But intuition says this is not the best
outcomeSurely they should both cooperate and
each get payoff of 3!

30
The Prisoners Dilemma

This apparent paradox is the fundamental problem
of multi-agent interactions.It appears to imply
that cooperation will not occur in societies of
self-interested agents.
Real world examples
Nuclear arms reduction (why dont I keep mine
)
Free rider systems public transport
In the UK television licenses

31
Axelrods Tournament

Suppose you play iterated prisoners dilemma
against a range of opponentsWhat strategy
should you choose, so as to maximize your overall
payoff?
Axelrod (1984) investigated this problem, with a
computer tournament for programs playing the
prisoners dilemma

32
Strategies in Axelrods Tournament

ALLD
Always defect the hawk strategy
TIT-FOR-TAT
On round u 0, cooperate
On round u gt 0, do what your opponent did on
round u 1
TESTER
On 1st round, defect. If the opponent retaliated,
then play TIT-FOR-TAT. Otherwise intersperse
cooperation and defection.
JOSS
As TIT-FOR-TAT, except periodically defect

33
Recipes for Success in Axelrods Tournament

Axelrod suggests the following rules for
succeeding in his tournament
Dont be enviousDont play as if it were zero
sum!
Be niceStart by cooperating, and reciprocate
cooperation
Retaliate appropriatelyAlways punish defection
immediately, but use measured force dont
overdo it
Dont hold grudgesAlways reciprocate
cooperation immediately

34
Game of Chicken

Consider another type of encounter the game of
chicken(Think of James Dean in Rebel without
a Cause swerving coop, driving straight
defect.)
Strategies (c,d) and (d,c) are in Nash
equilibrium
Difference to prisoners dilemma Mutual
defection is most feared outcome.(Whereas
suckers payoff is most feared in prisoners
dilemma.)
It refers to a situation in which there is a
competition for a shared resource and the
contestants can choose either conciliation or
conflict.

35
Bounded Rationality

By Herbert Simon, perfectly rational decisions
are often not feasible in practice due to the
finite computational resources available for
making them.
Game theory assumes that it is possible to
characterize an agents preferences with respect
to possible outcomes. Humans, however, find it
extremely hard to consistently define their
preference over outcomes.
Most game theoretic negotiation techniques tend
to assume the availability of unlimited
computational resources to find an optimal
solution they have the characteristics of
NP-hard problems.