Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs - PowerPoint PPT Presentation

About This Presentation

Title:

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs

Description:

solveGame( 0,p( 0)) Make Observation. hi = obsit U ait-1 U hi. Determine Type it = bestMatch(hi, ... Execute Action. ait = it ( it Propagate Forward t 1,p( t 1) ... – PowerPoint PPT presentation

Number of Views:56

Avg rating:3.0/5.0

Slides: 35

Provided by: rosemar72

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs

1
Approximate Solutions for Partially Observable
Stochastic Games with Common Payoffs

Rosemary Emery-Montemerlo
joint work with
Geoff Gordon, Jeff Schneider and Sebastian Thrun
July 21, 2004 AAMAS 2004

2
Robot Teams
3
Robot Teams

With limited communication, existing paradigms
for decentralized robot control are not
sufficient
Game theoretic methods are necessary for
multi-robot coordination under these conditions

4
Decentralized Decision Making
5
Decentralized Decision Making
6
Decentralized Decision Making
7
Decentralized Decision Making
8
Decentralized Decision Making
9
Decentralized Decision Making
10
Decentralized Decision Making
11
Decentralized Decision Making
12
Decentralized Decision Making
13
Decentralized Decision Making

A robot cannot choose actions based only on joint
observations consistent with its own sensor
readings

It must consider all joint observations that are
consistent with its possible sensor readings

14
Relationship Between Decision Theoretic Models
?
MDP
POMDP
State Space
State Space
Belief Space
Belief Space
Distribution over Belief Space
15
Models of Multi-Agent Systems

Partially observable stochastic games
Generalization of stochastic games to partially
observable worlds
Related models
DEC-POMDP Bernstein et al., 2000
MTDP Pynadath and Tambe, 2002
I-POMDP Gmystrasiewicz and Doshi, 2004
POIPSG Peshkin et al., 2000

16
Partially Observable Stochastic Games

POSG I, S, A, Z, T, R, O
I is the set of agents, I 1,,n
S is the set of states
A is the set of actions, A A1 ?? ? An
Z is the set of observations, Z Z1 ?? ? Zn
T is the transition function, T S ? A ? S
R is the reward function, R S ? A ? ?
O are the observation emission probabilities O S
? Z ? A ? 0,1

17
Solving POSGs

POSGs are computationally infeasible to solve

18
Solving POSGs

We can approximate a POSG as a series of smaller
Bayesian games

One-Step Lookahead Game at time t (Bayesian Game)
Full POSG
19
Bayesian Games

Private information relevant to game
Uncertainty in utility
Type
Encapsulates private information
Will limit selves to games with finite number of
types
In robot example
Type 1 Robot doesnt see anything
Type 2 Robot sees intruder at location x

20
Bayesian Games

BG I, ?, A,p(?), u
? is the joint type space, ? ?1 ?? ? ?n
? is a specific joint type, ? ?1,, ?n
p(?) is common prior on the distribution over ?
u is the utility function, u u1,,un
ui(ai,a-i,(?i, ?-i))
?i is a strategy for player i
Defines what player i does for each of its
possible types
Actions are individual actions, not joint actions

21
Bayesian-Nash Equilibrium

Set of best response strategies
Each agent tries to maximize its expected utility
conditioned on its probability distribution over
the other agents types p(?)
Each agent has a policy ?i that, given ?-i ,
maximizes ui(?i,?-i, ?-i)

22
POSG to Bayesian Game Approximation

I,S,A,Z,T,R,O to I, ?, A,p(?), ut
I I
A A
Type space ?it all possible histories of agent
is actions and observations up to time t
p(?)t calculated from S0,A,T,Z,O, ?t-1
Prune low probability types
Each joint type ? maps to a joint belief
u given by heuristic and ui uj
QMDP

23
Algorithm
24
Robotic Team Tag

Version of Team Tag
Environment is portion of Gates Hall

Full teammate observability
Opponent can be captured by a single robot in any
state
QMDP used as heuristic
Two pioneer-class robots

25
Robot Policies
26
Lady And The Tiger Nair et al. 2003
27
Contributions

Algorithm for finding approximate solutions to
POSG with common payoffs
Tractability achieved by modeling POSG as a
sequence of Bayesian games
Performs comparably to the full POSG for a small
finite-horizon problem
Improved performance over blind application of
utility heuristic in more complex problems
Successful real-time game-theoretic controller
for indoor robots

28
Questions?