Title: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs
1Approximate Solutions for Partially Observable
Stochastic Games with Common Payoffs
- Rosemary Emery-Montemerlo
- joint work with
- Geoff Gordon, Jeff Schneider and Sebastian Thrun
- July 21, 2004 AAMAS 2004
2Robot Teams
3Robot Teams
- With limited communication, existing paradigms
for decentralized robot control are not
sufficient - Game theoretic methods are necessary for
multi-robot coordination under these conditions
4Decentralized Decision Making
5Decentralized Decision Making
6Decentralized Decision Making
7Decentralized Decision Making
8Decentralized Decision Making
9Decentralized Decision Making
10Decentralized Decision Making
11Decentralized Decision Making
12Decentralized Decision Making
13Decentralized Decision Making
- A robot cannot choose actions based only on joint
observations consistent with its own sensor
readings
- It must consider all joint observations that are
consistent with its possible sensor readings
14Relationship Between Decision Theoretic Models
?
MDP
POMDP
State Space
State Space
Belief Space
Belief Space
Distribution over Belief Space
15Models of Multi-Agent Systems
- Partially observable stochastic games
- Generalization of stochastic games to partially
observable worlds - Related models
- DEC-POMDP Bernstein et al., 2000
- MTDP Pynadath and Tambe, 2002
- I-POMDP Gmystrasiewicz and Doshi, 2004
- POIPSG Peshkin et al., 2000
16Partially Observable Stochastic Games
- POSG I, S, A, Z, T, R, O
- I is the set of agents, I 1,,n
- S is the set of states
- A is the set of actions, A A1 ?? ? An
- Z is the set of observations, Z Z1 ?? ? Zn
- T is the transition function, T S ? A ? S
- R is the reward function, R S ? A ? ?
- O are the observation emission probabilities O S
? Z ? A ? 0,1
17Solving POSGs
- POSGs are computationally infeasible to solve
18Solving POSGs
- We can approximate a POSG as a series of smaller
Bayesian games
One-Step Lookahead Game at time t (Bayesian Game)
Full POSG
19Bayesian Games
- Private information relevant to game
- Uncertainty in utility
- Type
- Encapsulates private information
- Will limit selves to games with finite number of
types - In robot example
- Type 1 Robot doesnt see anything
- Type 2 Robot sees intruder at location x
20Bayesian Games
- BG I, ?, A,p(?), u
- ? is the joint type space, ? ?1 ?? ? ?n
- ? is a specific joint type, ? ?1,, ?n
- p(?) is common prior on the distribution over ?
- u is the utility function, u u1,,un
- ui(ai,a-i,(?i, ?-i))
- ?i is a strategy for player i
- Defines what player i does for each of its
possible types - Actions are individual actions, not joint actions
21Bayesian-Nash Equilibrium
- Set of best response strategies
- Each agent tries to maximize its expected utility
conditioned on its probability distribution over
the other agents types p(?) - Each agent has a policy ?i that, given ?-i ,
maximizes ui(?i,?-i, ?-i)
22POSG to Bayesian Game Approximation
- I,S,A,Z,T,R,O to I, ?, A,p(?), ut
- I I
- A A
- Type space ?it all possible histories of agent
is actions and observations up to time t - p(?)t calculated from S0,A,T,Z,O, ?t-1
- Prune low probability types
- Each joint type ? maps to a joint belief
- u given by heuristic and ui uj
- QMDP
23Algorithm
24Robotic Team Tag
- Version of Team Tag
- Environment is portion of Gates Hall
- Full teammate observability
- Opponent can be captured by a single robot in any
state - QMDP used as heuristic
- Two pioneer-class robots
25Robot Policies
26Lady And The Tiger Nair et al. 2003
27Contributions
- Algorithm for finding approximate solutions to
POSG with common payoffs - Tractability achieved by modeling POSG as a
sequence of Bayesian games - Performs comparably to the full POSG for a small
finite-horizon problem - Improved performance over blind application of
utility heuristic in more complex problems - Successful real-time game-theoretic controller
for indoor robots
28Questions?
- remery_at_cs.cmu.edu
- www.cs.cmu.edu/remery
29Back-Up Slides
30Lady And The Tiger Nair et al. 2003
31Robotic Team Tag
- I 1,2
- S S1 X S2 X Sopponent
- Si s0,,s28, sopponent s0,,s28,stagged
- S 25230
- Ai N,S,E,W,Tag
- Zi si,-1,s-i,a-i
- T adjacent cells
- O see opponent if on same cell
- R minimize capture time
- Modified from Pineau et al. 2003
32Environment
33Robotic Team Tag Results
34Robotic Team Tag Results