Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs - PowerPoint PPT Presentation

About This Presentation
Title:

Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs

Description:

solveGame( 0,p( 0)) Make Observation. hi = obsit U ait-1 U hi. Determine Type it = bestMatch(hi, ... Execute Action. ait = it ( it Propagate Forward t 1,p( t 1) ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 35
Provided by: rosemar72
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs


1
Approximate Solutions for Partially Observable
Stochastic Games with Common Payoffs
  • Rosemary Emery-Montemerlo
  • joint work with
  • Geoff Gordon, Jeff Schneider and Sebastian Thrun
  • July 21, 2004 AAMAS 2004

2
Robot Teams
3
Robot Teams
  • With limited communication, existing paradigms
    for decentralized robot control are not
    sufficient
  • Game theoretic methods are necessary for
    multi-robot coordination under these conditions

4
Decentralized Decision Making
5
Decentralized Decision Making
6
Decentralized Decision Making
7
Decentralized Decision Making
8
Decentralized Decision Making
9
Decentralized Decision Making
10
Decentralized Decision Making
11
Decentralized Decision Making
12
Decentralized Decision Making
13
Decentralized Decision Making
  • A robot cannot choose actions based only on joint
    observations consistent with its own sensor
    readings
  • It must consider all joint observations that are
    consistent with its possible sensor readings

14
Relationship Between Decision Theoretic Models
?
MDP
POMDP
State Space
State Space
Belief Space
Belief Space
Distribution over Belief Space
15
Models of Multi-Agent Systems
  • Partially observable stochastic games
  • Generalization of stochastic games to partially
    observable worlds
  • Related models
  • DEC-POMDP Bernstein et al., 2000
  • MTDP Pynadath and Tambe, 2002
  • I-POMDP Gmystrasiewicz and Doshi, 2004
  • POIPSG Peshkin et al., 2000

16
Partially Observable Stochastic Games
  • POSG I, S, A, Z, T, R, O
  • I is the set of agents, I 1,,n
  • S is the set of states
  • A is the set of actions, A A1 ?? ? An
  • Z is the set of observations, Z Z1 ?? ? Zn
  • T is the transition function, T S ? A ? S
  • R is the reward function, R S ? A ? ?
  • O are the observation emission probabilities O S
    ? Z ? A ? 0,1

17
Solving POSGs
  • POSGs are computationally infeasible to solve

18
Solving POSGs
  • We can approximate a POSG as a series of smaller
    Bayesian games

One-Step Lookahead Game at time t (Bayesian Game)
Full POSG
19
Bayesian Games
  • Private information relevant to game
  • Uncertainty in utility
  • Type
  • Encapsulates private information
  • Will limit selves to games with finite number of
    types
  • In robot example
  • Type 1 Robot doesnt see anything
  • Type 2 Robot sees intruder at location x

20
Bayesian Games
  • BG I, ?, A,p(?), u
  • ? is the joint type space, ? ?1 ?? ? ?n
  • ? is a specific joint type, ? ?1,, ?n
  • p(?) is common prior on the distribution over ?
  • u is the utility function, u u1,,un
  • ui(ai,a-i,(?i, ?-i))
  • ?i is a strategy for player i
  • Defines what player i does for each of its
    possible types
  • Actions are individual actions, not joint actions

21
Bayesian-Nash Equilibrium
  • Set of best response strategies
  • Each agent tries to maximize its expected utility
    conditioned on its probability distribution over
    the other agents types p(?)
  • Each agent has a policy ?i that, given ?-i ,
    maximizes ui(?i,?-i, ?-i)

22
POSG to Bayesian Game Approximation
  • I,S,A,Z,T,R,O to I, ?, A,p(?), ut
  • I I
  • A A
  • Type space ?it all possible histories of agent
    is actions and observations up to time t
  • p(?)t calculated from S0,A,T,Z,O, ?t-1
  • Prune low probability types
  • Each joint type ? maps to a joint belief
  • u given by heuristic and ui uj
  • QMDP

23
Algorithm
24
Robotic Team Tag
  • Version of Team Tag
  • Environment is portion of Gates Hall
  • Full teammate observability
  • Opponent can be captured by a single robot in any
    state
  • QMDP used as heuristic
  • Two pioneer-class robots

25
Robot Policies
26
Lady And The Tiger Nair et al. 2003
27
Contributions
  • Algorithm for finding approximate solutions to
    POSG with common payoffs
  • Tractability achieved by modeling POSG as a
    sequence of Bayesian games
  • Performs comparably to the full POSG for a small
    finite-horizon problem
  • Improved performance over blind application of
    utility heuristic in more complex problems
  • Successful real-time game-theoretic controller
    for indoor robots

28
Questions?
  • remery_at_cs.cmu.edu
  • www.cs.cmu.edu/remery

29
Back-Up Slides
30
Lady And The Tiger Nair et al. 2003
31
Robotic Team Tag
  • I 1,2
  • S S1 X S2 X Sopponent
  • Si s0,,s28, sopponent s0,,s28,stagged
  • S 25230
  • Ai N,S,E,W,Tag
  • Zi si,-1,s-i,a-i
  • T adjacent cells
  • O see opponent if on same cell
  • R minimize capture time
  • Modified from Pineau et al. 2003

32
Environment
33
Robotic Team Tag Results
34
Robotic Team Tag Results
Write a Comment
User Comments (0)
About PowerShow.com