Optimal Sequential Planning in Partially Observable Multiagent Settings - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Optimal Sequential Planning in Partially Observable Multiagent Settings

Description:

Tiger emits a growl periodically. Agent may open doors or listen. Tiger game as a POMDP ... Each agent hears growls as well as creaks. Each agent may open doors ... – PowerPoint PPT presentation

Number of Views:77
Avg rating:3.0/5.0
Slides: 24
Provided by: daliA6
Category:

less

Transcript and Presenter's Notes

Title: Optimal Sequential Planning in Partially Observable Multiagent Settings


1
Optimal Sequential Planning in Partially
Observable Multi-agent Settings
9th AAAI/SIGART Doctoral Consortium
Prashant Doshi Dept. of Computer Science Univ.
of Illinois at Chicago
Joint work with Piotr Gmytrasiewicz
2
Outline
  • Motivation Real-world Application Settings
  • General Problem Setting
  • Background Single-agent POMDPs
  • Definition and Solution
  • Single-agent Tiger game
  • Interactive POMDPs
  • Definition and Solution
  • Multi-agent Tiger game
  • Research Contributions

3
Real-World Application Settings
  • Surface Mapping of Mars by Autonomous Rovers
  • Coordinate to explore a pre-defined region of
    Mars optimally
  • Uncertainty
  • Robot Soccer
  • RoboCup competitions
  • Coordination with teammates, Deception of
    opponents
  • Anticipate and track others actions

Spirit
Opportunity
AIBO robots
4
General Problem Setting
5
Background Single-agent POMDPs
  • Partially Observable Markov Decision Processes
  • Standard Optimal Sequential Planning Framework
  • Realistic

POMDP Parameters
  • S, physical state space of environment
  • A, action space of the agent
  • ?, observation space of the agent
  • , transition function
  • , observation function
  • , preference function

6
Single-agent Tiger game
  • Task Maximize collection of gold over a finite
    or infinite number of steps while avoiding tiger
  • Tiger emits a growl periodically
  • Agent may open doors or listen

Tiger game as a POMDP S TL,TR, A L,OL,OR,
? GL,GR
7
Single-agent Tiger game
Belief Update (SE(b,a,o))
1
8
Single-agent Tiger game
Policy Computation
,
Trace of policy computation
9
Single-agent Tiger game
  • Value of all beliefs

Policy
Value Function
  • Properties of Value function
  • Value function is piecewise linear and convex
  • Value function converges asymptotically

10
I - POMDPs
  • Interactive Partially Observable Markov Decision
    Processes
  • Generalization of POMDPs to multi-agent settings
  • Main Idea 1. Consider other agents as part of
    the environment
  • 2. Agent maintains possible models of other
    agents, including their beliefs, and their
    beliefs about others beliefs.
  • Borrows concepts from several fields
  • Bayesian games
  • Interactive epistemology / recursive modeling
  • Decision-theoretic planning
  • Decision-theoretic approach to game theory

11
I-POMDPs
  • I-POMDPi Parameters
  • where Bayes rational
  • Belief Non-manipulability Assumption (BNM)
    Actions dont directly manipulate beliefs
    (instead, actions ? observations ? belief update)
  • Belief Non-observability Assumption (BNO)
    Beliefs of other agents cannot be directly
    observed (instead, beliefs ? actions ?
    observations)
  • Preferences are generally over physical states
    and actions

intentional model or type (computable)
12
I-POMDPs
  • Beliefs
  • Single-agent POMDP
  • I-POMDPi

uncountably infinite
countably infinite nesting
13
I-POMDPs
  • Finitely nested I-POMDP I-POMDPi,l
  • Computable approximations of I-POMDPs

bottom up
  • 0th level type is a POMDP

14
Multi-agent Tiger game
  • Task Maximize collection of gold over a finite
    or infinite number of steps while avoiding tiger
  • Each agent hears growls as well as creaks
  • Each agent may open doors or listen
  • Each agent is unable to perceive others action
    or observation

2 agents
Multi-agent Tiger game as a level 1 I-POMDP
STL,TR, , ,
AL,OL,ORxL,OL,OR, ?iGL,GRxS,CL,CR
15
Multi-agent Tiger game
Example agent is level 1 beliefs
i is uninformed about js beliefs
i knows j is clueless
i believes j is informed
16
Multi-agent Tiger game
Agent is belief update process
L
GL,S
17
Multi-agent Tiger game
Policy Computation
,
Policy traces
Pr(TL,b_j)
Pr(TL,b_j)
0.5
0.5
0.5
0.5
Pr(TR,b_j)
Pr(TR,b_j)
0.5
0.5
0.5
0.05
b_j
b_j
b_j
b_j
18
Multi-agent Tiger game
Value Function
Team behavior amongst agents i prefers
coordination with j
19
I-POMDPs
  • Theoretical Results
  • Proposition 1 (Sufficiency) In an I-POMDP,
    belief over is a sufficient statistic
    for the past history of is observations
  • Proposition 2 (Belief Update) Under the BNM and
    BNO assumptions, the belief update function for
    I-POMDPi is
  • Theorem 1 (Convergence) For any finitely nested
    I-POMDP, the Value Iteration algorithm starting
    from an arbitrary value function converges to a
    unique fixed point
  • Theorem 2 (PWLC) For any finitely nested
    I-POMDP, the value function is piecewise linear
    and convex

20
Research Contributions
  • Limitations of Nash equilibrium as a general
    multi-agent control paradigm in AI
  • Incomplete Does not say what to do
    off-equilibria
  • Non-unique Multiple solutions, no way to choose
  • Our approach complements Nash equilibrium
    adopts optimality and best response to
    anticipated actions, rather than stability
  • Game theoretic concepts Decision theory
    Strategic and long term planning
  • Formalizes greater autonomy amongst agents
    actions and
  • observations of other agents are not known, BNO,
    BNM
  • Applicable to games of cooperation and competition

21
Related Work
  • Multi-agent Decision-making
  • Learning in repeated games
  • Fictitious play
  • FudenbergLevine97
  • Rational (Bayesian) learning
  • KalaiLehrer93, Nyarko97
  • Learning in stochastic games
  • Multi-agent reinforcement learning
  • Littman94, HuWellman98, BowlingVeloso00
  • Other extensions of POMDPs
  • DEC-POMDP Restricted to team behavior (common
    payoffs)
  • Bernstein et.al.02, Nair et.al.03
  • Prior work places importance on Nash equilibrium
  • Learning in game theory ? attempt to justify Nash
    eq.
  • Learning in stochastic games ? impractical
    assumptions to obtain convergence to Nash eq.
  • Is the emphasis on Nash eq. in AI misguided ?

22
Proposed Work
  • Develop approximate solution techniques that
    tradeoff quality with time
  • Investigate the effect of increasing levels of
    belief nesting on error bounds of approximate
    solutions
  • Investigate if, and how solutions to I-POMDPs
    lead to Nash equilibrium type conditions
  • Study settings of the multi-agent Tiger game that
    lead to human-like social interaction patterns
  • Empirically evaluate the I-POMDP framework on
    another realistic problem domain
  • Develop a graphical model using the language of
    influence diagrams to solve I-POMDPs online

23
Thank You
Questions
Write a Comment
User Comments (0)
About PowerShow.com