Generalizing Plans to New Environments in Multiagent Relational MDPs - PowerPoint PPT Presentation

Loading...

PPT – Generalizing Plans to New Environments in Multiagent Relational MDPs PowerPoint presentation | free to download - id: 1158e0-YmYxY



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Generalizing Plans to New Environments in Multiagent Relational MDPs

Description:

Generalizing Plans to New Environments in Multiagent Relational MDPs. Carlos Guestrin ... [Guestrin, Koller, Parr `01] [Schweitzer and Seidmann 85] ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 40
Provided by: robo98
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Generalizing Plans to New Environments in Multiagent Relational MDPs


1
Generalizing Plans to New Environments in
Multiagent Relational MDPs
  • Carlos Guestrin
  • Daphne Koller
  • Stanford University

2
Multiagent Coordination Examples
  • Search and rescue
  • Factory management
  • Supply chain
  • Firefighting
  • Network routing
  • Air traffic control
  • Multiple, simultaneous decisions
  • Exponentially-large spaces
  • Limited observability
  • Limited communication

3
peasant
footman
  • Real-time Strategy Game
  • Peasants collect resources and build
  • Footmen attack enemies
  • Buildings train peasants and footmen

building
4
Scaling up by Generalization
  • Exploit similarities between world elements
  • Generalize plans
  • From a set of worlds to a new, unseen world
  • Avoid need to replan
  • Tackle larger problems

Formalize notion of similar elements Compute
generalizable plans
5
Relational Models and MDPs
  • Classes
  • Peasant, Gold, Wood, Barracks, Footman, Enemy
  • Relations
  • Collects, Builds, Trains, Attacks
  • Instances
  • Peasant1, Peasant2, Footman1, Enemy1
  • Value functions in class level
  • Objects of the same class have same contribution
    to value function
  • Factored MDP equivalents of PRMs Koller,
    Pfeffer 98

6
Relational MDPs
  • Class-level transition probabilities depends on
  • Attributes Actions Attributes of related
    objects
  • Class-level reward function
  • Instantiation (world)
  • Number objects Relations
  • Well-defined MDP

7
Planning in a World
  • Long-term planning by solving MDP
  • states exponential in number of objects
  • actions exponential
  • Efficient approximation by exploiting structure!
  • RMDP world is a factored MDP

8
Roadmap to Generalization
  • Solve 1 world
  • Compute generalizable value function
  • Tackle a new world

9
World is a Factored MDP
  • State
  • Dynamics
  • Decisions
  • Rewards

P(FF,G,H,AF)
10
Long-term Utility Value of MDP
  • Value computed by linear programming

Manne 60
  • One variable V (x) for each state
  • One constraint for each state x and action a
  • Number of states and actions exponential!

11
Approximate Value Functions
Linear combination of restricted domain functions
Bellman et al. 63 Tsitsiklis Van Roy
96 Koller Parr 99,00 Guestrin et al.
01
  • Each Vo depends on state of object and related
    objects
  • State of footman
  • Status of barracks
  • Must find Vo giving good approximate value
    function

12
Single LP Solution for Factored MDPs
Schweitzer and Seidmann 85
  • Variables for each Vo , for each object ?
  • Polynomially many LP variables
  • One constraint for every state and action ?
  • Exponentially many LP constraints
  • Vo , Qo depend on small sets of variables/actions
    ?
  • Exploit structure as in variable elimination
  • Guestrin, Koller, Parr 01

13
Representing Exponentially Many Constraints
Exponentially many linear one nonlinear
constraint
14
Variable Elimination
  • Can use variable elimination to maximize over
    state space Bertele Brioschi 72

Here we need only 23, instead of 63 sum operations
  • As in Bayes nets, maximization is exponential in
    tree-width

15
Representing the Constraints
  • Functions are factored, use Variable Elimination
    to represent constraints

Number of constraints exponentially smaller
16
Roadmap to Generalization
  • Solve 1 world
  • Compute generalizable value function
  • Tackle a new world

17
Generalization
  • Sample a set of worlds
  • Solve a linear program for these worlds
  • Obtain class value functions
  • When faced with new problem
  • Use class value function
  • No re-planning needed

18
Worlds and RMDPs
  • Meta-level MDP
  • Meta-level LP

19
Class-level Value Functions
  • Approximate solution to meta-level MDP
  • Linear approximation
  • Value function defined in the class level
  • All instances use same local value function

20
Class-level LP
  • Constraints for each world represented by
    factored LP
  • Number of worlds exponential or infinite
  • Sample worlds from P(?)

21
Theorem
  • Exponentially (infinitely) many worlds !
  • need exponentially many samples?

NO!
Value function within ?, with prob. at least 1-?.
Rmax is the maximum class reward Proof method
related to de Farias, Van Roy 02
22
LP with sampled worlds
  • Solve LP for sampled worlds
  • Use Factored LP for each world
  • Obtain class-level value function
  • New world instantiate value function and act

23
Learning Classes of Objects
  • Which classes of objects have same value
    function?
  • Plan for sampled worlds individually
  • Use value function as training data
  • Find objects with similar values
  • Include features of world
  • Used decision tree regression in experiments

24
Summary of Generalization Algorithm
  • Model domain as Relational MDPs
  • Pick local object value functions Vo
  • Learn classes by solving some instances
  • Sample set of worlds
  • Factored LP computes class-level value function

25
A New World
  • When faced with a new world ?, value function is
  • Q function becomes
  • At each state, choose action maximizing Q(x,a)
  • Number of actions is exponential!
  • Each QC depends only on a few objects!!!

26
Local Q function Approximation
Q(A1,,A4, X1,,X4)
Q(A1,,A4, X1,,X4) ¼ Q1(A1, A4, X1,X4) Q2(A1,
A2, X1,X2) Q3(A2, A3, X2,X3) Q4(A3, A4,
X3,X4)
Associated with Agent 3
Limited observability agent i only observes
variables in Qi
Must choose action to maximize åi Qi
27
Maximizing ?i Qi Coordination Graph
  • Use variable elimination for maximization
  • Bertele Brioschi 72
  • Limited communication for optimal action choice
  • Comm. bandwidth induced width of coord. graph

28
Summary of Algorithm
  • Model domain as Relational MDPs
  • Factored LP computes class-level value function
  • Reuse class-level value function in new world

29
Experimental Results
  • SysAdmin problem

30
Generalizing to New Problems
31
Generalizing to New Problems
32
Generalizing to New Problems
33
Classes of Objects Discovered
  • Learned 3 classes

34
Learning Classes of Objects
35
Learning Classes of Objects
36
Results
with Gearhart and Kanodia
  • 2 Peasants, Gold, Wood,
  • Barracks, 2 Footman, Enemy
  • Reward for dead enemy
  • About 1 million of state/action pairs
  • Solve with Factored LP
  • Some factors are exponential
  • Coordination graph for action selection

37
Generalization
  • 9 Peasants, Gold, Wood,
  • Barracks, 3 Footman, Enemy
  • Reward for dead enemy
  • About 3 trillion of state/action pairs
  • Instantiate generalizable value function
  • At run-time, factors are polynomial
  • Coordination graph for action selection

38
The 3 aspects of this talk
  • Scaling up collaborative multiagent planning
  • Exploiting structure
  • Generalization
  • Factored representation and algorithms
  • Relational MDP, Factored LP, coordination graph
  • Freecraft as a benchmark domain

39
Conclusions
  • RMDP
  • Compact representation for set of similar
    planning problems
  • Solve single instance with factored MDP
    algorithms
  • Tackle sets of problems with class-level value
    functions
  • Efficient sampling of worlds
  • Learn classes of value functions
  • Generalization to new domains
  • Avoid replanning
  • Solve larger, more complex MDPs
About PowerShow.com