Structured Representations for POMDPs - PowerPoint PPT Presentation

Loading...

PPT – Structured Representations for POMDPs PowerPoint presentation | free to view - id: 1158ab-NjVkN



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Structured Representations for POMDPs

Description:

Flat States, Actions, Observations. Structured. States State variables ... [Guestrin, Koller and Parr, 2001] Problem a vectors become exponential in size ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 23
Provided by: guys3
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Structured Representations for POMDPs


1
Structured Representations for POMDPs
  • Guy Shani
  • Machine Learning and Applied Statistics
  • Microsoft Research

2
Structured vs. Flat
  • Flat States, Actions, Observations
  • Structured
  • States ? State variables
  • Actions ? Action variables
  • Observations ? Observation variables
  • State variables - X X1,,Xn
  • State - s ltx1,, xngt

3
System Dynamics as DBNsBoutilier and Poole,
1996
  • Dynamic Bayesian Networks 2-layered, model
    dynamic changes
  • Nodes Variables
  • Edges dependency
  • CPT conditional probability table

DBN for transition given action a
X
X
1
1
2
2
Pr(X1TX1T,X3F,a)0.2 Pr(X1FX1T,X3F,a)0.
8
3
3
4
4
4
Example Rock SampleSmith and Simmons, 2004
Action Sample rock i
t
t1
X
X
Y
Y
Goal move to interesting rocks and sample them.
Ri
R'i
5
CPTs as Decision Diagrams
  • Decision Diagrams
  • Inner nodes variables
  • Edges values (left False, right True)
  • Leaves hold values
  • Algebraic Decision Diagrams (ADD)
  • Nodes with identical children are removed
  • Context specific independence

CPT
ADD
Decision Diagram
X1
X1
X3
X3
.5
X3
.9
.2
.5
.9
.2
.5
6
ADD OperationsBryant, 1986
  • Product
  • Sum
  • Inner product
  • Variable elimination
  • Replacing each Xi by the sum of its children
  • Translation
  • Replacing each occurrence of X by Y
  • Assuming that Y did not appear in the original
    ADD
  • Reduce reduces an ADD to its minimal form
  • The order of variables is important
  • All operations are implemented using traversals
    over the ADDS
  • Execution is enhanced by caching visited paths

7
System Dynamics in Factored Form
  • tr(s,a,s) tr(ltx1,,xngt,a,ltx1,,xngt)
  • O(a,s,o) O(a, ,ltx1,,xngt,o)
  • Pa,o- Complete Action-Observation Diagram
  • Hansen and Feng, 2000
  • Can be computed by joining together CPTs (no
    products)
  • Problem - Resulting ADD might be large

8
Value Iteration
  • Beliefs as ADDs
  • a-vectors as ADDs
  • Point-based backup -
  • ADDs
  • Belief update
  • ADDs
  • Need to normalize using pr(ob,a)

9
Compressing ADDsHansen and Feng, 2001
  • ADD size influenced by distinct values
  • ADDs can be compressed by joining similar values
  • a-vector Join values that are e-close
  • Beliefs after joining values we must normalize
  • Never join zero and non-zero values

Compress 0.1 differences
X1
X1
X1
Reduce
X3
X3
.5
X3
X3
X3
.5
.9
.2
.9
.2
.6
.5
.9
.2
.5
10
Relevant VariablesShani et al. 2008
  • Some variables do not influence transitions or
    observations pr(xixi ,a) 1.0
  • A variable is relevant if it affects the
    transition or observation given an action.
  • The complete action-observation diagram can
    specify only relevant variables
  • Advantage complete action diagrams become
    smaller
  • Exact method no approximations

11
Relevant Variables
Action Sample rock 0
t
t1
X
X
Y
Y
R0
R0
Goal sample all good rocks Actions Move
(north, south, east, west) Check (long range
sensor) Sample (drill into rock)
R1
R1
R2
R2
12
Relevant Variables Results
  • Relevant variables and variable orders over the
    RockSample domain.

13
Example Network Administration
Given computers connected in a network
M0
M0
M0
M1
M1
M1
M3
M2
M2
M2
Goal reduce downtime Actions Ping a
machine Restart a machine No-op
M3
M3
14
Example Network Administration
t
t1
t2
M0
M0
M0
M1
M1
M1
No effect locality! After a few time steps
everything is influenced by everything. Relevant
variables trick does not hold.
M2
M2
M2
M3
M4
M4
M5
M5
M5
M6
M6
M6
15
Beliefs as Product of MarginalsBoyen and
Koller, 1998,Poupart, 2005
  • Intuition separate variables with low
    correlations
  • Replace a single belief ADD with a set of ADDs
    over disjoint sets of variables (components)
  • The belief over all variables is the product of
    the components ADDs
  • Exact if components are independent.

16
Beliefs as Products of Marginals
M0
M1
M2
M3
M4
Values
M0
M3
M4
M1
M2
17
Beliefs as Product of Marginals
  • Straight forward solution
  • First compute the complete belief ADD
  • Then eliminate variables
  • Advantage - Products are computed only once
  • Variable elimination
  • Eliminate variables after each ADD product
  • Keeps intermediate ADDs small
  • Runtime depends on ADD size
  • Need heuristics to order the products and
    eliminations
  • Disadvantage - Products are recomputed repeatedly

18
Experiments
19
Basis FunctionsGuestrin, Koller and Parr, 2001
  • Problem a vectors become exponential in size
  • Idea restrict a vectors to linear combinations
    of basis functions
  • Basis function a fixed function (a vector) over
    a subset of the state variables.
  • Reduction to basis functions can be done using LP
  • Can we compute the reduction without explicitly
    computing the complete function first?
  • As we do for the belief marginals.

20
Relational POMDPsWang, 2007
  • Captures identical dynamics over objects
  • Move(A,B)
  • Pre Clear(B),Clear(A),-On(A,B)
  • Effect
  • 0.7 On(A,B)
  • 0.3 On(A,B)
  • Stronger structure than regular factored POMDP
  • FODD First Order Decision Diagram
  • ADD over propositions

21
Other Structures
  • Exploiting different types of structure
  • Hierarchical a hierarchy of POMDPs
  • Pineau, 2002, Hansen, 2003, Foka et al.
    2007
  • Value-directed compression exploit structure in
    the value function
  • Poupart, 2003
  • Belief compression exploiting structure in
    reachable beliefs.
  • Roy et al., 2004, Pineau et al., 2003

22
Summary
  • Flat methods got us far
  • 10 states at 1998
  • 200,000 states at 2008
  • Factored methods got us to the next step
  • 20,000,000 states
  • We need to exploit more structure in order to
    scale up
  • Much research is needed
About PowerShow.com