# Structured Representations for POMDPs - PowerPoint PPT Presentation

PPT – Structured Representations for POMDPs PowerPoint presentation | free to view - id: 1158ab-NjVkN

The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
Title:

## Structured Representations for POMDPs

Description:

### Flat States, Actions, Observations. Structured. States State variables ... [Guestrin, Koller and Parr, 2001] Problem a vectors become exponential in size ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 23
Provided by: guys3
Category:
Tags:
Transcript and Presenter's Notes

Title: Structured Representations for POMDPs

1
Structured Representations for POMDPs
• Guy Shani
• Machine Learning and Applied Statistics
• Microsoft Research

2
Structured vs. Flat
• Flat States, Actions, Observations
• Structured
• States ? State variables
• Actions ? Action variables
• Observations ? Observation variables
• State variables - X X1,,Xn
• State - s ltx1,, xngt

3
System Dynamics as DBNsBoutilier and Poole,
1996
• Dynamic Bayesian Networks 2-layered, model
dynamic changes
• Nodes Variables
• Edges dependency
• CPT conditional probability table

DBN for transition given action a
X
X
1
1
2
2
Pr(X1TX1T,X3F,a)0.2 Pr(X1FX1T,X3F,a)0.
8
3
3
4
4
4
Example Rock SampleSmith and Simmons, 2004
Action Sample rock i
t
t1
X
X
Y
Y
Goal move to interesting rocks and sample them.
Ri
R'i
5
CPTs as Decision Diagrams
• Decision Diagrams
• Inner nodes variables
• Edges values (left False, right True)
• Leaves hold values
• Nodes with identical children are removed
• Context specific independence

CPT
Decision Diagram
X1
X1
X3
X3
.5
X3
.9
.2
.5
.9
.2
.5
6
• Product
• Sum
• Inner product
• Variable elimination
• Replacing each Xi by the sum of its children
• Translation
• Replacing each occurrence of X by Y
• Assuming that Y did not appear in the original
• Reduce reduces an ADD to its minimal form
• The order of variables is important
• All operations are implemented using traversals
• Execution is enhanced by caching visited paths

7
System Dynamics in Factored Form
• tr(s,a,s) tr(ltx1,,xngt,a,ltx1,,xngt)
• O(a,s,o) O(a, ,ltx1,,xngt,o)
• Pa,o- Complete Action-Observation Diagram
• Hansen and Feng, 2000
• Can be computed by joining together CPTs (no
products)
• Problem - Resulting ADD might be large

8
Value Iteration
• Point-based backup -
• Belief update
• Need to normalize using pr(ob,a)

9
• ADD size influenced by distinct values
• ADDs can be compressed by joining similar values
• a-vector Join values that are e-close
• Beliefs after joining values we must normalize
• Never join zero and non-zero values

Compress 0.1 differences
X1
X1
X1
Reduce
X3
X3
.5
X3
X3
X3
.5
.9
.2
.9
.2
.6
.5
.9
.2
.5
10
Relevant VariablesShani et al. 2008
• Some variables do not influence transitions or
observations pr(xixi ,a) 1.0
• A variable is relevant if it affects the
transition or observation given an action.
• The complete action-observation diagram can
specify only relevant variables
• Advantage complete action diagrams become
smaller
• Exact method no approximations

11
Relevant Variables
Action Sample rock 0
t
t1
X
X
Y
Y
R0
R0
Goal sample all good rocks Actions Move
(north, south, east, west) Check (long range
sensor) Sample (drill into rock)
R1
R1
R2
R2
12
Relevant Variables Results
• Relevant variables and variable orders over the
RockSample domain.

13
Given computers connected in a network
M0
M0
M0
M1
M1
M1
M3
M2
M2
M2
Goal reduce downtime Actions Ping a
machine Restart a machine No-op
M3
M3
14
t
t1
t2
M0
M0
M0
M1
M1
M1
No effect locality! After a few time steps
everything is influenced by everything. Relevant
variables trick does not hold.
M2
M2
M2
M3
M4
M4
M5
M5
M5
M6
M6
M6
15
Beliefs as Product of MarginalsBoyen and
Koller, 1998,Poupart, 2005
• Intuition separate variables with low
correlations
over disjoint sets of variables (components)
• The belief over all variables is the product of
• Exact if components are independent.

16
Beliefs as Products of Marginals
M0
M1
M2
M3
M4
Values
M0
M3
M4
M1
M2
17
Beliefs as Product of Marginals
• Straight forward solution
• First compute the complete belief ADD
• Then eliminate variables
• Advantage - Products are computed only once
• Variable elimination
• Eliminate variables after each ADD product
• Runtime depends on ADD size
• Need heuristics to order the products and
eliminations
• Disadvantage - Products are recomputed repeatedly

18
Experiments
19
Basis FunctionsGuestrin, Koller and Parr, 2001
• Problem a vectors become exponential in size
• Idea restrict a vectors to linear combinations
of basis functions
• Basis function a fixed function (a vector) over
a subset of the state variables.
• Reduction to basis functions can be done using LP
• Can we compute the reduction without explicitly
computing the complete function first?
• As we do for the belief marginals.

20
Relational POMDPsWang, 2007
• Captures identical dynamics over objects
• Move(A,B)
• Pre Clear(B),Clear(A),-On(A,B)
• Effect
• 0.7 On(A,B)
• 0.3 On(A,B)
• Stronger structure than regular factored POMDP
• FODD First Order Decision Diagram

21
Other Structures
• Exploiting different types of structure
• Hierarchical a hierarchy of POMDPs
• Pineau, 2002, Hansen, 2003, Foka et al.
2007
• Value-directed compression exploit structure in
the value function
• Poupart, 2003
• Belief compression exploiting structure in
reachable beliefs.
• Roy et al., 2004, Pineau et al., 2003

22
Summary
• Flat methods got us far
• 10 states at 1998
• 200,000 states at 2008
• Factored methods got us to the next step
• 20,000,000 states
• We need to exploit more structure in order to
scale up
• Much research is needed