Policy%20Generation%20for%20Continuous-time%20Stochastic%20Domains%20with%20Concurrency - PowerPoint PPT Presentation

About This Presentation
Title:

Policy%20Generation%20for%20Continuous-time%20Stochastic%20Domains%20with%20Concurrency

Description:

Uncertain duration of flight and taxi ride. Plane can get full without reservation ... Use discrete event simulation to generate sample execution paths ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 26
Provided by: hakany3
Learn more at: http://www.tempastic.org
Category:

less

Transcript and Presenter's Notes

Title: Policy%20Generation%20for%20Continuous-time%20Stochastic%20Domains%20with%20Concurrency


1
Policy Generation forContinuous-time Stochastic
Domains with Concurrency
Håkan L. S. Younes Reid G. Simmons
Carnegie Mellon University Carnegie Mellon University
2
Introduction
  • Policy generation for asynchronous stochastic
    systems
  • Rich goal formalism
  • policy generation and repair
  • Solve relaxed problem using deterministic
    temporal planner
  • Decision tree learning to generalize plan
  • Sample path analysis to guide repair

3
Motivating Example
  • Deliver package from CMU to Honeywell

PIT
CMU
Pittsburgh
MSP
Honeywell
Minneapolis
4
Elements of Uncertainty
  • Uncertain duration of flight and taxi ride
  • Plane can get full without reservation
  • Taxi might not be at airport when arriving in
    Minneapolis
  • Package can get lost at airports

Asynchronous events ? not semi-Markov
5
Asynchronous Events
  • While the taxi is on its way to the airport, the
    plan may become full

fill plane _at_ t0
driving taxiplane not full
driving taxi plane full
Arrival time distribution
F(tt gt t0)
F(t)
6
Rich Goal Formalism
  • Goals specified as CSL formulae
  • ? true a ? ? ? ?? P? (? U T ?)
  • Goal example
  • Probability is at least 0.9 that the package
    reaches Honeywell within 300 minutes without
    getting lost on the way
  • P0.9 (?lostpackage U 300 atpkg,honeywell)

7
Problem Specification
  • Given
  • Complex domain model
  • Stochastic discrete event system
  • Initial state
  • Probabilistic temporally extended goal
  • CSL formula
  • Wanted
  • Policy satisfying goal formula in initial state

8
Generate, Test and Debug Simmons, AAAI-88
Generate initial policy
good
Test if policy is good
bad
repeat
Debug and repair policy
9
Generate
  • Ways of generating initial policy
  • Generate policy for relaxed problem
  • Use existing policy for similar problem
  • Start with null policy
  • Start with random policy

Generate
Test
Debug
10
Test Younes et al., ICAPS-03
  • Use discrete event simulation to generate sample
    execution paths
  • Use acceptance sampling to verify probabilistic
    CSL goal conditions

Generate
Test
Debug
11
Debug
  • Analyze sample paths generated in test step to
    find reasons for failure
  • Change policy to reflect outcome of failure
    analysis

Generate
Test
Debug
12
Closer Look at Generate Step
Gener
Generate initial policy
Test if policy is good
Debug and repair policy
13
Policy Generation
Probabilistic planning problem
Eliminate uncertainty
Deterministic planning problem
Solve using temporal planner(e.g. VHPOP Younes
Simmons, JAIR 20)
Temporal plan
Generate training databy simulating plan
State-action pairs
Decision tree learning
Policy (decision tree)
14
Conversion to Deterministic Planning Problem
  • Assume we can control nature
  • Exogenous events are treated as actions
  • Actions with probabilistic effects are split into
    multiple deterministic actions
  • Trigger time distributions are turned into
    interval duration constraints
  • Objective Find some execution trace satisfying
    path formula ?1 U T ?2 of probabilistic goal P?
    (?1 U T ?2)

15
Generating Training Data
enter-taxime,pgh-taxi,cmu
s0 enter-taxime,pgh-taxi,cmu
s1 depart-taxime,pgh-taxi,cmu,pgh-airport
depart-planeplane,pgh-airport,mpls-airport
s2 idle
depart-taxime,pgh-taxi,cmu,pgh-airport
s3 leave-taxime,pgh-taxi,pgh-airport
arrive-taxipgh-taxi,cmu,pgh-airport
s4 check-inme,plane,pgh-airport
leave-taxime,pgh-taxi,pgh-airport
s5 idle

check-inme,plane,pgh-airport
s0
s3
s6
s1
s4
s2
s5
16
Policy Tree
atpgh-taxi,cmu
atme,cmu
atplane,mpls-airport
atmpls-taxi,mpls-airport
atme,pgh-airport
enter-taxi
depart-taxi
inme,plane
movingmpls-taxi,mpls-airport,honeywell
atme,mpls-airport
check-in
movingpgh-taxi,cmu,pgh-airport
enter-taxi
depart-taxi
leave-taxi
idle
idle
leave-taxi
idle
17
Closer Look at Debug Step
Generate initial policy
Test if policy is good
Debug
Debug and repair policy
18
Policy Debugging
Sample execution paths
Sample path analysis
Failure scenarios
Solve deterministic planning problem taking
failure scenario into account
Temporal plan
Generate training databy simulating plan
State-action pairs
Incremental decision tree learningUtgoff et
al., MLJ 29
Revised policy
19
Sample Path Analysis
  • Construct Markov chain from paths
  • Assign values to states
  • Failure 1 Success 1
  • All other
  • Assign values to events
  • V(s') V(s) for transition s?s' caused by e
  • Generate failure scenarios

20
Sample Path Analysis Example
Sample paths
e1
e2
s0
s1
s2
e1
e4
e2
s0
s1
s4
s2
e3
s0
s3
21
Failure Scenarios
Failure paths
e1
e2
s0
s1
s2
e1
e4
e2
s0
s1
s4
s2
Failure path 1 Failure path 2 Failure scenario
e1 _at_ 1.2 e1 _at_ 1.6 e1 _at_ 1.4
e2 _at_ 4.4 e4 _at_ 4.5 e2 _at_ 4.6
- e2 _at_ 4.8 -
22
Additional Training Data
leave-taxime,pgh-taxi,cmu
s0 leave-taxime,pgh-taxi,cmu
s1 make-reservationme,plane,cmu
depart-planeplane,pgh-airport,mpls-airport
s2 enter-taxime,pgh-taxi,cmu
fill-planeplane,pgh-airport
s3 depart-taxime,pgh-taxi,cmu,pgh-airport
make-reservationme,plane,cmu
s4 idle
enter-taxime,pgh-taxi,cmu
s5 idle

depart-taxime,pgh-taxi,cmu,pgh-airport
arrive-taxipgh-taxi,cmu,pgh-airport
s0
s6
s5
s4
s1
s3
s2
23
Revised Policy Tree
atpgh-taxi,cmu
atme,cmu

has-reservationme,plane
has-reservationme,plane
enter-taxi
depart-taxi
make-reservation
leave-taxi
24
Summary
  • Planning with stochastic asynchronous events
    using a deterministic planner
  • Decision tree learning to generalize
    deterministic plan
  • Sample path analysis for generating failure
    scenarios to guide plan repair

25
Coming Attractions
  • Decision theoretic planning with asynchronous
    events
  • A formalism for stochastic decision processes
    with asynchronous events, MDP Workshop at
    AAAI-04
  • Solving GSMDPs using continuous phase-type
    distributions, AAAI-04
Write a Comment
User Comments (0)
About PowerShow.com