A Framework for Planning in Continuous-time Stochastic Domains - PowerPoint PPT Presentation

About This Presentation

Title:

A Framework for Planning in Continuous-time Stochastic Domains

Description:

Rejecting a good policy. Probability of false positive: Accepting a bad policy (1 )-soundness ... positive samples. Anytime Policy Verification ... – PowerPoint PPT presentation

Number of Views:103

Avg rating:3.0/5.0

Slides: 35

Provided by: Asatisfied152

Learn more at: http://www.tempastic.org

Category:

more less

Transcript and Presenter's Notes

Title: A Framework for Planning in Continuous-time Stochastic Domains

1
A Framework for Planning in Continuous-time
Stochastic Domains
Håkan L. S. Younes Håkan L. S. Younes
Carnegie Mellon University Carnegie Mellon University
David J. Musliner Reid G. Simmons
Honeywell Laboratories Carnegie Mellon University
2
Introduction

Policy generation for complex domains
Uncertainty in outcome and timing of actions and
events
Time as a continuous quantity
Concurrency
Rich goal formalism
Achievement, maintenance, prevention
Deadlines

3
Motivating Example

Deliver package from CMU to Honeywell

PIT
CMU
Pittsburgh
MSP
Honeywell
Minneapolis
4
Elements of Uncertainty

Uncertain duration of flight and taxi ride
Plane can get full without reservation
Taxi might not be at airport when arriving in
Minneapolis
Package can get lost at airports

5
Modeling Uncertainty

Associate a delay distribution F(t) with each
action/event a
F(t) is the cumulative distribution function for
the delay from when a is enabled until it triggers

arrive
U(20,40)
driving
at airport
Pittsburgh Taxi
6
Concurrency

Concurrent semi-Markov processes

arrive
U(20,40)
driving
at airport
Pittsburgh Taxi
move
PT drivingMT at airport
Exp(1/40)
t0
at airport
moving
MT move
U(10,20)
Minneapolis Taxi
return
PT drivingMT moving
t24
Generalized semi-Markov process
7
Rich Goal Formalism

Goals specified as CSL formulae
? true a ? ? ? ?? Pr ?(?)
? ? Ut ? ?t ? ?t ?

8
Goal for Motivating Example

Probability at least 0.9 that the package reaches
Honeywell within 300 minutes without getting lost
on the way
Pr0.9(?pkg lost U300 pkg_at_Honeywell)

9
Problem Specification

Given
Complex domain model
Stochastic discrete event system
Initial state
Probabilistic temporally extended goal
CSL formula
Wanted
Policy satisfying goal formula in initial state

10
Generate, Test and Debug Simmons 88
Generate initial policy
good
Test if policy is good
bad
repeat
Debug and repair policy
11
Generate

Ways of generating initial policy
Generate policy for relaxed problem
Use existing policy for similar problem
Start with null policy
Start with random policy

Not focus of this talk!
Generate
Test
Debug
12
Test

Use discrete event simulation to generate sample
execution paths
Use acceptance sampling to verify probabilistic
CSL goal conditions

Generate
Test
Debug
13
Debug

Analyze sample paths generated in test step to
find reasons for failure
Change policy to reflect outcome of failure
analysis

Generate
Test
Debug
14
More on Test Step
Generate initial policy
Test
Test if policy is good
Debug and repair policy
15
Error Due to Sampling

Probability of false negative ?
Rejecting a good policy
Probability of false positive ?
Accepting a bad policy

(1-?)-soundness
16
Acceptance Sampling

Hypothesis Pr?(?)

17
Performance of Test
18
Ideal Performance
19
Realistic Performance
20
SequentialAcceptance Sampling Wald 45

Hypothesis Pr?(?)

21
Graphical Representation of Sequential Test
22
Graphical Representation of Sequential Test

We can find an acceptance line and a rejection
line given ?, ?, ?, and ?

A?,?,?,?(n)
R?,?,?,?(n)
23
Graphical Representation of Sequential Test

Reject hypothesis

24
Graphical Representation of Sequential Test

Accept hypothesis

25
Anytime Policy Verification

Find best acceptance and rejection lines after
each sample in terms of ? and ?

26
Verification Example

Initial policy for example problem

?0.01
Error probability
CPU time (seconds)
27
More on Debug Step
Generate initial policy
Test if policy is good
Debug
Debug and repair policy
28
Role of Negative Sample Paths

Negative sample paths provide evidence on how
policy can fail
Counter examples

29
Generic Repair Procedure

Select some state along some negative sample path
Change the action planned for the selected state

Need heuristics to make informed state/action
choices
30
Scoring States

Assign 1 to last state along negative sample
path and propagate backwards
Add over all negative sample paths

s9
failure
s1
s5
s2
s5
-1
-?
-?2
-?3
-?4
s1
s5
s3
failure
s5
-1
-?
-?2
-?3
31
Example

Package gets lost at Minneapolis airport while
waiting for the taxi
Repair store package until taxi arrives

32
Verification of Repaired Policy
?0.01
Error probability
CPU time (seconds)
33
Comparing Policies

Use acceptance sampling
Pair samples from the verification of two
policies
Count pairs where policies differ
Prefer first policy if probability is at least
0.5 of pairs where first policy is better

34
Summary