An Overview of: Partially Observable Markov Decision Process presentation

About This Presentation

Transcript and Presenter's Notes

Title: An Overview of: Partially Observable Markov Decision Process

1
An Overview of Partially Observable Markov
Decision Process

2
Introduction

The operation of a UAV (or any automated device)
is dependent of decisions and actions
The decision of which action to take is typically
based on the state of the environment
if sense_a_target then strike
if sense_a_threat then avoid
Understandably, given limited sensors, a UAVs
view of the state of the environment is only
partially complete
It cant see through mountains, for example

3
Introduction

A challenge in the design of intelligent agents,
such as a UAV, is to find a very effective
mapping of actions for various environmental
states, even if the agent isnt completely sure
what environmental its in

4
Markov Chains

Developed by A.A. Markov in 1906
Markov chains applied to chains of events
Repeated trials
Outcomes depend only on the outcome of the
previous trial
Each experiment has a finite fixed number of
outcome called states
The outcomes may be stochastic, that is, entry
into the next state is based on a probability

5
Markov Chains

A Markov process may be depicted as a state
transition diagram where the nodes are states and
the transitions are probabilities leading to the
succeeding state

0.5
heads
tails
0.5
0.5
0.5
6
STDs and Agent Specification

State Transitions Diagrams are also used to
depict finite state automata, which are often
used to describe, and even specify, autonomous
agent behavior

7
FSM and Agents
8
Partially Observable Markov Decision Process

For a good introduction see
Atrash A, Koenig S. 2001. Probabilistic Planning
for Behavior-Based Robotics. GA Tech/AAAI.
Describes a police robot that has a noisy sensor
and the task of
searching rooms for victims
avoiding rooms with terrorists
Avoiding rooms with victims is bad
Searching rooms with terrorists is worse

9
POMDP

10
(No Transcript)
11
POMDP and Policy Graphs

Given some observed state of the world, a
decision maker must take some action, which
results in a new observation and a reward
The mapping of actions and observations may be
done with a policy graph
Nodes are actions
Arcs are observations

12
(No Transcript)
13
Policy Graph

The objective of the planner is to derive a
policy (graph) that maximizes the average total
reward over an infinite planning horizon (not
sure when things will end)
? a discount factor
Typically set to slightly less than 1.0 (i.e.,
0.9) to assure that total reward is always finite

14
Policy Graph

Mapping policy graphs to finite state automata is
strait forward
Note
Optimal policy graphs can potentially be large
but often are not
Finding optimal policy graph is PSPACE-complete
(see, Papadimitriou Tsitsiklis), and only
feasible for small planning tasks
There may be unreachable nodes, which can be
removed

15
(No Transcript)
16
POMDP vs. FSM

17
Ideas

Enter or continue is very much like our UAV
search or strike question
POMDP appear to be a good fit of a formal model
for planning to emergent swarm intelligence
Work needs to be done to clearly define the set
of states, actions, observation, and rewards

Write a Comment

User Comments (0)

An Overview of: Partially Observable Markov Decision Process PowerPoint PPT Presentation