An Overview of: Partially Observable Markov Decision Process PowerPoint PPT Presentation

presentation player overlay
1 / 17
About This Presentation
Transcript and Presenter's Notes

Title: An Overview of: Partially Observable Markov Decision Process


1
An Overview of Partially Observable Markov
Decision Process
  • Dr Karl Altenburg
  • 18 OCT 02

2
Introduction
  • The operation of a UAV (or any automated device)
    is dependent of decisions and actions
  • The decision of which action to take is typically
    based on the state of the environment
  • if sense_a_target then strike
  • if sense_a_threat then avoid
  • Understandably, given limited sensors, a UAVs
    view of the state of the environment is only
    partially complete
  • It cant see through mountains, for example

3
Introduction
  • A challenge in the design of intelligent agents,
    such as a UAV, is to find a very effective
    mapping of actions for various environmental
    states, even if the agent isnt completely sure
    what environmental its in

4
Markov Chains
  • Developed by A.A. Markov in 1906
  • Markov chains applied to chains of events
  • Repeated trials
  • Outcomes depend only on the outcome of the
    previous trial
  • Each experiment has a finite fixed number of
    outcome called states
  • The outcomes may be stochastic, that is, entry
    into the next state is based on a probability

5
Markov Chains
  • A Markov process may be depicted as a state
    transition diagram where the nodes are states and
    the transitions are probabilities leading to the
    succeeding state

0.5
heads
tails
0.5
0.5
0.5
6
STDs and Agent Specification
  • State Transitions Diagrams are also used to
    depict finite state automata, which are often
    used to describe, and even specify, autonomous
    agent behavior

7
FSM and Agents
8
Partially Observable Markov Decision Process
  • For a good introduction see
  • Atrash A, Koenig S. 2001. Probabilistic Planning
    for Behavior-Based Robotics. GA Tech/AAAI.
  • Describes a police robot that has a noisy sensor
    and the task of
  • searching rooms for victims
  • avoiding rooms with terrorists
  • Avoiding rooms with victims is bad
  • Searching rooms with terrorists is worse

9
POMDP
  • A POMDP may be described as follows
  • S finite set of states
  • O finite set of observations
  • ? initial state distribution
  • s current state
  • A(s) set of actions for that state
  • a an action
  • p(s s, a) transition function
  • q(o s, a) observation function
  • r(s, a) reward function

10
(No Transcript)
11
POMDP and Policy Graphs
  • Given some observed state of the world, a
    decision maker must take some action, which
    results in a new observation and a reward
  • The mapping of actions and observations may be
    done with a policy graph
  • Nodes are actions
  • Arcs are observations

12
(No Transcript)
13
Policy Graph
  • The objective of the planner is to derive a
    policy (graph) that maximizes the average total
    reward over an infinite planning horizon (not
    sure when things will end)
  • ? a discount factor
  • Typically set to slightly less than 1.0 (i.e.,
    0.9) to assure that total reward is always finite

14
Policy Graph
  • Mapping policy graphs to finite state automata is
    strait forward
  • Note
  • Optimal policy graphs can potentially be large
    but often are not
  • Finding optimal policy graph is PSPACE-complete
    (see, Papadimitriou Tsitsiklis), and only
    feasible for small planning tasks
  • There may be unreachable nodes, which can be
    removed

15
(No Transcript)
16
POMDP vs. FSM
  • POMDP assume discrete actions and observations
    made after each action
  • FSM assume continuous behavior and triggers could
    be observed at any time
  • POMDP planners assume every observation can be
    made in every state
  • Leads to a combinatorial explosion
  • FSM ignore non-trigger observations
  • Atomic packaging and abstracting of actions are
    solutions to these differences

17
Ideas
  • Enter or continue is very much like our UAV
    search or strike question
  • POMDP appear to be a good fit of a formal model
    for planning to emergent swarm intelligence
  • Work needs to be done to clearly define the set
    of states, actions, observation, and rewards
Write a Comment
User Comments (0)