A BDI model for HighLevel Agent Control with a POMDP Planner Gavin Rens Meraka Institute Knowledge S - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

A BDI model for HighLevel Agent Control with a POMDP Planner Gavin Rens Meraka Institute Knowledge S

Description:

In Decision Analysis, we roll back a decision tree to decide' the action by ... We iteratively roll back from last decision nodes to first decision node ... – PowerPoint PPT presentation

Number of Views:62
Avg rating:3.0/5.0
Slides: 36
Provided by: ksgMer
Category:

less

Transcript and Presenter's Notes

Title: A BDI model for HighLevel Agent Control with a POMDP Planner Gavin Rens Meraka Institute Knowledge S


1
A BDI model for High-Level Agent Control with a
POMDP Planner Gavin RensMeraka
InstituteKnowledge Systems Group
  • Agents, robots and their architectures
  • The architecture in this work
  • The BDI model for control of agents and robots
  • The POMDP model for planning
  • Golog and the situation calculus
  • A POMDP planner in Golog
  • Future work on the planner

2
Agents, robots and their architectures
  • Distinguish between agent control and robot
    control
  • Agent autonomous embodied software entity
  • Robot autonomous independent hardware entity
  • My research considers agents with an eye on
    implementation on robots
  • Therefore, complex agents, not agents of MAS

3
Agents, robots and their architectures
  • Two fundamental components in robot
    architectures
  • High-level decision making / deliberative
    component
  • Low-level reactive component
  • Another fundamental paradigm in robotics
  • the SENSE-THINK-ACT cycle

4
Agents, robots and their architectures
  • A basic architecture could thus be

Deliberative Layer (Think)?
Robot / Agent
Reactive Layer (Interface)?
Act
Sense
Environment
5
The architecture in this work
  • Hybrid deliberative-reactive architecture
  • Also known as a 3T (3 tier) architecture

High-level (Deliberation)?
Controller (Will)?
Low-level (Reaction)?
6
The architecture in this work
  • I shall only be looking at the high-level /
    deliberative component, and the control component
  • When referred to together, these two components
    are conventionally known as high-level control
    in robotics
  • In the study of (software) agents, high-level
    control is sometimes all there is to an agent

7
The architecture in this work
  • The high-level component will be a plan generator
  • versus, e.g., a pre-compiled library of plans
  • The control-level component will be based on a
    Belief-Desire-Intention (BDI) model of agency

8
Conclusion to first part
  • My MSc will be comprised of three things
  • The development of a new planner for the
    deliberative component
  • The development of a BDI controller suitable for
    the 3T architecture (with robots in mind)?
  • Combining the planner and controller s.t.
    performance of agent is acceptable

9
Introduction to second part
  • Most of the rest of the talk is devoted to the
    planner
  • Next Brief review of basic ideas behind the BDI
    model

10
The BDI model for control of agents and robots
  • A BDI model has the following elements
  • A set of current beliefs
  • A belief revision function (based on new
    observations and current beliefs)?
  • A set of current desires (all reasonable
    courses of action)?
  • A desire generation function (based on beliefs
    and intentions)?
  • A set of current intentions (the desires
    committed to)?
  • A commitment function (choosing which desires to
    make intentions)?
  • A plan function (selects or gen.s) Wooldridge,
    2000b

11
The BDI model for control of agents and robots
  • A possible control loop for BDI model agents
  • B B0 // Initialize beliefs
  • I I0 // Initialize intentions
  • Loop forever
  • get next percept p
  • B brf (B, p)?
  • D desires (B, I)?
  • I commit (B, D, I)?
  • Pol plan(B, I, Actions)?
  • While not ( succeeded (I, B) or impossible (I, B)
    )?
  • see next slide...
  • end Loop forever

12
The BDI model for control of agents and robots
  • Inner while loop continued ...
  • While not ( succeeded (I, B) or impossible (I, B)
    )?
  • a head (Pol)?
  • execute (a)?
  • Pol tail (Pol)?
  • get next percept p
  • B brf (B, p)?
  • if intentions need to be reconsidered (according
    to new beliefs)?
  • take appropriate actions
  • end While
  • end Loop forever adapted from Wooldridge, 2002

13
The BDI model for control of agents and robots
  • The BDI architecture was originally developed
    with the software environment in mind
  • actions are deterministic
  • observations are complete (known with certainty)?
  • The BDI controller manages reactivity and
    attention, but traditionally does not handle
  • stochastic actions (may turn out diff. than
    intended)?
  • partial observability (unsure about obs.
    probabilistic)?

14
The BDI model for control of agents and robots
  • The plan() function for a BDI controller could be
    implemented as
  • a simple plan-library lookup procedure
  • a STRIPS style planner Fikes Nilsson, 1971
  • a planner based on theorem proving Green, 1969
  • some other classical AI planner

15
The POMDP model for planning
  • For an agent (robot) high-level controller to
    have the advantages of one based on the BDI
    model, and that deals with
  • stochastic actions, and
  • partial observations,
  • an idea is to augment the BDI controller with a
    planner that does deal with
  • stochastic actions, and
  • partial observations
  • POMDP planner, in particular

16
The POMDP model for planning
  • A partially observable Markov decision process
    (POMDP) has the following elements Kaelbling et
    al., 1995
  • Set of states of a system (world states)?
  • Set of (deterministic) actions
  • Set of possible/recognized observations
  • Transition function (probability with which
    action a, done in state s, will put agent in a
    specific other state s)?

17
The POMDP model for planning
  • Observation function (probability with which
    observation o will be perceived in state s after
    action a)?
  • Belief state (probability distribution over all
    states)?
  • Belief state b is a set of pairs (state, prob.)?

18
The POMDP model for planning
  • State estimation function ( bnew SE (o, a,
    bold) )?
  • (Belief update function)?
  • SE (o, a, bold)
  • for each state s
  • bnew(s) Pr (s o, a, bold)?
  • SE captures the Markov assumption a new state of
    belief depends only on the immediately previous
    observation, action and state of belief

19
The POMDP model for planning
  • Reward function Rs(a, s) (determines a reward for
    doing any action in any world state)?
  • Reward function Rb(a, b) over belief states
    derived from the function above, where a reward
    is proportional to the probability of being in a
    state
  • Ie,
  • where b(s) Pr(s b)?

20
The POMDP model for planning
  • Optimality prescription of utility theory
  • Maximize the expected sum of rewards that an
    agent gets on the next k steps, Kaelbling et
    al., 1995.
  • Ie, an agent should maximize
  • where rt is the reward received on time-step t

21
The POMDP model for planning
  • A policy is a description of the behaviour of an
    agent
  • i.e., a policy is a conditional plan
  • actions are recommended according to the state
    the agent is in, and the observation the agent
    makes in that state
  • Initial belief state b0

22
The POMDP model for planning
  • A policy tree in the POMDP model can thus be
    represented as in the diagram

A
A
1 step to go
O1
O1
O2
O2
A
A
A
A
Ok
Ok
t steps to go
A
A
23
The POMDP model for planning
  • Let p be a (conditional) policy, i.e., a policy
    tree
  • Let Vp,t (s)Value functionbe the expected sum
    of rewards gained from starting in state s and
    executing policy p for t steps
  • Optimal policy p can be defined as
  • p argmax p (Vp,h (S0))?

24
The POMDP model for planning
  • To implement p argmax p (Vp,t (s)), we use a
    decision tree search

possible new belief states
det. actions
current belief state
observations
stoc. actions
25
The POMDP model for planning
  • My work will concern planning for a finite number
    of agent actions
  • I.e., the agent designer sets the number of steps
    t h, where h is known as the planning horizon

26
The POMDP model for planning
  • Belief states in the decision tree are decision
    nodes
  • In Decision Analysis, we roll back a decision
    tree to decide the action by taking the action
    that results in maximum expected utility
  • We iteratively roll backfrom last decision nodes
    to first decision node

27
The POMDP model for planning
  • An agent can choose only its actions (the best),
    not what it observes
  • Therefore, a policy recommends actions
    conditioned on observations
  • As the decision tree is rolled back, the best
    decision/action is placed into the policy,
    conditioned on the most recent possible
    observations
  • This is the essence of the theory on which the
    POMDP planner is based that i developed at
    University RWTH, Aachen, Germany (under
    supervision of Alexander Ferrein)?

28
The situation calculus (in one slide)?
  • An extension of FOL
  • Actions and situations are reified
  • A situation is defined i.t.o. the predicates that
    hold in the situation
  • do(a, s) is a special term the name of the
    situation after doing action a in situation s
  • In the sit. calc. fluents are predicates whose
    truth value can change
  • Successor-state axioms define how fluent values
    change
  • Precondition axioms must also be provided for
    actions

29
Golog (in one slide)?
  • Based on the situation calculus
  • Invented as an agent programming language (APL)?
  • It has most of the constructs of regular
    procedural programming languages (iteration,
    conditionals, etc.)?
  • Complex actions (A) can be specified
  • while X do Z (iteration of actions)?
  • if X then Y else Z (conditional actions)?
  • a1 a2 ... ak (sequence of actions)?
  • a1 a2 (nondeterministic action)?
  • more ...
  • Do(A, s, s) holds iff A can terminate legally in
    s when started in situation s

30
A POMDP planner in Golog
  • For decision theoretic planning, the Do formula
    becomes the BestDo formula for (fully observable)
    MDPs Boutilier, et al., 2000
  • My work modifies BestDo to deal with POMDPs
  • Relation of my BestDoXxx to a POMDP-decision-tree

possible new belief states
current belief state
BestDoPo
BestDoPo
BestDoObserve
31
A POMDP planner in Golog BestDoPo
  • Introduction to BestDoPo and its arguments
  • BestDoPo (
  • program A (a complex action),
  • belief-state b,
  • horizon h,
  • policy PI,
  • value v,
  • program-probability)?
  • Initially BestDoPo (a1...an, b0, 7, PI?, v?)?

32
A POMDP planner in Golog BestDoPo
  • choiceNat(a) is all the possible actions that a
    could be realized as, in the environment (nature)?

33
A POMDP planner in Golog BestDoObserve
(probabilistic obser.)?

34
Future work on the planner
  • Finding exact optimal policies for POMDP problems
    is notoriously intractable
  • To make the planner more efficient, we can
    constrain the branching factor of the decision
    tree
  • When concentrating on info gathering, expand only
    the actions that produce highest EVI
  • When concentrating on task completion, expand
    only the actions that lead to most probable states

35
A BDI model for High-Level Agent Control with a
POMDP Planner
  • THANK YOU
  • FOR LISTENING
  • Contact Info.
  • Gavin Rens
  • grens _at_ csir.co.za
Write a Comment
User Comments (0)
About PowerShow.com