A BDI model for HighLevel Agent Control with a POMDP Planner Gavin Rens Meraka Institute Knowledge S - PowerPoint PPT Presentation

1 / 35

About This Presentation

Title:

A BDI model for HighLevel Agent Control with a POMDP Planner Gavin Rens Meraka Institute Knowledge S

Description:

In Decision Analysis, we roll back a decision tree to decide' the action by ... We iteratively roll back from last decision nodes to first decision node ... – PowerPoint PPT presentation

Number of Views:62

Avg rating:3.0/5.0

Slides: 36

Provided by: ksgMer

Category:

more less

Transcript and Presenter's Notes

Title: A BDI model for HighLevel Agent Control with a POMDP Planner Gavin Rens Meraka Institute Knowledge S

1
A BDI model for High-Level Agent Control with a
POMDP Planner Gavin RensMeraka
InstituteKnowledge Systems Group

Agents, robots and their architectures
The architecture in this work
The BDI model for control of agents and robots
The POMDP model for planning
Golog and the situation calculus
A POMDP planner in Golog
Future work on the planner

2
Agents, robots and their architectures

Distinguish between agent control and robot
control
Agent autonomous embodied software entity
Robot autonomous independent hardware entity
My research considers agents with an eye on
implementation on robots
Therefore, complex agents, not agents of MAS

3
Agents, robots and their architectures

Two fundamental components in robot
architectures
High-level decision making / deliberative
component
Low-level reactive component
Another fundamental paradigm in robotics
the SENSE-THINK-ACT cycle

4
Agents, robots and their architectures

A basic architecture could thus be

Deliberative Layer (Think)?
Robot / Agent
Reactive Layer (Interface)?
Act
Sense
Environment
5
The architecture in this work

Hybrid deliberative-reactive architecture
Also known as a 3T (3 tier) architecture

High-level (Deliberation)?
Controller (Will)?
Low-level (Reaction)?
6
The architecture in this work

I shall only be looking at the high-level /
deliberative component, and the control component
When referred to together, these two components
are conventionally known as high-level control
in robotics
In the study of (software) agents, high-level
control is sometimes all there is to an agent

7
The architecture in this work

The high-level component will be a plan generator
versus, e.g., a pre-compiled library of plans
The control-level component will be based on a
Belief-Desire-Intention (BDI) model of agency

8
Conclusion to first part

My MSc will be comprised of three things
The development of a new planner for the
deliberative component
The development of a BDI controller suitable for
the 3T architecture (with robots in mind)?
Combining the planner and controller s.t.
performance of agent is acceptable

9
Introduction to second part

Most of the rest of the talk is devoted to the
planner
Next Brief review of basic ideas behind the BDI
model

10
The BDI model for control of agents and robots

A BDI model has the following elements
A set of current beliefs
A belief revision function (based on new
observations and current beliefs)?
A set of current desires (all reasonable
courses of action)?
A desire generation function (based on beliefs
and intentions)?
A set of current intentions (the desires
committed to)?
A commitment function (choosing which desires to
make intentions)?
A plan function (selects or gen.s) Wooldridge,
2000b

11
The BDI model for control of agents and robots

A possible control loop for BDI model agents
B B0 // Initialize beliefs
I I0 // Initialize intentions
Loop forever
get next percept p
B brf (B, p)?
D desires (B, I)?
I commit (B, D, I)?
Pol plan(B, I, Actions)?
While not ( succeeded (I, B) or impossible (I, B)
)?
see next slide...
end Loop forever

12
The BDI model for control of agents and robots

Inner while loop continued ...
While not ( succeeded (I, B) or impossible (I, B)
)?
a head (Pol)?
execute (a)?
Pol tail (Pol)?
get next percept p
B brf (B, p)?
if intentions need to be reconsidered (according
to new beliefs)?
take appropriate actions
end While
end Loop forever adapted from Wooldridge, 2002

13
The BDI model for control of agents and robots

The BDI architecture was originally developed
with the software environment in mind
actions are deterministic
observations are complete (known with certainty)?
The BDI controller manages reactivity and
attention, but traditionally does not handle
stochastic actions (may turn out diff. than
intended)?
partial observability (unsure about obs.
probabilistic)?

14
The BDI model for control of agents and robots

The plan() function for a BDI controller could be
implemented as
a simple plan-library lookup procedure
a STRIPS style planner Fikes Nilsson, 1971
a planner based on theorem proving Green, 1969
some other classical AI planner

15
The POMDP model for planning

For an agent (robot) high-level controller to
have the advantages of one based on the BDI
model, and that deals with
stochastic actions, and
partial observations,
an idea is to augment the BDI controller with a
planner that does deal with
stochastic actions, and
partial observations
POMDP planner, in particular

16
The POMDP model for planning

A partially observable Markov decision process
(POMDP) has the following elements Kaelbling et
al., 1995
Set of states of a system (world states)?
Set of (deterministic) actions
Set of possible/recognized observations
Transition function (probability with which
action a, done in state s, will put agent in a
specific other state s)?

17
The POMDP model for planning

Observation function (probability with which
observation o will be perceived in state s after
action a)?
Belief state (probability distribution over all
states)?
Belief state b is a set of pairs (state, prob.)?

18
The POMDP model for planning

State estimation function ( bnew SE (o, a,
bold) )?
(Belief update function)?
SE (o, a, bold)
for each state s
bnew(s) Pr (s o, a, bold)?
SE captures the Markov assumption a new state of
belief depends only on the immediately previous
observation, action and state of belief

19
The POMDP model for planning

Reward function Rs(a, s) (determines a reward for
doing any action in any world state)?
Reward function Rb(a, b) over belief states
derived from the function above, where a reward
is proportional to the probability of being in a
state
Ie,
where b(s) Pr(s b)?

20
The POMDP model for planning

Optimality prescription of utility theory
Maximize the expected sum of rewards that an
agent gets on the next k steps, Kaelbling et
al., 1995.
Ie, an agent should maximize
where rt is the reward received on time-step t

21
The POMDP model for planning

A policy is a description of the behaviour of an
agent
i.e., a policy is a conditional plan
actions are recommended according to the state
the agent is in, and the observation the agent
makes in that state
Initial belief state b0

22
The POMDP model for planning

A policy tree in the POMDP model can thus be
represented as in the diagram

A
A
1 step to go
O1
O1
O2
O2
A
A
A
A
Ok
Ok
t steps to go
A
A
23
The POMDP model for planning

Let p be a (conditional) policy, i.e., a policy
tree
Let Vp,t (s)Value functionbe the expected sum
of rewards gained from starting in state s and
executing policy p for t steps
Optimal policy p can be defined as
p argmax p (Vp,h (S0))?

24
The POMDP model for planning

To implement p argmax p (Vp,t (s)), we use a
decision tree search

possible new belief states
det. actions
current belief state
observations
stoc. actions
25
The POMDP model for planning

My work will concern planning for a finite number
of agent actions
I.e., the agent designer sets the number of steps
t h, where h is known as the planning horizon

26
The POMDP model for planning

Belief states in the decision tree are decision
nodes
In Decision Analysis, we roll back a decision
tree to decide the action by taking the action
that results in maximum expected utility
We iteratively roll backfrom last decision nodes
to first decision node

27
The POMDP model for planning

An agent can choose only its actions (the best),
not what it observes
Therefore, a policy recommends actions
conditioned on observations
As the decision tree is rolled back, the best
decision/action is placed into the policy,
conditioned on the most recent possible
observations
This is the essence of the theory on which the
POMDP planner is based that i developed at
University RWTH, Aachen, Germany (under
supervision of Alexander Ferrein)?

28
The situation calculus (in one slide)?

An extension of FOL
Actions and situations are reified
A situation is defined i.t.o. the predicates that
hold in the situation
do(a, s) is a special term the name of the
situation after doing action a in situation s
In the sit. calc. fluents are predicates whose
truth value can change
Successor-state axioms define how fluent values
change
Precondition axioms must also be provided for
actions

29
Golog (in one slide)?

Based on the situation calculus
Invented as an agent programming language (APL)?
It has most of the constructs of regular
procedural programming languages (iteration,
conditionals, etc.)?
Complex actions (A) can be specified
while X do Z (iteration of actions)?
if X then Y else Z (conditional actions)?
a1 a2 ... ak (sequence of actions)?
a1 a2 (nondeterministic action)?
more ...
Do(A, s, s) holds iff A can terminate legally in
s when started in situation s

30
A POMDP planner in Golog

For decision theoretic planning, the Do formula
becomes the BestDo formula for (fully observable)
MDPs Boutilier, et al., 2000
My work modifies BestDo to deal with POMDPs
Relation of my BestDoXxx to a POMDP-decision-tree

possible new belief states
current belief state
BestDoPo
BestDoPo
BestDoObserve
31
A POMDP planner in Golog BestDoPo

Introduction to BestDoPo and its arguments
BestDoPo (
program A (a complex action),
belief-state b,
horizon h,
policy PI,
value v,
program-probability)?
Initially BestDoPo (a1...an, b0, 7, PI?, v?)?

32
A POMDP planner in Golog BestDoPo

choiceNat(a) is all the possible actions that a
could be realized as, in the environment (nature)?

33
A POMDP planner in Golog BestDoObserve
(probabilistic obser.)?

34
Future work on the planner

Finding exact optimal policies for POMDP problems
is notoriously intractable
To make the planner more efficient, we can
constrain the branching factor of the decision
tree
When concentrating on info gathering, expand only
the actions that produce highest EVI
When concentrating on task completion, expand
only the actions that lead to most probable states

35
A BDI model for High-Level Agent Control with a
POMDP Planner