Title: A BDI model for HighLevel Agent Control with a POMDP Planner Gavin Rens Meraka Institute Knowledge S
1A BDI model for High-Level Agent Control with a
POMDP Planner Gavin RensMeraka
InstituteKnowledge Systems Group
- Agents, robots and their architectures
- The architecture in this work
- The BDI model for control of agents and robots
- The POMDP model for planning
- Golog and the situation calculus
- A POMDP planner in Golog
- Future work on the planner
2Agents, robots and their architectures
- Distinguish between agent control and robot
control - Agent autonomous embodied software entity
- Robot autonomous independent hardware entity
- My research considers agents with an eye on
implementation on robots - Therefore, complex agents, not agents of MAS
3Agents, robots and their architectures
- Two fundamental components in robot
architectures - High-level decision making / deliberative
component - Low-level reactive component
- Another fundamental paradigm in robotics
- the SENSE-THINK-ACT cycle
4Agents, robots and their architectures
- A basic architecture could thus be
Deliberative Layer (Think)?
Robot / Agent
Reactive Layer (Interface)?
Act
Sense
Environment
5The architecture in this work
- Hybrid deliberative-reactive architecture
- Also known as a 3T (3 tier) architecture
High-level (Deliberation)?
Controller (Will)?
Low-level (Reaction)?
6The architecture in this work
- I shall only be looking at the high-level /
deliberative component, and the control component - When referred to together, these two components
are conventionally known as high-level control
in robotics - In the study of (software) agents, high-level
control is sometimes all there is to an agent
7The architecture in this work
- The high-level component will be a plan generator
- versus, e.g., a pre-compiled library of plans
- The control-level component will be based on a
Belief-Desire-Intention (BDI) model of agency
8Conclusion to first part
- My MSc will be comprised of three things
- The development of a new planner for the
deliberative component - The development of a BDI controller suitable for
the 3T architecture (with robots in mind)? - Combining the planner and controller s.t.
performance of agent is acceptable
9Introduction to second part
- Most of the rest of the talk is devoted to the
planner - Next Brief review of basic ideas behind the BDI
model
10The BDI model for control of agents and robots
- A BDI model has the following elements
- A set of current beliefs
- A belief revision function (based on new
observations and current beliefs)? - A set of current desires (all reasonable
courses of action)? - A desire generation function (based on beliefs
and intentions)? - A set of current intentions (the desires
committed to)? - A commitment function (choosing which desires to
make intentions)? - A plan function (selects or gen.s) Wooldridge,
2000b
11The BDI model for control of agents and robots
- A possible control loop for BDI model agents
- B B0 // Initialize beliefs
- I I0 // Initialize intentions
- Loop forever
- get next percept p
- B brf (B, p)?
- D desires (B, I)?
- I commit (B, D, I)?
- Pol plan(B, I, Actions)?
- While not ( succeeded (I, B) or impossible (I, B)
)? - see next slide...
- end Loop forever
12The BDI model for control of agents and robots
- Inner while loop continued ...
- While not ( succeeded (I, B) or impossible (I, B)
)? - a head (Pol)?
- execute (a)?
- Pol tail (Pol)?
- get next percept p
- B brf (B, p)?
- if intentions need to be reconsidered (according
to new beliefs)? - take appropriate actions
- end While
- end Loop forever adapted from Wooldridge, 2002
13The BDI model for control of agents and robots
- The BDI architecture was originally developed
with the software environment in mind - actions are deterministic
- observations are complete (known with certainty)?
- The BDI controller manages reactivity and
attention, but traditionally does not handle - stochastic actions (may turn out diff. than
intended)? - partial observability (unsure about obs.
probabilistic)?
14The BDI model for control of agents and robots
- The plan() function for a BDI controller could be
implemented as - a simple plan-library lookup procedure
- a STRIPS style planner Fikes Nilsson, 1971
- a planner based on theorem proving Green, 1969
- some other classical AI planner
15The POMDP model for planning
- For an agent (robot) high-level controller to
have the advantages of one based on the BDI
model, and that deals with - stochastic actions, and
- partial observations,
- an idea is to augment the BDI controller with a
planner that does deal with - stochastic actions, and
- partial observations
- POMDP planner, in particular
16The POMDP model for planning
- A partially observable Markov decision process
(POMDP) has the following elements Kaelbling et
al., 1995 - Set of states of a system (world states)?
- Set of (deterministic) actions
- Set of possible/recognized observations
- Transition function (probability with which
action a, done in state s, will put agent in a
specific other state s)?
17The POMDP model for planning
- Observation function (probability with which
observation o will be perceived in state s after
action a)? - Belief state (probability distribution over all
states)? - Belief state b is a set of pairs (state, prob.)?
18The POMDP model for planning
- State estimation function ( bnew SE (o, a,
bold) )? - (Belief update function)?
- SE (o, a, bold)
- for each state s
- bnew(s) Pr (s o, a, bold)?
- SE captures the Markov assumption a new state of
belief depends only on the immediately previous
observation, action and state of belief
19The POMDP model for planning
- Reward function Rs(a, s) (determines a reward for
doing any action in any world state)? - Reward function Rb(a, b) over belief states
derived from the function above, where a reward
is proportional to the probability of being in a
state - Ie,
- where b(s) Pr(s b)?
20The POMDP model for planning
- Optimality prescription of utility theory
- Maximize the expected sum of rewards that an
agent gets on the next k steps, Kaelbling et
al., 1995. - Ie, an agent should maximize
- where rt is the reward received on time-step t
21The POMDP model for planning
- A policy is a description of the behaviour of an
agent - i.e., a policy is a conditional plan
- actions are recommended according to the state
the agent is in, and the observation the agent
makes in that state - Initial belief state b0
22The POMDP model for planning
- A policy tree in the POMDP model can thus be
represented as in the diagram
A
A
1 step to go
O1
O1
O2
O2
A
A
A
A
Ok
Ok
t steps to go
A
A
23The POMDP model for planning
- Let p be a (conditional) policy, i.e., a policy
tree - Let Vp,t (s)Value functionbe the expected sum
of rewards gained from starting in state s and
executing policy p for t steps - Optimal policy p can be defined as
- p argmax p (Vp,h (S0))?
24The POMDP model for planning
- To implement p argmax p (Vp,t (s)), we use a
decision tree search
possible new belief states
det. actions
current belief state
observations
stoc. actions
25The POMDP model for planning
- My work will concern planning for a finite number
of agent actions - I.e., the agent designer sets the number of steps
t h, where h is known as the planning horizon
26The POMDP model for planning
- Belief states in the decision tree are decision
nodes - In Decision Analysis, we roll back a decision
tree to decide the action by taking the action
that results in maximum expected utility - We iteratively roll backfrom last decision nodes
to first decision node
27The POMDP model for planning
- An agent can choose only its actions (the best),
not what it observes - Therefore, a policy recommends actions
conditioned on observations - As the decision tree is rolled back, the best
decision/action is placed into the policy,
conditioned on the most recent possible
observations - This is the essence of the theory on which the
POMDP planner is based that i developed at
University RWTH, Aachen, Germany (under
supervision of Alexander Ferrein)?
28The situation calculus (in one slide)?
- An extension of FOL
- Actions and situations are reified
- A situation is defined i.t.o. the predicates that
hold in the situation - do(a, s) is a special term the name of the
situation after doing action a in situation s - In the sit. calc. fluents are predicates whose
truth value can change - Successor-state axioms define how fluent values
change - Precondition axioms must also be provided for
actions
29Golog (in one slide)?
- Based on the situation calculus
- Invented as an agent programming language (APL)?
- It has most of the constructs of regular
procedural programming languages (iteration,
conditionals, etc.)? - Complex actions (A) can be specified
- while X do Z (iteration of actions)?
- if X then Y else Z (conditional actions)?
- a1 a2 ... ak (sequence of actions)?
- a1 a2 (nondeterministic action)?
- more ...
- Do(A, s, s) holds iff A can terminate legally in
s when started in situation s
30A POMDP planner in Golog
- For decision theoretic planning, the Do formula
becomes the BestDo formula for (fully observable)
MDPs Boutilier, et al., 2000 - My work modifies BestDo to deal with POMDPs
- Relation of my BestDoXxx to a POMDP-decision-tree
possible new belief states
current belief state
BestDoPo
BestDoPo
BestDoObserve
31A POMDP planner in Golog BestDoPo
- Introduction to BestDoPo and its arguments
- BestDoPo (
- program A (a complex action),
- belief-state b,
- horizon h,
- policy PI,
- value v,
- program-probability)?
- Initially BestDoPo (a1...an, b0, 7, PI?, v?)?
32A POMDP planner in Golog BestDoPo
- choiceNat(a) is all the possible actions that a
could be realized as, in the environment (nature)?
33A POMDP planner in Golog BestDoObserve
(probabilistic obser.)?
34Future work on the planner
- Finding exact optimal policies for POMDP problems
is notoriously intractable - To make the planner more efficient, we can
constrain the branching factor of the decision
tree - When concentrating on info gathering, expand only
the actions that produce highest EVI - When concentrating on task completion, expand
only the actions that lead to most probable states
35A BDI model for High-Level Agent Control with a
POMDP Planner
- THANK YOU
- FOR LISTENING
- Contact Info.
- Gavin Rens
- grens _at_ csir.co.za