Title: A polynomial time algorithm for constructing k-maintainable policies
1A polynomial time algorithm for constructing
k-maintainable policies
- Chitta Baral
- Arizona State University
- and
- Thomas Eiter
- Vienna University of Technology
2Motivation What is maintain f?
- Always f, also written as ? f
- - too strong for many kind of maintainability
(eg. maintain the room clean) - Always Eventually f, also written as ? ? f.
- - Weak in the sense it does not give an
estimate on when f will be made true. - - May not be achievable in presence of
continuous interference by belligerent agents. - ? f ------------------ ? ?k f
-------------------------- ? ? f - ? ?3 f is a shorthand for ? ( f V O f V OO
f V OOO f ) - But if an external agent keeps interfering how is
one supposed to guarantee ? ?3 f . - k-maintain f If there is a break from the
environment for k steps, then during that the
agent will reach a state where f is true.
3Motivation a controller-agent transcript
- Controller (to the agent/robot) Your goal is to
maintain the room clean. - Robot/Agent Can you be precise about what you
mean by maintain? Also can I clean anytime or
are there restrictions? - Controller You can only clean when the room is
unoccupied. - Controller By maintain I mean ALWAYS clean.
- Robot/Agent I wont be able to guarantee that.
What if while the room is occupied some one makes
it dirty? - Controller Ok, I understand. How about
ALWAYS
EVENTUALLLY clean. - Controllers Boss Eventually is too lenient.
We cant have the room unclean for too long. We
should put some bound.
4Controller-agent transcript (cont)
- Controller Sorry, Sir. I should have made it
more precise. - ALWAYS EVENTUALLY3 clean
- Robot/Agent Sorry. I can neither guarantee
ALWAYS EVENTUALLLY clean nor guarantee ALWAYS
EVENTUALLLY3 clean. - What if the room is continuously being used and
you told me I can not clean while it is being
used. -
- Controller You have a good point. Let me clarify
again. - If you are given an opportunity of 3 units of
time without the room being occupied (i.e.,
without any interference from external agents)
then you should have the room clean during that
time. -
- Robot/Agent I think I understand you. But as you
know I am a robot and not that good at
understanding English. Can you please input it in
a precise language.
5Formulating k-maintainability a system
- A system is a quadruple A (S,A,?, poss), where
- S is the set of system states
- A is the set of actions, which is the union of
the set of agents actions, Aag, and the set of
environmental actions, Aenv - ? S x A ? 2 S is a non-deterministic
transition function that specifies how the state
of the world changes in response to actions - poss S ? 2 A is a function that describes
which actions are possible (by the agent or the
environment) in which states.
6a
c
d
a
a
a
a
b
f
h
e
g
S b,c,d,f,g,h A a, a, e Aag a,
a Aenv e ? as shown in the
picture poss(b) a when our policy dictates
a to be executed at b.
7Controls and super-controls
- Given a system A (S,A,?, poss) and a set Aag
(subset of A) of agent actions, - a control policy for A w.r.t. Aag is a partial
function K S ? Aag, such that K(s) is an
element of poss(s) whenever K(s) is defined. - a super-control policy for A w.r.t. Aag is a
partial function - K S ? 2 Aag such that K(s) is a subset of
poss(s) and K(s) ? whenever K(s) is defined.
8Reachable states and closure
- Reachable states R(A,s) from an individual state
s - Given a system A (S,A,?, poss) and a state
s, R(A, s) is the smallest set of states that
satisfy the following conditions - (i) s is in R(A, s) and
- (ii) If s is in R(A, s) and a is in poss(s'),
then ?(s, a) is a subset of R(A, s) . - Closure(S,A)of a set of states S
- Let A (S,A,?, poss) be a system and let S
be a subset of S. Then the closure of A w.r.t.
S, denoted by Closure(S,A), is defined by
Closure(S,A) Us in S R(A, s) .
9a
c
d
a
a
a
a
b
f
h
e
g
A (S,A,?, poss) R(A,d) d,h R(A,f) f, g,
h Closure(d,f, A) d,f,g,h
10Unfoldk(s,A,K)
- An element of Unfoldk(s,A,K) is a sequence of
states of length at most k 1 that the system
may go through if it follows the control K
starting from the state s.
11a
c
d
a
a
a
a
b
f
h
e
a
g
Consider policy K Do action a in states b, c,
and d Unfold3(b,A,K) ltb,c,d,hgt, ltb,ggt
Unfold3(c,A,K) ltc,d,hgt
12Definition of k-maintainability the parameters
- 1. a system A (S,A,?, poss) ,
- 2. a set Aag ? A of agent actions,
- 3. set of initial states S
- 4. a set of desired states E that we want to
maintain, - 5. Maintainability parameter k.
- 6. a function exo S ? 2 Aenv detailing
exogenous actions, such that exo(s) is a subset
of poss(s), and - 7. a control K (mapping a relevant part of S to
Aag) such that K (s) belongs to poss(s).
13Basic Idea
- Ignoring interference
- From any state under consideration by following
the control policy one should visit E in k steps. - Accounting for interference
- Broaden the states under consideration from the
initial states to all reachable states due to
control and the environment. (Use Closure.) - When using Closure
- Account for the control policy.
- Ignore other agent actions.
- Also only consider exogenous actions in exo(s).
14Definition of k-maintainability
- possK,exo (s) is the set K (s) U exo(s).
- AK,exo (S,A,?, possK,exo)
- Given a system A (S,A,?, poss), a set of agents
action Aag (subset of A ) and a specification of
exogenous action occurrence exo, we say that a
control K for A w.r.t. Aag k-maintains subset S
of S with respect to subset E of S, where k0,
if - - for each state s in Closure(S,AK,exo) and each
sequence s s0, s1, . . . , sr in Unfoldk(s,A,K)
with s0 s, it holds that - s0, s1, . . . , sr n E ? .
15a
c
d
a
a
a
a
b
f
h
e
g
Consider policy K Do action a in states b, c,
and d. poss(b) a,a
possK,exo (b) a Closure(b,c,A)
b,c,d,f,g,h Closure(b,c,AK,exo) b,c,d,h
16a
c
d
a
a
a
a
b
f
h
e
g
Goal 3-maintainable policy for Sb w.r.t.
Eh Such a policy Do a in b, c, and d
17a
c
d
a
a
e
a
a
b
f
h
e
g
Goal Find 3-maintainable policy for Sb w.r.t.
Eh No such policy!
18Constructing k-maintainable control policies
pre-formulation attempts
- Handwritten policies subsumption architecture,
RAPs, situation control rules, protocols. - Our initial motivation behind formulating
maintainability was when we tried to formalize
what a control module was doing. - Kaelbling and Rosenschein 1991 In the control
rule if condition c is satisfied then do action
a, the action a is the action that leads to the
goal from any state where the condition c is
satisfied.
19a
c
d
a
a
a
a
b
f
h
e
g
Forward Search If we use minimal paths or
minimal cost paths we might
pick a then we would have to
backtrack. Backward Search Should we include
both d and f.
20Propositional Encoding of solutions
- Input An input I is a system A (S, A,F, poss),
set of goal states E ? S , set of initial states
S ? S, a set Aag ? A, a function exo, and an
integer k ? 0 - Output A control K such that S is k-maintainable
with respect to E (using the control K), if such
a control exists. Otherwise the output is NO. - AIM Given input I, construct sat(I) in PTIME
s.t. - sat(I) is satisfiable if and only if the input I
allows for a k-maintainable control, - satisfying assignments for sat(I) encode possible
such controls, and - sat(I) is polynomially solvable.
21Propositional encoding notation
- si denotes that
- there is a path from state s to some state in E
using only agent actions and at most i of them. - (to which we refer as there is an a-path from
s to E of length at most i)
22The encoding sat(I)
- (0) For all states s, and for all j, 0 ? j ltk
sj ? sj1 - (1) For all initial states s in E s0
- (2) For all states s, t such that F(a,s) t for
some action a ? exo(s) sk ? tk - (3) For all states s not in E and all i, 1 ? i ?
k - si ? ?t ?PS(s) ti-1 ,
- where PS(s) t ? S ? a ? Aag ?
poss(s) t F(a,s) - (4) For all initial states not in E
sk - (5) For all states s not in E ? s0
23Constructing policies from the models of sat(I)
- Let M be a model of sat(I).
- CM s? S M sk
- LM (s) the smallest index j such that M sj
(i.e., s0, s1 ,, sj-1 are false and sj is true) - K(s) is defined iff s? CM \ E and
- K(s) ? a ? Aag F(s,a) t ,
- t ? CM , LM (t) lt
LM (s)
24Proposition
- Let I consist of a system A (S, Aag, F, poss),
where F is deterministic, a set Aag ? A, sets of
states E ? S, and S ? S, an exogenous function
exo, and a integer k. Then, - (i) S is k-maintainable w.r.t E iff sat(I) is
satisfiable. - (ii) Given any model M of sat(I), any control K
constructed from the algorithm above k-maintains
S w.r.t. E.
25Reverse Encoding
- a ? b is equivalent to
- ? a ? b is equivalent to
- ? (? b) ? ? a is equivalent to
- ?b ? ?a is equivalent to
- b ? a is equivalent to
- a ? b
26Rearranging sat(I) to Horn
- (0) For all states s and for all j, 0 ? j ltk
- sj ? sj1 sj ? sj1
- (1) For all initial states s in E
- s0 ? s0
- (2) For all states s, t such that F(a,s) t for
some action a?exo(s) - sk ? tk sk ? tk'
- (3) For all state s not in E and all i, 1 ? i ?
k - si ? ?t?PS(s) ti-1 , si ? t?PS(s) ti-1
- where
- PS(s) t? S ? a ? Aag ? poss(s) t F(a,s)
- (4) For all initial states s not in E
- sk ? sk
- (5) For all states not in E
- ? s0 s0
27a
c
d
a
a
a
a
b
f
h
e
g
(6) b0, c0, d0, f0, g0 (From 5) (7) g1,
g2, g3 (From 3) (8) b1, c1 (From 6 and
3) (9) f3 (From 7 and 2) (10) f2 (From 9 and
0) (11) f1 (From 10 and 0) (12) b2 (From 8,
11, and 3) Thus M f3, f2, f1 , f0, g3,
g2, g1 , g0, b2, b1, b0, c1, c0,
d0 LM(b) 3 LM(c) 2
LM(d) 1
28Big picture of the algorithm summary
- Initialization about states not in E (5) and
states with no agent transitions to compute si
(3). - Backward reasoning from there using (2) and (3)
and downward propagation using (0). - Use (1) and (4) for inconsistency detection.
- Computation of LM (s).
- Use LM (s) to compute the control K(s).
29Polynomial time generation of control policy and
maximal control policy
- Horn satisfiability is a well-known polynomial
problem -
- Theorem Under deterministic state transitions,
problem k-MAINTAIN is solvable in polynomial
time. - Maximal Control
- Each satisfiable Horn theory T has the least
model, MT, which is given by the intersection of
all its models. - MT is computable in linear time in the size of
the encoding. - MT leads to a maximal control, in the sense that
it works on a greatest set S of states w.r.t.
E such that S is a subset of S . - I.e. robust with respect to increasing S.
30Dealing with non-deterministic transition
functions
- Notation s_ai, i gt 0, will denote that there is
an a-path from s to E of length at most i
starting with action a. - The encoding sat'(I) has again groups (0)-(5) of
clauses as follows - (0), (1), (4) and (5) are the same as in sat(I).
- (2) For any state s and t such that t ? F(a,s)
for some action a ? exo(s) - sk ? tk
31Dealing with non-deterministic transition
functions (cont.)
- (3) For every state s not in E and for all i, 1
? i ? k - (3.1) si ? ?(a ? Aag ?poss(s)) s_ai
- (3.2) for every a ? Aag ? poss(s) and t ? F(s,a)
-
- s_ai ? ti-1
- (3.3) for every a? Aag ? poss(s) if i lt k
- s_ai ? s_ai1
- Leading to a Horn theory !
32Direct algorithm using counters
- Idea cs i means s0 si and cs_a i
means s_a0 s_ai - Initialization
- For all states s not in E make s0 true.
cs 0. - For all states s not in E without any outgoing
edges with agents actions then make s0 sk
true. cs k. - For all states s, if agent action a is not
executable in s then make s_a0 s_ak true.
cs_a k. - The other steps are similar.
- The idea can then be extended to actions with
durations (or costs).
33Computational Complexity
- k-maintainability is PTIME-complete (under
log-space reduction). - PTIME-hardness holds for 1-maintainability, even
if all actions are deterministic, and there is
only one deterministic exogenous action - k-maintainability is EXPTIME-complete when we
have a compact representation (e.g. STRIPS like) - EXPTIME-hardness holds for 1-maintainability,
even if all actions are deterministic, and there
is only one deterministic exogenous action
34Conclusion
- k-maintainability is an important notion.
- Most specifications over infinite trajectories
would be better off with k-maintainability like
notions as part of the specification. - Role 1 of k length of the window of opportunity
- Role 2 of k bound within which maintenance is
guaranteed - k-maintainability is related to Dijkstra's notion
of self-stabilization. - There is a big research community of
self-stabilization in distributed control and
fault tolerance. - But they have not much focused on automatic
generation of control (protocol, in their
parlance) - They have focused more on proving correctness of
hand written protocol - Sat encoding to Horn logic program encoding an
interesting and fruitful approach to design a
polynomial algorithm - One does not often think in terms of negative
propositions. - We have a prototype implementation using DLV.
35THANK YOU!