A polynomial time algorithm for constructing k-maintainable policies - PowerPoint PPT Presentation

About This Presentation

Title:

A polynomial time algorithm for constructing k-maintainable policies

Description:

Always f, also written as f - too strong for many kind of maintainability (eg. ... We can't have the room unclean for too long. We should put some bound. ... – PowerPoint PPT presentation

Number of Views:16

Avg rating:3.0/5.0

Slides: 36

Provided by: chit151

Learn more at: https://www.public.asu.edu

Category:

more less

Transcript and Presenter's Notes

Title: A polynomial time algorithm for constructing k-maintainable policies

1
A polynomial time algorithm for constructing
k-maintainable policies

Chitta Baral
Arizona State University
and
Thomas Eiter
Vienna University of Technology

2
Motivation What is maintain f?

Always f, also written as ? f
- too strong for many kind of maintainability
(eg. maintain the room clean)
Always Eventually f, also written as ? ? f.
- Weak in the sense it does not give an
estimate on when f will be made true.
- May not be achievable in presence of
continuous interference by belligerent agents.
? f ------------------ ? ?k f
-------------------------- ? ? f
? ?3 f is a shorthand for ? ( f V O f V OO
f V OOO f )
But if an external agent keeps interfering how is
one supposed to guarantee ? ?3 f .
k-maintain f If there is a break from the
environment for k steps, then during that the
agent will reach a state where f is true.

3
Motivation a controller-agent transcript

Controller (to the agent/robot) Your goal is to
maintain the room clean.
Robot/Agent Can you be precise about what you
mean by maintain? Also can I clean anytime or
are there restrictions?
Controller You can only clean when the room is
unoccupied.
Controller By maintain I mean ALWAYS clean.
Robot/Agent I wont be able to guarantee that.
What if while the room is occupied some one makes
it dirty?
Controller Ok, I understand. How about
ALWAYS
EVENTUALLLY clean.
Controllers Boss Eventually is too lenient.
We cant have the room unclean for too long. We
should put some bound.

4
Controller-agent transcript (cont)

Controller Sorry, Sir. I should have made it
more precise.
ALWAYS EVENTUALLY3 clean
Robot/Agent Sorry. I can neither guarantee
ALWAYS EVENTUALLLY clean nor guarantee ALWAYS
EVENTUALLLY3 clean.
What if the room is continuously being used and
you told me I can not clean while it is being
used.
Controller You have a good point. Let me clarify
again.
If you are given an opportunity of 3 units of
time without the room being occupied (i.e.,
without any interference from external agents)
then you should have the room clean during that
time.
Robot/Agent I think I understand you. But as you
know I am a robot and not that good at
understanding English. Can you please input it in
a precise language.

5
Formulating k-maintainability a system

A system is a quadruple A (S,A,?, poss), where
S is the set of system states
A is the set of actions, which is the union of
the set of agents actions, Aag, and the set of
environmental actions, Aenv
? S x A ? 2 S is a non-deterministic
transition function that specifies how the state
of the world changes in response to actions
poss S ? 2 A is a function that describes
which actions are possible (by the agent or the
environment) in which states.

6
a
c
d
a
a
a
a
b
f
h
e
g
S b,c,d,f,g,h A a, a, e Aag a,
a Aenv e ? as shown in the
picture poss(b) a when our policy dictates
a to be executed at b.
7
Controls and super-controls

Given a system A (S,A,?, poss) and a set Aag
(subset of A) of agent actions,
a control policy for A w.r.t. Aag is a partial
function K S ? Aag, such that K(s) is an
element of poss(s) whenever K(s) is defined.
a super-control policy for A w.r.t. Aag is a
partial function
K S ? 2 Aag such that K(s) is a subset of
poss(s) and K(s) ? whenever K(s) is defined.

8
Reachable states and closure

Reachable states R(A,s) from an individual state
s
Given a system A (S,A,?, poss) and a state
s, R(A, s) is the smallest set of states that
satisfy the following conditions
(i) s is in R(A, s) and
(ii) If s is in R(A, s) and a is in poss(s'),
then ?(s, a) is a subset of R(A, s) .
Closure(S,A)of a set of states S
Let A (S,A,?, poss) be a system and let S
be a subset of S. Then the closure of A w.r.t.
S, denoted by Closure(S,A), is defined by
Closure(S,A) Us in S R(A, s) .

9
a
c
d
a
a
a
a
b
f
h
e
g
A (S,A,?, poss) R(A,d) d,h R(A,f) f, g,
h Closure(d,f, A) d,f,g,h
10
Unfoldk(s,A,K)

An element of Unfoldk(s,A,K) is a sequence of
states of length at most k 1 that the system
may go through if it follows the control K
starting from the state s.

11
a
c
d
a
a
a
a
b
f
h
e
a
g
Consider policy K Do action a in states b, c,
and d Unfold3(b,A,K) ltb,c,d,hgt, ltb,ggt
Unfold3(c,A,K) ltc,d,hgt
12
Definition of k-maintainability the parameters

1. a system A (S,A,?, poss) ,
2. a set Aag ? A of agent actions,
3. set of initial states S
4. a set of desired states E that we want to
maintain,
5. Maintainability parameter k.
6. a function exo S ? 2 Aenv detailing
exogenous actions, such that exo(s) is a subset
of poss(s), and
7. a control K (mapping a relevant part of S to
Aag) such that K (s) belongs to poss(s).

13
Basic Idea

Ignoring interference
From any state under consideration by following
the control policy one should visit E in k steps.
Accounting for interference
Broaden the states under consideration from the
initial states to all reachable states due to
control and the environment. (Use Closure.)
When using Closure
Account for the control policy.
Ignore other agent actions.
Also only consider exogenous actions in exo(s).

14
Definition of k-maintainability

possK,exo (s) is the set K (s) U exo(s).
AK,exo (S,A,?, possK,exo)
Given a system A (S,A,?, poss), a set of agents
action Aag (subset of A ) and a specification of
exogenous action occurrence exo, we say that a
control K for A w.r.t. Aag k-maintains subset S
of S with respect to subset E of S, where k0,
if
- for each state s in Closure(S,AK,exo) and each
sequence s s0, s1, . . . , sr in Unfoldk(s,A,K)
with s0 s, it holds that
s0, s1, . . . , sr n E ? .

15
a
c
d
a
a
a
a
b
f
h
e
g
Consider policy K Do action a in states b, c,
and d. poss(b) a,a
possK,exo (b) a Closure(b,c,A)
b,c,d,f,g,h Closure(b,c,AK,exo) b,c,d,h
16
a
c
d
a
a
a
a
b
f
h
e
g
Goal 3-maintainable policy for Sb w.r.t.
Eh Such a policy Do a in b, c, and d
17
a
c
d
a
a
e
a
a
b
f
h
e
g
Goal Find 3-maintainable policy for Sb w.r.t.
Eh No such policy!
18
Constructing k-maintainable control policies
pre-formulation attempts

Handwritten policies subsumption architecture,
RAPs, situation control rules, protocols.
Our initial motivation behind formulating
maintainability was when we tried to formalize
what a control module was doing.
Kaelbling and Rosenschein 1991 In the control
rule if condition c is satisfied then do action
a, the action a is the action that leads to the
goal from any state where the condition c is
satisfied.

19
a
c
d
a
a
a
a
b
f
h
e
g
Forward Search If we use minimal paths or
minimal cost paths we might
pick a then we would have to
backtrack. Backward Search Should we include
both d and f.
20
Propositional Encoding of solutions

Input An input I is a system A (S, A,F, poss),
set of goal states E ? S , set of initial states
S ? S, a set Aag ? A, a function exo, and an
integer k ? 0
Output A control K such that S is k-maintainable
with respect to E (using the control K), if such
a control exists. Otherwise the output is NO.
AIM Given input I, construct sat(I) in PTIME
s.t.
sat(I) is satisfiable if and only if the input I
allows for a k-maintainable control,
satisfying assignments for sat(I) encode possible
such controls, and
sat(I) is polynomially solvable.

21
Propositional encoding notation

si denotes that
there is a path from state s to some state in E
using only agent actions and at most i of them.
(to which we refer as there is an a-path from
s to E of length at most i)

22
The encoding sat(I)

(0) For all states s, and for all j, 0 ? j ltk
sj ? sj1
(1) For all initial states s in E s0
(2) For all states s, t such that F(a,s) t for
some action a ? exo(s) sk ? tk
(3) For all states s not in E and all i, 1 ? i ?
k
si ? ?t ?PS(s) ti-1 ,
where PS(s) t ? S ? a ? Aag ?
poss(s) t F(a,s)
(4) For all initial states not in E
sk
(5) For all states s not in E ? s0

23
Constructing policies from the models of sat(I)

Let M be a model of sat(I).
CM s? S M sk
LM (s) the smallest index j such that M sj
(i.e., s0, s1 ,, sj-1 are false and sj is true)
K(s) is defined iff s? CM \ E and
K(s) ? a ? Aag F(s,a) t ,
t ? CM , LM (t) lt
LM (s)

24
Proposition

Let I consist of a system A (S, Aag, F, poss),
where F is deterministic, a set Aag ? A, sets of
states E ? S, and S ? S, an exogenous function
exo, and a integer k. Then,
(i) S is k-maintainable w.r.t E iff sat(I) is
satisfiable.
(ii) Given any model M of sat(I), any control K
constructed from the algorithm above k-maintains
S w.r.t. E.

25
Reverse Encoding

a ? b is equivalent to
? a ? b is equivalent to
? (? b) ? ? a is equivalent to
?b ? ?a is equivalent to
b ? a is equivalent to
a ? b

26
Rearranging sat(I) to Horn

(0) For all states s and for all j, 0 ? j ltk
sj ? sj1 sj ? sj1
(1) For all initial states s in E
s0 ? s0
(2) For all states s, t such that F(a,s) t for
some action a?exo(s)
sk ? tk sk ? tk'
(3) For all state s not in E and all i, 1 ? i ?
k
si ? ?t?PS(s) ti-1 , si ? t?PS(s) ti-1
where
PS(s) t? S ? a ? Aag ? poss(s) t F(a,s)
(4) For all initial states s not in E
sk ? sk
(5) For all states not in E
? s0 s0

27
a
c
d
a
a
a
a
b
f
h
e
g
(6) b0, c0, d0, f0, g0 (From 5) (7) g1,
g2, g3 (From 3) (8) b1, c1 (From 6 and
3) (9) f3 (From 7 and 2) (10) f2 (From 9 and
0) (11) f1 (From 10 and 0) (12) b2 (From 8,
11, and 3) Thus M f3, f2, f1 , f0, g3,
g2, g1 , g0, b2, b1, b0, c1, c0,
d0 LM(b) 3 LM(c) 2
LM(d) 1
28
Big picture of the algorithm summary

Initialization about states not in E (5) and
states with no agent transitions to compute si
(3).
Backward reasoning from there using (2) and (3)
and downward propagation using (0).
Use (1) and (4) for inconsistency detection.
Computation of LM (s).
Use LM (s) to compute the control K(s).

29
Polynomial time generation of control policy and
maximal control policy

Horn satisfiability is a well-known polynomial
problem
Theorem Under deterministic state transitions,
problem k-MAINTAIN is solvable in polynomial
time.
Maximal Control
Each satisfiable Horn theory T has the least
model, MT, which is given by the intersection of
all its models.
MT is computable in linear time in the size of
the encoding.
MT leads to a maximal control, in the sense that
it works on a greatest set S of states w.r.t.
E such that S is a subset of S .
I.e. robust with respect to increasing S.

30
Dealing with non-deterministic transition
functions

Notation s_ai, i gt 0, will denote that there is
an a-path from s to E of length at most i
starting with action a.
The encoding sat'(I) has again groups (0)-(5) of
clauses as follows
(0), (1), (4) and (5) are the same as in sat(I).
(2) For any state s and t such that t ? F(a,s)
for some action a ? exo(s)
sk ? tk

31
Dealing with non-deterministic transition
functions (cont.)

(3) For every state s not in E and for all i, 1
? i ? k
(3.1) si ? ?(a ? Aag ?poss(s)) s_ai
(3.2) for every a ? Aag ? poss(s) and t ? F(s,a)
s_ai ? ti-1
(3.3) for every a? Aag ? poss(s) if i lt k
s_ai ? s_ai1
Leading to a Horn theory !

32
Direct algorithm using counters

Idea cs i means s0 si and cs_a i
means s_a0 s_ai
Initialization
For all states s not in E make s0 true.
cs 0.
For all states s not in E without any outgoing
edges with agents actions then make s0 sk
true. cs k.
For all states s, if agent action a is not
executable in s then make s_a0 s_ak true.
cs_a k.
The other steps are similar.
The idea can then be extended to actions with
durations (or costs).

33
Computational Complexity

k-maintainability is PTIME-complete (under
log-space reduction).
PTIME-hardness holds for 1-maintainability, even
if all actions are deterministic, and there is
only one deterministic exogenous action
k-maintainability is EXPTIME-complete when we
have a compact representation (e.g. STRIPS like)
EXPTIME-hardness holds for 1-maintainability,
even if all actions are deterministic, and there
is only one deterministic exogenous action

34
Conclusion

k-maintainability is an important notion.
Most specifications over infinite trajectories
would be better off with k-maintainability like
notions as part of the specification.
Role 1 of k length of the window of opportunity
Role 2 of k bound within which maintenance is
guaranteed
k-maintainability is related to Dijkstra's notion
of self-stabilization.
There is a big research community of
self-stabilization in distributed control and
fault tolerance.
But they have not much focused on automatic
generation of control (protocol, in their
parlance)
They have focused more on proving correctness of
hand written protocol
Sat encoding to Horn logic program encoding an
interesting and fruitful approach to design a
polynomial algorithm
One does not often think in terms of negative
propositions.
We have a prototype implementation using DLV.