A polynomial time algorithm for constructing k-maintainable policies - PowerPoint PPT Presentation

About This Presentation
Title:

A polynomial time algorithm for constructing k-maintainable policies

Description:

Always f, also written as f - too strong for many kind of maintainability (eg. ... We can't have the room unclean for too long. We should put some bound. ... – PowerPoint PPT presentation

Number of Views:16
Avg rating:3.0/5.0
Slides: 36
Provided by: chit151
Category:

less

Transcript and Presenter's Notes

Title: A polynomial time algorithm for constructing k-maintainable policies


1
A polynomial time algorithm for constructing
k-maintainable policies
  • Chitta Baral
  • Arizona State University
  • and
  • Thomas Eiter
  • Vienna University of Technology

2
Motivation What is maintain f?
  • Always f, also written as ? f
  • - too strong for many kind of maintainability
    (eg. maintain the room clean)
  • Always Eventually f, also written as ? ? f.
  • - Weak in the sense it does not give an
    estimate on when f will be made true.
  • - May not be achievable in presence of
    continuous interference by belligerent agents.
  • ? f ------------------ ? ?k f
    -------------------------- ? ? f
  • ? ?3 f is a shorthand for ? ( f V O f V OO
    f V OOO f )
  • But if an external agent keeps interfering how is
    one supposed to guarantee ? ?3 f .
  • k-maintain f If there is a break from the
    environment for k steps, then during that the
    agent will reach a state where f is true.

3
Motivation a controller-agent transcript
  • Controller (to the agent/robot) Your goal is to
    maintain the room clean.
  • Robot/Agent Can you be precise about what you
    mean by maintain? Also can I clean anytime or
    are there restrictions?
  • Controller You can only clean when the room is
    unoccupied.
  • Controller By maintain I mean ALWAYS clean.
  • Robot/Agent I wont be able to guarantee that.
    What if while the room is occupied some one makes
    it dirty?
  • Controller Ok, I understand. How about
    ALWAYS
    EVENTUALLLY clean.
  • Controllers Boss Eventually is too lenient.
    We cant have the room unclean for too long. We
    should put some bound.

4
Controller-agent transcript (cont)
  • Controller Sorry, Sir. I should have made it
    more precise.
  • ALWAYS EVENTUALLY3 clean
  • Robot/Agent Sorry. I can neither guarantee
    ALWAYS EVENTUALLLY clean nor guarantee ALWAYS
    EVENTUALLLY3 clean.
  • What if the room is continuously being used and
    you told me I can not clean while it is being
    used.
  • Controller You have a good point. Let me clarify
    again.
  • If you are given an opportunity of 3 units of
    time without the room being occupied (i.e.,
    without any interference from external agents)
    then you should have the room clean during that
    time.
  • Robot/Agent I think I understand you. But as you
    know I am a robot and not that good at
    understanding English. Can you please input it in
    a precise language.

5
Formulating k-maintainability a system
  • A system is a quadruple A (S,A,?, poss), where
  • S is the set of system states
  • A is the set of actions, which is the union of
    the set of agents actions, Aag, and the set of
    environmental actions, Aenv
  • ? S x A ? 2 S is a non-deterministic
    transition function that specifies how the state
    of the world changes in response to actions
  • poss S ? 2 A is a function that describes
    which actions are possible (by the agent or the
    environment) in which states.

6
a
c
d
a
a
a
a
b
f
h
e
g
S b,c,d,f,g,h A a, a, e Aag a,
a Aenv e ? as shown in the
picture poss(b) a when our policy dictates
a to be executed at b.
7
Controls and super-controls
  • Given a system A (S,A,?, poss) and a set Aag
    (subset of A) of agent actions,
  • a control policy for A w.r.t. Aag is a partial
    function K S ? Aag, such that K(s) is an
    element of poss(s) whenever K(s) is defined.
  • a super-control policy for A w.r.t. Aag is a
    partial function
  • K S ? 2 Aag such that K(s) is a subset of
    poss(s) and K(s) ? whenever K(s) is defined.

8
Reachable states and closure
  • Reachable states R(A,s) from an individual state
    s
  • Given a system A (S,A,?, poss) and a state
    s, R(A, s) is the smallest set of states that
    satisfy the following conditions
  • (i) s is in R(A, s) and
  • (ii) If s is in R(A, s) and a is in poss(s'),
    then ?(s, a) is a subset of R(A, s) .
  • Closure(S,A)of a set of states S
  • Let A (S,A,?, poss) be a system and let S
    be a subset of S. Then the closure of A w.r.t.
    S, denoted by Closure(S,A), is defined by
    Closure(S,A) Us in S R(A, s) .

9
a
c
d
a
a
a
a
b
f
h
e
g
A (S,A,?, poss) R(A,d) d,h R(A,f) f, g,
h Closure(d,f, A) d,f,g,h
10
Unfoldk(s,A,K)
  • An element of Unfoldk(s,A,K) is a sequence of
    states of length at most k 1 that the system
    may go through if it follows the control K
    starting from the state s.

11
a
c
d
a
a
a
a
b
f
h
e
a
g
Consider policy K Do action a in states b, c,
and d Unfold3(b,A,K) ltb,c,d,hgt, ltb,ggt
Unfold3(c,A,K) ltc,d,hgt
12
Definition of k-maintainability the parameters
  • 1. a system A (S,A,?, poss) ,
  • 2. a set Aag ? A of agent actions,
  • 3. set of initial states S
  • 4. a set of desired states E that we want to
    maintain,
  • 5. Maintainability parameter k.
  • 6. a function exo S ? 2 Aenv detailing
    exogenous actions, such that exo(s) is a subset
    of poss(s), and
  • 7. a control K (mapping a relevant part of S to
    Aag) such that K (s) belongs to poss(s).

13
Basic Idea
  • Ignoring interference
  • From any state under consideration by following
    the control policy one should visit E in k steps.
  • Accounting for interference
  • Broaden the states under consideration from the
    initial states to all reachable states due to
    control and the environment. (Use Closure.)
  • When using Closure
  • Account for the control policy.
  • Ignore other agent actions.
  • Also only consider exogenous actions in exo(s).

14
Definition of k-maintainability
  • possK,exo (s) is the set K (s) U exo(s).
  • AK,exo (S,A,?, possK,exo)
  • Given a system A (S,A,?, poss), a set of agents
    action Aag (subset of A ) and a specification of
    exogenous action occurrence exo, we say that a
    control K for A w.r.t. Aag k-maintains subset S
    of S with respect to subset E of S, where k0,
    if
  • - for each state s in Closure(S,AK,exo) and each
    sequence s s0, s1, . . . , sr in Unfoldk(s,A,K)
    with s0 s, it holds that
  • s0, s1, . . . , sr n E ? .

15
a
c
d
a
a
a
a
b
f
h
e
g
Consider policy K Do action a in states b, c,
and d. poss(b) a,a
possK,exo (b) a Closure(b,c,A)
b,c,d,f,g,h Closure(b,c,AK,exo) b,c,d,h
16
a
c
d
a
a
a
a
b
f
h
e
g
Goal 3-maintainable policy for Sb w.r.t.
Eh Such a policy Do a in b, c, and d
17
a
c
d
a
a
e
a
a
b
f
h
e
g
Goal Find 3-maintainable policy for Sb w.r.t.
Eh No such policy!
18
Constructing k-maintainable control policies
pre-formulation attempts
  • Handwritten policies subsumption architecture,
    RAPs, situation control rules, protocols.
  • Our initial motivation behind formulating
    maintainability was when we tried to formalize
    what a control module was doing.
  • Kaelbling and Rosenschein 1991 In the control
    rule if condition c is satisfied then do action
    a, the action a is the action that leads to the
    goal from any state where the condition c is
    satisfied.

19
a
c
d
a
a
a
a
b
f
h
e
g
Forward Search If we use minimal paths or
minimal cost paths we might
pick a then we would have to
backtrack. Backward Search Should we include
both d and f.
20
Propositional Encoding of solutions
  • Input An input I is a system A (S, A,F, poss),
    set of goal states E ? S , set of initial states
    S ? S, a set Aag ? A, a function exo, and an
    integer k ? 0
  • Output A control K such that S is k-maintainable
    with respect to E (using the control K), if such
    a control exists. Otherwise the output is NO.
  • AIM Given input I, construct sat(I) in PTIME
    s.t.
  • sat(I) is satisfiable if and only if the input I
    allows for a k-maintainable control,
  • satisfying assignments for sat(I) encode possible
    such controls, and
  • sat(I) is polynomially solvable.

21
Propositional encoding notation
  • si denotes that
  • there is a path from state s to some state in E
    using only agent actions and at most i of them.
  • (to which we refer as there is an a-path from
    s to E of length at most i)

22
The encoding sat(I)
  • (0) For all states s, and for all j, 0 ? j ltk
    sj ? sj1
  • (1) For all initial states s in E s0
  • (2) For all states s, t such that F(a,s) t for
    some action a ? exo(s) sk ? tk
  • (3) For all states s not in E and all i, 1 ? i ?
    k
  • si ? ?t ?PS(s) ti-1 ,
  • where PS(s) t ? S ? a ? Aag ?
    poss(s) t F(a,s)
  • (4) For all initial states not in E
    sk
  • (5) For all states s not in E ? s0

23
Constructing policies from the models of sat(I)
  • Let M be a model of sat(I).
  • CM s? S M sk
  • LM (s) the smallest index j such that M sj
    (i.e., s0, s1 ,, sj-1 are false and sj is true)
  • K(s) is defined iff s? CM \ E and
  • K(s) ? a ? Aag F(s,a) t ,
  • t ? CM , LM (t) lt
    LM (s)

24
Proposition
  • Let I consist of a system A (S, Aag, F, poss),
    where F is deterministic, a set Aag ? A, sets of
    states E ? S, and S ? S, an exogenous function
    exo, and a integer k. Then,
  • (i) S is k-maintainable w.r.t E iff sat(I) is
    satisfiable.
  • (ii) Given any model M of sat(I), any control K
    constructed from the algorithm above k-maintains
    S w.r.t. E.

25
Reverse Encoding
  • a ? b is equivalent to
  • ? a ? b is equivalent to
  • ? (? b) ? ? a is equivalent to
  • ?b ? ?a is equivalent to
  • b ? a is equivalent to
  • a ? b

26
Rearranging sat(I) to Horn
  • (0) For all states s and for all j, 0 ? j ltk
  • sj ? sj1 sj ? sj1
  • (1) For all initial states s in E
  • s0 ? s0
  • (2) For all states s, t such that F(a,s) t for
    some action a?exo(s)
  • sk ? tk sk ? tk'
  • (3) For all state s not in E and all i, 1 ? i ?
    k
  • si ? ?t?PS(s) ti-1 , si ? t?PS(s) ti-1
  • where
  • PS(s) t? S ? a ? Aag ? poss(s) t F(a,s)
  • (4) For all initial states s not in E
  • sk ? sk
  • (5) For all states not in E
  • ? s0 s0

27
a
c
d
a
a
a
a
b
f
h
e
g
(6) b0, c0, d0, f0, g0 (From 5) (7) g1,
g2, g3 (From 3) (8) b1, c1 (From 6 and
3) (9) f3 (From 7 and 2) (10) f2 (From 9 and
0) (11) f1 (From 10 and 0) (12) b2 (From 8,
11, and 3) Thus M f3, f2, f1 , f0, g3,
g2, g1 , g0, b2, b1, b0, c1, c0,
d0 LM(b) 3 LM(c) 2
LM(d) 1
28
Big picture of the algorithm summary
  • Initialization about states not in E (5) and
    states with no agent transitions to compute si
    (3).
  • Backward reasoning from there using (2) and (3)
    and downward propagation using (0).
  • Use (1) and (4) for inconsistency detection.
  • Computation of LM (s).
  • Use LM (s) to compute the control K(s).

29
Polynomial time generation of control policy and
maximal control policy
  • Horn satisfiability is a well-known polynomial
    problem
  • Theorem Under deterministic state transitions,
    problem k-MAINTAIN is solvable in polynomial
    time.
  • Maximal Control
  • Each satisfiable Horn theory T has the least
    model, MT, which is given by the intersection of
    all its models.
  • MT is computable in linear time in the size of
    the encoding.
  • MT leads to a maximal control, in the sense that
    it works on a greatest set S of states w.r.t.
    E such that S is a subset of S .
  • I.e. robust with respect to increasing S.

30
Dealing with non-deterministic transition
functions
  • Notation s_ai, i gt 0, will denote that there is
    an a-path from s to E of length at most i
    starting with action a.
  • The encoding sat'(I) has again groups (0)-(5) of
    clauses as follows
  • (0), (1), (4) and (5) are the same as in sat(I).
  • (2) For any state s and t such that t ? F(a,s)
    for some action a ? exo(s)
  • sk ? tk

31
Dealing with non-deterministic transition
functions (cont.)
  • (3) For every state s not in E and for all i, 1
    ? i ? k
  • (3.1) si ? ?(a ? Aag ?poss(s)) s_ai
  • (3.2) for every a ? Aag ? poss(s) and t ? F(s,a)
  • s_ai ? ti-1
  • (3.3) for every a? Aag ? poss(s) if i lt k
  • s_ai ? s_ai1
  • Leading to a Horn theory !

32
Direct algorithm using counters
  • Idea cs i means s0 si and cs_a i
    means s_a0 s_ai
  • Initialization
  • For all states s not in E make s0 true.
    cs 0.
  • For all states s not in E without any outgoing
    edges with agents actions then make s0 sk
    true. cs k.
  • For all states s, if agent action a is not
    executable in s then make s_a0 s_ak true.
    cs_a k.
  • The other steps are similar.
  • The idea can then be extended to actions with
    durations (or costs).

33
Computational Complexity
  • k-maintainability is PTIME-complete (under
    log-space reduction).
  • PTIME-hardness holds for 1-maintainability, even
    if all actions are deterministic, and there is
    only one deterministic exogenous action
  • k-maintainability is EXPTIME-complete when we
    have a compact representation (e.g. STRIPS like)
  • EXPTIME-hardness holds for 1-maintainability,
    even if all actions are deterministic, and there
    is only one deterministic exogenous action

34
Conclusion
  • k-maintainability is an important notion.
  • Most specifications over infinite trajectories
    would be better off with k-maintainability like
    notions as part of the specification.
  • Role 1 of k length of the window of opportunity
  • Role 2 of k bound within which maintenance is
    guaranteed
  • k-maintainability is related to Dijkstra's notion
    of self-stabilization.
  • There is a big research community of
    self-stabilization in distributed control and
    fault tolerance.
  • But they have not much focused on automatic
    generation of control (protocol, in their
    parlance)
  • They have focused more on proving correctness of
    hand written protocol
  • Sat encoding to Horn logic program encoding an
    interesting and fruitful approach to design a
    polynomial algorithm
  • One does not often think in terms of negative
    propositions.
  • We have a prototype implementation using DLV.

35
THANK YOU!
Write a Comment
User Comments (0)
About PowerShow.com