Markov Decision Processes: Approximate Equivalence - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Markov Decision Processes: Approximate Equivalence

Description:

Property Testing and its connection to Learning and Approximation. ... Union of polytopes: each H can be computed by a linear program. ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 18
Provided by: lri76
Category:

less

Transcript and Presenter's Notes

Title: Markov Decision Processes: Approximate Equivalence


1
Markov Decision Processes Approximate Equivalence
  • Michel de Rougemont
  • Université Paris II LRI
  • http//www.lri.fr/mdr/

2
The world of MDPs
  • Follow-up of On the complexity of partially
    observed markov decision processes, 1996, D.
    Burago, Anatol, Mdr
  • What is robustness? Deviation model in the 1990s.
  • Distance on runs in the 2000s
  • Efficient Distance of a run to an MDP
  • Approximate Comparison of MDPs
  • Statistics Analysis of Probabilistic Processes,
  • (LICS 2009 with Mathieu Tracol)

3
M.D.P
  • S States s,t,u,v
  • S actions a,b,c
  • P(u t,b)0.
  • Policy s resolves the
  • non-determinism.
  • Example s(t)b, s(v)c
  • Run s,t,a,u,a,v
  • Trace aba

4
This talk
  • Approximation of Decision problems Property
    Testing
  • Non deterministic Automata Tester for membership
    and equivalence.
  • Markov Decision Processes Tester for the
    Existence of Strategies, and Equivalence

5
1. Testers on a class K
  • Let F be a property on a class K of structures
    U
  • An e -tester for F is a probabilistic algorithm
    A such that
  • If U F, A accepts
  • If U is e far from F, A rejects with high
    probability
  • F is testable if there is a probabilistic
    algorithm A such that
  • A is an e -tester for all e
  • Time(A) is independent of nsize(U).
  • Robust characterizations of polynomials, R.
    Rubinfeld, M. Sudan, 1994
  • Property Testing and its connection to Learning
    and Approximation. O. Goldreich, S. Goldwasser,
    D. Ron, 1996.
  • Tester usually implies a linear time corrector.
    (e1, e2)-Tolerant Tester

6
Edit Distances with Moves on Strings
  • Classical Edit DistanceInsertions, Deletions,
    Modifications
  • Edit Distance with moves dist(w,w)
  • 0111000011110011001
  • 0111011110000011001
  • 3. Edit Distance with Moves generalizes to
    Ordered Trees

7
Uniform statistics k-gram
W001010101110 length n, u.stat any
subwords of length k, n-k1 blocks, shingles

8
Tester for equality
Edit distance with moves. NP-complete problem,
but approximable in constant time with additive
error. Uniform statistics ( )
W001010101110 Theorem 1. u.stat(w)-u.stat(w)
approximates dist(w,w)/n. Sample N subwords
of length k, compute Y(w) and Y(w) Lemma
(Chernoff). Y(w) approximates u.stat(w). Corollar
y. Y(w)-Y(w) approximates dist(w,w)/n. Tester
1 If Y(w)-Y(w) lte. accept, else reject.
9
Tester for W ? r (regular language)
Hu.stat(W) W in r is a union of
polytopes. 2 Polytopes for r.
Y(w)
Membership Tester
10
2. Equivalence Tester for regular properties
Time polynomial in mMax(A , B ) The exact
equivalence is PSPACE complete
11
3. Markov Decision Processes
  • Policies s
  • HR History dependent and Randomized,
  • MR(k) Memory k, Randomized
  • SD Stationary Deterministic
  • Communicating MDP
  • SD s(t)b, s(v)c
  • Trace 1 abac ab abac ab .
  • Trace 2 ab abac ab abac.

12
Classical results k1
  • State-action frequencies
  • For a class K of strategies
  • Theorem (Puterman, Derman, Tsitsiklis)
  • For a communicating MDP,

13
Generalization
  • Theorem For a communicating MDP

H
x
14
Existence of a strategy
  • Input MDP, wn ,d, ?
  • Theorem Existence of a strategy is PSPACE hard
    but testable.
  • Tester Sample wn
  • Estimate the dist to H (linear program)

H
x
15
General MDPs
  • Union of polytopes each H can be computed by a
    linear program.
  • Threshold value for each component.

H2 .6
H1 .4
16
Equivalence of MDPs
  • Decide if the Polytopes are identical with
    identical threshold values.
  • Equivalence Tester discretize the polytopes
    with an e grid. Check mutual inclusion.

17
Conclusion
  • Testers for MDPs. Verify property such as
     Almost surely there are less than 10 a 
  •  After an a, there is a b 
  • 2. Testers for probabilistic systems
  • Approximate Probabilistic Membership
  • Approximate Equivalence
  • 3. VERAP http//www.lri.fr/mdr/verap/
Write a Comment
User Comments (0)
About PowerShow.com