Approximation Techniques for Automated Reasoning - PowerPoint PPT Presentation

1 / 135
About This Presentation
Title:

Approximation Techniques for Automated Reasoning

Description:

University of California, Irvine. dechter_at_ics.uci.edu. SP2. 2. Outline ... 'Road map' CSPs: complete algorithms. Variable Elimination. Conditioning (Search) ... – PowerPoint PPT presentation

Number of Views:108
Avg rating:3.0/5.0
Slides: 136
Provided by: ibm76
Category:

less

Transcript and Presenter's Notes

Title: Approximation Techniques for Automated Reasoning


1
Approximation Techniques for Automated Reasoning
  • Irina Rish
  • IBM T.J.Watson Research Center
  • rish_at_us.ibm.com
  • Rina Dechter
  • University of California, Irvine
  • dechter_at_ics.uci.edu

2
Outline
  • Introduction
  • Reasoning tasks
  • Reasoning approaches elimination and
    conditioning
  • CSPs exact inference and approximations
  • Belief networks exact inference and
    approximations
  • MDPs decision-theoretic planning
  • Conclusions

3
Automated reasoning tasks
  • Propositional satisfiability
  • Constraint satisfaction
  • Planning and scheduling
  • Probabilistic inference
  • Decision-theoretic planning
  • Etc.

Reasoning is NP-hard
Approximations
4
Graphical Frameworks
  • Our focus - graphical frameworks
  • constraint and belief networks
  • Nodes variables
  • Edges dependencies
  • (constraints, probabilities, utilities)
  • Reasoning graph transformations

5
Propositional Satisfiability
Example party problem
  • If Alex goes, then Becky goes
  • If Chris goes, then Alex goes
  • Query
  • Is it possible that Chris goes to the party
    but Becky does not?

6
Constraint Satisfaction
  • Example map coloring
  • Variables - countries (A,B,C,etc.)
  • Values - colors (e.g., red, green, yellow)
  • Constraints

7
Constrained Optimization
Example power plant scheduling
8
Probabilistic Inference
Example medical diagnosis
smoking
visit to Asia
S
V
lung cancer
T
B
C
bronchitis
tuberculosis
abnormality in lungs
A
X
D
dyspnoea (shortness of breath)
X-ray
Query P(T yes S no, D yes) ?
9
Decision-Theoretic Planning
Example robot navigation
  • State X, Y, Battery_Level
  • Actions Go_North, Go_South, Go_West, Go_East
  • Probability of success P
  • Task reach the goal location ASAP

10
Reasoning Methods
  • Our focus - conditioning and elimination
  • Conditioning
  • (guessing assignments, reasoning by
    assumptions)
  • Branch-and-bound (optimization)
  • Backtracking search (CSPs)
  • Cycle-cutset (CSPs, belief nets)
  • Variable elimination
  • (inference, propagation of constraints,
    probabilities, cost functions)
  • Dynamic programming (optimization)
  • Adaptive consistency (CSPs)
  • Joint-tree propagation (CSPs, belief nets)

11
Conditioning Backtracking Search
12
Bucket EliminationAdaptive Consistency (Dechter
Pear, 1987)

Bucket E E ¹ D, E ¹ C Bucket D D ¹
A Bucket C C ¹ B Bucket B B ¹ A Bucket A
13
Bucket-elimination and conditioning a
uniform framework
  • Unifying approach to different reasoning tasks
  • Understanding commonality and differences
  • Technology transfer
  • Ease of implementation
  • Extensions to hybrids conditioningelimination
  • Approximations

14
Exact CSP techniques complexity
15
Approximations
  • Exact approaches can be intractable
  • Approximate conditioning
  • Local search, gradient descent (optimization,
    CSPs, SAT)
  • Stochastic simulations (belief nets)
  • Approximate elimination
  • Local consistency enforcing (CSPs), local
    probability propagation (belief nets)
  • Bounded resolution (SAT)
  • Mini-bucket approach (belief nets)
  • Hybrids (conditioningelimination)
  • Other approximations (e.g., variational)

16
Road map
  • CSPs complete algorithms
  • Variable Elimination
  • Conditioning (Search)
  • CSPs approximations
  • Belief nets complete algorithms
  • Belief nets approximations
  • MDPs

17
Constraint Satisfaction
Applications
  • Planning and scheduling
  • Configuration and design problems
  • Circuit diagnosis
  • Scene labeling
  • Temporal reasoning
  • Natural language processing

18
Constraint Satisfaction
  • Example map coloring
  • Variables - countries (A,B,C,etc.)
  • Values - colors (e.g., red, green, yellow)
  • Constraints

19
Constraint Networks
20
The Idea of Elimination
21
Variable Elimination
Eliminate variables one by one constraint propag
ation
Solution generation after elimination is
backtrack-free
22
Elimination Operationjoin followed by projection
Join operation over A finds all solutions
satisfying constraints that involve A
23
Bucket EliminationAdaptive Consistency (Dechter
and Pearl, 1987)
RCBE
RDBE ,
RE
24
Induced Width
Width along ordering d max of previous
neighbors (parents) Induced width
The width in the ordered induced graph, obtained
by connecting parents of each
recursively, from in to 1.
25
Induced width (continued)
  • Finding minimum- ordering is NP-complete
    (Arnborg, 1985)
  • Greedy ordering heuristics min-width,
    min-degree, max-cardinality (Bertele and Briochi,
    1972 Freuder 1982)
  • Tractable classes trees have
  • of an ordering is computed in O(n) time,
  • i.e. complexity of elimination is easy to
    predict

26
Example crossword puzzle
27
Crossword PuzzleAdaptive consistency
28
Adaptive Consistency as bucket-elimination
  • Initialize partition constraints into
  • For in down to 1 // process buckets in the
    reverse order
  • for all relations
    do
  • // join all relations and project-out

If is not empty, add it to
where k is the largest variable
index in Else problem is unsatisfiable
Return the set of all relations (old and new) in
the buckets
29
Solving Trees (Mackworth and Freuder, 1985)
Adaptive consistency is linear for trees
and equivalent to enforcing directional
arc-consistency (recording only unary
constraints)
30
Properties of bucket-elimination(adaptive
consistency)
  • Adaptive consistency generates a constraint
    network that is backtrack-free (can be solved
    without deadends).
  • The time and space complexity of adaptive
    consistency along ordering d is
    .
  • Therefore, problems having bounded induced width
    are tractable (solved in polynomial time).
  • Examples of tractable problem classes trees (
    ), series-parallel networks ( ),
    and in general k-trees ( ).

31
Road map
  • CSPs complete algorithms
  • Variable Elimination
  • Conditioning (Search)
  • CSPs approximations
  • Belief nets complete algorithms
  • Belief nets approximations
  • MDPs

32
The Idea of Conditioning
33
Backtracking SearchHeuristics
Vanilla backtracking variable/value ordering
Heuristics constraint propagation learning
  • Look-ahead schemes
  • Forward checking (Haralick and Elliot, 1980)
  • MAC (full arc-consistency at each node) (Gashnig
    1977)
  • Look back schemes
  • Backjumping (Gashnig 1977, Dechter 1990, Prosser
    1993)
  • Backmarking (Gashnig 1977)
  • BJDVO (Frost and Dechter, 1994)
  • Constraint learning (Dechter 1990, Frost and
    Dechter 1994, Bayardo and Miranker 1996)

34
Search complexity distributions
Complexity histograms (deadends, time) gt
continuous distributions (Frost, Rish, and Vila
1997 Selman and Gomez 1997, Hoos 1998)
Frequency (probability)
nodes explored in the search space
35
Constraint Programming
  • Constraint solving embedded in programming
    languages
  • Allows flexible modeling with algorithms
  • Logic programs forward checking
  • Eclipse, Ilog, OPL
  • Using only look-ahead schemes.

36
Complete CSP algorithms summary
  • Bucket elimination
  • adaptive consistency (CSP), directional
    resolution (SAT)
  • elimination operation join-project (CSP),
    resolution (SAT)
  • Time and space exponential in the induced width
  • (given a variable ordering)
  • Conditioning
  • Backtracking searchheuristics
  • Time complexity worst-case O(exp(n)), but
    average-case
  • is often much better. Space complexity
    linear.

37
Road map
  • CSPs complete algorithms
  • CSPs approximations
  • Approximating elimination
  • Approximating conditioning
  • Belief nets complete algorithms
  • Belief nets approximations
  • MDPs

38
Approximating EliminationLocal Constraint
Propagation
  • Problem bucket-elimination algorithms are
    intractable
  • when induced width is large
  • Approximation bound the size of recorded
    dependencies,
  • i.e. perform local constraint propagation
    (local inference)
  • Advantages efficiency may discover
  • inconsistencies by deducing new constraints
  • Disadvantages does not guarantee a solution
    exist

39
From Global to Local Consistency
40
Constraint Propagation
  • Arc-consistency, unit resolution, i-consistency

X
Y
?
3
2,
1,
3
2,
1,
1 ? X, Y, Z, T ? 3 X ? Y Y Z T ? Z X ? T

?
3
2,
1,
3
2,
1,
?
T
Z
41
Constraint Propagation
  • Arc-consistency, unit resolution, i-consistency

X
Y
?
1 ? X, Y, Z, T ? 3 X ? Y Y Z T ? Z X ? T

?
?
T
Z
  • Incorporated into backtracking search
  • Constraint programming languages powerful
    approach for modeling and solving combinatorial
    optimization problems.

42
Arc-consistency
Only domain constraints are recorded
Example
43
Local consistency i-consistency
  • i-consistency
  • Any consistent assignment to any i-1
    variables is consistent with at least one value
    of any i-th variable
  • strong i-consistency k-consistency for every
  • directional i-consistency
  • Given an ordering, each variable is
    i-consistent with any i-1 preceding variables
  • strong directional i-consistency
  • Given an ordering, each variable is strongly
    i-consistent with any i-1 preceding variables

44
Directional i-consistency
Adaptive
d-arc
d-path
45
Enforcing Directional i-consistency
  • Directional i-consistency bounds
  • the size of recorded constraints by i.
  • i1 - arc-consistency
  • i2 - path-consistency
  • For , directional i-consistency is
    equivalent to adaptive consistency

46
Example SAT
  • Elimination operation resolution
  • Directional Resolution adaptive consistency
  • (Davis and Putnam, 1960 Dechter and Rish, 1994)
  • Bounded resolution bounds the resolvent size
  • BDR(i) directional i-consistency (Dechter and
    Rish, 1994)
  • k-closure full k-consistency (van Gelder and
    Tsuji, 1996)
  • In general bounded induced-width resolution
  • DCDR(b) generalizes cycle-cutset idea limits
  • induced width by conditioning on cutset
    variables
  • (Rish and Dechter 1996, Rish and Dechter 2000)

47
Directional Resolution ?Adaptive Consistency
48
DR complexity
49
History
  • 1960 resolution-based Davis-Putnam algorithm
  • 1962 resolution step replaced by conditioning
  • (Davis, Logemann and Loveland, 1962) to
    avoid
  • memory explosion, resulting into a
    backtracking search
  • algorithm known as Davis-Putnam (DP), or
    DPLL procedure.
  • The dependency on induced width was not known in
    1960.
  • 1994 Directional Resolution (DR), a rediscovery
    of
  • the original Davis-Putnam, identification of
    tractable classes
  • (Dechter and Rish, 1994).

50
DR versus DPLL complementary properties
(k,m)-tree 3-CNFs (bounded induced width)
Uniform random 3-CNFs (large induced width)
51
Complementary properties gt hybrids
52
BDR-DP(i) bounded resolution backtracking
  • Complete algorithm run BDR(i) as preprocessing
    before
  • the Davis-Putnam backtracking algorithm.
  • Empirical results random vs. structured (low-w)
    problems

53
DCDR(b)ConditioningDR
54
(No Transcript)
55
DCDR(b) empirical results
56
Approximating Elimination Summary
  • Key idea local propagation, restricting the
    number of
  • variables involved in recorded constraints
  • Examples arc-, path-, and i-consistency (CSPs),
    bounded resolution, k-closure (SAT)
  • For SAT
  • bucket-eliminationdirectional resolution
    (original resolution-based Davis-Putnam)
  • ConditioningDPLL (backtracking search)
  • Hybrids bounded resolutionsearch
  • complete algorithms (BDR-DP(i), DCDR(b) )

57
Road map
  • CSPs complete algorithms
  • CSPs approximations
  • Approximating elimination
  • Approximating conditioning
  • Belief nets complete algorithms Belief nets
    approximations
  • MDPs

58
Approximating Conditioning Local Search
  • Problem complete (systematic, exhaustive) search
    can be intractable (O(exp(n) worst-case)
  • Approximation idea explore only parts of search
    space
  • Advantages anytime answer may run into a
    solution quicker than systematic approaches
  • Disadvantages may not find an exact solution
    even if there is one cannot detect that a
    problem is unsatisfiable

59
Simple greedy search
  • 1. Generate a random assignment to all variables
  • 2. Repeat until no improvement made or solution
    found
  • // hill-climbing step
  • 3. flip a variable (change its value)
    that
  • increases the number of satisfied
    constraints

Easily gets stuck at local maxima
60
GSAT local search for SAT(Selman, Levesque and
Mitchell, 1992)
Greatly improves hill-climbing by adding
restarts and sideway moves
  • For i1 to MaxTries
  • Select a random assignment A
  • For j1 to MaxFlips
  • if A satisfies all constraint,
    return A
  • else flip a variable to maximize the
    score
  • (number of satisfied constraints
    if no variable
  • assignment increases the score,
    flip at random)
  • end
  • end

61
WalkSAT (Selman, Kautz and Cohen, 1994)
Adds random walk to GSAT
  • With probability p
  • random walk flip a variable in some
    unsatisfied constraint
  • With probability 1-p
  • perform a hill-climbing step

Randomized hill-climbing often solves large and
hard satisfiable problems
62
Other approaches
  • Different flavors of GSAT with randomization
    (GenSAT by Gent and Walsh, 1993 Novelty by
    McAllester, Kautz and Selman, 1997)
  • Simulated annealing
  • Tabu search
  • Genetic algorithms
  • Hybrid approximations
  • eliminationconditioning

63
Approximating conditioning with elimination
  • Energy minimization in neural networks
  • (Pinkas and Dechter, 1995)

For cycle-cutset nodes, use the greedy update
function (relative to neighbors). For the rest
of nodes, run the arc-consistency algorithm
followed by value assignment.
64
GSAT with Cycle-Cutset(Kask and Dechter, 1996)
Input a CSP, a partition of the variables into
cycle-cutset and tree
variables Output an assignment to all the
variables Within each try Generate a random
initial asignment, and then alternate between
the two steps 1. Run Tree algorithm
(arc-consistencyassignment) on the
problem with fixed values of cutset variables.
2. Run GSAT on the problem with fixed values of
tree variables.
65
Results GSAT with Cycle-Cutset(Kask and
Dechter, 1996)
66
Results GSAT with Cycle-Cutset(Kask and
Dechter, 1996)
67
Road map
  • CSPs complete algorithms
  • CSPs approximations
  • Bayesian belief nets
  • complete algorithms
  • Bucket-elimination
  • Relation to join-tree, Pearls poly-tree
    algorithm, conditioning
  • Belief nets approximations
  • MDPs

68
Belief Networks
Smoking
lung Cancer
Bronchitis
X-ray
Dyspnoea
P(S) P(CS) P(BS) P(XC,S) P(DC,B)
P(S, C, B, X, D)
69
Example Printer Troubleshooting
70
Example Car Diagnosis
71
What are they good for?
  • Diagnosis P(causesymptom)?
  • Prediction P(symptomcause)?
  • Decision-making (given a cost function)

72
Probabilistic Inference Tasks
  • Belief updating
  • Finding most probable explanation (MPE)
  • Finding maximum a-posteriory hypothesis
  • Finding maximum-expected-utility (MEU) decision

73
Belief Updating
Smoking
lung Cancer
Bronchitis
X-ray
Dyspnoea
P (lung canceryes smokingno, dyspnoeayes )
?
74
Moral Graph
Conditional Probability Distribution (CPD)
Clique in moral graph (family)
75
Belief updating P(Xevidence)?
P(ae0)
B
C
E
D
P(a)
76
Bucket elimination Algorithm elim-bel (Dechter
1996)
77
Finding Algorithm elim-mpe (Dechter 1996)
Elimination operator
78
Generating the MPE-tuple
79
Complexity of elimination
The effect of the ordering
80
Other tasks and algorithms
  • MAP and MEU tasks
  • Similar bucket-elimination algorithms - elim-map,
    elim-meu (Dechter 1996)
  • Elimination operation either summation or
    maximization
  • Restriction on variable ordering summation must
    precede maximization (i.e. hypothesis or decision
    variables are eliminated last)
  • Other inference algorithms
  • Join-tree clustering
  • Pearls poly-tree propagation
  • Conditioning, etc.

81
Relationship with join-tree clustering
BCE
ADB
A cluster is a set of buckets (a
super-bucket)
ABC
82
Relationship with Pearls belief propagation in
poly-trees
Pearls belief propagation for
single-root query
elim-bel using topological ordering and
super-buckets for families
Elim-bel, elim-mpe, and elim-map are linear for
poly-trees.
83
Conditioning generates the probability tree
Complexity of conditioning exponential time,
linear space
84
ConditioningElimination
85
Super-bucket elimination(Dechter and El Fattah,
1996)
  • Eliminating several variables at once
  • Conditioning is done only in super-buckets

86
The idea of super-buckets
Larger super-buckets (cliques) gtmore time but
less space
  • Complexity
  • Time exponential in clique (super-bucket) size
  • Space exponential in separator size

87
Application circuit diagnosis
Problem Given a circuit and its unexpected
output, identify faulty components. The problem
can be modeled as a constraint optimization
problem and solved by bucket elimination.
88
Time-Space Tradeoff
89
Road map
  • CSPs complete algorithms
  • CSPs approximations
  • Belief nets complete algorithms
  • Belief nets approximations
  • Local inference mini-buckets
  • Stochastic simulations
  • Variational techniques
  • MDPs

90
Mini-buckets local inference
  • The idea is similar to i-consistency
  • bound the size of recorded dependencies
  • Computation in a bucket is time and space
  • exponential in the number of variables
    involved
  • Therefore, partition functions in a bucket
  • into mini-buckets on smaller number of
    variables

91
Mini-bucket approximation MPE task
Split a bucket into mini-buckets gtbound
complexity
92
Approx-mpe(i)
  • Input i max number of variables allowed in a
    mini-bucket
  • Output lower bound (P of a sub-optimal
    solution), upper bound

Example approx-mpe(3) versus elim-mpe
93
Properties of approx-mpe(i)
  • Complexity O(exp(2i)) time and O(exp(i))
    time.
  • Accuracy determined by upper/lower (U/L) bound.
  • As i increases, both accuracy and complexity
    increase.
  • Possible use of mini-bucket approximations
  • As anytime algorithms (Dechter and Rish, 1997)
  • As heuristics in best-first search (Kask and
    Dechter, 1999)
  • Other tasks similar mini-bucket approximations
    for belief updating, MAP and MEU (Dechter and
    Rish, 1997)

94
Anytime Approximation
95
Empirical Evaluation(Dechter and Rish, 1997
Rish, 1999)
  • Randomly generated networks
  • Uniform random probabilities
  • Random noisy-OR
  • CPCS networks
  • Probabilistic decoding
  • Comparing approx-mpe and anytime-mpe
  • versus elim-mpe

96
Random networks
  • Uniform random 60 nodes, 90 edges (200
    instances)
  • In 80 of cases, 10-100 times speed-up while
    U/Llt2
  • Noisy-OR even better results
  • Exact elim-mpe was infeasible appprox-mpe took
    0.1 to 80 sec.

97
CPCS networks medical diagnosis(noisy-OR model)
Test case no evidence
98
The effect of evidence
More likely evidencegthigher MPE gt higher
accuracy (why?)
Likely evidence versus random (unlikely) evidence
99
Probabilistic decoding
Error-correcting linear block code
State-of-the-art
approximate algorithm iterative belief
propagation (IBP) (Pearls poly-tree algorithm
applied to loopy networks)
100
Iterative Belief Proapagation
  • Belief propagation is exact for poly-trees
  • IBP - applying BP iteratively to cyclic networks
  • No guarantees for convergence
  • Works well for many coding networks

101
approx-mpe vs. IBP
Bit error rate (BER) as a function of noise
(sigma)
102
Mini-buckets summary
  • Mini-buckets local inference approximation
  • Idea bound size of recorded functions
  • Approx-mpe(i) - mini-bucket algorithm for MPE
  • Better results for noisy-OR than for random
    problems
  • Accuracy increases with decreasing noise in
  • Accuracy increases for likely evidence
  • Sparser graphs -gt higher accuracy
  • Coding networks approx-mpe outperfroms IBP on
    low-induced width codes

103
Heuristic search
  • Mini-buckets record upper-bound heuristics
  • The evaluation function over
  • Best-first expand a node with maximal evaluation
    function
  • Branch and Bound prune if f gt upper bound
  • Properties
  • an exact algorithm
  • Better heuristics lead to more prunning

104
Heuristic Function
Given a cost function
P(a,b,c,d,e) P(a) P(ba) P(ca) P(eb,c)
P(db,a)
Define an evaluation function over a partial
assignment as the probability of its best
extension
0
D
0
B
E
0
D
1
A
B
1
D
E
1
D
f(a,e,d) maxb,c P(a,b,c,d,e) P(a)
maxb,c P)ba) P(ca) P(eb,c) P(da,b)
g(a,e,d) H(a,e,d)
105
Heuristic Function
H(a,e,d) maxb,c P(ba) P(ca) P(eb,c)
P(da,b) maxc P(ca) maxb P(eb,c)
P(ba) P(da,b) maxc P(ca) maxb
P(eb,c) maxb P(ba) P(da,b)
H(a,e,d) f(a,e,d) g(a,e,d) H(a,e,d) ³
f(a,e,d) The heuristic function H is compiled
during the preprocessing stage of the
Mini-Bucket algorithm.
106
Heuristic Function
The evaluation function f(xp) can be computed
using function recorded by the Mini-Bucket scheme
and can be used to estimate the probability of
the best extension of partial assignment xpx1,
, xp,
f(xp)g(xp) H(xp )
For example,
maxB P(eb,c) P(da,b)
P(ba) maxC P(ca) hB(e,c) maxD

hB(d,a) maxE hC(e,a) maxA P(a)
hE(a) hD (a)
H(a,e,d) hB(d,a) hC (e,a)
g(a,e,d) P(a)
107
Properties
  • Heuristic is monotone
  • Heuristic is admissible
  • Heuristic is computed in linear time
  • IMPORTANT
  • Mini-buckets generate heuristics of varying
    strength using control parameter bound I
  • Higher bound -gt more preprocessing -gt
  • stronger heuristics -gt less search
  • Allows controlled trade-off between preprocessing
    and search

108
Empirical Evaluation of mini-bucket heuristics
109
Road map
  • CSPs complete algorithms
  • CSPs approximations
  • Belief nets complete algorithms
  • Belief nets approximations
  • Local inference mini-buckets
  • Stochastic simulations
  • Variational techniques
  • MDPs

110
Stochastic Simulation
  • Forward sampling (logic sampling)
  • Likelihood weighing
  • Markov Chain Monte Carlo (MCMC) Gibbs sampling

111
Approximation via Sampling
112
Forward Sampling(logic sampling (Henrion, 1988))

113
Forward sampling (example)
Drawback high rejection rate!
114
Likelihood Weighing(Fung and Chang, 1990
Shachter and Peot, 1990)
Clamping evidenceforward sampling weighing
samples by evidence likelihood
Works well for likely evidence!
115
Gibbs Sampling(Geman and Geman, 1984)
Markov Chain Monte Carlo (MCMC) create a Markov
chain of samples
Advantage guaranteed to converge to
P(X) Disadvantage convergence may be slow
116
Gibbs Sampling (contd)(Pearl, 1988)
Markov blanket
117
Road map
  • CSPs complete algorithms
  • CSPs approximations
  • Belief nets complete algorithms
  • Belief nets approximations
  • Local inference mini-buckets
  • Stochastic simulations
  • Variational techniques
  • MDPs

118
Variational Approximations
  • Idea
  • variational transformation of CPDs simplifies
    inference
  • Advantages
  • Compute upper and lower bounds on P(Y)
  • Usually faster than sampling techniques
  • Disadvantages
  • More complex and less general re-derived for
    each particular form of CPD functions

119
Variational bounds example
log(x)
This approach can be generalized for any concave
(convex) function in order to compute its
upper (lower) bounds
120
Convex duality approach(Jaakkola and Jordan,
1997)
121
Example QMR-DT network(Quick Medical Reference
Decision-Theoretic (Shwe et al., 1991))
600 diseases
4000 findings
Noisy-OR model
122
Inference in QMR-DT
factorized
Positive evidence couples the disease nodes
factorized
Inference complexity O(exp(minp,k)) p
of positive findings, k max family
size (Heckerman, 1989 (Quickscore), Rish and
Dechter, 1998)
123
Variational approach to QMR-DT(Jaakkola and
Jordan, 1997)
The effect of positive evidence is now factorized
(diseases are decoupled)
124
Variational approach (cont.)
  • Bounds on local CPDs yield a bound on posterior
  • Two approaches sequential and block
  • Sequential applies variational transformation to
    (a subset of) nodes sequentially during inference
    using a heuristic node ordering then optimizes
    across variational parameters
  • Block selects in advance nodes to be
    transformed, then selects variational parameters
    minimizing the KL-distance between true and
    approximate posteriors

125
Block approach
126
Variational approach summary
  • Variational approximations were successfully
    applied to inference in QMR-DT and neural
    networks (logistic functions), and to learning
    (approximate E step in EM-algorithm)
  • For more details, see
  • Saul, Jaakkola, and Jordan, 1996
  • Jaakkola and Jordan, 1997
  • Neal and Hinton, 1998
  • Jordan, 1999

127
Road map
  • CSPs complete algorithms
  • CSPs approximations
  • Belief nets complete algorithms
  • Belief nets approximations
  • MDPs
  • Elimination and Conditioning

128
Decision-Theoretic Planning
Example robot navigation
  • State X, Y, Battery_Level
  • Actions Go_North, Go_South, Go_West, Go_East
  • Probability of success P
  • Task reach the goal location ASAP

129
Dynamic Belief Networks (DBNs)
Two-stage influence diagram
Interaction graph
130
Markov Decision Process
131
Dynamic Programming Elimination
132
Bucket Elimination
Complexity O(exp(w))
133
MDPs Elimination and Conditioning
  • Finite-horizon MDPs
  • dynamic programmingelimination along
    temporal ordering (N slices)
  • Infinite-horizon MDPs
  • Value Iteration (VI) elimination along
    temporal ordering (iterative)
  • Policy Iteration (PI) conditioning on Aj,
    elimination on Xj (iterative)
  • Bucket elimination non-temporal orderings
  • Complexity

134
MDPs approximations
  • Open directions for further research
  • Applying probabilistic inference approximations
    to DBNs
  • Handling actions (rewards)
  • Approximating elimination, heuristic search, etc.

135
Conclusions
  • Common reasoning approaches elimination and
    conditioning
  • Exact reasoning is often intractable gt need
    approximations
  • Approximation principles
  • Approximating elimination local inference,
    bounding size of dependencies among variables
    (cliques in a problems graph).
  • Mini-buckets, IBP, i-consistency enforcing
  • Approximating conditioning local search,
    stochastic simulations
  • Other approximations variational techniques,
    etc.
  • Further research
  • Combining orthogonal approximation approaches
  • Better understanding of what works well where
    which approximation suits which problem structure
  • Other approximation paradigms (e.g., other ways
    of approximating probabilities, constraints, cost
    functions)
Write a Comment
User Comments (0)
About PowerShow.com