Approximation Techniques bounded inference - PowerPoint PPT Presentation

About This Presentation
Title:

Approximation Techniques bounded inference

Description:

Dagum and Luby 1993: approximation up to a relative error is NP-hard. ... MC extends the partition based approximation from mini-buckets to general tree ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 70
Provided by: ibm359
Learn more at: https://ics.uci.edu
Category:

less

Transcript and Presenter's Notes

Title: Approximation Techniques bounded inference


1
Approximation Techniques bounded inference
  • COMPSCI 276
  • Fall 2007

2
Approximate Inference
  • Metrics of evaluation
  • Absolute error give egt0 and a query p P(xe),
    an estimate r has absolute error e iff p-rlte
  • Relative error the ratio r/p in 1-e,1e.
  • Dagum and Luby 1993 approximation up to a
    relative error is NP-hard.
  • Absolute error is also NP-hard if error is less
    than .5

3
Mini-buckets local inference
  • Computation in a bucket is time and space
  • exponential in the number of variables
    involved
  • Therefore, partition functions in a bucket
  • into mini-buckets on smaller number of
    variables
  • (The idea is similar to i-consistency
  • bound the size of recorded dependencies,
    Dechter 2003)

4
Mini-bucket approximation MPE task
Split a bucket into mini-buckets gtbound
complexity
5
Mini-Bucket Elimination
Mini-buckets
minBS
minBS
bucket B
F(b,d)
F(b,e)
F(a,b)
F(b,c)
A
hB(a,c)
F(c,e)
F(a,c)
bucket C
B
C
F(a,d)
hB(d,e)
bucket D
E
D
hC(e,a)
e 0
hD(e,a)
bucket E
hE(a)
bucket A
L lower bound
5
6
Semantics of Mini-Bucket Splitting a Node
Variables in different buckets are renamed and
duplicated (Kask et. al., 2001), (Geffner et.
al., 2007), (Choi, Chavira, Darwiche , 2007)
After SplittingNetwork N'
Before SplittingNetwork N
U
U
Û
6
7
Approx-mpe(i)
  • Input i max number of variables allowed in a
    mini-bucket
  • Output lower bound (P of a sub-optimal
    solution), upper bound

Example approx-mpe(3) versus elim-mpe
8
MBE(i,m), (MBE(i) (also, approx-mpe)
  • Input Belief network (P1,Pn)
  • Output upper and lower bounds
  • Initialize (put functions in buckets)
  • Process each bucket from pn to 1
  • Create (i,m)-mini-buckets
  • Process each mini-bucket
  • (For mpe) assign values in ordering d
  • Return mpe-tuple, upper and lower bounds

9
Properties of approx-mpe(i)
  • Complexity O(exp(2i)) time and O(exp(i))
    space.
  • Accuracy determined by upper/lower (U/L) bound.
  • As i increases, both accuracy and complexity
    increase.
  • Possible use of mini-bucket approximations
  • As anytime algorithms (Dechter and Rish, 1997)
  • As heuristics in best-first search (Kask and
    Dechter, 1999)
  • Other tasks similar mini-bucket approximations
    for belief updating, MAP and MEU (Dechter and
    Rish, 1997)

10
Anytime Approximation
11
Bounded Inference for belief updating
  • Idea mini-bucket is the same
  • So we can apply a sum in each mini-bucket, or
    better, one sum and the rest max, or min (for
    lower-bound)
  • Approx-bel-max(i,m) generating upper and
    lower-bound on beliefs approximates elim-bel
  • Approx-map(i,m) max buckets will be maximized,
    sum buckets will be sum-max. Approximates
    elim-map.

12
Empirical Evaluation(Dechter and Rish, 1997
Rish thesis, 1999)
  • Randomly generated networks
  • Uniform random probabilities
  • Random noisy-OR
  • CPCS networks
  • Probabilistic decoding
  • Comparing approx-mpe and anytime-mpe
  • versus elim-mpe

13
Causal Independence
  • Event X has two possible causes A,B. It is hard
    to elicit P(XA,B) but it is easy to determine
    P(XA) and P(XB).
  • Example several diseases causes a symptom.
  • Effect of A on X is independent from the effect
    of B on X
  • Causal Independence, using canonical models
  • Noisy-O, Noisy AND, noisy-max

A
B
X
14
Binary OR
A
B
X
A
B
P(X0A,B)
P(X1A,B)
0
0
1
0
0
1
0
1
1
0
0
1
1
1
0
1
15
Noisy-OR
  • noise is associated with each edge
  • described by noise parameter ? ? 0,1
  • Let q b0.2, qa 0.1
  • P(x0a,b) (1-?a) (1-?b)
  • P(x1a,b)1-(1-?a) (1-?b)

A
B
?a
?b
X
A
B
P(X0A,B)
P(X1A,B)
0
0
1
0
0
1
0.1
0.9
qiP(X0A_i1,else 0)
1
0
0.2
0.8
1
1
0.02
0.98
16
Closed Form Bel(X) - 1
Given noisy-or CPT P(xu) noise parameters
?i Tu i Ui 1 Define qi 1 - ?I Then
17
Closed Form Bel(X) - 2
Using Iterative Belief Propagation
Set piix pix (uk1). Then we can show that
18
Methodology for Empirical Evakuation (for mpe)
  • U/L accuracy
  • Better (U/mpe) or mpe/L
  • Benchmarks Random networks
  • Given n,e,v generate a random DAG
  • For xi and parents generate table from uniform
    0,1, or noisy-or
  • Create k instances. For each generate random
    evidence, likely evidence
  • Measure averages

19
Random networks
  • Uniform random 60 nodes, 90 edges (200
    instances)
  • In 80 of cases, 10-100 times speed-up while
    U/Llt2
  • Noisy-OR even better results
  • Exact elim-mpe was infeasible appprox-mpe took
    0.1 to 80 sec.

20
CPCS networks medical diagnosis(noisy-OR model)
Test case no evidence
21
The effect of evidence
More likely evidencegthigher MPE gt higher
accuracy (why?)
Likely evidence versus random (unlikely) evidence
22
Probabilistic decoding
Error-correcting linear block code
State-of-the-art
approximate algorithm iterative belief
propagation (IBP) (Pearls poly-tree algorithm
applied to loopy networks)
23
Iterative Belief Proapagation
  • Belief propagation is exact for poly-trees
  • IBP - applying BP iteratively to cyclic networks
  • No guarantees for convergence
  • Works well for many coding networks

24
approx-mpe vs. IBP
Bit error rate (BER) as a function of noise
(sigma)
25
Mini-buckets summary
  • Mini-buckets local inference approximation
  • Idea bound size of recorded functions
  • Approx-mpe(i) - mini-bucket algorithm for MPE
  • Better results for noisy-OR than for random
    problems
  • Accuracy increases with decreasing noise in
    coding
  • Accuracy increases for likely evidence
  • Sparser graphs -gt higher accuracy
  • Coding networks approx-mpe outperfroms IBP on
    low-induced width codes

26
Cluster Tree Elimination - properties
  • Correctness and completeness Algorithm CTE is
    correct, i.e. it computes the exact joint
    probability of a single variable and the
    evidence.
  • Time complexity O ( deg ? (nN) ? d w1 )
  • Space complexity O ( N ? d sep)
  • where deg the maximum degree of a node
  • n number of variables ( number of CPTs)
  • N number of nodes in the tree decomposition
  • d the maximum domain size of a variable
  • w the induced width
  • sep the separator size

27
Cluster Tree Elimination - the messages
A B C p(a), p(ba), p(ca,b)
1
BC
B C D F p(db), p(fc,d) h(1,2)(b,c)
2
sep(2,3)B,F elim(2,3)C,D
BF
B E F p(eb,f), h(2,3)(b,f)
3
EF
E F G p(ge,f)
4
28
Mini-Clustering for belief updating
  • Motivation
  • Time and space complexity of Cluster Tree
    Elimination depend on the induced width w of the
    problem
  • When the induced width w is big, CTE algorithm
    becomes infeasible
  • The basic idea
  • Try to reduce the size of the cluster (the
    exponent) partition each cluster into
    mini-clusters with less variables
  • Accuracy parameter i maximum number of
    variables in a mini-cluster
  • The idea was explored for variable elimination
    (Mini-Bucket)

29
Idea of Mini-Clustering
30
Mini-Clustering - MC
Mini-Clustering, i3
A B C p(a), p(ba), p(ca,b)
Cluster Tree Elimination
1
BC
B C D F p(db), p(fc,d)
B C D F p(db), h(1,2)(b,c), p(fc,d)
2
2
BF
sep(2,3) B,F elim(2,3) C,D
B E F p(eb,f)
3
EF
E F G p(ge,f)
4
31
Mini-Clustering - the messages, i3
A B C p(a), p(ba), p(ca,b)
1
BC
B C D p(db), h(1,2)(b,c) C D F p(fc,d)
2
sep(2,3)B,F elim(2,3)C,D
BF
B E F p(eb,f), h1(2,3)(b), h2(2,3)(f)
3
EF
E F G p(ge,f)
4
32
Mini-Clustering - example
ABC
1
BC
BCDF
2
BF
BEF
3
EF
EFG
4
33
Cluster Tree Elimination vs. Mini-Clustering
ABC
ABC
1
1
BC
BC
BCDF
BCDF
2
2
BF
BF
BEF
BEF
3
3
EF
EF
EFG
EFG
4
4
34
Mini-Clustering
  • Correctness and completeness Algorithm MC(i)
    computes a bound (or an approximation) on the
    joint probability P(Xi,e) of each variable and
    each of its values.
  • Time space complexity O(n ? hw ? d i)
  • where hw maxu f f ? ?(u) ? ?

35
Lower bounds and mean approximations
  • We can replace max operator by
  • min gt lower bound on the joint
  • mean gt approximation of the joint

36
Normalization
  • MC can compute an (upper) bound on
    the joint P(Xi,e)
  • Deriving a bound on the conditional P(Xie) is
    not easy when the exact P(e) is not available
  • If a lower bound would be available, we
    could useas an upper bound on the posterior
  • In our experiments we normalized the results and
    regarded them as approximations of the posterior
    P(Xie)

37
Experimental results
  • Algorithms
  • Exact
  • IBP
  • Gibbs sampling (GS)
  • MC with normalization (approximate)
  • Networks (all variables are binary)
  • Coding networks
  • CPCS 54, 360, 422
  • Grid networks (MxM)
  • Random noisy-OR networks
  • Random networks
  • Measures
  • Normalized Hamming Distance (NHD)
  • BER (Bit Error Rate)
  • Absolute error
  • Relative error
  • Time

38
Random networks - Absolute error
evidence0
evidence10
39
Noisy-OR networks - Absolute error
evidence10
evidence20
40
Grid 15x15 - 10 evidence
41
CPCS422 - Absolute error
evidence0
evidence10
42
Coding networks - Bit Error Rate
sigma0.22
sigma.51
43
Mini-Clustering summary
  • MC extends the partition based approximation from
    mini-buckets to general tree decompositions for
    the problem of belief updating
  • Empirical evaluation demonstrates its
    effectiveness and superiority (for certain types
    of problems, with respect to the measures
    considered) relative to other existing algorithms

44
What is IJGP?
  • IJGP is an approximate algorithm for belief
    updating in Bayesian networks
  • IJGP is a version of join-tree clustering which
    is both anytime and iterative
  • IJGP applies message passing along a join-graph,
    rather than a join-tree
  • Empirical evaluation shows that IJGP is almost
    always superior to other approximate schemes
    (IBP, MC)

45
Iterative Belief Propagation - IBP
One step update BEL(U1)
U1
U2
U3
  • Belief propagation is exact for poly-trees
  • IBP - applying BP iteratively to cyclic networks
  • No guarantees for convergence
  • Works well for many coding networks

X2
X1
46
IJGP - Motivation
  • IBP is applied to a loopy network iteratively
  • not an anytime algorithm
  • when it converges, it converges very fast
  • MC applies bounded inference along a tree
    decomposition
  • MC is an anytime algorithm controlled by i-bound
  • MC converges in two passes up and down the tree
  • IJGP combines
  • the iterative feature of IBP
  • the anytime feature of MC

47
IJGP - The basic idea
  • Apply Cluster Tree Elimination to any join-graph
  • We commit to graphs that are minimal I-maps
  • Avoid cycles as long as I-mapness is not violated
  • Result use minimal arc-labeled join-graphs

48
IJGP - Example
A
B
C
A
C
A
ABC
C
A
AB
BC
C
D
E
ABDE
BCE
BE
C
DE
CE
F
CDEF
G
H
H
FGH
H
F
F
FG
GH
H
GI
I
J
FGI
GHIJ
a) Belief network
a) The graph IBP works on
49
Arc-minimal join-graph
A
C
A
A
ABC
C
A
ABC
C
A
AB
BC
C
AB
BC
ABDE
BCE
ABDE
BCE
BE
C
C
DE
CE
DE
CE
CDEF
CDEF
H
H
FGH
H
FGH
H
F
F
F
FG
GH
H
FG
GH
GI
GI
FGI
GHIJ
FGI
GHIJ
50
Minimal arc-labeled join-graph
A
A
A
ABC
C
A
ABC
C
AB
BC
AB
BC
ABDE
BCE
ABDE
BCE
C
C
DE
CE
DE
CE
CDEF
CDEF
H
H
FGH
H
FGH
H
F
F
FG
GH
F
GH
GI
GI
FGI
GHIJ
FGI
GHIJ
51
Join-graph decompositions
A
A
ABC
C
AB
BC
BC
BC
ABDE
BCE
ABCDE
BCE
ABCDE
BCE
C
DE
CE
CDE
CE
DE
CE
CDEF
CDEF
CDEF
H
FGH
H
FGH
FGH
F
F
F
F
GH
F
GH
F
GH
GI
GI
GI
FGI
GHIJ
FGI
GHIJ
FGI
GHIJ
a) Minimal arc-labeled join graph
b) Join-graph obtained by collapsing nodes of
graph a)
c) Minimal arc-labeled join graph
52
Tree decomposition
BC
ABCDE
BCE
ABCDE
DE
CE
CDE
CDEF
CDEF
FGH
F
F
F
GH
GHI
GI
FGI
GHIJ
FGHI
GHIJ
a) Minimal arc-labeled join graph
a) Tree decomposition
53
Join-graphs
more accuracy
less complexity
54
Message propagation
BC
ABCDE
BCE
ABCDE p(a), p(c), p(bac), p(dabe),p(eb,c)
h(3,1)(bc)
h(3,1)(bc)
CDE
BCD
1
3
CE
BC
CDEF
FGH
h(1,2)
CDE
CE
F
F
GH
CDEF
2
GI
FGI
GHIJ
Minimal arc-labeled sep(1,2)D,E
elim(1,2)A,B,C
Non-minimal arc-labeled sep(1,2)C,D,E
elim(1,2)A,B
55
Bounded decompositions
  • We want arc-labeled decompositions such that
  • the cluster size (internal width) is bounded by i
    (the accuracy parameter)
  • the width of the decomposition as a graph
    (external width) is as small as possible
  • Possible approaches to build decompositions
  • partition-based algorithms - inspired by the
    mini-bucket decomposition
  • grouping-based algorithms

56
Partition-based algorithms
GFE
P(GF,E)
EF
EBF
P(EB,F)
P(FC,D)
BF
F
FCD
BF
CD
CDB
P(DB)
CB
B
CAB
P(CA,B)
BA
BA
P(BA)
A
A
P(A)
a) schematic mini-bucket(i), i3 b) arc-labeled
join-graph decomposition
57
IJGP properties
  • IJGP(i) applies BP to min arc-labeled
    join-graph, whose cluster size is bounded by i
  • On join-trees IJGP finds exact beliefs
  • IJGP is a Generalized Belief Propagation
    algorithm (Yedidia, Freeman, Weiss 2001)
  • Complexity of one iteration
  • time O(deg(nN) d i1)
  • space O(Nd?)

58
Empirical evaluation
  • Measures
  • Absolute error
  • Relative error
  • Kulbach-Leibler (KL) distance
  • Bit Error Rate
  • Time
  • Algorithms
  • Exact
  • IBP
  • MC
  • IJGP
  • Networks (all variables are binary)
  • Random networks
  • Grid networks (MxM)
  • CPCS 54, 360, 422
  • Coding networks

59
Random networks - KL at convergence
evidence0
evidence5
60
Random networks - KL vs. iterations
evidence0
evidence5
61
Random networks - Time
62
Coding networks - BER
sigma.22
sigma.32
sigma.51
sigma.65
63
Coding networks - Time
64
IJGP summary
  • IJGP borrows the iterative feature from IBP and
    the anytime virtues of bounded inference from MC
  • Empirical evaluation showed the potential of
    IJGP, which improves with iteration and most of
    the time with i-bound, and scales up to large
    networks
  • IJGP is almost always superior, often by a high
    margin, to IBP and MC
  • Based on all our experiments, we think that IJGP
    provides a practical breakthrough to the task of
    belief updating

65
Heuristic search
  • Mini-buckets record upper-bound heuristics
  • The evaluation function over
  • Best-first expand a node with maximal evaluation
    function
  • Branch and Bound prune if f gt upper bound
  • Properties
  • an exact algorithm
  • Better heuristics lead to more prunning

66
Heuristic Function
Given a cost function
P(a,b,c,d,e) P(a) P(ba) P(ca) P(eb,c)
P(db,a)
Define an evaluation function over a partial
assignment as the probability of its best
extension
0
D
0
B
E
0
D
1
A
B
1
D
E
1
D
f(a,e,d) maxb,c P(a,b,c,d,e) P(a)
maxb,c P)ba) P(ca) P(eb,c) P(da,b)
g(a,e,d) H(a,e,d)
67
Heuristic Function
H(a,e,d) maxb,c P(ba) P(ca) P(eb,c)
P(da,b) maxc P(ca) maxb P(eb,c)
P(ba) P(da,b) maxc P(ca) maxb
P(eb,c) maxb P(ba) P(da,b)
H(a,e,d) f(a,e,d) g(a,e,d) H(a,e,d) ³
f(a,e,d) The heuristic function H is compiled
during the preprocessing stage of the
Mini-Bucket algorithm.
68
Heuristic Function
The evaluation function f(xp) can be computed
using function recorded by the Mini-Bucket scheme
and can be used to estimate the probability of
the best extension of partial assignment xpx1,
, xp,
f(xp)g(xp) H(xp )
For example,
maxB P(eb,c) P(da,b)
P(ba) maxC P(ca) hB(e,c) maxD

hB(d,a) maxE hC(e,a) maxA P(a)
hE(a) hD (a)
H(a,e,d) hB(d,a) hC (e,a)
g(a,e,d) P(a)
69
Properties
  • Heuristic is monotone
  • Heuristic is admissible
  • Heuristic is computed in linear time
  • IMPORTANT
  • Mini-buckets generate heuristics of varying
    strength using control parameter bound I
  • Higher bound -gt more preprocessing -gt
  • stronger heuristics -gt less search
  • Allows controlled trade-off between preprocessing
    and search

70
Empirical Evaluation of mini-bucket heuristics
Write a Comment
User Comments (0)
About PowerShow.com