Bayesian Networks presentation

About This Presentation

Transcript and Presenter's Notes

Title: Bayesian Networks

1
Bayesian Networks

??????
???????
?????????

2
Contents

Introduction
Probability Theory ? Skip
Inference
Clique Tree Propagation
Building the Clique Tree
Inference by Propagation

3
Bayesian Networks

Introduction
???????
?????????

4
What is Bayesian Networks?

Bayesian Networks are directed acyclic graphs
(DAGs) with an associated set of probability
tables.
The nodes are random variables.
Certain independence relations can be induced by
the topology of the graph.

5
Why Use a Bayesian Network?

Deal with uncertainty in inference via
probability ? Bayes.
Handle incomplete data set, e.g., classification,
regression.
Model the domain knowledge, e.g., causal
relationships.

6
Example
Use a DAG to model the causality.
Train Strike
Norman Oversleep
Martin Oversleep
Martin Late
Norman Late
Project Delay
Office Dirty
Boss Failure-in-Love
Boss Angry
7
Example
Attach prior probabilities to all root nodes
Norman oversleep Probability
T 0.2
F 0.8
Train Strike Probability
T 0.1
F 0.9
Martin oversleep Probability
T 0.01
F 0.99
Boss failure-in-love Probability
T 0.01
F 0.99
8
Example
Attach prior probabilities to non-root nodes
Each column is summed to 1.
Norman untidy
Train strike Train strike Train strike Train strike
T T F F
Martin oversleep Martin oversleep Martin oversleep Martin oversleep
T F T F
Martin Late T 0.95 0.8 0.7 0.05
Martin Late F 0.05 0.2 0.3 0.95
Norman oversleep Norman oversleep
T F
Norman untidy T 0.6 0.2
Norman untidy F 0.4 0.8
9
Example
Attach prior probabilities to non-root nodes
Boss Failure-in-love Boss Failure-in-love Boss Failure-in-love Boss Failure-in-love Boss Failure-in-love Boss Failure-in-love Boss Failure-in-love Boss Failure-in-love
T T T T F F F F
Project Delay Project Delay Project Delay Project Delay Project Delay Project Delay Project Delay Project Delay
T T F F T T F F
Office Dirty Office Dirty Office Dirty Office Dirty Office Dirty Office Dirty Office Dirty Office Dirty
T F T F T F T F
Boss Angry very 0.98 0.85 0.6 0.5 0.3 0.2 0 0.01
Boss Angry mid 0.02 0.15 0.3 0.25 0.5 0.5 0.2 0.02
Boss Angry little 0 0 0.1 0.25 0.2 0.3 0.7 0.07
Boss Angry no 0 0 0 0 0 0 0.1 0.9
Each column is summed to 1.
Norman untidy
What is the difference between probability
fuzzy measurements?
10
Example
Medical Knowledge
11
Definition of Bayesian Networks

A Bayesian network is a directed acyclic graph
with
the following properties
Each node represents a random variable.
Each node representing a variable A with parent
nodes representing variables B1, B2,..., Bn is
assigned a conditional probability table (CPT)

12
Problems

How to inference?
How to learn the probabilities from data?
How to learn the structure from data?
What applications we may have?

Bad news All of them are NP-Hard
13
Bayesian Networks

Inference
???????
?????????

14
Inference
15
Example
Train Strike Probability
T 0.1
F 0.9
Train Strike Train Strike
T F
Norman Late T 0.8 0.1
Norman Late F 0.2 0.9
Train Strike Train Strike
T F
Martin Late T 0.6 0.5
Martin Late F 0.4 0.5
Questions
P (Martin Late, Norman Late, Train
Strike)?
Joint distribution
P(Martin Late)?
Marginal distribution
Conditional distribution
P(Matrin Late Norman Late )?
16
Example
A B C Probability
T T T 0.048
F T T 0.032
T F T 0.012
F F T 0.008
T T F 0.045
F T F 0.045
T F F 0.405
F F F 0.405
Demo
Train Strike Probability
T 0.1
F 0.9
C
Train Strike Train Strike
T F
Norman Late T 0.8 0.1
Norman Late F 0.2 0.9
Train Strike Train Strike
T F
Martin Late T 0.6 0.5
Martin Late F 0.4 0.5
B
A
Questions
P (Martin Late, Norman Late, Train
Strike)?
Joint distribution
e.g.,
17
Example
A B C Probability
T T T 0.048
F T T 0.032
T F T 0.012
F F T 0.008
T T F 0.045
F T F 0.045
T F F 0.405
F F F 0.405
A B Probability
T T 0.093
F T 0.077
T F 0.417
F F 0.413
Demo
Train Strike Probability
T 0.1
F 0.9
C
Train Strike Train Strike
T F
Norman Late T 0.8 0.1
Norman Late F 0.2 0.9
Train Strike Train Strike
T F
Martin Late T 0.6 0.5
Martin Late F 0.4 0.5
B
A
Questions
P (Martin Late, Norman Late)?
Marginal distribution
e.g.,
18
Example
A B C Probability
T T T 0.048
F T T 0.032
T F T 0.012
F F T 0.008
T T F 0.045
F T F 0.045
T F F 0.405
F F F 0.405
A B Probability
T T 0.093
F T 0.077
T F 0.417
F F 0.413
Train Strike Probability
T 0.1
F 0.9
C
Train Strike Train Strike
T F
Norman Late T 0.8 0.1
Norman Late F 0.2 0.9
Train Strike Train Strike
T F
Martin Late T 0.6 0.5
Martin Late F 0.4 0.5
B
A
A Probability
T 0.51
F 0.49
Demo
Questions
P (Martin Late)?
Marginal distribution
e.g.,
19
Example
A B C Probability
T T T 0.048
F T T 0.032
T F T 0.012
F F T 0.008
T T F 0.045
F T F 0.045
T F F 0.405
F F F 0.405
A B Probability
T T 0.093
F T 0.077
T F 0.417
F F 0.413
Train Strike Probability
T 0.1
F 0.9
C
Train Strike Train Strike
T F
Norman Late T 0.8 0.1
Norman Late F 0.2 0.9
Train Strike Train Strike
T F
Martin Late T 0.6 0.5
Martin Late F 0.4 0.5
B
A
A Probability
T 0.51
F 0.49
B Probability
T 0.17
F 0.83
Questions
P (Martin Late Norman Late )?
Conditional distribution
e.g.,
Demo
20
Inference Methods

Exact Algorithms
Probability propagation
Variable elimination
Cutset Conditioning
Dynamic Programming
Approximation Algorithms
Variational methods
Sampling (Monte Carlo) methods
Loopy belief propagation
Bounded cutset conditioning
Parametric approximation methods

21
Independence Assertions
The given terms are called evidences.

Bayesian Networks have build-in independent
assertions.
An independence assertion is a statement of the
form
X and Y are independent given Z
We called that X and Y are d-separated by Z.

That is,
or
22
d-Separation
Z
23
Type of Connections
Serial Connections
Yi Z Xj
Converge Connections
Y1/2 Z Y3/4
Z
Y3 Z Y4
Diverge Connections
Xi Z Xj
24
d-Separation
Serial
Converge
Diverge
25
Joint Distribution
JPT Joint probability table
CPT Conditional probability table
? With this, we can compute all probabilities
By chain rule
By independence assertions
Parents of Xi
Consider binary random variables

To store JPT of all r.vs 2n ?1 table entries
To store CPT of all r.vs ? table entries

26
Joint Distribution

Consider binary random variables

To store JPT of all r.vs 2n ?1 table entries
To store CPT of all r.vs ? table entries

27
Joint Distribution
To store JPT of all random variables
To store CPT of all random variables
28
More on d-Separation
A path from X to Y is d-connecting w.r.t evidence
nodes E is every interior nodes N in the path has
the property that either

It is linear or diverge and not a member of E or
It is converging, and either N or one of its
descendants is in E.

29
More on d-Separation
Identify the d-connecting and non-d-connecting
paths from X to Y.
A path from X to Y is d-connecting w.r.t evidence
nodes E is every interior nodes N in the path has
the property that either

It is linear or diverge and not a member of E or
It is converging, and either N or one of its
descendants is in E.

30
More on d-Separation
Two nodes are d-separated if there is no
d-connecting path between them.
Exercise
Withdraw minimum number of edges such that X and
Y are d-separated.
31
More on d-Separation
Two set of nodes, say, XX1, , Xm and YY1,
, Yn are d-separated w.r.t. evidence nodes E if
any pair of Xi and Yj are d-separated w.r.t. E.
In this case, we have
32
Bayesian Networks

Clique Tree Propagation
???????
?????????

33
References

Developed by Lauritzen and Spiegelhalter and
refined by Jensen et al.

Lauritzen, S. L., and Spiegelhalter, D. J., Local
computations with probabilities on graphical
structures and their application to expert
systems, J. Roy. Stat. Soc. B, 50, 157-224,
1988. Jensen, F. V., Lauritzen, S. L., and
Olesen, K. G., Bayesian updating in causal
probabilistic networks by local computations,
Comp. Stat. Quart., 4, 269-282, 1990. Shenoy,
P., and Shafer, G., Axioms for probability and
belief-function propagation, in Uncertainty and
Articial Intelligence, Vol. 4 (R. D. Shachter, T.
Levitt, J. F. Lemmer and L. N. Kanal, Eds.),
Elsevier, North-Holland, Amsterdam, 169-198, 1990.
34
Clique Tree Propagation (CTP)

Given a Bayesian Network, build a secondary
structure, called clique tree.
An undirected tree
Inference by propagation the belief potential
among tree nodes.
It is an exact algorithm.

35
Notations
Item Notation Notation Examples
Random variables uninitiated uppercase A, B, C
Random variables initiated lowercase a, b, c
Random vectors uninitiated Boldface uppercase X, Y, Z
Random vectors initiated Boldface lowercase x, y, z
36
Definition Family of a Node
The family of a node V, denoted as FV, is defined
by
Examples
37
Potential and Distributions
We will model the probability tables as potential
functions.
a P(a)
on 0.5
off 0.5
Function of a.
All of these tables map a set of random variables
to a real value.
Prior probability
b b a a
b b on off
P(b a) on 0.7 0.2
P(b a) off 0.3 0.8
Conditional probability
Conditional probability
f f d d d d
f f on on off off
f f e e e e
f f on off on off
P(f de) on 0.95 0.8 0.7 0.05
P(f de) off 0.05 0.2 0.3 0.95
Function of a and b.
Function of d, e and f.
38
Potential
Used to implement matrices or tables.
Two operations
1. Marginalization 2. Multiplication
39
Marginalization
A B C ?ABC
T T T 0.048
F T T 0.032
T F T 0.012
F F T 0.008
T T F 0.045
F T F 0.045
T F F 0.405
F F F 0.405
Example
A B ?AB
T T 0.093
F T 0.077
T F 0.417
F F 0.413
A ?A
T 0.51
F 0.49
40
Multiplication
A B C ?ABC
T T T 0.093? 0.080.00744
F T T 0.077? 0.080.00616
T F T 0.417? 0.020.00834
F F T 0.413? 0.020.00826
T T F 0.093? 0.090.00837
F T F 0.077? 0.090.00693
T F F 0.417? 0.910.37947
F F F 0.413? 0.910.37583
Not necessary sum to one.
x and y are consistent with z.
Example
A B ?AB
T T 0.093
F T 0.077
T F 0.417
F F 0.413
B C ?AB
T T 0.08
F T 0.02
T F 0.09
F F 0.91
41
The Secondary Structure
Given a Bayesian Network over a set of
variables U V1, , Vn , its secondary
structure contains a graphical and a numerical
component.
Graphic Component
An undirected clique tree satisfies the join
tree property.
Numerical Component
Belief potentials on nodes and edges.
42
The Clique Tree T
How to build a clique tree?
The clique tree T for a belief network over a
set of variables U V1, , Vn satisfies the
following properties.

Each node in T is a cluster or clique (nonempty
set) of variables.
The clusters satisfy the join tree property
Given two clusters X and Y in T, all clusters on
the path between X and Y contain X??Y.
For each variable V?U, FV is included in at least
one of the cluster.
Sepsets Each edge in T is labeled with the
intersection of the adjacent clusters.

43
The Numeric Component
How to assign belief functions?
Clusters and sepsets are attached with belief
functions.

For each cluster X and neighboring sepset S, it
holds that
It also holds that

Local Consistency
Global Consistency
44
The Numeric Component
How to assign belief functions?
Clusters and sepsets are attached with belief
functions.
The key step to satisfy these constraints by
letting
and
If so,
45
Bayesian Networks

Building the Clique Tree
???????
?????????

46
The Steps
Belief Network
Moral Graph
Triangulated Graph
Clique Set
Join Tree
47
Moral Graph
Belief Network
Moral Graph

Convert the directed graph to undirected.
Connect each pair of parent nodes for each node.

48
Triangulation
This step is, in fact, done by incorporating with
the next step.
Moral Graph
Triangulated Graph

Triangulate the cycles with length more than 4

There are many ways.
49
Select Clique Set

Copy GM to GM.
While GM is not empty
select a node V from GM, according to a
criterion.
Node V and its neighbor form a cluster.
Connect all the nodes in the cluster. For each
edge added to GM, add the same edge to GM.
Remove V from GM.

50
Select Clique Set

Criterion
The weight of a node V is the number of values of
V.
The weight of a cluster is the product of it
constituent nodes.
Choose the node that causes the least number of
edges to be added.
Breaking ties by choosing the node that induces
the cluster with the smallest weight.

Copy GM to GM.
While GM is not empty
select a node V from GM, according to a
criterion.
Node V and its neighbor form a cluster.
Connect all the nodes in the cluster. For each
edge added to GM, add the same edge to GM.
Remove V from GM.

51
Select Clique Set

Criterion
The weight of a node V is the number of values of
V.
The weight of a cluster is the product of it
constituent nodes.
Choose the node that causes the least number of
edges to be added.
Breaking ties by choosing the node that induces
the cluster with the smallest weight.

52
Building an Optimal Join Tree
We need to find minimal number of edges to
connect these cliques, i.e. to build a tree.
Given n nodes to build a tree, n?1 edges are
required.
There are many ways.
How to achieve optimality?
53
Building an Optimal Join Tree

Begin with a set of n trees, each consisting of a
single clique, and an empty set S.
For each distinct pair of cliques X and Y
Create a candidate sepset SXY X?Y, with
backpointers to X and Y.
Insert SXY to S.
Repeat until n?1 sepsets have been inserted into
the forest.
Select a sepset SXY from S, according to the
criterion described in the next slide. Delete SXY
from S.
Insert SXY between cliques X and Y only if X and
Y are on different trees in the forest.

54
Building an Optimal Join Tree

Criterion
The mass of SXY is the number of nodes in X?Y.
The cost of SXY is the weight X plus the weight
Y.
The weight of a node V is the number of values of
V.
The weight of a set of nodes X is the product of
it constituent nodes in X.
Choose the sepset with causes the largest mass.
Breaking ties by choosing the sepset with the
smallest cost.

Begin with a set of n trees, each consisting of a
single clique, and an empty set S.
For each distinct pair of cliques X and Y
Create a candidate sepset SXY X?Y, with
backpointers to X and Y.
Insert SXY to S.
Repeat until n?1 sepsets have been inserted into
the forest.
Select a sepset SXY from S, according to the
criterion described in the next slide. Delete SXY
from S.
Insert SXY between cliques X and Y only if X and
Y are on different trees in the forest.

55
Building an Optimal Join Tree
Graphical Transformation
56
Bayesian Networks

Inference by Propagation
???????
?????????

57
Inferences
Inference without evidence
Inference with evidence
PPTC Probability Propagation in Tree of Cliques.
58
Inference without Evidence
Demo
59
Procedure for PPTC without Evidence
Belief Network
Building Graphic Component
Graphical Transformation
Join Tree Structure
Building Numeric Component
Initialization
Inconsistent Join Tree
Propagation
Consistent Join Tree
Marginalization
60
Initialization

For each cluster and sepset X, set each ?X(x) to
1
For each variable V
Assign to V a cluster X that contains FV call X
the parent cluster of FV.
Multiply ?X(x) by P(V ?V).

61
Initialization
62
Initialization
By independence assertions
N clusters Q variables
63
Initialization
By independence assertions
N clusters Q variables
After initialization, global consistency is
satisfied, but local consistency is not.
64
Global Propagation
It is used to achieve local consistency.
Lets consider single message passing first.
Message Passing
Absorption on receiving cluster
Projection on sepset
65
The Effect of Single Message Passing
Message Passing
Absorption on receiving cluster
Projection on sepset
66
Global Propagation

Choose an arbitrary cluster X.
Unmark all clusters. Call Ingoing-Propagation(X).
Unmark all clusters. Call Outgoing-Propagation(X).

67
Global Propagation

Choose an arbitrary cluster X.
Unmark all clusters. Call Ingoing-Propagation(X).
Unmark all clusters. Call Outgoing-Propagation(X).

Ingoing-Propagation(X)
Mark X.
Call Ingoing-Propagation recursively on Xs
unmarked neighboring clusters, if any.
Pass a message from X to the cluster which
invoked Ingoing-Propagation(X).

Outgoing-Propagation(X)
Mark X.
Pass a message from X to each of its unmarked
clusters, if any.
Call Outgoing-Propagation recursively on Xs
unmarked neighboring clusters, if any.

1
3
5
After global propagation, The clique tree is both
global and local consistent.
8
6
9
2
7
10
4
68
Marginalization
Consistent Join Tree
69
ReviewProcedure for PPTC without Evidence
Belief Network
Building Graphic Component
Graphical Transformation
Join Tree Structure
Building Numeric Component
Initialization
Inconsistent Join Tree
Propagation
Consistent Join Tree
Marginalization
70
Inference with Evidence
Demo
71
Observations

Observations are the simplest forms of evidences.
An observations is a statement of the form V v.
Collections of observations may be denoted by
Observations are referred to as hard evidence.

E e
An instantiation of a set of variable E.
72
Likelihoods
Given E e, the likelihood of V, denoted as ?V,
is defined as
73
Likelihoods
Variable V ?V(v) ?V(v)
Variable V v on v off
A 1 1
B 1 1
C 1 0
D 0 1
E 1 1
F 1 1
G 1 1
H 1 1
on
off
74
Procedure for PPTC with Evidence

Initialization
Observation Entry

Marginalization
Normalization

75
Initialization with Observations

Set each likelihood element ?V(v) to 1

76
Observation Entry

Encode the observation V v as
Identify a cluster X that contains V.
Update ?X and ?V

77
Marginalization
After global propagation,
78
Normalization
After global propagation,
Normalization
79
Handling Dynamic Observations
e2
e1
How to handle the consistency if the observation
is changed to e2?
Suppose that the join tree now is consistent for
e1.
80
Observation States
e2
e1
Three observation states for a variable, say, V

No change
Update
Retraction

V is unobserved ? observed
V is observed ? unobserved
or
V v1 ? V v2 , v1 ? v2
81
Handling Dynamic Observations
Global Update
Global Retraction
When?
When?

Write a Comment

User Comments (0)

About PowerShow.com

Bayesian Networks PowerPoint PPT Presentation