Bayesian Networks - PowerPoint PPT Presentation

1 / 81
About This Presentation
Title:

Bayesian Networks

Description:

Title: Bayesian Networks Author: Yue Tai-Wen Last modified by: Tai-Wen Yue Created Date: 7/27/2002 12:56:06 PM Document presentation format: – PowerPoint PPT presentation

Number of Views:672
Avg rating:3.0/5.0
Slides: 82
Provided by: YueTa4
Category:

less

Transcript and Presenter's Notes

Title: Bayesian Networks


1
Bayesian Networks
  • ??????
  • ???????
  • ?????????

2
Contents
  • Introduction
  • Probability Theory ? Skip
  • Inference
  • Clique Tree Propagation
  • Building the Clique Tree
  • Inference by Propagation

3
Bayesian Networks
  • Introduction
  • ???????
  • ?????????

4
What is Bayesian Networks?
  • Bayesian Networks are directed acyclic graphs
    (DAGs) with an associated set of probability
    tables.
  • The nodes are random variables.
  • Certain independence relations can be induced by
    the topology of the graph.

5
Why Use a Bayesian Network?
  • Deal with uncertainty in inference via
    probability ? Bayes.
  • Handle incomplete data set, e.g., classification,
    regression.
  • Model the domain knowledge, e.g., causal
    relationships.

6
Example
Use a DAG to model the causality.
Train Strike
Norman Oversleep
Martin Oversleep
Martin Late
Norman Late
Project Delay
Office Dirty
Boss Failure-in-Love
Boss Angry
7
Example
Attach prior probabilities to all root nodes
Norman oversleep Probability
T 0.2
F 0.8
Train Strike Probability
T 0.1
F 0.9
Martin oversleep Probability
T 0.01
F 0.99
Boss failure-in-love Probability
T 0.01
F 0.99
8
Example
Attach prior probabilities to non-root nodes
Each column is summed to 1.
Norman untidy
Train strike Train strike Train strike Train strike
T T F F
Martin oversleep Martin oversleep Martin oversleep Martin oversleep
T F T F
Martin Late T 0.95 0.8 0.7 0.05
Martin Late F 0.05 0.2 0.3 0.95
Norman oversleep Norman oversleep
T F
Norman untidy T 0.6 0.2
Norman untidy F 0.4 0.8
9
Example
Attach prior probabilities to non-root nodes
Boss Failure-in-love Boss Failure-in-love Boss Failure-in-love Boss Failure-in-love Boss Failure-in-love Boss Failure-in-love Boss Failure-in-love Boss Failure-in-love
T T T T F F F F
Project Delay Project Delay Project Delay Project Delay Project Delay Project Delay Project Delay Project Delay
T T F F T T F F
Office Dirty Office Dirty Office Dirty Office Dirty Office Dirty Office Dirty Office Dirty Office Dirty
T F T F T F T F
Boss Angry very 0.98 0.85 0.6 0.5 0.3 0.2 0 0.01
Boss Angry mid 0.02 0.15 0.3 0.25 0.5 0.5 0.2 0.02
Boss Angry little 0 0 0.1 0.25 0.2 0.3 0.7 0.07
Boss Angry no 0 0 0 0 0 0 0.1 0.9
Each column is summed to 1.
Norman untidy
What is the difference between probability
fuzzy measurements?
10
Example
Medical Knowledge
11
Definition of Bayesian Networks
  • A Bayesian network is a directed acyclic graph
    with
  • the following properties
  • Each node represents a random variable.
  • Each node representing a variable A with parent
    nodes representing variables B1, B2,..., Bn is
    assigned a conditional probability table (CPT)

12
Problems
  • How to inference?
  • How to learn the probabilities from data?
  • How to learn the structure from data?
  • What applications we may have?

Bad news All of them are NP-Hard
13
Bayesian Networks
  • Inference
  • ???????
  • ?????????

14
Inference
15
Example
Train Strike Probability
T 0.1
F 0.9
Train Strike Train Strike
T F
Norman Late T 0.8 0.1
Norman Late F 0.2 0.9
Train Strike Train Strike
T F
Martin Late T 0.6 0.5
Martin Late F 0.4 0.5
Questions
P (Martin Late, Norman Late, Train
Strike)?
Joint distribution
P(Martin Late)?
Marginal distribution
Conditional distribution
P(Matrin Late Norman Late )?
16
Example
A B C Probability
T T T 0.048
F T T 0.032
T F T 0.012
F F T 0.008
T T F 0.045
F T F 0.045
T F F 0.405
F F F 0.405
Demo
Train Strike Probability
T 0.1
F 0.9
C
Train Strike Train Strike
T F
Norman Late T 0.8 0.1
Norman Late F 0.2 0.9
Train Strike Train Strike
T F
Martin Late T 0.6 0.5
Martin Late F 0.4 0.5
B
A
Questions
P (Martin Late, Norman Late, Train
Strike)?
Joint distribution
e.g.,
17
Example
A B C Probability
T T T 0.048
F T T 0.032
T F T 0.012
F F T 0.008
T T F 0.045
F T F 0.045
T F F 0.405
F F F 0.405
A B Probability
T T 0.093
F T 0.077
T F 0.417
F F 0.413
Demo
Train Strike Probability
T 0.1
F 0.9
C
Train Strike Train Strike
T F
Norman Late T 0.8 0.1
Norman Late F 0.2 0.9
Train Strike Train Strike
T F
Martin Late T 0.6 0.5
Martin Late F 0.4 0.5
B
A
Questions
P (Martin Late, Norman Late)?
Marginal distribution
e.g.,
18
Example
A B C Probability
T T T 0.048
F T T 0.032
T F T 0.012
F F T 0.008
T T F 0.045
F T F 0.045
T F F 0.405
F F F 0.405
A B Probability
T T 0.093
F T 0.077
T F 0.417
F F 0.413
Train Strike Probability
T 0.1
F 0.9
C
Train Strike Train Strike
T F
Norman Late T 0.8 0.1
Norman Late F 0.2 0.9
Train Strike Train Strike
T F
Martin Late T 0.6 0.5
Martin Late F 0.4 0.5
B
A
A Probability
T 0.51
F 0.49
Demo
Questions
P (Martin Late)?
Marginal distribution
e.g.,
19
Example
A B C Probability
T T T 0.048
F T T 0.032
T F T 0.012
F F T 0.008
T T F 0.045
F T F 0.045
T F F 0.405
F F F 0.405
A B Probability
T T 0.093
F T 0.077
T F 0.417
F F 0.413
Train Strike Probability
T 0.1
F 0.9
C
Train Strike Train Strike
T F
Norman Late T 0.8 0.1
Norman Late F 0.2 0.9
Train Strike Train Strike
T F
Martin Late T 0.6 0.5
Martin Late F 0.4 0.5
B
A
A Probability
T 0.51
F 0.49
B Probability
T 0.17
F 0.83
Questions
P (Martin Late Norman Late )?
Conditional distribution
e.g.,
Demo
20
Inference Methods
  • Exact Algorithms
  • Probability propagation
  • Variable elimination
  • Cutset Conditioning
  • Dynamic Programming
  • Approximation Algorithms
  • Variational methods
  • Sampling (Monte Carlo) methods
  • Loopy belief propagation
  • Bounded cutset conditioning
  • Parametric approximation methods

21
Independence Assertions
The given terms are called evidences.
  • Bayesian Networks have build-in independent
    assertions.
  • An independence assertion is a statement of the
    form
  • X and Y are independent given Z
  • We called that X and Y are d-separated by Z.

That is,
or
22
d-Separation
Z
23
Type of Connections
Serial Connections
Yi Z Xj
Converge Connections
Y1/2 Z Y3/4
Z
Y3 Z Y4
Diverge Connections
Xi Z Xj
24
d-Separation
Serial
Converge
Diverge
25
Joint Distribution
JPT Joint probability table
CPT Conditional probability table
? With this, we can compute all probabilities
By chain rule
By independence assertions
Parents of Xi
Consider binary random variables
  1. To store JPT of all r.vs 2n ?1 table entries
  2. To store CPT of all r.vs ? table entries

26
Joint Distribution

Consider binary random variables
  1. To store JPT of all r.vs 2n ?1 table entries
  2. To store CPT of all r.vs ? table entries

27
Joint Distribution
To store JPT of all random variables
To store CPT of all random variables
28
More on d-Separation
A path from X to Y is d-connecting w.r.t evidence
nodes E is every interior nodes N in the path has
the property that either
  1. It is linear or diverge and not a member of E or
  2. It is converging, and either N or one of its
    descendants is in E.

29
More on d-Separation
Identify the d-connecting and non-d-connecting
paths from X to Y.
A path from X to Y is d-connecting w.r.t evidence
nodes E is every interior nodes N in the path has
the property that either
  1. It is linear or diverge and not a member of E or
  2. It is converging, and either N or one of its
    descendants is in E.

30
More on d-Separation
Two nodes are d-separated if there is no
d-connecting path between them.
Exercise
Withdraw minimum number of edges such that X and
Y are d-separated.
31
More on d-Separation
Two set of nodes, say, XX1, , Xm and YY1,
, Yn are d-separated w.r.t. evidence nodes E if
any pair of Xi and Yj are d-separated w.r.t. E.
In this case, we have
32
Bayesian Networks
  • Clique Tree Propagation
  • ???????
  • ?????????

33
References
  • Developed by Lauritzen and Spiegelhalter and
    refined by Jensen et al.

Lauritzen, S. L., and Spiegelhalter, D. J., Local
computations with probabilities on graphical
structures and their application to expert
systems, J. Roy. Stat. Soc. B, 50, 157-224,
1988. Jensen, F. V., Lauritzen, S. L., and
Olesen, K. G., Bayesian updating in causal
probabilistic networks by local computations,
Comp. Stat. Quart., 4, 269-282, 1990. Shenoy,
P., and Shafer, G., Axioms for probability and
belief-function propagation, in Uncertainty and
Articial Intelligence, Vol. 4 (R. D. Shachter, T.
Levitt, J. F. Lemmer and L. N. Kanal, Eds.),
Elsevier, North-Holland, Amsterdam, 169-198, 1990.
34
Clique Tree Propagation (CTP)
  • Given a Bayesian Network, build a secondary
    structure, called clique tree.
  • An undirected tree
  • Inference by propagation the belief potential
    among tree nodes.
  • It is an exact algorithm.

35
Notations
Item Notation Notation Examples
Random variables uninitiated uppercase A, B, C
Random variables initiated lowercase a, b, c
Random vectors uninitiated Boldface uppercase X, Y, Z
Random vectors initiated Boldface lowercase x, y, z
36
Definition Family of a Node
The family of a node V, denoted as FV, is defined
by
Examples
37
Potential and Distributions
We will model the probability tables as potential
functions.
a P(a)
on 0.5
off 0.5
Function of a.
All of these tables map a set of random variables
to a real value.
Prior probability
b b a a
b b on off
P(b a) on 0.7 0.2
P(b a) off 0.3 0.8
Conditional probability
Conditional probability
f f d d d d
f f on on off off
f f e e e e
f f on off on off
P(f de) on 0.95 0.8 0.7 0.05
P(f de) off 0.05 0.2 0.3 0.95
Function of a and b.
Function of d, e and f.
38
Potential
Used to implement matrices or tables.
Two operations
1. Marginalization 2. Multiplication
39
Marginalization
A B C ?ABC
T T T 0.048
F T T 0.032
T F T 0.012
F F T 0.008
T T F 0.045
F T F 0.045
T F F 0.405
F F F 0.405
Example
A B ?AB
T T 0.093
F T 0.077
T F 0.417
F F 0.413
A ?A
T 0.51
F 0.49
40
Multiplication
A B C ?ABC
T T T 0.093? 0.080.00744
F T T 0.077? 0.080.00616
T F T 0.417? 0.020.00834
F F T 0.413? 0.020.00826
T T F 0.093? 0.090.00837
F T F 0.077? 0.090.00693
T F F 0.417? 0.910.37947
F F F 0.413? 0.910.37583
Not necessary sum to one.
x and y are consistent with z.
Example
A B ?AB
T T 0.093
F T 0.077
T F 0.417
F F 0.413
B C ?AB
T T 0.08
F T 0.02
T F 0.09
F F 0.91
41
The Secondary Structure
Given a Bayesian Network over a set of
variables U V1, , Vn , its secondary
structure contains a graphical and a numerical
component.
Graphic Component
An undirected clique tree satisfies the join
tree property.
Numerical Component
Belief potentials on nodes and edges.
42
The Clique Tree T
How to build a clique tree?
The clique tree T for a belief network over a
set of variables U V1, , Vn satisfies the
following properties.
  • Each node in T is a cluster or clique (nonempty
    set) of variables.
  • The clusters satisfy the join tree property
  • Given two clusters X and Y in T, all clusters on
    the path between X and Y contain X??Y.
  • For each variable V?U, FV is included in at least
    one of the cluster.
  • Sepsets Each edge in T is labeled with the
    intersection of the adjacent clusters.

43
The Numeric Component
How to assign belief functions?
Clusters and sepsets are attached with belief
functions.
  • For each cluster X and neighboring sepset S, it
    holds that
  • It also holds that

Local Consistency
Global Consistency
44
The Numeric Component
How to assign belief functions?
Clusters and sepsets are attached with belief
functions.
The key step to satisfy these constraints by
letting
and
If so,
45
Bayesian Networks
  • Building the Clique Tree
  • ???????
  • ?????????

46
The Steps
Belief Network
Moral Graph
Triangulated Graph
Clique Set
Join Tree
47
Moral Graph
Belief Network
Moral Graph
  1. Convert the directed graph to undirected.
  2. Connect each pair of parent nodes for each node.

48
Triangulation
This step is, in fact, done by incorporating with
the next step.
Moral Graph
Triangulated Graph
  1. Triangulate the cycles with length more than 4

There are many ways.
49
Select Clique Set
  • Copy GM to GM.
  • While GM is not empty
  • select a node V from GM, according to a
    criterion.
  • Node V and its neighbor form a cluster.
  • Connect all the nodes in the cluster. For each
    edge added to GM, add the same edge to GM.
  • Remove V from GM.

50
Select Clique Set
  • Criterion
  • The weight of a node V is the number of values of
    V.
  • The weight of a cluster is the product of it
    constituent nodes.
  • Choose the node that causes the least number of
    edges to be added.
  • Breaking ties by choosing the node that induces
    the cluster with the smallest weight.
  • Copy GM to GM.
  • While GM is not empty
  • select a node V from GM, according to a
    criterion.
  • Node V and its neighbor form a cluster.
  • Connect all the nodes in the cluster. For each
    edge added to GM, add the same edge to GM.
  • Remove V from GM.

51
Select Clique Set
  • Criterion
  • The weight of a node V is the number of values of
    V.
  • The weight of a cluster is the product of it
    constituent nodes.
  • Choose the node that causes the least number of
    edges to be added.
  • Breaking ties by choosing the node that induces
    the cluster with the smallest weight.

52
Building an Optimal Join Tree
We need to find minimal number of edges to
connect these cliques, i.e. to build a tree.
Given n nodes to build a tree, n?1 edges are
required.
There are many ways.
How to achieve optimality?
53
Building an Optimal Join Tree
  • Begin with a set of n trees, each consisting of a
    single clique, and an empty set S.
  • For each distinct pair of cliques X and Y
  • Create a candidate sepset SXY X?Y, with
    backpointers to X and Y.
  • Insert SXY to S.
  • Repeat until n?1 sepsets have been inserted into
    the forest.
  • Select a sepset SXY from S, according to the
    criterion described in the next slide. Delete SXY
    from S.
  • Insert SXY between cliques X and Y only if X and
    Y are on different trees in the forest.

54
Building an Optimal Join Tree
  • Criterion
  • The mass of SXY is the number of nodes in X?Y.
  • The cost of SXY is the weight X plus the weight
    Y.
  • The weight of a node V is the number of values of
    V.
  • The weight of a set of nodes X is the product of
    it constituent nodes in X.
  • Choose the sepset with causes the largest mass.
  • Breaking ties by choosing the sepset with the
    smallest cost.
  • Begin with a set of n trees, each consisting of a
    single clique, and an empty set S.
  • For each distinct pair of cliques X and Y
  • Create a candidate sepset SXY X?Y, with
    backpointers to X and Y.
  • Insert SXY to S.
  • Repeat until n?1 sepsets have been inserted into
    the forest.
  • Select a sepset SXY from S, according to the
    criterion described in the next slide. Delete SXY
    from S.
  • Insert SXY between cliques X and Y only if X and
    Y are on different trees in the forest.

55
Building an Optimal Join Tree
Graphical Transformation
56
Bayesian Networks
  • Inference by Propagation
  • ???????
  • ?????????

57
Inferences
Inference without evidence
Inference with evidence
PPTC Probability Propagation in Tree of Cliques.
58
Inference without Evidence
Demo
59
Procedure for PPTC without Evidence
Belief Network
Building Graphic Component
Graphical Transformation
Join Tree Structure
Building Numeric Component
Initialization
Inconsistent Join Tree
Propagation
Consistent Join Tree
Marginalization
60
Initialization
  • For each cluster and sepset X, set each ?X(x) to
    1
  • For each variable V
  • Assign to V a cluster X that contains FV call X
    the parent cluster of FV.
  • Multiply ?X(x) by P(V ?V).

61
Initialization
62
Initialization
By independence assertions
N clusters Q variables
63
Initialization
By independence assertions
N clusters Q variables
After initialization, global consistency is
satisfied, but local consistency is not.
64
Global Propagation
It is used to achieve local consistency.
Lets consider single message passing first.
Message Passing
Absorption on receiving cluster
Projection on sepset
65
The Effect of Single Message Passing
Message Passing
Absorption on receiving cluster
Projection on sepset
66
Global Propagation
  1. Choose an arbitrary cluster X.
  2. Unmark all clusters. Call Ingoing-Propagation(X).
  3. Unmark all clusters. Call Outgoing-Propagation(X).

67
Global Propagation
  1. Choose an arbitrary cluster X.
  2. Unmark all clusters. Call Ingoing-Propagation(X).
  3. Unmark all clusters. Call Outgoing-Propagation(X).
  • Ingoing-Propagation(X)
  • Mark X.
  • Call Ingoing-Propagation recursively on Xs
    unmarked neighboring clusters, if any.
  • Pass a message from X to the cluster which
    invoked Ingoing-Propagation(X).
  • Outgoing-Propagation(X)
  • Mark X.
  • Pass a message from X to each of its unmarked
    clusters, if any.
  • Call Outgoing-Propagation recursively on Xs
    unmarked neighboring clusters, if any.

1
3
5
After global propagation, The clique tree is both
global and local consistent.
8
6
9
2
7
10
4
68
Marginalization
Consistent Join Tree
69
ReviewProcedure for PPTC without Evidence
Belief Network
Building Graphic Component
Graphical Transformation
Join Tree Structure
Building Numeric Component
Initialization
Inconsistent Join Tree
Propagation
Consistent Join Tree
Marginalization
70
Inference with Evidence
Demo
71
Observations
  • Observations are the simplest forms of evidences.
  • An observations is a statement of the form V v.
  • Collections of observations may be denoted by
  • Observations are referred to as hard evidence.

E e
An instantiation of a set of variable E.
72
Likelihoods
Given E e, the likelihood of V, denoted as ?V,
is defined as
73
Likelihoods
Variable V ?V(v) ?V(v)
Variable V v on v off
A 1 1
B 1 1
C 1 0
D 0 1
E 1 1
F 1 1
G 1 1
H 1 1
on
off
74
Procedure for PPTC with Evidence
  1. Initialization
  2. Observation Entry
  1. Marginalization
  2. Normalization

75
Initialization with Observations
  1. Set each likelihood element ?V(v) to 1

76
Observation Entry
  1. Encode the observation V v as
  2. Identify a cluster X that contains V.
  3. Update ?X and ?V

77
Marginalization
After global propagation,
78
Normalization
After global propagation,
Normalization
79
Handling Dynamic Observations
e2
e1
How to handle the consistency if the observation
is changed to e2?
Suppose that the join tree now is consistent for
e1.
80
Observation States
e2
e1
Three observation states for a variable, say, V
  1. No change
  2. Update
  3. Retraction

V is unobserved ? observed
V is observed ? unobserved
or
V v1 ? V v2 , v1 ? v2
81
Handling Dynamic Observations
Global Update
Global Retraction
When?
When?
Write a Comment
User Comments (0)
About PowerShow.com