Bayesian Statistics and Belief Networks - PowerPoint PPT Presentation

About This Presentation
Title:

Bayesian Statistics and Belief Networks

Description:

Bayesian methods are capable of handling noisy, incomplete data sets ... Russel, S. and Norvig, P. (1995). Artificial Intelligence, A Modern Approach. Prentice Hall. ... – PowerPoint PPT presentation

Number of Views:91
Avg rating:3.0/5.0
Slides: 26
Provided by: Kenric7
Category:

less

Transcript and Presenter's Notes

Title: Bayesian Statistics and Belief Networks


1
Bayesian Statistics and Belief Networks
2
Overview
  • Book Ch 8.3
  • Refresher on Bayesian statistics
  • Bayesian classifiers
  • Belief Networks / Bayesian Networks

3
Why Should We Care?
  • Theoretical framework for machine learning,
    classification, knowledge representation,
    analysis
  • Bayesian methods are capable of handling noisy,
    incomplete data sets
  • Bayesian methods are commonly in use today

4
Bayesian Approach To Probability and Statistics
  • Classical Probability Physical property of the
    world (e.g., 50 flip of a fair coin). True
    probability.
  • Bayesian Probability A persons degree of
    belief in event X. Personal probability.
  • Unlike classical probability, Bayesian
    probabilities benefit from but do not require
    repeated trials - only focus on next event e.g.
    probability Seawolves win next game?

5
Bayes Rule
Product Rule
Equating Sides
i.e.
All classification methods can be seen as
estimates of Bayes Rule, with different
techniques to estimate P(evidenceClass).
6
Simple Bayes Rule Example
Probability your computer has a virus, V,
1/1000. If virused, the probability of a crash
that day, C, 4/5. Probability your computer
crashes in one day, C, 1/10.
P(CV)0.8
P(V)1/1000
P(C)1/10
Even though a crash is a strong indicator of a
virus, we expect only 8/1000 crashes to be caused
by viruses.
Why not compute P(VC) from direct evidence?
Causal vs. Diagnostic knowledge (consider if
P(C) suddenly drops).
7
Bayesian Classifiers
If were selecting the single most likely class,
we only need to find the class that maximizes
P(eClass)P(Class).
Hard part is estimating P(eClass).
Evidence e typically consists of a set of
observations
Usual simplifying assumption is conditional
independence
8
Bayesian Classifier Example
Probability CVirus CBad Disk
P(C) 0.4 0.6
P(crashesC) 0.1 0.2
P(diskfullC) 0.6 0.1
Given a case where the disk is full and computer
crashes, the classifier chooses Virus as most
likely since (0.4)(0.1)(0.6) gt (0.6)(0.2)(0.1).
9
Beyond Conditional Independence
Linear Classifier
C1
C2
  • Include second-order dependencies i.e. pairwise
    combination of variables via joint probabilities

Correction factor -
Difficult to compute -
joint probabilities to consider
10
Belief Networks
  • DAG that represents the dependencies between
    variables and specifies the joint probability
    distribution
  • Random variables make up the nodes
  • Directed links represent causal direct influences
  • Each node has a conditional probability table
    quantifying the effects from the parents
  • No directed cycles

11
Burglary Alarm Example
P(B)
P(E)
Burglary
Earthquake
0.001
0.002
B E P(A)
T T 0.95
Alarm
T F 0.94
F T 0.29
F F 0.001
A P(J)
A P(M)
John Calls
Mary Calls
T 0.70
T 0.90
F 0.01
F 0.05
12
Sample Bayesian Network
13
Using The Belief Network
P(B)
P(E)
Burglary
Earthquake
0.002
0.001
B E P(A)
T T 0.95
Alarm
T F 0.94
F T 0.29
F F 0.001
A P(M)
John Calls
Mary Calls
T 0.70
A P(J)
F 0.01
T 0.90
F 0.05
Probability of alarm, no burglary or earthquake,
both John and Mary call
14
Belief Computations
  • Two types both are NP-Hard
  • Belief Revision
  • Model explanatory/diagnostic tasks
  • Given evidence, what is the most likely
    hypothesis to explain the evidence?
  • Also called abductive reasoning
  • Belief Updating
  • Queries
  • Given evidence, what is the probability of some
    other random variable occurring?

15
Belief Revision
  • Given some evidence variables, find the state of
    all other variables that maximize the
    probability.
  • E.g. We know John Calls, but not Mary. What is
    the most likely state? Only consider assignments
    where JT and MF, and maximize. Best

16
Belief Updating
  • Causal Inferences
  • Diagnostic Inferences
  • Intercausal Inferences
  • Mixed Inferences

E
Q
Q
E
Q
E
E
E
Q
17
Causal Inferences
P(B)
P(E)
Burglary
Earthquake
Inference from cause to effect. E.g. Given a
burglary, what is P(JB)?
0.002
0.001
B E P(A)
T T 0.95
Alarm
T F 0.94
F T 0.29
F F 0.001
A P(M)
John Calls
Mary Calls
T 0.70
A P(J)
F 0.01
T 0.90
F 0.05
P(MB)0.67 via similar calculations
18
Diagnostic Inferences
From effect to cause. E.g. Given that John calls,
what is the P(burglary)?
What is P(J)? Need P(A) first
Many false positives.
19
Intercausal Inferences
Explaining Away Inferences.
Given an alarm, P(BA)0.37. But if we add the
evidence that earthquake is true, then
P(BAE)0.003.
Even though B and E are independent, the presence
of one may make the other more/less likely.
20
Mixed Inferences
Simultaneous intercausal and diagnostic inference.
E.g., if John calls and Earthquake is false
Computing these values exactly is somewhat
complicated.
21
Exact Computation - Polytree Algorithm
  • Judea Pearl, 1982
  • Only works on singly-connected networks - at most
    one undirected path between any two nodes.
  • Backward-chaining Message-passing algorithm for
    computing posterior probabilities for query node
    X
  • Compute causal support for X, evidence variables
    above X
  • Compute evidential support for X, evidence
    variables below X

22
Polytree Computation
...
U(1)
U(m)
X
Z(1,j)
Z(n,j)
...
Y(1)
Y(n)
Algorithm recursive, message passing chain
23
Other Query Methods
  • Exact Algorithms
  • Clustering
  • Cluster nodes to make single cluster,
    message-pass along that cluster
  • Symbolic Probabilistic Inference
  • Uses d-separation to find expressions to combine
  • Approximate Algorithms
  • Select sampling distribution, conduct trials
    sampling from root to evidence nodes,
    accumulating weight for each node. Still
    tractable for dense networks.
  • Forward Simulation
  • Stochastic Simulation

24
Summary
  • Bayesian methods provide sound theory and
    framework for implementation of classifiers
  • Bayesian networks a natural way to represent
    conditional independence information.
    Qualitative info in links, quantitative in
    tables.
  • NP-complete or NP-hard to compute exact values
    typical to make simplifying assumptions or
    approximate methods.
  • Many Bayesian tools and systems exist

25
References
  • Russel, S. and Norvig, P. (1995). Artificial
    Intelligence, A Modern Approach. Prentice Hall.
  • Weiss, S. and Kulikowski, C. (1991). Computer
    Systems That Learn. Morgan Kaufman.
  • Heckerman, D. (1996). A Tutorial on Learning
    with Bayesian Networks. Microsoft Technical
    Report MSR-TR-95-06.
  • Internet Resources on Bayesian Networks and
    Machine Learning http//www.cs.orst.edu/wangxi/r
    esource.html
Write a Comment
User Comments (0)
About PowerShow.com