Introduction to Bayesian Belief Nets - PowerPoint PPT Presentation

About This Presentation

Title:

Introduction to Bayesian Belief Nets

Description:

Introduction to Bayesian Belief Nets Russ Greiner Dep t of Computing Science Alberta Ingenuity Centre for Machine Learning University of Alberta – PowerPoint PPT presentation

Number of Views:174

Avg rating:3.0/5.0

Slides: 64

Provided by: dobrofsky

Category:

more less

Transcript and Presenter's Notes

Title: Introduction to Bayesian Belief Nets

1
Introduction toBayesian Belief Nets

Russ Greiner
Dept of Computing Science
Alberta Ingenuity Centre for Machine Learning
University of Alberta
http//www.cs.ualberta.ca/greiner/bn.html

2
1996
1990
1980
3

4
Motivation

Gates says LATimes, 28/Oct/96
Microsofts competitive advantages is its
expertise in Bayesian networks

Current Products
Microsoft Pregnancy and Child Care (MSN)
Answer Wizard (Office, )
Print Troubleshooter
Excel Workbook Troubleshooter
Office 95 Setup Media Troubleshooter
Windows NT 4.0 Video Troubleshooter
Word Mail Merge Troubleshooter

5
Motivation (II)

US Army SAIP (Battalion Detection from SAR, IR
GulfWar)

NASA Vista (DSS for Space Shuttle)
GE Gems (real-time monitor for utility
generators)
Intel (infer possible processing problems from
end-of-line tests on semiconductor chips)
KIC
medical sleep disorders, pathology, trauma care,
hand and wrist evaluations, dermatology,
home-based health evaluations
DSS for capital equipment locomotives,
gas-turbine engines, office equipment

6
Motivation (III)

Lymph-node pathology diagnosis
Manufacturing control
Software diagnosis
Information retrieval
Types of tasks
Classification/Regression
Sensor Fusion
Prediction/Forecasting

7
Outline

Existing uses of Belief Nets (BNs)
How to reason with BNs
Specific Examples of BNs
Contrast with Rules, Neural Nets,
Possible applications of BNs
Challenges
How to reason efficiently
How to learn BNs

8

9
Objectives Decision Support System

Determine
which tests to perform
which repair to suggest
based on costs, sensitivity/specificity,

Use all sources of information
symbolic (discrete observations, history, )
signal (from sensors)
Handle partial information
Adapt to track fault distribution

10
Underlying Task

Situation Given observations O1v1, Okvk
(symptoms, history, test results, )
what is best DIAGNOSIS Dxi for patient?

Seldom Completely Certain

11
Underlying Task, II

Situation Given observations O1v1, Okvk
(symptoms, history, test results, )
what is best DIAGNOSIS Dxi for patient?

Challenge How to express Probabilities?

12
How to deal with Probabilities

Sufficient atomic events
for all 21N values u ? T, F, vj ?T, F

P( Dx u, O1v1,..., Ok vk,, ONvN )

But even if binary Dx, 20 binary obs.s. ?
gt2,097,000 numbers!

13
Problems with Atomic Events

Representation is not intuitive
? Should make connections explicit
use local information

P(Jaundice Hepatitis), P(LightDim
BadBattery),

Too many numbers O(2N)
Hard to store
Hard to use
Must add 2r values to marginalize r
variables
Hard to learn
Takes O(2N) samples to learn 2N parameters
? Include only necessary connections

14

15
Hepatitis Example
H Hepatitis J Jaundice B (positive) Blood
test

(Boolean) Variables

Want P( H1 J0, B1 )
, P(H1 B1, J1), P(H1 B0,J0),

Alternatively

16
Encoding Causal Links

Simple Belief Net

Node Variable
Link Causal dependency

CPTable P(child parents)

17
Encoding Causal Links
P(H1)
0.05
H
h P(B1 Hh)
1 0.95
0 0.03
h b P(J1h , b )
1 1 0.8
1 0 0.8
0 1 0.3
0 0 0.3
B
J

P(J H, B0) P(J H, B1) ? J, H ! ?
P( J H, B) P(J H)
J is INDEPENDENT of B, once we know H
Dont need B? J arc!

18
Encoding Causal Links
P(H1)
0.05
H
h P(B1 Hh)
1 0.95
0 0.03
h P(J1h )
1 0.8
1
0 0.3
0
B
J

P(J H, B0) P(J H, B1) ? J, H ! ?
P( J H, B) P(J H)
J is INDEPENDENT of B, once we know H
Dont need B? J arc!

19
Encoding Causal Links
P(H1)
0.05
H
h P(B1 Hh)
1 0.95
0 0.03
h P(J1h )
1 0.8
0 0.3
B
J

P(J H, B0) P(J H, B1) ? J, H ! ?
P( J H, B) P(J H)
J is INDEPENDENT of B, once we know H
Dont need B? J arc!

20
Sufficient Belief Net
P(H1)
0.05
h P(B1 Hh)
1 0.95
0 0.03
h P(J1h )
1 0.8
0 0.3

Requires P(H1) known
P(J1 H1) known
P(B1 H1) known
(Only 5 parameters, not 7)

21
Factoring

B does depend on J
If J1, then likely that H1 ? B 1

N.b., B and J ARE correlated a priori P(J B
) ? P(J) GIVEN H, they become uncorrelated
P(J B, H) P(J H)
22
Factored Distribution

Symptoms independent, given Disease

ReadingAbility and ShoeSize are dependent,
P(ReadAbility ShoeSize ) ? P(ReadAbility )
but become independent, given Age
P(ReadAbility ShoeSize, Age ) P(ReadAbility
Age)

23
Naïve Bayes

Classification Task
Given O1 v1, , On vn
Find hi that maximizes (H hi O1 v1,
, On vn)

Find argmax hi

24
Naïve Bayes (cont)

Normalizing term
(No need to compute, as same for all hi)

Easy to use for Classification

Can use even if some vjs not specified

25
Bigger Networks
P(I1)
0.20
P(H1)
0.32
g lt P(H1g ,lt )
1 1 0.82
1 0 0.10
0 1 0.45
0 0 0.04
h P(J1 h )
1 0.8
0 0.3
h P(B1 h )
1 0.98
0 0.01

Intuition Show CAUSAL connections
GeneticPH CAUSES Hepatitis Hepatitis CAUSES
Jaundice

26
Belief Nets

DAG structure
Each node ? Variable v
v depends (only) on its parents
conditional prob P(vi parenti ?0,1,?
)

v is INDEPENDENT of non-descendants,
given assignments to its parents

27
Less Trivial Situations

N.b., obs1 is not always independent of obs2
given H
Eg, FamilyHistoryDepression causes
MotherSuicide and Depression
MotherSuicide causes Depression (w/ or w/o
F.H.Depression)

Here, P( D MS, FHD ) ? P( D FHD ) !

Can be done using Belief Network,
but need to specify
P( FHD ) 1
P( MS FHD ) 2
P( D MS, FHD ) 4

28
Example Car Diagnosis
29
MammoNet
30
ALARM

A Logical Alarm Reduction Mechanism
8 diagnoses, 16 findings,

31
Troup Detection
32
ARCO1 Forecasting Oil Prices
33
ARCO1 Forecasting Oil Prices
34
Forecasting Potato Production
35
Warning System
36
Extensions

Find best values (posterior distr.) for
SEVERAL (gt 1) output variables
Partial specification of input values
only subset of variables
only distribution of each input variable
General Variables
Discrete, but domain gt 2
Continuous (Gaussian x ?i bi yi for
parents Y )
Decision Theory ? Decision Nets (Influence
Diagrams) Making Decisions, not just assigning
probs
Storing P( v p1, p2,,pk) General CP Tables
0(2k) Noisy-Or, Noisy-And, Noisy-Max Decision
Trees

37
Outline

Existing uses of Belief Nets (BNs)
How to reason with BNs
Specific Examples of BNs
Contrast with Rules, Neural Nets,
Possible applications of BNs
Challenges
How to reason efficiently
How to learn BNs

38
Belief Nets vs Rules

Both have Locality
Specific clusters (rules / connected nodes)

WHY? Easier for people to reason CAUSALLY even
if use is DIAGNOSTIC

BN provide OPTIMAL way to deal with
Uncertainty
Vagueness (var not given, or only dist)
Error
Signals meeting Symbols

BN permits different directions of inference

39
Belief Nets vs Neural Nets

Both have graph structure but

BN Nodes have SEMANTICs
Combination Rules Sound Probability
NN Nodes arbitrary
Combination Rules Arbitrary

So harder to
Initialize NN
Explain NN
(But perhaps easier to learn NN from examples
only?)

BNs can deal with
Partial Information
Different directions of inference

40
Belief Nets vs Markov Nets

Each uses graph structure
to FACTOR a distribution
explicitly specify dependencies, implicitly
independencies

but subtle differences
BNs capture causality, hierarchies
MNs capture temporality

41
Uses of Belief Nets 1

Medical Diagnosis Assist/Critique MD
identify diseases not ruled-out
specify additional tests to perform
suggest treatments appropriate/cost-effective
react to MDs proposed treatment

Decision Support Find/repair faults in complex
machines
Device, or Manufacturing Plant, or
based on sensors, recorded info, history,

Preventative Maintenance
Anticipate problems in complex machines
Device, or Manufacturing Plant, or
based on sensors, statistics, recorded info,
device history,

42
Uses (cont)

Logistics Support Stock warehouses
appropriatelybased on (estimated) freq. of
needs, costs,
Diagnose Software Find most probable bugs,
given program behavior, core dump, source code,
Part Inspection/Classification based on
multiple sensors, background, model of
production,
Information Retrieval Combine information from
various sources, based on info from various
agents,

General Partial Info, Sensor fusion -Classificati
on -Interpretation -Prediction -
43
Challenge 1Computational Efficiency
For given BN General problem is Given
Compute If BN is poly tree, ? efficient
alg. - If BN is genl DAG (gt1 path from X to
Y) - NP-hard in theory - slow in
practice Tricks Get approximate answer
(quickly) Use abstraction of BN Use
abstraction of query (range)
O1 v1, , On vn
D
I
P(H O1 v1, , On vn)
H
J
B
44
2aObtaining Accurate BN

BN encodes distribution over n variables
Not O(2n) values, but only ?i 2k_i
(Node ni binary, with ki parents)
Still lots of values! structure ..
? Qualitative Information
Structure What depends on what?
Easy for people (background knowledge)
But NP-hard to learn from samples
? Quantitative Information
Actual CP-tables
Easy to learn, given lots of examples.
But people have hard time

Knowledge acquisition from human experts
Simple learning algorithm
45
Notes on Learning

Mixed Sources Person provides structure
Algorithm fills-in numbers.

Just Human Expert People produce CP-table, as
well as structure
Relatively few values really required
Esp. if NoisyOr, NoisyAnd, NaiveBayes,
Actual values not that important
Sensitivity studies

46
My Current Work

Learning Belief Nets
Model selection
Challenging myth that MDL is appropriate criteria
Learning performance system, not model
Validating Belief Nets
Error bars around answers
Adaptive User Interfaces
Efficient Vision Systems
Foundations of Learnability
Learning Active Classifiers
Sequential learners
Condition Based maintenance, Bio-signal
interpretation,

47
2b Maintaining Accurate BN

The world changes.
Information in BN may be
perfect at time t
sub-optimal at time t 20
worthless at time t 200

Need to MAINTAIN a BN over time
using on-going human consultant

Adaptive BN
Dirichlet distribution (variables)
Priors over BNs

48
Conclusions

Provide effective way to
Represent complicated, inter-related events
Reason about such situations
Diagnosis, Explanation, ValueOfInfo
Explain conclusions
Mix Symbolic and Numeric observations

Challenges
Efficient ways to use BNs
Ways to create BNs
Ways to maintain BNs
Reason about time

49
Extra Slides

AI Seminar
Friday, noon, CSC3-33
Free PIZZA!
http//www.cs.ualberta.ca/ai/ai-seminar.html
References
http//www.cs.ualberta.ca/greiner/bn.html
Crusher Controller
Formal Framework
Decision Nets
Developing the Model
Why Reasoning is Hard
Learning Accurate Belief Nets

50
References

http//www.cs.ualberta.ca/greiner/bn.html
Overview textbooks
Judea Pearl, Probabilistic Reasoning in
Intelligent Systems Networks of Plausible
Inference, Morgan Kaufmann, 1988.
Stuart Russell and Peter Norvig, Artificial
Intelligence A Modern Approach, Prentice Hall,
1995. (See esp Ch 14, 15, 19.)
General info re BayesNets
http//www.afit.af.mil80/Schools/EN/ENG/LABS/AI/B
ayesianNetworks
Proceedings http//www.sis.pitt.edu/dsl/uai.html
Assoc for Uncertainty in AI http//www.auai.org/
Learning
David Heckerman, A tutorial on learning with
Bayesian networks, 1995,
http//www.research.microsoft.com/research/dtg/he
ckerma/TR-95-06.htm
Software
General http//bayes.stat.washington.edu/almon
d/belief.html
JavaBayes http//www.cs.cmu.edu/fgcozman/Research
/JavaBaye
Norsys http//www.norsys.com/

51
Decision Net Test/Buy a Car
52
Utility Decision Nets

Given c( action, state) ? R (cost function)
Cp(a) Es c(a,s) ? s?S p(s obs) c(a, s)
Best (immediate) action a argmina ?A Cp(a)
Decision Net (like Belief Net) but
3 types of nodes
chance (like Belief net)
action repair, sensing
cost/utility
Links for dependency
Given observations, obs, computes best action, a
Sequence of Actions MDPs, POMDPs,

Go Back
53
Decision Net Drill for Oil?
Go Back
54
Formal Framework

Always true
P(x1, ,xn) P(x1) P(x2 x1) P (x3 x2,
x1) P (xn xn-1,,x1)
Given independencies,
P(xk x1,,xk-1) P (xk pak) for some
pak ?x1, , xk-1
Hence
So just connect each y ? pai to xi ? DAG
structure

Note -Size of BN is
.
so better to use small
pai. -pai 1,,i 1 is never incorrect
but seldom minl (so hard to store,
learn, reason with,) - Order of variables can
make HUGE difference Can have pai 1
for one ordering pai i 1 for another
Go Back
55
Developing the Model

Source of information
(Human) Expert (s)
Data from earlier Runs
Simulator

Typical Process
1. Develop / Refine Initial Prototype
2. Test Prototype ? Accurate System
3. Deploy System
4. Update / Maintain System

56
Develop/Refine Prototype

Requires expert
useful to have data

Initial Interview(s)
To establish what relates to what
Expert time ½ - day

Iterative process (Gradual refinement)
To refine qualitative connections
To establish correct operations

Expert presents Good Performance KE implements
Experts claims KE tests on examples (real data
or expert), and reports to Expert
Expert time 1 2 hours / week for ??
Weeks (Depends on complexity of device, and
accuracy of model)
Go Back
57
Why Reasoning is Hard