Introduction to Bayesian Belief Nets - PowerPoint PPT Presentation

About This Presentation
Title:

Introduction to Bayesian Belief Nets

Description:

Introduction to Bayesian Belief Nets Russ Greiner Dep t of Computing Science Alberta Ingenuity Centre for Machine Learning University of Alberta – PowerPoint PPT presentation

Number of Views:174
Avg rating:3.0/5.0
Slides: 64
Provided by: dobrofsky
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Bayesian Belief Nets


1
Introduction toBayesian Belief Nets
  • Russ Greiner
  • Dept of Computing Science
  • Alberta Ingenuity Centre for Machine Learning
  • University of Alberta
  • http//www.cs.ualberta.ca/greiner/bn.html

2
1996
1990
1980
3
 
 
4
Motivation
  • Gates says LATimes, 28/Oct/96
  • Microsofts competitive advantages is its
    expertise in Bayesian networks
  • Current Products
  • Microsoft Pregnancy and Child Care (MSN)
  • Answer Wizard (Office, )
  • Print Troubleshooter
  • Excel Workbook Troubleshooter
  • Office 95 Setup Media Troubleshooter
  • Windows NT 4.0 Video Troubleshooter
  • Word Mail Merge Troubleshooter

5
Motivation (II)
  • US Army SAIP (Battalion Detection from SAR, IR
    GulfWar)
  • NASA Vista (DSS for Space Shuttle)
  • GE Gems (real-time monitor for utility
    generators)
  • Intel (infer possible processing problems from
    end-of-line tests on semiconductor chips)
  • KIC
  • medical sleep disorders, pathology, trauma care,
    hand and wrist evaluations, dermatology,
    home-based health evaluations
  • DSS for capital equipment locomotives,
    gas-turbine engines, office equipment

6
Motivation (III)
  • Lymph-node pathology diagnosis
  • Manufacturing control
  • Software diagnosis
  • Information retrieval
  • Types of tasks
  • Classification/Regression
  • Sensor Fusion
  • Prediction/Forecasting

7
Outline
  • Existing uses of Belief Nets (BNs)
  • How to reason with BNs
  • Specific Examples of BNs
  • Contrast with Rules, Neural Nets,
  • Possible applications of BNs
  • Challenges
  • How to reason efficiently
  • How to learn BNs

8

9
Objectives Decision Support System
  • Determine
  • which tests to perform
  • which repair to suggest
  • based on costs, sensitivity/specificity,
  • Use all sources of information
  • symbolic (discrete observations, history, )
  • signal (from sensors)
  • Handle partial information
  • Adapt to track fault distribution

10
Underlying Task
  • Situation Given observations O1v1, Okvk
  • (symptoms, history, test results, )
  • what is best DIAGNOSIS Dxi for patient?
  • Seldom Completely Certain

11
Underlying Task, II
  • Situation Given observations O1v1, Okvk
  • (symptoms, history, test results, )
  • what is best DIAGNOSIS Dxi for patient?
  • Challenge How to express Probabilities?

12
How to deal with Probabilities
  • Sufficient atomic events
  • for all 21N values u ? T, F, vj ?T, F

P( Dx u, O1v1,..., Ok vk,, ONvN )
  • But even if binary Dx, 20 binary obs.s. ?
    gt2,097,000 numbers!

13
Problems with Atomic Events
  • Representation is not intuitive
  • ? Should make connections explicit
  • use local information

P(Jaundice Hepatitis), P(LightDim
BadBattery),
  • Too many numbers O(2N)
  • Hard to store
  • Hard to use
  • Must add 2r values to marginalize r
    variables
  • Hard to learn
  • Takes O(2N) samples to learn 2N parameters
  • ? Include only necessary connections

14

15
Hepatitis Example
H Hepatitis J Jaundice B (positive) Blood
test
  • (Boolean) Variables
  • Want P( H1 J0, B1 )
  • , P(H1 B1, J1), P(H1 B0,J0),
  • Alternatively

16
Encoding Causal Links
  • Simple Belief Net
  • Node Variable
  • Link Causal dependency
  • CPTable P(child parents)

17
Encoding Causal Links
P(H1)
0.05
H
h P(B1 Hh)
1 0.95
0 0.03
h b P(J1h , b )
1 1 0.8
1 0 0.8
0 1 0.3
0 0 0.3
B
J
  • P(J H, B0) P(J H, B1) ? J, H ! ?
    P( J H, B) P(J H)
  • J is INDEPENDENT of B, once we know H
  • Dont need B? J arc!

18
Encoding Causal Links
P(H1)
0.05
H
h P(B1 Hh)
1 0.95
0 0.03
h P(J1h )
1 0.8
1
0 0.3
0
B
J
  • P(J H, B0) P(J H, B1) ? J, H ! ?
    P( J H, B) P(J H)
  • J is INDEPENDENT of B, once we know H
  • Dont need B? J arc!

19
Encoding Causal Links
P(H1)
0.05
H
h P(B1 Hh)
1 0.95
0 0.03
h P(J1h )
1 0.8
0 0.3
B
J
  • P(J H, B0) P(J H, B1) ? J, H ! ?
    P( J H, B) P(J H)
  • J is INDEPENDENT of B, once we know H
  • Dont need B? J arc!

20
Sufficient Belief Net
P(H1)
0.05
h P(B1 Hh)
1 0.95
0 0.03
h P(J1h )
1 0.8
0 0.3
  • Requires P(H1) known
  • P(J1 H1) known
  • P(B1 H1) known
  • (Only 5 parameters, not 7)

21
Factoring
  • B does depend on J
  • If J1, then likely that H1 ? B 1

N.b., B and J ARE correlated a priori P(J B
) ? P(J) GIVEN H, they become uncorrelated
P(J B, H) P(J H)
22
Factored Distribution
  • Symptoms independent, given Disease
  • ReadingAbility and ShoeSize are dependent,
  • P(ReadAbility ShoeSize ) ? P(ReadAbility )
  • but become independent, given Age
  • P(ReadAbility ShoeSize, Age ) P(ReadAbility
    Age)

23
Naïve Bayes
  • Classification Task
  • Given O1 v1, , On vn
  • Find hi that maximizes (H hi O1 v1,
    , On vn)
  • Find argmax hi

24
Naïve Bayes (cont)
  • Normalizing term
  • (No need to compute, as same for all hi)
  • Easy to use for Classification
  • Can use even if some vjs not specified

25
Bigger Networks
P(I1)
0.20
P(H1)
0.32
g lt P(H1g ,lt )
1 1 0.82
1 0 0.10
0 1 0.45
0 0 0.04
h P(J1 h )
1 0.8
0 0.3
h P(B1 h )
1 0.98
0 0.01
  • Intuition Show CAUSAL connections
  • GeneticPH CAUSES Hepatitis Hepatitis CAUSES
    Jaundice

26
Belief Nets
  • DAG structure
  • Each node ? Variable v
  • v depends (only) on its parents
  • conditional prob P(vi parenti ?0,1,?
    )
  • v is INDEPENDENT of non-descendants,
  • given assignments to its parents

27
Less Trivial Situations
  • N.b., obs1 is not always independent of obs2
    given H
  • Eg, FamilyHistoryDepression causes
    MotherSuicide and Depression
  • MotherSuicide causes Depression (w/ or w/o
    F.H.Depression)
  • Here, P( D MS, FHD ) ? P( D FHD ) !
  • Can be done using Belief Network,
  • but need to specify
  • P( FHD ) 1
  • P( MS FHD ) 2
  • P( D MS, FHD ) 4

28
Example Car Diagnosis
29
MammoNet
30
ALARM
  • A Logical Alarm Reduction Mechanism
  • 8 diagnoses, 16 findings,

31
Troup Detection
32
ARCO1 Forecasting Oil Prices
33
ARCO1 Forecasting Oil Prices
34
Forecasting Potato Production
35
Warning System
36
Extensions
  • Find best values (posterior distr.) for
  • SEVERAL (gt 1) output variables
  • Partial specification of input values
  • only subset of variables
  • only distribution of each input variable
  • General Variables
  • Discrete, but domain gt 2
  • Continuous (Gaussian x ?i bi yi for
    parents Y )
  • Decision Theory ? Decision Nets (Influence
    Diagrams) Making Decisions, not just assigning
    probs
  • Storing P( v p1, p2,,pk) General CP Tables
    0(2k) Noisy-Or, Noisy-And, Noisy-Max Decision
    Trees

37
Outline
  • Existing uses of Belief Nets (BNs)
  • How to reason with BNs
  • Specific Examples of BNs
  • Contrast with Rules, Neural Nets,
  • Possible applications of BNs
  • Challenges
  • How to reason efficiently
  • How to learn BNs

38
Belief Nets vs Rules
  • Both have Locality
  • Specific clusters (rules / connected nodes)

WHY? Easier for people to reason CAUSALLY even
if use is DIAGNOSTIC
  • BN provide OPTIMAL way to deal with
  • Uncertainty
  • Vagueness (var not given, or only dist)
  • Error
  • Signals meeting Symbols
  • BN permits different directions of inference

39
Belief Nets vs Neural Nets
  • Both have graph structure but
  • BN Nodes have SEMANTICs
  • Combination Rules Sound Probability
  • NN Nodes arbitrary
  • Combination Rules Arbitrary
  • So harder to
  • Initialize NN
  • Explain NN
  • (But perhaps easier to learn NN from examples
    only?)
  • BNs can deal with
  • Partial Information
  • Different directions of inference

40
Belief Nets vs Markov Nets
  • Each uses graph structure
  • to FACTOR a distribution
  • explicitly specify dependencies, implicitly
    independencies
  • but subtle differences
  • BNs capture causality, hierarchies
  • MNs capture temporality

41
Uses of Belief Nets 1
  • Medical Diagnosis Assist/Critique MD
  • identify diseases not ruled-out
  • specify additional tests to perform
  • suggest treatments appropriate/cost-effective
  • react to MDs proposed treatment
  • Decision Support Find/repair faults in complex
    machines
  • Device, or Manufacturing Plant, or
  • based on sensors, recorded info, history,
  • Preventative Maintenance
  • Anticipate problems in complex machines
  • Device, or Manufacturing Plant, or
  • based on sensors, statistics, recorded info,
    device history,

42
Uses (cont)
  • Logistics Support Stock warehouses
    appropriatelybased on (estimated) freq. of
    needs, costs,
  • Diagnose Software Find most probable bugs,
    given program behavior, core dump, source code,
  • Part Inspection/Classification based on
    multiple sensors, background, model of
    production,
  • Information Retrieval Combine information from
    various sources, based on info from various
    agents,

General Partial Info, Sensor fusion -Classificati
on -Interpretation -Prediction -
43
Challenge 1Computational Efficiency
For given BN General problem is Given
Compute If BN is poly tree, ? efficient
alg. - If BN is genl DAG (gt1 path from X to
Y) - NP-hard in theory - slow in
practice Tricks Get approximate answer
(quickly) Use abstraction of BN Use
abstraction of query (range)
O1 v1, , On vn
D
I
P(H O1 v1, , On vn)
H
J
B
44
2aObtaining Accurate BN
  • BN encodes distribution over n variables
  • Not O(2n) values, but only ?i 2k_i
  • (Node ni binary, with ki parents)
  • Still lots of values! structure ..
  • ? Qualitative Information
  • Structure What depends on what?
  • Easy for people (background knowledge)
  • But NP-hard to learn from samples
  • ? Quantitative Information
  • Actual CP-tables
  • Easy to learn, given lots of examples.
  • But people have hard time

Knowledge acquisition from human experts
Simple learning algorithm
45
Notes on Learning
  • Mixed Sources Person provides structure
  • Algorithm fills-in numbers.
  • Just Human Expert People produce CP-table, as
    well as structure
  • Relatively few values really required
  • Esp. if NoisyOr, NoisyAnd, NaiveBayes,
  • Actual values not that important
  • Sensitivity studies

46
My Current Work
  • Learning Belief Nets
  • Model selection
  • Challenging myth that MDL is appropriate criteria
  • Learning performance system, not model
  • Validating Belief Nets
  • Error bars around answers
  • Adaptive User Interfaces
  • Efficient Vision Systems
  • Foundations of Learnability
  • Learning Active Classifiers
  • Sequential learners
  • Condition Based maintenance, Bio-signal
    interpretation,

47
2b Maintaining Accurate BN
  • The world changes.
  • Information in BN may be
  • perfect at time t
  • sub-optimal at time t 20
  • worthless at time t 200
  • Need to MAINTAIN a BN over time
  • using on-going human consultant
  • Adaptive BN
  • Dirichlet distribution (variables)
  • Priors over BNs

48
Conclusions
  • Provide effective way to
  • Represent complicated, inter-related events
  • Reason about such situations
  • Diagnosis, Explanation, ValueOfInfo
  • Explain conclusions
  • Mix Symbolic and Numeric observations
  • Challenges
  • Efficient ways to use BNs
  • Ways to create BNs
  • Ways to maintain BNs
  • Reason about time

49
Extra Slides
  • AI Seminar
  • Friday, noon, CSC3-33
  • Free PIZZA!
  • http//www.cs.ualberta.ca/ai/ai-seminar.html
  • References
  • http//www.cs.ualberta.ca/greiner/bn.html
  • Crusher Controller
  • Formal Framework
  • Decision Nets
  • Developing the Model
  • Why Reasoning is Hard
  • Learning Accurate Belief Nets

50
References
  • http//www.cs.ualberta.ca/greiner/bn.html
  • Overview textbooks
  • Judea Pearl, Probabilistic Reasoning in
    Intelligent Systems Networks of Plausible
    Inference, Morgan Kaufmann, 1988.
  • Stuart Russell and Peter Norvig, Artificial
    Intelligence A Modern Approach, Prentice Hall,
    1995. (See esp Ch 14, 15, 19.)
  • General info re BayesNets
  • http//www.afit.af.mil80/Schools/EN/ENG/LABS/AI/B
    ayesianNetworks
  • Proceedings http//www.sis.pitt.edu/dsl/uai.html
  • Assoc for Uncertainty in AI http//www.auai.org/
  • Learning
  • David Heckerman, A tutorial on learning with
    Bayesian networks, 1995,
  • http//www.research.microsoft.com/research/dtg/he
    ckerma/TR-95-06.htm
  • Software
  • General http//bayes.stat.washington.edu/almon
    d/belief.html
  • JavaBayes http//www.cs.cmu.edu/fgcozman/Research
    /JavaBaye
  • Norsys http//www.norsys.com/

51
Decision Net Test/Buy a Car
52
Utility Decision Nets
  • Given c( action, state) ? R (cost function)
  • Cp(a) Es c(a,s) ? s?S p(s obs) c(a, s)
  • Best (immediate) action a argmina ?A Cp(a)
  • Decision Net (like Belief Net) but
  • 3 types of nodes
  • chance (like Belief net)
  • action repair, sensing
  • cost/utility
  • Links for dependency
  • Given observations, obs, computes best action, a
  • Sequence of Actions MDPs, POMDPs,

Go Back
53
Decision Net Drill for Oil?
Go Back
54
Formal Framework
  • Always true
  • P(x1, ,xn) P(x1) P(x2 x1) P (x3 x2,
    x1) P (xn xn-1,,x1)
  • Given independencies,
  • P(xk x1,,xk-1) P (xk pak) for some
    pak ?x1, , xk-1
  • Hence
  • So just connect each y ? pai to xi ? DAG
    structure

Note -Size of BN is
.
so better to use small
pai. -pai 1,,i 1 is never incorrect
but seldom minl (so hard to store,
learn, reason with,) - Order of variables can
make HUGE difference Can have pai 1
for one ordering pai i 1 for another
Go Back
55
Developing the Model
  • Source of information
  • (Human) Expert (s)
  • Data from earlier Runs
  • Simulator
  • Typical Process
  • 1. Develop / Refine Initial Prototype
  • 2. Test Prototype ? Accurate System
  • 3. Deploy System
  • 4. Update / Maintain System

56
Develop/Refine Prototype
  • Requires expert
  • useful to have data
  • Initial Interview(s)
  • To establish what relates to what
  • Expert time ½ - day
  • Iterative process (Gradual refinement)
  • To refine qualitative connections
  • To establish correct operations

Expert presents Good Performance KE implements
Experts claims KE tests on examples (real data
or expert), and reports to Expert
Expert time 1 2 hours / week for ??
Weeks (Depends on complexity of device, and
accuracy of model)
Go Back
57
Why Reasoning is Hard
  • BN reasoning may look easy
  • Just propagate information from node to node

P(Zt)
0.5
z P(BtZz)
t 0.0
f 1.0
z P(AtZz)
t 1.0
f 0.0
a b P(Cta,b)
t t 1.0
t f 0.0
f t 0.0
f f 0.0
  • Challenge What is P(Ct)?

A Z B P ( A t ) P ( B f ) ½
So ? P ( C t ) P ( A t, B t)
P ( A t) P( B t) ½ ½ ¼
Wrong P ( C t ) 0 !
  • Need to maintain dependencies! P ( A t, B t
    ) P ( A t ) P ( B t A t)

Go Back
58
Crusher Controller
  • Given observations
  • History, sensor readings, schedule,
  • Specify best action for crusher
  • stop immediately, increase roller speed by ?
  • Best minimize expected cost
  • Initially just recommendation to human operator
  • Later Directly implement (some) actions
  • ?Request values of other sensors?

59
Approach
  • For each state s
  • (Good flow, tooth about to enter, )
  • for each action a
  • (Stop immediately, Change p7 0.32, )
  • determine utility of performing a in s
  • (Cost of lost production if stopped
  • of reduced production efficient if continue )
  • Use observations to estimate (dist over) current
    states
  • Infer EXPECTED UTILITY of each action, based on
    distr.
  • Return action with highest Expected Utility

60
Details
  • Inputs
  • Sensor Readings (history)
  • Camera, microphone, power-draw
  • Parameter settings
  • Log files, Maintenance records
  • Schedule (maintenance, anticipated load, )
  • Outputs
  • Continue as is
  • Adjust parameters
  • GapSize, ApronFeederSpeed, 1J_ConveyorSpeed
  • Shut down immediately
  • Step adding new material
  • Tell operator to look
  • State CrusherEnvironment
  • UncrushableThingsNowInCrusher
  • TeethMissing
  • NextUncrushableEntry
  • Control Parameters

61
Benefits
  • Increase Crusher Effectiveness
  • Find best settings for parameters
  • To maximize production of well-sized chunks
  • Reduce Down Time
  • Know when maintain/repair is critical
  • Reduce Damage to Crusher
  • Usable Model of Crusher
  • Easy to modify when needed
  • Training
  • Design of next generation
  • Prototype for design of control, diagnostician
    of other machines

Go Back
62
My Background
  • PhD, Stanford (Computer Science)
  • Representational issues, Analogical Inference
  • everything in Logic
  • PostDoc at UofToronto (CS)
  • Foundations of learnability, logical inference,
    DB, control theory,
  • everything in Logic
  • Industrial research (Siemens Corporate Research)
  • Need to solve REAL problems
  • Theory Revision, Navigational systems,
  • logic is not be-all-and-end-all!
  • Prof at UofAlberta (CS)
  • Industrial problems (Siemens, BioTools, Syncrude)
  • Foundations of learnability, probabilistic
    inference

63
Less Trivial Situations
  • N.b., obs1 is not always independent of obs2
    given H
  • Eg, FamilyHistoryDepression causes
    MotherSuicide and Depression
  • MotherSuicide causes Depression (w/ or w/o
    F.H.Depression)

FHD
f P(MS1 FHDf)
1 0.10
0 0.03
MS
f m P(D1 FHDf, MSm)
1 1 0.97
1 0 0.90
0 1 0.08
0 0 0.04
D
  • Here, P( D MS, FHD ) ? P( D FHD ) !
  • Can be done using Belief Network,
  • but need to specify
  • P( FHD ) 1
  • P( MS FHD ) 2
  • P( D MS, FHD ) 4
Write a Comment
User Comments (0)
About PowerShow.com