Reasoning Under Uncertainty - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

Reasoning Under Uncertainty

Description:

the time period of interest, there is a 10^-4 a priori chance. of this happening, and ... the time period of interest, there is a 10^-4 a priori chance ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 45
Provided by: peopleCs80
Category:

less

Transcript and Presenter's Notes

Title: Reasoning Under Uncertainty


1
Reasoning Under Uncertainty
  • Artificial Intelligence
  • CSPP 56553
  • February 18, 2004

2
Agenda
  • Motivation
  • Reasoning with uncertainty
  • Medical Informatics
  • Probability and Bayes Rule
  • Bayesian Networks
  • Noisy-Or
  • Decision Trees and Rationality
  • Conclusions

3
Uncertainty
  • Search and Planning Agents
  • Assume fully observable, deterministic, static
  • Real World
  • Probabilities capture Ignorance Laziness
  • Lack relevant facts, conditions
  • Failure to enumerate all conditions, exceptions
  • Partially observable, stochastic, extremely
    complex
  • Can't be sure of success, agent will maximize
  • Bayesian (subjective) probabilities relate to
    knowledge

4
Motivation
  • Uncertainty in medical diagnosis
  • Diseases produce symptoms
  • In diagnosis, observed symptoms gt disease ID
  • Uncertainties
  • Symptoms may not occur
  • Symptoms may not be reported
  • Diagnostic tests not perfect
  • False positive, false negative
  • How do we estimate confidence?

5
Motivation II
  • Uncertainty in medical decision-making
  • Physicians, patients must decide on treatments
  • Treatments may not be successful
  • Treatments may have unpleasant side effects
  • Choosing treatments
  • Weigh risks of adverse outcomes
  • People are BAD at reasoning intuitively about
    probabilities
  • Provide systematic analysis

6
Probability Basics
  • The sample space
  • A set O ?1, ?2, ?3, ?n
  • E.g 6 possible rolls of die
  • ?i is a sample point/atomic event
  • Probability space/model is a sample space with an
    assignment P(?) for every ? in O s.t. 0lt
    P(?)lt1 S ?P(?) 1
  • E.g. P(die roll lt 4)1/61/61/61/2

7
Random Variables
  • A random variable is a function from sample
    points to a range (e.g. reals, bools)
  • E.g. Odd(1) true
  • P induces a probability distribution for any r.v
    X
  • P(Xxi) S?X(?)xiP(?)
  • E.g. P(Oddtrue)1/61/61/61/2
  • Proposition is event (set of sample pts) s.t.
    proposition is true e.g. event a A(?)true

8
Why probabilities?
  • Definitions imply that logically related events
    have related probabilities
  • In AI applications, sample points are defined by
    set of random variables
  • Random vars boolean, discrete, continuous

9
Prior Probabilities
  • Prior probabilities belief prior to evidence
  • E.g. P(cavityt)0.2 P(weathersunny)0.6
  • Distribution gives values for all assignments
  • Joint distribution on set of r.v.s gives
    probability on every atomic event of r.v.s
  • E.g. P(weather,cavity)4x2 matrix of values
  • Every question about a domain can be answered
    with joint b/c every event is a sum of sample pts

10
Conditional Probabilities
  • Conditional (posterior) probabilities
  • E.g. P(cavitytoothache) 0.8, given only that
  • P(cavitytoothache)2 elt vector of 2 elt vectors
  • Can add new evidence, possibly irrelevant
  • P(ab) P(a,b)/P(b) where P(b) ?0
  • Also, P(a,b)P(ab)P(b)P(ba)P(a)
  • Product rule generalizes to chaining

11
Inference By Enumeration
12
Inference by Enumeration
13
Inference by Enumeration
14
Independence
15
Conditional Independence
16
Conditional Independence II
17
Probabilities Model Uncertainty
  • The World - Features
  • Random variables
  • Feature values
  • States of the world
  • Assignments of values to variables
  • Exponential in of variables
  • possible states

18
Probabilities of World States
  • Joint probability of assignments
  • States are distinct and exhaustive
  • Typically care about SUBSET of assignments
  • aka Circumstance
  • Exponential in of dont cares

19
A Simpler World
  • 2n world states Maximum entropy
  • Know nothing about the world
  • Many variables independent
  • P(strep,ebola) P(strep)P(ebola)
  • Conditionally independent
  • Depend on same factors but not on each other
  • P(fever,coughflu) P(feverflu)P(coughflu)

20
Probabilistic Diagnosis
  • Question
  • How likely is a patient to have a disease if they
    have the symptoms?
  • Probabilistic Model Bayes Rule
  • P(DS) P(SD)P(D)/P(S)
  • Where
  • P(SD) Probability of symptom given disease
  • P(D) Prior probability of having disease
  • P(S) Prior probability of having symptom

21
Diagnosis
  • Consider Meningitis
  • Disease Meningitis m
  • Symptom Stiff neck s
  • P(sm) 0.5
  • P(m) 0.0001
  • P(s) 0.1
  • How likely is it that someone with a stiff neck
    actually has meningitis?

22
Modeling (In)dependence
  • Simple, graphical notation for conditional
    independence compact spec of joint
  • Bayesian network
  • Nodes Variables
  • Directed acyclic graph link directly
    influences
  • Arcs Child depends on parent(s)
  • No arcs independent (0 incoming only a priori)
  • Parents of X
  • For each X need

23
Example I
24
Simple Bayesian Network
  • MCBN1

Need P(A) P(BA) P(CA) P(DB,C) P(EC)
Truth table 2 22 22 222 22
A only a priori B depends on A C depends on A D
depends on B,C E depends on C
25
Simplifying with Noisy-OR
  • How many computations?
  • p parents k values for variable
  • (k-1)kp
  • Very expensive! 10 binary parents2101024
  • Reduce computation by simplifying model
  • Treat each parent as possible independent cause
  • Only 11 computations
  • 10 causal probabilities leak probability
  • Some other cause

26
Noisy-OR Example
P(ba)
b b
a a
0.6 0.4 0.5 0.5
27
Noisy-OR Example II
Full model P(dab)P(dab)P(dab)P(dab) neg
Assume P(a)0.1 P(b)0.05 P(dab)0.3
0.5 P(db) 0.7
28
Graph Models
  • Bipartite graphs
  • E.g. medical reasoning
  • Generally, diseases cause symptom (not reverse)

29
Topologies
  • Generally more complex
  • Polytree One path between any two nodes
  • General Bayes Nets
  • Graphs with undirected cycles
  • No directed cycles - cant be own cause
  • Issue Automatic net acquisition
  • Update probabilities by observing data
  • Learn topology use statistical evidence of
    indep, heuristic search to find most probable
    structure

30
Holmes Example (Pearl)
Holmes is worried that his house will be burgled.
For the time period of interest, there is a
10-4 a priori chance of this happening, and
Holmes has installed a burglar alarm to try to
forestall this event. The alarm is 95 reliable
in sounding when a burglary happens, but also has
a false positive rate of 1. Holmes neighbor,
Watson, is 90 sure to call Holmes at his office
if the alarm sounds, but he is also a bit of a
practical joker and, knowing Holmes concern,
might (30) call even if the alarm is silent.
Holmes other neighbor Mrs. Gibbons is a
well-known lush and often befuddled, but Holmes
believes that she is four times more likely to
call him if there is an alarm than not.
31
Holmes Example Model
There a four binary random variables B whether
Holmes house has been burgled A whether his
alarm sounded W whether Watson called G whether
Gibbons called
32
Holmes Example Tables
B t Bf
Wt Wf
A t f
0.0001 0.9999
0.90 0.10 0.30 0.70
At Af
B t f
Gt Gf
A t f
0.95 0.05 0.01 0.99
0.40 0.60 0.10 0.90
33
Decision Making
  • Design model of rational decision making
  • Maximize expected value among alternatives
  • Uncertainty from
  • Outcomes of actions
  • Choices taken
  • To maximize outcome
  • Select maximum over choices
  • Weighted average value of chance outcomes

34
Gangrene Example
Medicine
Amputate foot
Worse 0.25
Full Recovery 0.7 1000
Die 0.05 0
Die 0.01
Live 0.99
850
0
Medicine
Amputate leg
Live 0.6 995
Live 0.98 700
Die 0.4 0
Die 0.02 0
35
Decision Tree Issues
  • Problem 1 Tree size
  • k activities 2k orders
  • Solution 1 Hill-climbing
  • Choose best apparent choice after one step
  • Use entropy reduction
  • Problem 2 Utility values
  • Difficult to estimate, Sensitivity, Duration
  • Change value depending on phrasing of question
  • Solution 2c Model effect of outcome over lifetime

36
Conclusion
  • Reasoning with uncertainty
  • Many real systems uncertain - e.g. medical
    diagnosis
  • Bayes Nets
  • Model (in)dependence relations in reasoning
  • Noisy-OR simplifies model/computation
  • Assumes causes independent
  • Decision Trees
  • Model rational decision making
  • Maximize outcome Max choice, average outcomes

37
Holmes Example (Pearl)
Holmes is worried that his house will be burgled.
For the time period of interest, there is a
10-4 a priori chance of this happening, and
Holmes has installed a burglar alarm to try to
forestall this event. The alarm is 95 reliable
in sounding when a burglary happens, but also has
a false positive rate of 1. Holmes neighbor,
Watson, is 90 sure to call Holmes at his office
if the alarm sounds, but he is also a bit of a
practical joker and, knowing Holmes concern,
might (30) call even if the alarm is silent.
Holmes other neighbor Mrs. Gibbons is a
well-known lush and often befuddled, but Holmes
believes that she is four times more likely to
call him if there is an alarm than not.
38
Holmes Example Model
There a four binary random variables B whether
Holmes house has been burgled A whether his
alarm sounded W whether Watson called G whether
Gibbons called
39
Holmes Example Tables
B t Bf
Wt Wf
A t f
0.0001 0.9999
0.90 0.10 0.30 0.70
At Af
B t f
Gt Gf
A t f
0.95 0.05 0.01 0.99
0.40 0.60 0.10 0.90
40
Bayesian Spam Filtering
  • Automatic Text Categorization
  • Probabilistic Classifier
  • Conditional Framework
  • Naïve Bayes Formulation
  • Independence assumptions galore
  • Feature Selection
  • Classification Evaluation

41
Spam Classification
  • Text categorization problem
  • Given a message,M, is it Spam or NotSpam?
  • Probabilistic framework
  • P(SpamM)gt P(NotSpamM)
  • P(SpamM)P(Spam,M)P(M)
  • P(NotSpamM)P(NotSpam,M)P(M)
  • Which is more likely?

42
Characterizing a Message
  • Represent message M as set of features
  • Features a1,a2,.an
  • What features?
  • Words! (again)
  • Alternatively (skip) n-gram sequences
  • Stemmed (?)
  • Term frequencies N(W, Spam) N(W,NotSpam)
  • Also, N(Spam),N(NotSpam) of words in each class

43
Characterizing a Message II
  • Estimating term conditional probabilities
  • Selecting good features
  • Exclude terms s.t.
  • N(WSpam)N(WNotSpam)lt4
  • 0.45 ltP(WSpam)/P(WSpam)P(WNotSpam)lt0.55

44
Naïve Bayes Formulation
  • Naïve Bayes (aka Idiot Bayes)
  • Assumes all features independent
  • Not accurate but useful simplification
  • So,
  • P(M,Spam)P(a1,a2,..,an,Spam)
  • P(a1,a2,..,anSpam)P(Spam)
  • P(a1Spam)..P(anSpam)P(Spam)
  • Likewise for NotSpam

45
Experimentation (Pantel Lin)
  • Training 160 spam, 466 non-spam
  • Test 277 spam, 346 non-spam
  • 230,449 training words 60434 spam
  • 12228 terms filtering reduces to 3848

46
Results (PL)
  • False positives 1.16
  • False negatives 8.3
  • Overall error 4.33
  • Simple approach, effective

47
Variants
  • Features?
  • Model?
  • Explicit bias to certain error types
  • Address lists
  • Explicit rules
Write a Comment
User Comments (0)
About PowerShow.com