Reasoning Under Uncertainty

About This Presentation

Title:

Reasoning Under Uncertainty

Description:

the time period of interest, there is a 10^-4 a priori chance. of this happening, and ... the time period of interest, there is a 10^-4 a priori chance ... – PowerPoint PPT presentation

Number of Views:25

Avg rating:3.0/5.0

Slides: 45

Provided by: peopleCs80

Learn more at: http://people.cs.uchicago.edu

Category:

more less

Transcript and Presenter's Notes

Title: Reasoning Under Uncertainty

1
Reasoning Under Uncertainty

Artificial Intelligence
CSPP 56553
February 18, 2004

2
Agenda

Motivation
Reasoning with uncertainty
Medical Informatics
Probability and Bayes Rule
Bayesian Networks
Noisy-Or
Decision Trees and Rationality
Conclusions

3
Uncertainty

Search and Planning Agents
Assume fully observable, deterministic, static
Real World
Probabilities capture Ignorance Laziness
Lack relevant facts, conditions
Failure to enumerate all conditions, exceptions
Partially observable, stochastic, extremely
complex
Can't be sure of success, agent will maximize
Bayesian (subjective) probabilities relate to
knowledge

4
Motivation

Uncertainty in medical diagnosis
Diseases produce symptoms
In diagnosis, observed symptoms gt disease ID
Uncertainties
Symptoms may not occur
Symptoms may not be reported
Diagnostic tests not perfect
False positive, false negative
How do we estimate confidence?

5
Motivation II

Uncertainty in medical decision-making
Physicians, patients must decide on treatments
Treatments may not be successful
Treatments may have unpleasant side effects
Choosing treatments
Weigh risks of adverse outcomes
People are BAD at reasoning intuitively about
probabilities
Provide systematic analysis

6
Probability Basics

The sample space
A set O ?1, ?2, ?3, ?n
E.g 6 possible rolls of die
?i is a sample point/atomic event
Probability space/model is a sample space with an
assignment P(?) for every ? in O s.t. 0lt
P(?)lt1 S ?P(?) 1
E.g. P(die roll lt 4)1/61/61/61/2

7
Random Variables

A random variable is a function from sample
points to a range (e.g. reals, bools)
E.g. Odd(1) true
P induces a probability distribution for any r.v
X
P(Xxi) S?X(?)xiP(?)
E.g. P(Oddtrue)1/61/61/61/2
Proposition is event (set of sample pts) s.t.
proposition is true e.g. event a A(?)true

8
Why probabilities?

Definitions imply that logically related events
have related probabilities
In AI applications, sample points are defined by
set of random variables
Random vars boolean, discrete, continuous

9
Prior Probabilities

Prior probabilities belief prior to evidence
E.g. P(cavityt)0.2 P(weathersunny)0.6
Distribution gives values for all assignments
Joint distribution on set of r.v.s gives
probability on every atomic event of r.v.s
E.g. P(weather,cavity)4x2 matrix of values
Every question about a domain can be answered
with joint b/c every event is a sum of sample pts

10
Conditional Probabilities

Conditional (posterior) probabilities
E.g. P(cavitytoothache) 0.8, given only that
P(cavitytoothache)2 elt vector of 2 elt vectors
Can add new evidence, possibly irrelevant
P(ab) P(a,b)/P(b) where P(b) ?0
Also, P(a,b)P(ab)P(b)P(ba)P(a)
Product rule generalizes to chaining

11
Inference By Enumeration
12
Inference by Enumeration
13
Inference by Enumeration
14
Independence
15
Conditional Independence
16
Conditional Independence II
17
Probabilities Model Uncertainty

The World - Features
Random variables
Feature values
States of the world
Assignments of values to variables
Exponential in of variables
possible states

18
Probabilities of World States

Joint probability of assignments
States are distinct and exhaustive
Typically care about SUBSET of assignments
aka Circumstance
Exponential in of dont cares

19
A Simpler World

2n world states Maximum entropy
Know nothing about the world
Many variables independent
P(strep,ebola) P(strep)P(ebola)
Conditionally independent
Depend on same factors but not on each other
P(fever,coughflu) P(feverflu)P(coughflu)

20
Probabilistic Diagnosis

Question
How likely is a patient to have a disease if they
have the symptoms?
Probabilistic Model Bayes Rule
P(DS) P(SD)P(D)/P(S)
Where
P(SD) Probability of symptom given disease
P(D) Prior probability of having disease
P(S) Prior probability of having symptom

21
Diagnosis

Consider Meningitis
Disease Meningitis m
Symptom Stiff neck s
P(sm) 0.5
P(m) 0.0001
P(s) 0.1
How likely is it that someone with a stiff neck
actually has meningitis?

22
Modeling (In)dependence

Simple, graphical notation for conditional
independence compact spec of joint
Bayesian network
Nodes Variables
Directed acyclic graph link directly
influences
Arcs Child depends on parent(s)
No arcs independent (0 incoming only a priori)
Parents of X
For each X need

23
Example I
24
Simple Bayesian Network

MCBN1

Need P(A) P(BA) P(CA) P(DB,C) P(EC)
Truth table 2 22 22 222 22
A only a priori B depends on A C depends on A D
depends on B,C E depends on C
25
Simplifying with Noisy-OR

How many computations?
p parents k values for variable
(k-1)kp
Very expensive! 10 binary parents2101024
Reduce computation by simplifying model
Treat each parent as possible independent cause
Only 11 computations
10 causal probabilities leak probability
Some other cause

26
Noisy-OR Example
P(ba)
b b
a a
0.6 0.4 0.5 0.5
27
Noisy-OR Example II
Full model P(dab)P(dab)P(dab)P(dab) neg
Assume P(a)0.1 P(b)0.05 P(dab)0.3
0.5 P(db) 0.7
28
Graph Models

Bipartite graphs
E.g. medical reasoning
Generally, diseases cause symptom (not reverse)

29
Topologies

Generally more complex
Polytree One path between any two nodes
General Bayes Nets
Graphs with undirected cycles
No directed cycles - cant be own cause
Issue Automatic net acquisition
Update probabilities by observing data
Learn topology use statistical evidence of
indep, heuristic search to find most probable
structure

30
Holmes Example (Pearl)
Holmes is worried that his house will be burgled.
For the time period of interest, there is a
10-4 a priori chance of this happening, and
Holmes has installed a burglar alarm to try to
forestall this event. The alarm is 95 reliable
in sounding when a burglary happens, but also has
a false positive rate of 1. Holmes neighbor,
Watson, is 90 sure to call Holmes at his office
if the alarm sounds, but he is also a bit of a
practical joker and, knowing Holmes concern,
might (30) call even if the alarm is silent.
Holmes other neighbor Mrs. Gibbons is a
well-known lush and often befuddled, but Holmes
believes that she is four times more likely to
call him if there is an alarm than not.
31
Holmes Example Model
There a four binary random variables B whether
Holmes house has been burgled A whether his
alarm sounded W whether Watson called G whether
Gibbons called
32
Holmes Example Tables
B t Bf
Wt Wf
A t f
0.0001 0.9999
0.90 0.10 0.30 0.70
At Af
B t f
Gt Gf
A t f
0.95 0.05 0.01 0.99
0.40 0.60 0.10 0.90
33
Decision Making

Design model of rational decision making
Maximize expected value among alternatives
Uncertainty from
Outcomes of actions
Choices taken
To maximize outcome
Select maximum over choices
Weighted average value of chance outcomes

34
Gangrene Example
Medicine
Amputate foot
Worse 0.25
Full Recovery 0.7 1000
Die 0.05 0
Die 0.01
Live 0.99
850
0
Medicine
Amputate leg
Live 0.6 995
Live 0.98 700
Die 0.4 0
Die 0.02 0
35
Decision Tree Issues

Problem 1 Tree size
k activities 2k orders
Solution 1 Hill-climbing
Choose best apparent choice after one step
Use entropy reduction
Problem 2 Utility values
Difficult to estimate, Sensitivity, Duration
Change value depending on phrasing of question
Solution 2c Model effect of outcome over lifetime

36
Conclusion

Reasoning with uncertainty
Many real systems uncertain - e.g. medical
diagnosis
Bayes Nets
Model (in)dependence relations in reasoning
Noisy-OR simplifies model/computation
Assumes causes independent
Decision Trees
Model rational decision making
Maximize outcome Max choice, average outcomes

37
Holmes Example (Pearl)
Holmes is worried that his house will be burgled.
For the time period of interest, there is a
10-4 a priori chance of this happening, and
Holmes has installed a burglar alarm to try to
forestall this event. The alarm is 95 reliable
in sounding when a burglary happens, but also has
a false positive rate of 1. Holmes neighbor,
Watson, is 90 sure to call Holmes at his office
if the alarm sounds, but he is also a bit of a
practical joker and, knowing Holmes concern,
might (30) call even if the alarm is silent.
Holmes other neighbor Mrs. Gibbons is a
well-known lush and often befuddled, but Holmes
believes that she is four times more likely to
call him if there is an alarm than not.
38
Holmes Example Model
There a four binary random variables B whether
Holmes house has been burgled A whether his
alarm sounded W whether Watson called G whether
Gibbons called
39
Holmes Example Tables
B t Bf
Wt Wf
A t f
0.0001 0.9999
0.90 0.10 0.30 0.70
At Af
B t f
Gt Gf
A t f
0.95 0.05 0.01 0.99
0.40 0.60 0.10 0.90
40
Bayesian Spam Filtering

Automatic Text Categorization
Probabilistic Classifier
Conditional Framework
Naïve Bayes Formulation
Independence assumptions galore
Feature Selection
Classification Evaluation

41
Spam Classification

Text categorization problem
Given a message,M, is it Spam or NotSpam?
Probabilistic framework
P(SpamM)gt P(NotSpamM)
P(SpamM)P(Spam,M)P(M)
P(NotSpamM)P(NotSpam,M)P(M)
Which is more likely?

42
Characterizing a Message

Represent message M as set of features
Features a1,a2,.an
What features?
Words! (again)
Alternatively (skip) n-gram sequences
Stemmed (?)
Term frequencies N(W, Spam) N(W,NotSpam)
Also, N(Spam),N(NotSpam) of words in each class

43
Characterizing a Message II

Estimating term conditional probabilities
Selecting good features
Exclude terms s.t.
N(WSpam)N(WNotSpam)lt4
0.45 ltP(WSpam)/P(WSpam)P(WNotSpam)lt0.55

44
Naïve Bayes Formulation

Naïve Bayes (aka Idiot Bayes)
Assumes all features independent
Not accurate but useful simplification
So,
P(M,Spam)P(a1,a2,..,an,Spam)
P(a1,a2,..,anSpam)P(Spam)
P(a1Spam)..P(anSpam)P(Spam)
Likewise for NotSpam

45
Experimentation (Pantel Lin)

Training 160 spam, 466 non-spam
Test 277 spam, 346 non-spam
230,449 training words 60434 spam
12228 terms filtering reduces to 3848

46
Results (PL)

False positives 1.16
False negatives 8.3
Overall error 4.33
Simple approach, effective

47
Variants

Features?
Model?
Explicit bias to certain error types
Address lists
Explicit rules

Write a Comment

User Comments (0)

About PowerShow.com

Reasoning Under Uncertainty - PowerPoint PPT Presentation

Reasoning Under Uncertainty

the time period of interest, there is a 10^-4 a priori chance. of this happening, and ... the time period of interest, there is a 10^-4 a priori chance ... – PowerPoint PPT presentation