Reasoning Under Uncertainty

About This Presentation

Title:

Reasoning Under Uncertainty

Description:

Reasoning Under Uncertainty Artificial Intelligence Chapter 9 * – PowerPoint PPT presentation

Number of Views:127

Avg rating:3.0/5.0

Slides: 49

Provided by: drd146

Category:

more less

Transcript and Presenter's Notes

Title: Reasoning Under Uncertainty

1
Reasoning Under Uncertainty

Artificial Intelligence
Chapter 9

2
Part 2
Reasoning
3
Notation

Random variable (RV) a variable (uppercase)that
takes on values (lowercase) from a domainof
mutually exclusive and exhaustive values
Aa a proposition, world state, event, effect,
etc.
abbreviate P(Atrue) to P(a)
abbreviate P(Afalse) to P(Øa)
abbreviate P(Avalue) to P(value)
abbreviate P(A¹value) to P(Øvalue)
Atomic event a complete specification of the
stateof the world about which the agent is
uncertain

4
Notation

P(a) a prior probability of RV Aawhich is the
degree of belief proposition ain absence of any
other relevant information
P(ae) conditional probability of RV Aa given
Eewhich is the degree of belief in proposition
awhen all that is known is evidence e
P(A) probability distribution, i.e. set of P(ai)
for all i
Joint probabilities are for conjunctions of
propositions

5
Reasoning under Uncertainty

Rather than reasoning about the truth or
falsityof a proposition, instead reason about
the belief that a proposition is true.
Use knowledge base of known probabilities to
determine probabilities for query propositions.

6
Reasoning under Uncertaintyusing Full Joint
Distributions

Assume a simplified Clue game havingtwo
characters, two weapons and two rooms

each row is an atomic event- one of these must
be true
- list must be mutually exclusive
- list must be exhaustive

Who What Where
plum rope hall
plum rope kitchen
plum pipe hall
plum pipe kitchen
green rope hall
green rope kitchen
green pipe hall
green pipe kitchen
Probability
1/8
1/8
1/8
1/8
1/8
1/8
1/8
1/8

prior probability for each is 1/8
- each equally likely
- e.g. P(plum,rope,hall) 1/8

? P(atomic_eventi) 1 since each RV's domain
isexhaustive mutually exclusive
7
Determining Marginal Probabilitiesusing Full
Joint Distributions

The probability of any proposition is equal
tothe sum of the probabilities of the atomic
eventsin which it holds, which is called the set
e(a).
P(a) ? P(ei)where ei is an element of e(a)
its the disjunction of atomic events in set e(a)
recall this property of atomic eventsany
proposition is logically equivalent to the
disjunctionof all atomic events that entail the
truth of that proposition

8
Determining Marginal Probabilitiesusing Full
Joint Distributions

Assume a simplified Clue game havingtwo
characters, two weapons and two rooms

P(a) ? P(ei)where ei is an element of e(a)
Who What Where
plum rope hall
plum rope kitchen
plum pipe hall
plum pipe kitchen
green rope hall
green rope kitchen
green pipe hall
green pipe kitchen
Probability
1/8
1/8
1/8
1/8
1/8
1/8
1/8
1/8
P(plum) ?
1/81/81/81/8 1/2
when obtained in this manner it is called a
marginal probability can be just a prior
probability (shown) or more complex (next) this
process is called marginalization or summing out
9
Reasoning under Uncertaintyusing Full Joint
Distributions

Assume a simplified Clue game havingtwo
characters, two weapons and two rooms

Who What Where
plum rope hall
plum rope kitchen
plum pipe hall
plum pipe kitchen
green rope hall
green rope kitchen
green pipe hall
green pipe kitchen
Probability
1/8
1/8
1/8
1/8
1/8
1/8
1/8
1/8
P(green,pipe)
P(rope, Øhall)
P(rope Ú hall)
1/8
10
Independence

Using the game cluefor an example is
uninteresting! Why?
Because the random variablesWho, What, Where are
independent.
Does picking the murder from the deck of cards
affect which weapon is chosen? Location?
No! Each is randomly selected.

11
Independence

Unconditional (absolute) IndependenceRVs have
no affect on each other's probabilities
1. P(XY) P(X)
2. P(YX) P(Y)
3. P(X,Y) P(X) P(Y)
Example (full clue)
P(green hall) P(green, hall) / P(hall)
6/324 / 1/9 P(green) 1/6
P(hall green) P(hall) 1/9
P(green, hall) P(green) P(hall) 1/54
We need a more interesting example!

12
Independence

Conditional IndependenceRVs (X, Y) are
dependent on another RV (Z)but are independent
of each other
1. P(XY,Z) P(XZ)
2. P(YX,Z) P(YZ)
3. P(X,YZ) P(XZ) P(YZ)
Ideasneezing (x) and itchy eyes (y)are both
directly caused by hayfever (z)
but neither sneezing nor itchy eyeshas a direct
effect on each other

13
Reasoning under Uncertaintyusing Full Joint
Distributions

Assume three boolean RVs Hayfever HF, Sneeze SN,
ItchyEyes IE

and fictional probabilities
P(a) ? P(ei)where ei is an element of e(a)
HF SN IE
false false false
false false true
false true false
false true true
true false false
true false true
true true false
true true true
Probability
0.5
0.09
0.1
0.1
0.01
0.06
0.04
0.1
P(sn) 0.1 0.1 0.04 0.10.34
P(hf) 0.01 0.06 0.04 0.10.21
P(sn,ie) 0.1 0.10.20
P(hf,sn) 0.04 0.10.14
14
Reasoning under Uncertaintyusing Full Joint
Distributions

Assume three boolean RVs Hayfever HF, Sneeze SN,
ItchyEyes IE
and fictional probabilities

HF SN IE
false false false
false false true
false true false
false true true
true false false
true false true
true true false
true true true
Probability
0.5
0.09
0.1
0.1
0.01
0.06
0.04
0.1
P(ae) P(a, e) / P(e)
P(hf sn) P(hf,sn) / P(sn) 0.14 /
0.34 0.41
P(hf ie) P(hf,ie) / P(ie) 0.16 /
0.35 0.46
15
Reasoning under Uncertaintyusing Full Joint
Distributions

Assume three boolean RVs Hayfever HF, Sneeze SN,
ItchyEyes IE
and fictional probabilities

HF SN IE
false false false
false false true
false true false
false true true
true false false
true false true
true true false
true true true
Probability
0.5
0.09
0.1
0.1
0.01
0.06
0.04
0.1
P(ae) P(a, e) / P(e)
Instead of computing P(e),could use normalization
P(hf sn) 0.14 / P(sn)
also computeP(Øhf sn) 0.20 / P(sn) since
P(hf sn) P(Øhf sn) 1 substituting and
solving gives P(sn) 0.34 !
16
Combining Multiple Evidence

As evidence describing the state of the worldis
accumulated, we'd like to be able to easily
update the degree of belief in a conclusion.
Using the Full Joint Prob. Dist. Table
P(v1,...,vkvk1,...,vn) ?P(V1v1,...,Vnvn) /
?P(Vk1vk1,...,Vnvn)
sum of all entries in the table, where V1v1,
..., Vnvn
divided by the sum of all entries in the
tablecorresponding to the evidence, where
Vk1vk1, ..., Vnvn

17
Combining Multiple Evidenceusing Full Joint
Distributions

Assume three boolean RVs and fictional
probabilities Hayfever HF, Sneeze SN, ItchyEyes
IE

HF SN IE
false false false
false false true
false true false
false true true
true false false
true false true
true true false
true true true
Probability
0.5
0.09
0.1
0.1
0.01
0.06
0.04
0.1
P(ab, c) P(a,b,c) / ? P(b,c) as described in
prior slide
P(hf sn, ie) P(hf,sn,ie) / ? P(sn,ie)
0.10 / (0.10.1) 0.5
18
Combining Multiple Evidence (cont.)

FJDT techniques are intractable in general
because the table size grows exponentially.
Independence assertions can help reducethe size
of the domain and the complexityof the inference
problem.
Independence assertions are usually basedon the
knowledge of the domain enablingFJD table to be
factored in to separate JD tables.
it's a good thing that problem domains are
independent
but typically subsets of dependent RVs are quite
large

19
Probability Rulesfor Multi-valued Variables

Summing Out P(Y) ? P(Y, z) sum over all
values z of RV Z
Conditioning P(Y) ? P(Yz) P(z) sum over
all values z of RV Z
Product Rule P(X, Y) P(XY) P(Y) P(YX) P(X)
Chain Rule P(X, Y, Z) P(XY, Z) P(YZ) P(Z)
this is a generalization of product rule with Y
Y,Z
order of RVs doesn't matter, i.e. gives same
result
Conditionalized Chain Rule(let YAB) P(X,
AB) P(XA, B) P(AB)(order doesn't matter)
P(AX, B) P(XB)

20
Bayes' Rule

Bayes' RuleP(ba) (P(ab) P(b)) / P(a)
derived from P(a Ù b) P(ba) P(a) P(ab)
P(b)just divide both sides of equation by P(a)
basis of AI systems using probabilistic reasoning
For Example
ahappy, bsun a sneeze, b fall
P(sunhappy) ? P(fallsneeze) ?
P(happysun) 0.95 P(sneezefall)
0.85P(sun) 0.5 P(fall)
0.25P(happy)
0.75 P(sneeze) 0.3
(0.95 0.5)/0.75 0.63 (0.85 0.25)/0.3
0.71

21
Bayes' Rule

P(ba) (P(ab) P(b)) / P(a)What's the benefit
of being able to calculateP(ba) from the three
probabilities on the right?
Usefulness of Bayes' Rule
many problems have good estimates of
probabilities on right
P(ba) needed to identify cause, classification,
diagnosis, etc
typical use is to calculate diagnostic
knowledgefrom causal knowledge

22
Bayes' Rule

Causal knowledge from causes to effects
e.g. P(sneezecold) probability of effect sneeze
given cause common cold
this probability the doctors obtains from
experiencetreating patients and understanding
the disease process
Diagnostic knowledge from effects to causes
e.g. P(coldsneeze)probability of cause common
cold given effect sneeze
knowing this probability helps a doctor make
adisease diagnosis based on a patient's symptoms
diagnostic knowledge is more fragile that causal
knowledgesince it can change significantly over
time given variationsin rate of occurrence of
its causes (due to epidemics, etc.)

23
Bayes' Rule

Using Bayes' Rule with causal knowledge
want to determine diagnostic knowledge
(diagnostic reasoning)that is difficult to
obtain from a general population
e.g. symptom is sstiffNeck, disease is
mmeningitis
P(sm) 1/2 the casual knowledge
P(m) 1/50000, P(s) 1/20 prior probabilities
P(ms) ? desired diagnostic knowledge
(1/2 1/50000)/ (1/20) 1/5000
doctor can now use P(ms) to guide diagnosis

24
Combining Multiple Evidenceusing Bayes' Rule

How do you update conditional probabilityof Y
given two pieces of evidence A and B?
General Bayes' Rule for multi-valued RVsP(YX)
(P(XY) P(Y)) / P(X)
let XA,B
P(YA,B) (P(A,BY) P(Y) ) / P(A,B) (P(Y)
(P(BA,Y) P(AY)) / (P(BA) P(A))
P(Y)(P(AY)/P(A))(P(BA,Y)/P(BA))
conditionalized chain rule used, product rule
used
Problems
P(BA,Y) generally hard to compute or obtain
doesn't scale well for n evidence RVs, table size
grows O(2n)

25
Combining Multiple Evidenceusing Bayes' Rule

Problems can be circumvented
If A and B are conditionally independent given
Ythen P(A,BY) P(AY)P(BY) and for P(A,B) use
product rule
P(YA,B) (P(Y) P(A,BY) ) / P(A,B) Bayes' Rule
Multi-E
P(YA,B) P(Y) (P(AY)/P(A))
(P(BY)/P(BA))
No joint probabilities, representation grows O(n)
If A is unconditionally independent of Bthen
P(A,BY) P(AY)P(BY) and P(A,B) P(A)P(B)
P(YA,B) (P(Y) P(A,BY) ) / P(A,B) Bayes' Rule
Multi-E P(YA,B) P(Y) (P(AY)/P(A))
(P(BY)/P(B))
This equation used to define a naïve Bayes
classifier.

26
Combining Multiple Evidenceusing Bayes' Rule

Example
What is the likelihood that a patient has
sclerosis colangitis?
doctor's initial belief P(sc) 1/1,000,000
examine reveals jaundice P(j)
1/10,000 P(jsc) 1/5
doctor's belief given test result P(scj)
P(sc)P(jsc)/P(j) 2/1000
tests reveal fibrosis of bile ducts P(fsc)
4/5 P(f) 1/100
doctor naïvely assumes jaundice and fibrosis are
independent
doctor's belief now rises P(scj,f) 16/100
P(scj,f) P(sc)(P(j sc)/P(j)) (P(f
sc)/P(f )) P(YA,B) P(Y) (P(AY)/P(A))(P(BY
) /P(B))

27
Naïve Bayes Classifier

Naïve Bayes Classifierused where single class is
based on a number of featuresor where single
cause influences a number of effects
based on P(YA,B) P(Y) (P(AY)/P(A))
(P(BY)/P(B))
given RV C
domain is possible classifications say c1,c2,c3
classifies input example of features F1, , Fn
compute
P(c1F1, , Fn), P(c2F1, , Fn), P(c3F1, , Fn)
naïvely assume features are independent
choose value for C that gives maximum probability
works surprising well in practice evenwhen
independence assumption aren't true

28
Bayesian Networks

AKA Bayes Nets, Belief Nets, Causal Nets, etc.
Encodes the full joint probability distribution
(FJPD) for the set of RVs defining a problem
domain
Uses a space-efficient data structure by
exploiting
fact that dependencies between RVs are generally
local
which results in lots of conditionally
independent RVs
Captures both qualitative and quantitative
relationships between RVs

29
Bayesian Networks

Can be used to compute any value in FJPD
Can be used to reason
predictive/causal reasoningforward (top-down)
from causes to effects
diagnostic reasoningbackward (bottom-up) from
effects to causes

30
Bayesian Network Representation

Is an augmented DAG (i.e. directed, acyclic
graph)
Represented by V,E where
V is a set of vertices
E is a set of directed edges joining vertices, no
loops
Each vertex contains
the RV's name
either a prior probability distribution ora
conditional probability distribution table
(CDT)that quantifies the effects of the parents
on this RV
Each directed arc
is from cause (parent) to its immediate effects
(children)
represents direct causal relationship between RVs

31
Bayesian Network Representation

Example in class
each row in conditional probability tables must
sum to 1
columns don't need to sum to 1
values obtained from experts
Number of probabilities required is typicallyfar
fewer than the number required for a FJDT
Quantitative information is usually givenby an
expert or determined empirically from data

32
Conditional Independence

Assume effects are conditionally independentof
each other given their common cause
The net is constructed so that given its
parents,a node is conditionally independent of
its non-descendant RVs in the net
P(X1x1, ..., Xnxn) P(xi parents(Xi)) ...
P(xn parents(Xn))
Note the full joint probability distribution
isn't needed, only need conditionals relative to
their parent RVs

33
Algorithm for ConstructingBayesian Networks

Choose a set of relevant random variables
Choose an ordering for them
Assume they're X1 .. Xm where X1 is first, X2 is
second, etc.
For i 1 to m
add a new node for Xi to the network
set Parents(Xi) to be a minimal subset of X1 ..
Xi-1such that we have conditional independence
of Xiand all other members of X1 ..Xi-1 given
Parents(Xi)
add directed arc from each node in Parents(Xi) to
Xi
non-root nodes define a conditional probability
table P(Xi x combinations of
Parents(Xi)) root nodes define prior
probability distribution at Xi P(Xi)

34
Algorithm for ConstructingBayesian Networks

For a given set of random variables (RVs)there
is not, in general, a unique Bayesian Netbut all
of them represent the same information
For the best net, topologically sort RVs in step
2
each RV comes before all of its children
first nodes are roots, then nodes they directly
influence
Best Bayesian Network for a problem has
fewest number of probabilities and arcs
easy to determine probabilities for the CDT
Algorithm won't construct a net that violatesthe
rules of probability

35
Computing Joint Probabilitiesusing a Bayesian
Network

Use product rule
Simplify using independence
For Example

Compute P(a,b,c,d) P(d,c,b,a)
order RVS in joint probability bottom up D,C,B,A
P(dc,b,a) P(c,b,a) Product Rule P(d,c,b,a)
P(dc) P(c,b,a) Conditional Independ. of D
given C
P(dc) P(cb,a) P(b,a) Product Rule
P(c,b,a)
P(dc) P(cb,a) P(ba) P(a) Product Rule
P(b,a)
P(dc) P(cb,a) P(b ) P(a) Independence of
B and A given no evidence

36
Computing Joint Probabilitiesusing a Bayesian
Network

Any entry in the full joint dist. table(i.e.
atomic event) can be computed!
P(v1,...,vn) PP(viParents(Vi)) over i from 1
to n
e.g. given boolean RVs what is P(a,..,h,k,..,p)?

P(a)P(b)P(c)P(da,b)P(eb,c)P(f)P(gd,e)P(h)
P(kf,g)P(lgh)P(mk)P(nk)P(ok,l)P(pl)
Note this is fast, i.e. linearin the number of
nodes in the net!

37
Computing Joint Probabilitiesusing a Bayesian
Network

How is any joint probability computed?
sum the relevant joint probabilities
e.g. Compute P(a,b)

P(a,b,c,d) P(a,b,c,Ød) P(a,b,Øc,d)
P(a,b,Øc,Ød)
e.g. Compute P(c)
P(a,b,c,d) P(a,Øb,c,d) PØa,b,c,d)
PØa,Øb,c,d) P(a,b,c,Ød) P(a,Øb,c,Ød)
P(Øa,b,c,Ød) P(Øa,Øb,c,Ød)
A BN can answer any query (i.e. probability)
about the domain by summing the relevant joint
probs.
Enumeration can require many computations!

38
Computing Conditional Probabilitiesusing a
Bayesian Network

Basic task of probabilistic systemis to compute
conditional probabilities.
Any conditional probability can be computed
P(v1,...,vkvk1,...,vn) ?P(V1v1,...,Vnvn) /
?P(Vk1vk1,...,Vnvn)
Key problem is that the technique of
enumeratingjoint probabilities can make the
computations intractable (exponential in the
number of RVs).

39
Computing Conditional Probabilitiesusing a
Bayesian Network

These computations generally relyon the
simplifications resulting fromthe independence
of the RVs.
Every variable that isn't an ancestorof a query
variable or an evidence variableis irrelevant to
the query.
What ancestors are irrelevant?

40
Independence in a Bayesian Network

Given a Bayesian Networkhow is independence
established?

A node is conditionally independent (CI)of its
non-descendants, given its parents.
e.g. Given D and E, G is CI of ?

41
Independence in a Bayesian Network

Given a Bayesian Networkhow is independence
established?

A node is conditionally independent (CI)of its
non-descendants, given its parents.
e.g. Given D and E, G is CI of ?

A, B, C, F, H
e.g. Given F and G, K is CI of ?
42
Independence in a Bayesian Network

Given a Bayesian Networkhow is independence
established?

A node is conditionally independentof all other
nodes in the network givenits parents, children,
and children'sparents, which is called a Markov
blanket
e.g. What is the Markov blanket for G?

G
Given this blanket G is CI of ? A, B, C, M,
N , O, P
What about absolute independence?
43
Computing Conditional Probabilitiesusing a
Bayesian Network

The general algorithm for computingconditional
probabilities is complicated.
It is easy if the query involves nodesthat are
directly connected to each other.
examples assumed to use boolean RVs
Simple causal inference P(EC)
conditional prob. dist. of effect E given cause C
as evidence
reasoning in same direction as arc, e.g. disease
to symptom
Simple diagnostic inference P(QE)
conditional prob. dist. of query Q given effect E
as evidence
reasoning in direction opposite of arc, e.g.
symptom to disease

44
Computing Conditional ProbabilitiesCausal
(Top-Down) Inference

Compute P(ec)
conditional probability of effect Ee given cause
Cc as evidence
assume arc exists to E from C and C2

Rewrite conditional probability of e in termsof
e and all of its parents (that aren't
evidence)given evidence c
Re-express each joint probability backto the
probability of e given all of its parents
Simplify using independence and Look Uprequired
values in the Bayesian Network

45
Computing Conditional ProbabilitiesCausal
(Top-Down) Inference

Compute P(ec)
P(e,c) / P(c) product rule
(P(e,c,c2) P(e,c,Øc2)) / P(c)
marginalizing
P(e,c,c2) / P(c) P(e,c,Øc2) / P(c)
algebra
P(e,c2c) P(e,Øc2c) product
rule, e.g. Xe,c2
P(ec2,c) P(c2c) P(eØc2,c) P(Øc2c) cond.
chain rule
Simplify given C and C2 are independent
P(c2c) P(c2)
P(Øc2c) P(Øc2)
P(ec2,c) P(c2) P(eØc2,c) P(Øc2)
algebra
now look up values to finish computation

46
Computing Conditional ProbabilitiesDiagnostic
(Bottom-Up) Inference

Compute P(ce)
conditional probability of cause Cc given effect
Ee as evidence
assume arc exists from C to E
idea convert to casual inference using Bayes'
rule
Use Bayes' rule P(ce) P(ec) P(c) / P(e)
Compute P(ec) using causal inference method
Look up value of P(c) in Bayesian Net
Use normalization to avoid computing P(e)
requires computing P(Øce)
using steps as in 1 3 above

47
Summary the Good News

Bayesian Nets are the bread and butter
ofAI-uncertainty community (like resolution to
AI-logic)
Bayesian Nets are a compact representation
don't require exponential storage to holdall of
the info in the full joint probability
distribution (FJPD) table
are a decomposed representation of the FJPD table
conditional probability distribution tables in
non-root nodes are only exponential in the max
number of parents of any node
Bayesian Nets are fast at computing joint
probs P(V1, ..., Vk) i.e. prior probability of
V1, ..., Vk
computing the probability of an atomic event can
be donein linear time with the number of nodes
in the net

48
Summary the Bad News

Conditional probabilities can also be computed
P(QE1, ..., Ek)posterior probability of query
Q given multiple evidence E1, ..., Ek
requires enumerating all of the matching
entries,which takes exponential time in the
number of variables
in special cases it can be done faster, lt
polynomial timee.g. polytree linear time for
nets structured like trees
In general, inference in Bayesian Networks
(BN)is NP-hard. ?
but BNs are well studied so there exists many
efficient exact solution methods as well as a
variety of approximation techniques