Reasoning Under Uncertainty - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

Reasoning Under Uncertainty

Description:

Reasoning Under Uncertainty Artificial Intelligence Chapter 9 * – PowerPoint PPT presentation

Number of Views:127
Avg rating:3.0/5.0
Slides: 49
Provided by: drd146
Category:

less

Transcript and Presenter's Notes

Title: Reasoning Under Uncertainty


1
Reasoning Under Uncertainty
  • Artificial Intelligence
  • Chapter 9

2
Part 2
Reasoning
3
Notation
  • Random variable (RV) a variable (uppercase)that
    takes on values (lowercase) from a domainof
    mutually exclusive and exhaustive values
  • Aa a proposition, world state, event, effect,
    etc.
  • abbreviate P(Atrue) to P(a)
  • abbreviate P(Afalse) to P(Øa)
  • abbreviate P(Avalue) to P(value)
  • abbreviate P(A¹value) to P(Øvalue)
  • Atomic event a complete specification of the
    stateof the world about which the agent is
    uncertain

4
Notation
  • P(a) a prior probability of RV Aawhich is the
    degree of belief proposition ain absence of any
    other relevant information
  • P(ae) conditional probability of RV Aa given
    Eewhich is the degree of belief in proposition
    awhen all that is known is evidence e
  • P(A) probability distribution, i.e. set of P(ai)
    for all i
  • Joint probabilities are for conjunctions of
    propositions

5
Reasoning under Uncertainty
  • Rather than reasoning about the truth or
    falsityof a proposition, instead reason about
    the belief that a proposition is true.
  • Use knowledge base of known probabilities to
    determine probabilities for query propositions.

6
Reasoning under Uncertaintyusing Full Joint
Distributions
  • Assume a simplified Clue game havingtwo
    characters, two weapons and two rooms
  • each row is an atomic event- one of these must
    be true
  • - list must be mutually exclusive
  • - list must be exhaustive

Who What Where
plum rope hall
plum rope kitchen
plum pipe hall
plum pipe kitchen
green rope hall
green rope kitchen
green pipe hall
green pipe kitchen
Probability
1/8
1/8
1/8
1/8
1/8
1/8
1/8
1/8
  • prior probability for each is 1/8
  • - each equally likely
  • - e.g. P(plum,rope,hall) 1/8

? P(atomic_eventi) 1 since each RV's domain
isexhaustive mutually exclusive
7
Determining Marginal Probabilitiesusing Full
Joint Distributions
  • The probability of any proposition is equal
    tothe sum of the probabilities of the atomic
    eventsin which it holds, which is called the set
    e(a).
  • P(a) ? P(ei)where ei is an element of e(a)
  • its the disjunction of atomic events in set e(a)
  • recall this property of atomic eventsany
    proposition is logically equivalent to the
    disjunctionof all atomic events that entail the
    truth of that proposition

8
Determining Marginal Probabilitiesusing Full
Joint Distributions
  • Assume a simplified Clue game havingtwo
    characters, two weapons and two rooms

P(a) ? P(ei)where ei is an element of e(a)
Who What Where
plum rope hall
plum rope kitchen
plum pipe hall
plum pipe kitchen
green rope hall
green rope kitchen
green pipe hall
green pipe kitchen
Probability
1/8
1/8
1/8
1/8
1/8
1/8
1/8
1/8
P(plum) ?
1/81/81/81/8 1/2
when obtained in this manner it is called a
marginal probability can be just a prior
probability (shown) or more complex (next) this
process is called marginalization or summing out
9
Reasoning under Uncertaintyusing Full Joint
Distributions
  • Assume a simplified Clue game havingtwo
    characters, two weapons and two rooms

Who What Where
plum rope hall
plum rope kitchen
plum pipe hall
plum pipe kitchen
green rope hall
green rope kitchen
green pipe hall
green pipe kitchen
Probability
1/8
1/8
1/8
1/8
1/8
1/8
1/8
1/8
P(green,pipe)
P(rope, Øhall)
P(rope Ú hall)
1/8
10
Independence
  • Using the game cluefor an example is
    uninteresting! Why?
  • Because the random variablesWho, What, Where are
    independent.
  • Does picking the murder from the deck of cards
    affect which weapon is chosen? Location?
  • No! Each is randomly selected.

11
Independence
  • Unconditional (absolute) IndependenceRVs have
    no affect on each other's probabilities
  • 1. P(XY) P(X)
  • 2. P(YX) P(Y)
  • 3. P(X,Y) P(X) P(Y)
  • Example (full clue)
  • P(green hall) P(green, hall) / P(hall)
    6/324 / 1/9 P(green) 1/6
  • P(hall green) P(hall) 1/9
  • P(green, hall) P(green) P(hall) 1/54
  • We need a more interesting example!

12
Independence
  • Conditional IndependenceRVs (X, Y) are
    dependent on another RV (Z)but are independent
    of each other
  • 1. P(XY,Z) P(XZ)
  • 2. P(YX,Z) P(YZ)
  • 3. P(X,YZ) P(XZ) P(YZ)
  • Ideasneezing (x) and itchy eyes (y)are both
    directly caused by hayfever (z)
  • but neither sneezing nor itchy eyeshas a direct
    effect on each other

13
Reasoning under Uncertaintyusing Full Joint
Distributions
  • Assume three boolean RVs Hayfever HF, Sneeze SN,
    ItchyEyes IE

and fictional probabilities
P(a) ? P(ei)where ei is an element of e(a)
HF SN IE
false false false
false false true
false true false
false true true
true false false
true false true
true true false
true true true
Probability
0.5
0.09
0.1
0.1
0.01
0.06
0.04
0.1
P(sn) 0.1 0.1 0.04 0.10.34
P(hf) 0.01 0.06 0.04 0.10.21
P(sn,ie) 0.1 0.10.20
P(hf,sn) 0.04 0.10.14
14
Reasoning under Uncertaintyusing Full Joint
Distributions
  • Assume three boolean RVs Hayfever HF, Sneeze SN,
    ItchyEyes IE
  • and fictional probabilities

HF SN IE
false false false
false false true
false true false
false true true
true false false
true false true
true true false
true true true
Probability
0.5
0.09
0.1
0.1
0.01
0.06
0.04
0.1
P(ae) P(a, e) / P(e)
P(hf sn) P(hf,sn) / P(sn) 0.14 /
0.34 0.41
P(hf ie) P(hf,ie) / P(ie) 0.16 /
0.35 0.46
15
Reasoning under Uncertaintyusing Full Joint
Distributions
  • Assume three boolean RVs Hayfever HF, Sneeze SN,
    ItchyEyes IE
  • and fictional probabilities

HF SN IE
false false false
false false true
false true false
false true true
true false false
true false true
true true false
true true true
Probability
0.5
0.09
0.1
0.1
0.01
0.06
0.04
0.1
P(ae) P(a, e) / P(e)
Instead of computing P(e),could use normalization
P(hf sn) 0.14 / P(sn)
also computeP(Øhf sn) 0.20 / P(sn) since
P(hf sn) P(Øhf sn) 1 substituting and
solving gives P(sn) 0.34 !
16
Combining Multiple Evidence
  • As evidence describing the state of the worldis
    accumulated, we'd like to be able to easily
    update the degree of belief in a conclusion.
  • Using the Full Joint Prob. Dist. Table
  • P(v1,...,vkvk1,...,vn) ?P(V1v1,...,Vnvn) /
    ?P(Vk1vk1,...,Vnvn)
  • sum of all entries in the table, where V1v1,
    ..., Vnvn
  • divided by the sum of all entries in the
    tablecorresponding to the evidence, where
    Vk1vk1, ..., Vnvn

17
Combining Multiple Evidenceusing Full Joint
Distributions
  • Assume three boolean RVs and fictional
    probabilities Hayfever HF, Sneeze SN, ItchyEyes
    IE

HF SN IE
false false false
false false true
false true false
false true true
true false false
true false true
true true false
true true true
Probability
0.5
0.09
0.1
0.1
0.01
0.06
0.04
0.1
P(ab, c) P(a,b,c) / ? P(b,c) as described in
prior slide
P(hf sn, ie) P(hf,sn,ie) / ? P(sn,ie)
0.10 / (0.10.1) 0.5
18
Combining Multiple Evidence (cont.)
  • FJDT techniques are intractable in general
    because the table size grows exponentially.
  • Independence assertions can help reducethe size
    of the domain and the complexityof the inference
    problem.
  • Independence assertions are usually basedon the
    knowledge of the domain enablingFJD table to be
    factored in to separate JD tables.
  • it's a good thing that problem domains are
    independent
  • but typically subsets of dependent RVs are quite
    large

19
Probability Rulesfor Multi-valued Variables
  • Summing Out P(Y) ? P(Y, z) sum over all
    values z of RV Z
  • Conditioning P(Y) ? P(Yz) P(z) sum over
    all values z of RV Z
  • Product Rule P(X, Y) P(XY) P(Y) P(YX) P(X)
  • Chain Rule P(X, Y, Z) P(XY, Z) P(YZ) P(Z)
  • this is a generalization of product rule with Y
    Y,Z
  • order of RVs doesn't matter, i.e. gives same
    result
  • Conditionalized Chain Rule(let YAB) P(X,
    AB) P(XA, B) P(AB)(order doesn't matter)
    P(AX, B) P(XB)

20
Bayes' Rule
  • Bayes' RuleP(ba) (P(ab) P(b)) / P(a)
  • derived from P(a Ù b) P(ba) P(a) P(ab)
    P(b)just divide both sides of equation by P(a)
  • basis of AI systems using probabilistic reasoning
  • For Example
  • ahappy, bsun a sneeze, b fall
  • P(sunhappy) ? P(fallsneeze) ?
  • P(happysun) 0.95 P(sneezefall)
    0.85P(sun) 0.5 P(fall)
    0.25P(happy)
    0.75 P(sneeze) 0.3
  • (0.95 0.5)/0.75 0.63 (0.85 0.25)/0.3
    0.71

21
Bayes' Rule
  • P(ba) (P(ab) P(b)) / P(a)What's the benefit
    of being able to calculateP(ba) from the three
    probabilities on the right?
  • Usefulness of Bayes' Rule
  • many problems have good estimates of
    probabilities on right
  • P(ba) needed to identify cause, classification,
    diagnosis, etc
  • typical use is to calculate diagnostic
    knowledgefrom causal knowledge

22
Bayes' Rule
  • Causal knowledge from causes to effects
  • e.g. P(sneezecold) probability of effect sneeze
    given cause common cold
  • this probability the doctors obtains from
    experiencetreating patients and understanding
    the disease process
  • Diagnostic knowledge from effects to causes
  • e.g. P(coldsneeze)probability of cause common
    cold given effect sneeze
  • knowing this probability helps a doctor make
    adisease diagnosis based on a patient's symptoms
  • diagnostic knowledge is more fragile that causal
    knowledgesince it can change significantly over
    time given variationsin rate of occurrence of
    its causes (due to epidemics, etc.)

23
Bayes' Rule
  • Using Bayes' Rule with causal knowledge
  • want to determine diagnostic knowledge
    (diagnostic reasoning)that is difficult to
    obtain from a general population
  • e.g. symptom is sstiffNeck, disease is
    mmeningitis
  • P(sm) 1/2 the casual knowledge
  • P(m) 1/50000, P(s) 1/20 prior probabilities
  • P(ms) ? desired diagnostic knowledge
  • (1/2 1/50000)/ (1/20) 1/5000
  • doctor can now use P(ms) to guide diagnosis

24
Combining Multiple Evidenceusing Bayes' Rule
  • How do you update conditional probabilityof Y
    given two pieces of evidence A and B?
  • General Bayes' Rule for multi-valued RVsP(YX)
    (P(XY) P(Y)) / P(X)
  • let XA,B
  • P(YA,B) (P(A,BY) P(Y) ) / P(A,B) (P(Y)
    (P(BA,Y) P(AY)) / (P(BA) P(A))
  • P(Y)(P(AY)/P(A))(P(BA,Y)/P(BA))
  • conditionalized chain rule used, product rule
    used
  • Problems
  • P(BA,Y) generally hard to compute or obtain
  • doesn't scale well for n evidence RVs, table size
    grows O(2n)

25
Combining Multiple Evidenceusing Bayes' Rule
  • Problems can be circumvented
  • If A and B are conditionally independent given
    Ythen P(A,BY) P(AY)P(BY) and for P(A,B) use
    product rule
  • P(YA,B) (P(Y) P(A,BY) ) / P(A,B) Bayes' Rule
    Multi-E
  • P(YA,B) P(Y) (P(AY)/P(A))
    (P(BY)/P(BA))
  • No joint probabilities, representation grows O(n)
  • If A is unconditionally independent of Bthen
    P(A,BY) P(AY)P(BY) and P(A,B) P(A)P(B)
  • P(YA,B) (P(Y) P(A,BY) ) / P(A,B) Bayes' Rule
    Multi-E P(YA,B) P(Y) (P(AY)/P(A))
    (P(BY)/P(B))
  • This equation used to define a naïve Bayes
    classifier.

26
Combining Multiple Evidenceusing Bayes' Rule
  • Example
  • What is the likelihood that a patient has
    sclerosis colangitis?
  • doctor's initial belief P(sc) 1/1,000,000
  • examine reveals jaundice P(j)
    1/10,000 P(jsc) 1/5
  • doctor's belief given test result P(scj)
    P(sc)P(jsc)/P(j) 2/1000
  • tests reveal fibrosis of bile ducts P(fsc)
    4/5 P(f) 1/100
  • doctor naïvely assumes jaundice and fibrosis are
    independent
  • doctor's belief now rises P(scj,f) 16/100
    P(scj,f) P(sc)(P(j sc)/P(j)) (P(f
    sc)/P(f )) P(YA,B) P(Y) (P(AY)/P(A))(P(BY
    ) /P(B))

27
Naïve Bayes Classifier
  • Naïve Bayes Classifierused where single class is
    based on a number of featuresor where single
    cause influences a number of effects
  • based on P(YA,B) P(Y) (P(AY)/P(A))
    (P(BY)/P(B))
  • given RV C
  • domain is possible classifications say c1,c2,c3
  • classifies input example of features F1, , Fn
  • compute
  • P(c1F1, , Fn), P(c2F1, , Fn), P(c3F1, , Fn)
  • naïvely assume features are independent
  • choose value for C that gives maximum probability
  • works surprising well in practice evenwhen
    independence assumption aren't true

28
Bayesian Networks
  • AKA Bayes Nets, Belief Nets, Causal Nets, etc.
  • Encodes the full joint probability distribution
    (FJPD) for the set of RVs defining a problem
    domain
  • Uses a space-efficient data structure by
    exploiting
  • fact that dependencies between RVs are generally
    local
  • which results in lots of conditionally
    independent RVs
  • Captures both qualitative and quantitative
    relationships between RVs

29
Bayesian Networks
  • Can be used to compute any value in FJPD
  • Can be used to reason
  • predictive/causal reasoningforward (top-down)
    from causes to effects
  • diagnostic reasoningbackward (bottom-up) from
    effects to causes

30
Bayesian Network Representation
  • Is an augmented DAG (i.e. directed, acyclic
    graph)
  • Represented by V,E where
  • V is a set of vertices
  • E is a set of directed edges joining vertices, no
    loops
  • Each vertex contains
  • the RV's name
  • either a prior probability distribution ora
    conditional probability distribution table
    (CDT)that quantifies the effects of the parents
    on this RV
  • Each directed arc
  • is from cause (parent) to its immediate effects
    (children)
  • represents direct causal relationship between RVs

31
Bayesian Network Representation
  • Example in class
  • each row in conditional probability tables must
    sum to 1
  • columns don't need to sum to 1
  • values obtained from experts
  • Number of probabilities required is typicallyfar
    fewer than the number required for a FJDT
  • Quantitative information is usually givenby an
    expert or determined empirically from data

32
Conditional Independence
  • Assume effects are conditionally independentof
    each other given their common cause
  • The net is constructed so that given its
    parents,a node is conditionally independent of
    its non-descendant RVs in the net
  • P(X1x1, ..., Xnxn) P(xi parents(Xi)) ...
    P(xn parents(Xn))
  • Note the full joint probability distribution
    isn't needed, only need conditionals relative to
    their parent RVs

33
Algorithm for ConstructingBayesian Networks
  • Choose a set of relevant random variables
  • Choose an ordering for them
  • Assume they're X1 .. Xm where X1 is first, X2 is
    second, etc.
  • For i 1 to m
  • add a new node for Xi to the network
  • set Parents(Xi) to be a minimal subset of X1 ..
    Xi-1such that we have conditional independence
    of Xiand all other members of X1 ..Xi-1 given
    Parents(Xi)
  • add directed arc from each node in Parents(Xi) to
    Xi
  • non-root nodes define a conditional probability
    table P(Xi x combinations of
    Parents(Xi)) root nodes define prior
    probability distribution at Xi P(Xi)

34
Algorithm for ConstructingBayesian Networks
  • For a given set of random variables (RVs)there
    is not, in general, a unique Bayesian Netbut all
    of them represent the same information
  • For the best net, topologically sort RVs in step
    2
  • each RV comes before all of its children
  • first nodes are roots, then nodes they directly
    influence
  • Best Bayesian Network for a problem has
  • fewest number of probabilities and arcs
  • easy to determine probabilities for the CDT
  • Algorithm won't construct a net that violatesthe
    rules of probability

35
Computing Joint Probabilitiesusing a Bayesian
Network
  • Use product rule
  • Simplify using independence
  • For Example
  • Compute P(a,b,c,d) P(d,c,b,a)
  • order RVS in joint probability bottom up D,C,B,A
  • P(dc,b,a) P(c,b,a) Product Rule P(d,c,b,a)
  • P(dc) P(c,b,a) Conditional Independ. of D
    given C
  • P(dc) P(cb,a) P(b,a) Product Rule
    P(c,b,a)
  • P(dc) P(cb,a) P(ba) P(a) Product Rule
    P(b,a)
  • P(dc) P(cb,a) P(b ) P(a) Independence of
    B and A given no evidence

36
Computing Joint Probabilitiesusing a Bayesian
Network
  • Any entry in the full joint dist. table(i.e.
    atomic event) can be computed!
  • P(v1,...,vn) PP(viParents(Vi)) over i from 1
    to n
  • e.g. given boolean RVs what is P(a,..,h,k,..,p)?
  • P(a)P(b)P(c)P(da,b)P(eb,c)P(f)P(gd,e)P(h)
    P(kf,g)P(lgh)P(mk)P(nk)P(ok,l)P(pl)
  • Note this is fast, i.e. linearin the number of
    nodes in the net!

37
Computing Joint Probabilitiesusing a Bayesian
Network
  • How is any joint probability computed?
  • sum the relevant joint probabilities
  • e.g. Compute P(a,b)
  • P(a,b,c,d) P(a,b,c,Ød) P(a,b,Øc,d)
    P(a,b,Øc,Ød)
  • e.g. Compute P(c)
  • P(a,b,c,d) P(a,Øb,c,d) PØa,b,c,d)
    PØa,Øb,c,d) P(a,b,c,Ød) P(a,Øb,c,Ød)
    P(Øa,b,c,Ød) P(Øa,Øb,c,Ød)
  • A BN can answer any query (i.e. probability)
    about the domain by summing the relevant joint
    probs.
  • Enumeration can require many computations!

38
Computing Conditional Probabilitiesusing a
Bayesian Network
  • Basic task of probabilistic systemis to compute
    conditional probabilities.
  • Any conditional probability can be computed
  • P(v1,...,vkvk1,...,vn) ?P(V1v1,...,Vnvn) /
    ?P(Vk1vk1,...,Vnvn)
  • Key problem is that the technique of
    enumeratingjoint probabilities can make the
    computations intractable (exponential in the
    number of RVs).

39
Computing Conditional Probabilitiesusing a
Bayesian Network
  • These computations generally relyon the
    simplifications resulting fromthe independence
    of the RVs.
  • Every variable that isn't an ancestorof a query
    variable or an evidence variableis irrelevant to
    the query.
  • What ancestors are irrelevant?

40
Independence in a Bayesian Network
  • Given a Bayesian Networkhow is independence
    established?
  • A node is conditionally independent (CI)of its
    non-descendants, given its parents.
  • e.g. Given D and E, G is CI of ?

41
Independence in a Bayesian Network
  • Given a Bayesian Networkhow is independence
    established?
  • A node is conditionally independent (CI)of its
    non-descendants, given its parents.
  • e.g. Given D and E, G is CI of ?

A, B, C, F, H
e.g. Given F and G, K is CI of ?
42
Independence in a Bayesian Network
  • Given a Bayesian Networkhow is independence
    established?
  • A node is conditionally independentof all other
    nodes in the network givenits parents, children,
    and children'sparents, which is called a Markov
    blanket
  • e.g. What is the Markov blanket for G?

G
Given this blanket G is CI of ? A, B, C, M,
N , O, P
What about absolute independence?
43
Computing Conditional Probabilitiesusing a
Bayesian Network
  • The general algorithm for computingconditional
    probabilities is complicated.
  • It is easy if the query involves nodesthat are
    directly connected to each other.
  • examples assumed to use boolean RVs
  • Simple causal inference P(EC)
  • conditional prob. dist. of effect E given cause C
    as evidence
  • reasoning in same direction as arc, e.g. disease
    to symptom
  • Simple diagnostic inference P(QE)
  • conditional prob. dist. of query Q given effect E
    as evidence
  • reasoning in direction opposite of arc, e.g.
    symptom to disease

44
Computing Conditional ProbabilitiesCausal
(Top-Down) Inference
  • Compute P(ec)
  • conditional probability of effect Ee given cause
    Cc as evidence
  • assume arc exists to E from C and C2
  1. Rewrite conditional probability of e in termsof
    e and all of its parents (that aren't
    evidence)given evidence c
  2. Re-express each joint probability backto the
    probability of e given all of its parents
  3. Simplify using independence and Look Uprequired
    values in the Bayesian Network

45
Computing Conditional ProbabilitiesCausal
(Top-Down) Inference
  • Compute P(ec)
  • P(e,c) / P(c) product rule
  • (P(e,c,c2) P(e,c,Øc2)) / P(c)
    marginalizing
  • P(e,c,c2) / P(c) P(e,c,Øc2) / P(c)
    algebra
  • P(e,c2c) P(e,Øc2c) product
    rule, e.g. Xe,c2
  • P(ec2,c) P(c2c) P(eØc2,c) P(Øc2c) cond.
    chain rule
  • Simplify given C and C2 are independent
  • P(c2c) P(c2)
  • P(Øc2c) P(Øc2)
  • P(ec2,c) P(c2) P(eØc2,c) P(Øc2)
    algebra
  • now look up values to finish computation

46
Computing Conditional ProbabilitiesDiagnostic
(Bottom-Up) Inference
  • Compute P(ce)
  • conditional probability of cause Cc given effect
    Ee as evidence
  • assume arc exists from C to E
  • idea convert to casual inference using Bayes'
    rule
  • Use Bayes' rule P(ce) P(ec) P(c) / P(e)
  • Compute P(ec) using causal inference method
  • Look up value of P(c) in Bayesian Net
  • Use normalization to avoid computing P(e)
  • requires computing P(Øce)
  • using steps as in 1 3 above

47
Summary the Good News
  • Bayesian Nets are the bread and butter
    ofAI-uncertainty community (like resolution to
    AI-logic)
  • Bayesian Nets are a compact representation
  • don't require exponential storage to holdall of
    the info in the full joint probability
    distribution (FJPD) table
  • are a decomposed representation of the FJPD table
  • conditional probability distribution tables in
    non-root nodes are only exponential in the max
    number of parents of any node
  • Bayesian Nets are fast at computing joint
    probs P(V1, ..., Vk) i.e. prior probability of
    V1, ..., Vk
  • computing the probability of an atomic event can
    be donein linear time with the number of nodes
    in the net

48
Summary the Bad News
  • Conditional probabilities can also be computed
  • P(QE1, ..., Ek)posterior probability of query
    Q given multiple evidence E1, ..., Ek
  • requires enumerating all of the matching
    entries,which takes exponential time in the
    number of variables
  • in special cases it can be done faster, lt
    polynomial timee.g. polytree linear time for
    nets structured like trees
  • In general, inference in Bayesian Networks
    (BN)is NP-hard. ?
  • but BNs are well studied so there exists many
    efficient exact solution methods as well as a
    variety of approximation techniques
Write a Comment
User Comments (0)
About PowerShow.com