Abduction, Uncertainty, and Probabilistic Reasoning - PowerPoint PPT Presentation

About This Presentation
Title:

Abduction, Uncertainty, and Probabilistic Reasoning

Description:

Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more * – PowerPoint PPT presentation

Number of Views:136
Avg rating:3.0/5.0
Slides: 67
Provided by: YunP151
Category:

less

Transcript and Presenter's Notes

Title: Abduction, Uncertainty, and Probabilistic Reasoning


1
Abduction, Uncertainty, and Probabilistic
Reasoning
  • Chapters 13, 14, and more

2
Introduction
  • Abduction is a reasoning process that tries to
    form plausible explanations for abnormal
    observations
  • Abduction is distinct different from deduction
    and induction
  • Abduction is inherently uncertain
  • Uncertainty becomes an important issue in AI
    research
  • Some major formalisms for representing and
    reasoning about uncertainty
  • Mycins certainty factor (an early
    representative)
  • Probability theory (esp. Bayesian networks)
  • Dempster-Shafer theory
  • Fuzzy logic
  • Truth maintenance systems

3
Abduction
  • Definition (Encyclopedia Britannica) reasoning
    that derives an explanatory hypothesis from a
    given set of facts
  • The inference result is a hypothesis, which if
    true, could explain the occurrence of the given
    facts
  • Examples
  • Dendral, an expert system to construct 3D
    structure of chemical compounds
  • Fact mass spectrometer data of the compound and
    the chemical formula of the compound
  • KB chemistry, esp. strength of different types
    of bounds
  • Reasoning form a hypothetical 3D structure which
    meets the given chemical formula, and would most
    likely produce the given mass spectrum if
    subjected to electron beam bombardment

4
  • Medical diagnosis
  • Facts symptoms, lab test results, and other
    observed findings (called manifestations)
  • KB causal associations between diseases and
    manifestations
  • Reasoning one or more diseases whose presence
    would causally explain the occurrence of the
    given manifestations
  • Many other reasoning processes (e.g., word sense
    disambiguation in natural language process, image
    understanding, detectives work, etc.) can also
    been seen as abductive reasoning.

5
Comparing abduction, deduction and induction
  • Deduction major premise All balls in the
    box are black
  • minor premise This
    ball is from the box
  • conclusion This
    ball is black
  • Abduction rule All balls
    in the box are black
  • observation This
    ball is black
  • explanation This ball is
    from the box
  • Induction case These
    balls are from the box
  • observation These
    balls are black
  • hypothesized rule All ball
    in the box are black

A gt B A --------- B
A gt B B ------------- Possibly A
Whenever A then B but not vice versa -------------
Possibly A gt B
Induction from specific cases to general
rules Abduction and deduction both from
part of a specific case to other part of
the case using general rules (in different ways)
6
Characteristics of abduction reasoning
  • Reasoning results are hypotheses, not theorems
    (may be false even if rules and facts are true),
  • e.g., misdiagnosis in medicine
  • There may be multiple plausible hypotheses
  • When given rules A gt B and C gt B, and fact B
  • both A and C are plausible hypotheses
  • Abduction is inherently uncertain
  • Hypotheses can be ranked by their plausibility if
    that can be determined
  • Reasoning is often a Hypothesize-and-test cycle
  • hypothesize phase postulate possible hypotheses,
    each of which could explain the given facts (or
    explain most of the important facts)
  • test phase test the plausibility of all or some
    of these hypotheses

7
  • One way to test a hypothesis H is to test if
    something that is currently unknown but can be
    predicted from H is actually true.
  • If we also know A gt D and C gt E, then ask if D
    and E are true.
  • If it turns out D is true and E is false, then
    hypothesis A becomes more plausible (support for
    A increased, support for C decreased)
  • Alternative hypotheses compete with each other
    (Okams razor prefers simpler hypotheses)
  • Reasoning is non-monotonic
  • Plausibility of hypotheses can increase/decrease
    as new facts are collected (deductive inference
    determines if a sentence is true but would never
    change its truth value)
  • Some hypotheses may be discarded/defeated, and
    new ones may be formed when new observations are
    made

8
Source of Uncertainty in Intelligent Systems
  • Uncertain data (noise)
  • Uncertain knowledge (e.g, causal relations)
  • A disorder may cause any and all POSSIBLE
    manifestations in a specific case
  • A manifestation can be caused by more than one
    POSSIBLE disorders
  • Uncertain reasoning results
  • Abduction and induction are inherently uncertain
  • Default reasoning, even in deductive fashion, is
    uncertain
  • Incomplete deductive inference may be uncertain

9
Probabilistic Inference
  • Based on probability theory (especially Bayes
    theorem)
  • Well established discipline about uncertain
    outcomes
  • Empirical science like physics/chemistry, can be
    verified by experiments
  • Probability theory is too rigid to apply directly
    in many knowledge-based applications
  • Some assumptions have to be made to simplify the
    reality
  • Different formalisms have been developed in which
    some aspects of the probability theory are
    changed/modified.
  • We will briefly review the basics of probability
    theory before discussing different approaches to
    uncertainty
  • The presentation uses diagnostic process (an
    abductive and evidential reasoning process) as an
    example

10
Probability of Events
  • Sample space and events
  • Sample space S (e.g., all people in an area)
  • Events E1 ? S (e.g., all people having
    cough)
  • E2 ? S (e.g., all people having
    cold)
  • Prior (marginal) probabilities of events
  • P(E) E / S (frequency interpretation)
  • P(E) 0.1 (subjective probability)
  • 0 lt P(E) lt 1 for all events
  • Two special events ? and S P(?) 0 and P(S)
    1.0
  • Boolean operators between events (to form
    compound events)
  • Conjunctive (intersection) E1 E2 ( E1 ?
    E2)
  • Disjunctive (union) E1 v E2 ( E1 ? E2)
  • Negation (complement) E (E S E)

C
11
  • Probabilities of compound events
  • P(E) 1 P(E) because P(E) P(E) 1
  • P(E1 v E2) P(E1) P(E2) P(E1 E2)
  • But how to compute the joint probability P(E1
    E2)?
  • Conditional probability (of E1, given E2)
  • How likely E1 occurs in the subspace of E2

12
  • Independence assumption
  • Two events E1 and E2 are said to be independent
    of each other if
  • (given E2
    does not change the likelihood of E1)
  • Computation can be simplified with independent
    events
  • Mutually exclusive (ME) and exhaustive (EXH) set
    of events
  • ME
  • EXH

13
Bayes Theorem
  • In the setting of diagnostic/evidential reasoning
  • Know prior probability of hypothesis
  • conditional probability
  • Want to compute the posterior probability
  • Bayes theorem (formula 1)
  • If the purpose is to find which of the n
    hypotheses
  • is more plausible given , then we can ignore
    the denominator and rank them use relative
    likelihood

14
  • can be computed from
    and , if we assume all hypotheses
    are ME and EXH
  • Then we have another version of Bayes theorem
  • where , the sum of relative
    likelihood of all n hypotheses, is a
    normalization factor

15
Probabilistic Inference for simple diagnostic
problems
  • Knowledge base
  • Case input
  • Find the hypothesis with the highest
    posterior probability
  • By Bayes theorem
  • Assume all pieces of evidence are conditionally
    independent, given any hypothesis

16
  • The relative likelihood
  • The absolute posterior probability
  • Evidence accumulation (when new evidence
    discovered)
  • If El1 present
  • If El1 present

17
Assessing the Assumptions
  • Assumption 1 hypotheses are mutually exclusive
    and exhaustive
  • Single fault assumption (one and only hypothesis
    must true)
  • Multi-faults do exist in individual cases
  • Can be viewed as an approximation of situations
    where hypotheses are independent of each other
    and their prior probabilities are very small
  • Assumption 2 pieces of evidence are
    conditionally independent of each other, given
    any hypothesis
  • Manifestations themselves are not independent of
    each other, they are correlated by their common
    causes
  • Reasonable under single fault assumption
  • Not so when multi-faults are to be considered

18
Limitations of the simple Bayesian system
  • Cannot handle well hypotheses of multiple
    disorders
  • Suppose are independent of
    each other
  • Consider a composite hypothesis
  • How to compute the posterior probability (or
    relative likelihood)
  • Using Bayes theorem

19
  • but this is a very unreasonable assumption
  • Cannot handle causal chaining
  • Ex. A weather of the year
  • B cotton production of the year
  • C cotton price of next year
  • Observed A influences C
  • The influence is not direct (A gt B gt C)
  • P(CB, A) P(CB) instantiation of B blocks
    influence of A on C
  • Need a better representation and a better
    assumption

E and B are independent But when A is given, they
are (adversely) dependent because they become
competitors to explain A P(BA,E) ltltP(BA)
20
Bayesian Networks (BNs)
  • Definition BN (DAG, CPD)
  • DAG directed acyclic graph (BNs structure)
  • Nodes random variables (typically binary or
    discrete, but methods also exist to handle
    continuous variables)
  • Arcs indicate probabilistic dependencies between
    nodes (lack of link signifies conditional
    independence)
  • CPD conditional probability distribution (BNs
    parameters)
  • Conditional probabilities at each node, usually
    stored as a table (conditional probability table,
    or CPT)
  • Root nodes are a special case no parents, so
    just use priors in CPD

21
Example BN
P(a0) 0.001
A B C D
E
P(c0a0) 0.2 P(c0a0) 0.005
P(b0a0) 0.3 P(b0a1) 0.001
P(d0b0, c0) 0.1 P(d0b0, c1)
0.01 P(d0b1, c0) 0.01 P(d0b1, c1)
0.00001
P(d0) b0 b1
c0 0.1 0.01
c1 0.01 0.00001
P(e0c0) 0.4 P(e0c1) 0.002
Uppercase variables (A, B, ) Lowercase
values/states of variables (A has two states a0
and a1)
Note that we only specify P(a0) etc., not P(a1),
since they have to add to one
22
Netica
  • An commercial BN package by Norsys
  • Down load limited version for free from
  • http//www.norsys.com/
  • May also down load APIs

23
Conditional independence and chaining
  • Conditional independence assumption
  • where q is any set of variables
  • (nodes) other than and its successors
  • blocks influence of other nodes on
  • and its successors (q influences only
  • through variables in )
  • With this assumption, the complete joint
    probability distribution of all variables in the
    network can be represented by (recovered from)
    local CPDs by chaining these CPDs

q
24
Chaining Example
  • Computing the joint probability for all
  • variables is easy
  • The joint distribution of all variables
  • P(A, B, C, D, E)
  • P(E A, B, C, D) P(A, B, C, D) by Bayes
    theorem
  • P(E C) P(A, B, C, D) by cond. indep.
    assumption
  • P(E C) P(D A, B, C) P(A, B, C)
  • P(E C) P(D B, C) P(C A, B) P(A, B)
  • P(E C) P(D B, C) P(C A) P(B A) P(A)
  • For a particular state
  • P(a0, b0, c1, d1, e0) P(a0)P(b0a0)P(c1a0)P(d1
    b0, c1)P(e0 c1)
  • 0.0010.30.80.990.002 4.75210(-7)

25
P(E) 0.002
P(B) 0.01
P(EA) 0.167 P(BA) 0.835 P(EA, E) 1.0
P(BA, E) 0.0112
P(A) B B
E 0.9 0.8
E 0.8 0.0
P(BA, E) P(B,A,E)/P(A,E) P(B,A,E)/(P(B,A,E)
P(B,A,E) 0.010.0020.9/(0.010.0020.9
0.990.0020.8) 0.000018/(0.000018
0.001548) 0.000018/0.001566 0.01123
26
Topological semantics
  • A node is conditionally independent of its
    non-descendants given its parents
  • A node is conditionally independent of all other
    nodes in the network given its parents, children,
    and childrens parents (also known as its Markov
    blanket)
  • The method called d-separation can be applied to
    decide whether a set of nodes X is independent of
    another set Y, given a third set Z

Chain A and C are independent, given B
Converging B and C are independent, NOT given A
Diverging B and C are independent, given A
27
Inference tasks
  • Simple queries Computer posterior probability
    P(Xi Ee)
  • E.g., P(NoGas Gaugeempty, Lightson,
    Startsfalse)
  • Posteriors for ALL nonevidence nodes
  • Priors for all nodes (E ?)
  • Conjunctive queries
  • P(Xi, Xj Ee) P(Xi Ee) P(Xj Xi, Ee)
  • Optimal decisions Decision networks or influence
    diagrams
  • include utility information and actions
  • inference is to find P(outcome action,
    evidence)
  • Value of information Which evidence should we
    seek next?
  • Sensitivity analysis Which probability values
    are most critical?
  • Explanation Why do I need a new starter motor?

28
  • MAP problems (explanation)
  • The solution provides a good explanation for your
    action
  • This is an optimization problem

29
Approaches to inference
  • Exact inference
  • Enumeration
  • Variable elimination
  • Belief propagation in polytrees (singly connected
    BNs)
  • Clustering / junction tree algorithms
  • Approximate inference
  • Stochastic simulation / sampling methods
  • Markov chain Monte Carlo methods
  • Loopy propagation
  • Others
  • Mean field theory
  • Neural networks

30
Inference by enumeration
  • To compute P(XEe), where X is a single variable
    and E is evidence (instantiation of a set of
    variables)
  • Add all of the terms (atomic event probabilities)
    from the full joint distribution that are
    consistent with E
  • If Y are the other (unobserved) variables,
    excluding X, then the posterior distribution
  • P(XEe) a P(X, e) a ?yP(X, e, Y)
  • Sum is over all possible instantiations of
    variables in Y
  • Each P(X, e, Y) term can be computed using the
    chain rule
  • Computationally expensive!

31
Example Enumeration
  • P(xi) S pi P(xi pi) P(pi)
  • Suppose we want P(D), and only the value of E is
    given as true
  • P (De) ? SA,B,CP(a, b, c, d, e)
    ? SA,B,CP(a) P(ba) P(ca) P(db,c) P(ec)
  • With simple iteration to compute this expression,
    theres going to be a lot of repetition (e.g.,
    P(ec) has to be recomputed every time we iterate
    over C for all possible assignments of A and B))

32
Exercise Enumeration
p(smart).8
p(study).6
smart
study
p(fair).9
prepared
fair
p(prep) smart ?smart
study .9 .7
?study .5 .1
pass
p(pass) smart smart ?smart ?smart
p(pass) prep ?prep prep ?prep
fair .9 .7 .7 .2
?fair .1 .1 .1 .1
Query What is the probability that a student
studied, given that they pass the exam?
33
Variable elimination
  • Basically just enumeration, but with caching of
    local calculations
  • Linear for polytrees
  • Potentially exponential for multiply connected
    BNs
  • Exact inference in Bayesian networks is NP-hard!

34
Variable elimination
  • General idea
  • Write query in the form
  • Iteratively
  • Move all irrelevant terms outside of innermost
    sum
  • Perform innermost sum, getting a new term
  • Insert the new term into the product

35
Variable elimination
8 x 4 32 multiplications 8 x 2 4 2
22 multiplications
  • Example
  • SASBSCP(a) P(ba) P(ca) P(db,c) P(ec)
  • SASBP(a)P(ba)SCP(ca) P(db,c) P(ec)
  • SAP(a)SBP(ba)SCP(ca) P(db,c) P(ec)
  • for each state of A a
  • for each state of B b
  • compute fC(a, b) SCP(ca) P(db,c)
    P(ec)
  • compute fB(a) SBP(b)fC(a, b)
  • Compute result SAP(a)fB(a)
  • Here fC(a, b), fB(a) are called factors, which
    are vectors or matrices

Variable C is summed out variable B is summed
out
36
Exercise Variable elimination
p(smart).8
p(study).6
smart
study
p(fair).9
prepared
fair
p(prep) smart ?smart
study .9 .7
?study .5 .1
pass
p(pass) smart smart ?smart ?smart
p(pass) prep ?prep prep ?prep
fair .9 .7 .7 .2
?fair .1 .1 .1 .1
Query What is the probability that a student is
smart, given that they pass the exam?
37
Belief Propagation
  • Singly connected network, SCN (also known as
    polytree)
  • there is at most one undirected path between any
    two nodes (i.e., the network is a tree if the
    direction of arcs are ignored)
  • The influence of the instantiated variable
    (evidence) spreads to the rest of the network
    along the arcs
  • The instantiated variable influences
  • its predecessors and successors differently
    (using CPT along opposite directions)
  • Computation is linear to the diameter of
  • the network (the longest undirected path)
  • Update belief (posterior) of every non-evidence
    node in one pass
  • For multi-connected net conditioning

38
Conditioning
  • Conditioning Find the networks smallest cutset
    S (a set of nodes whose removal renders the
    network singly connected)
  • In this network, S A or B or C or D
  • For each instantiation of S, compute the belief
    update with the belief propagation algorithm
  • Combine the results from all instantiations of S
    (each is weighted by P(S s))
  • Computationally expensive (finding the smallest
    cutset is in general NP-hard, and the total
    number of possible instantiations of S is O(2S))

39
Junction Tree
  • Convert a BN to a junction tree
  • Moralization add undirected edge between every
    pair of parents, then drop directions of all arc
    Moralized Graph
  • Triangulation add an edge to any cycle of length
    gt 3 Triangulated Graph
  • A junction tree is a tree of cliques of the
    triangulated graph
  • Cliques are connected by links
  • A link stands for the set of all variables S
    shared by these two cliques
  • Each clique has a potential (similar to CPT),
    constructed from CPT of variables in the original
    BN

40
Junction Tree
  • Example

41
Junction Tree
  • Reasoning
  • Since it is now a tree, polytree algorithm can be
    applied, but now two cliques exchange P(S), the
    distribution over S, their shared variables.
  • Complexity
  • O(n) steps, where n is the number of cliques
  • Each step is expensive if cliques are large (CPT
    exponential to clique size)
  • Construction of CPT of JT is expensive as well,
    but it needs to compute only once.

42
Some comments on BN reasoning
  • Let be the set of
    all variables in a BN. Any BN reasoning task can
    be expressed in the form of calculating
  • This can be done by marginalization of the joint
    distribution P(X) over Y X \ U \ V
  • where each entry P(x) P(u,v,y) can be
    calculated by chain rule from CPTs
  • Computation can be done more efficiently using,
    say Junction tree, by utilizing variable
    interdependencies
  • Computational complexity of BN reasoning is
    proved to be NP-hard by reducing 3SAT problems to
    BN reasoning (Cooper 1990)

43
Approximate inference Direct sampling
  • Suppose you are given values for some subset of
    the variables, E, and want to infer distributions
    for unknown variables, Z
  • Randomly generate a very large number of
    instantiations from the BN according to the
    distribution
  • Generate instantiations for all variables start
    at root variables and work your way forward in
    topological order
  • Rejection sampling Only keep those
    instantiations that are consistent with the
    values for E
  • Use the frequency of values for Z to get
    estimated probabilities
  • Accuracy of the results depends on the size of
    the sample (asymptotically approaches exact
    results)
  • Very expensive and inefficient

44
Likelihood weighting
  • Idea Dont generate samples that need to be
    rejected in the first place!
  • Sample only from the unknown variables Z and
    YX\Z\E (E are fixed)
  • Weight each sample according to the likelihood
    that it would occur, given the evidence E
  • A weight w is associated with each sample (w
    initialized to 1)
  • When a evidence node (say E1 e1-0) is selected
    for weighting, its parents are already
    instantiated (say parents A and B are assigned
    state a and b)
  • Modify w w P(e1-0 a, b) based on E1s CPT
  • Repeat for the other evidence nodes

45
Markov chain Monte Carlo algorithm
  • So called because
  • Markov chain each instance generated in the
    sampling is dependent on the previous instance
  • Monte Carlo statistical sampling method
  • Perform a random walk through variable assignment
    space, collecting statistics as you go
  • Start with a random instantiation, consistent
    with evidence variables
  • At each step, randomly select a non-evidence
    variable x, randomly sample its value by
  • Given enough samples, MCMC gives an accurate
    estimate of the true distribution of values

46
Loopy Propagation
  • Belief propagation
  • Works only for polytrees (exact solution)
  • Each evidence propagates once throughout the
    network
  • Loopy propagation
  • Let propagation continue until the network
    stabilize (hope)
  • Experiments show
  • Many BN stabilize with loopy propagation
  • If it stabilizes, often yielding exact or very
    good approximate solutions
  • Analysis
  • Conditions for convergence and quality
    approximation are under intense investigation

47
Noisy-Or BN
  • A special BN of binary variables (Peng Reggia,
    Cooper)
  • Causation independence parent nodes influence a
    child independently
  • Advantages
  • One-to-one correspondence between causal links
    and causal strengths
  • Easy for humans to understand (acquire and
    evaluate KB)
  • Fewer of probabilities needed in KB
  • Computation is less expensive
  • Disadvantage less expressive (less general)

48
Learning BN (from case data)
  • Needs for learning
  • Difficult to construct BN by humans (esp. CPT)
  • Experts opinions are often biased, inaccurate,
    and incomplete
  • Large databases of cases become available
  • What to learn
  • Parameter learning learning CPT when DAG is
    known (easy)
  • Structural learning learning DAG (hard)
  • Difficulties in learning DAG from case data
  • There are too many possible DAG when of
    variables is large (more than exponential)
  • n of possible DAG
  • 3 25
  • 10 41018
  • Missing values in database
  • Noisy data

49
BN Learning Approaches
  • Early effort Based on variable dependencies
    (Pearl)
  • Find all pairs of variables that are dependent of
    each other (applying standard statistical method
    on the database)
  • Eliminate (as much as possible) indirect
    dependencies
  • Determine directions of dependencies
  • Learning results are often incomplete (learned BN
    contains indirect dependencies and undirected
    links)

50
BN Learning Approaches
  • Bayesian approach (Cooper)
  • Find the most probable DAG, given database DB,
    i.e.,
  • max(P(DAGDB)) or max(P(DAG, DB))
  • Based on some assumptions, a formula is developed
    to compute P(DAG, DB) for a given pair of DAG and
    DB
  • A hill-climbing algorithm (K2) is developed to
    search a (sub)optimal DAG using a pre-determined
    partial order of the variables
  • Compute CPTs after the DAG is determined
  • Extensions to handle some form of missing values

51
BN Learning Approaches
  • Minimum description length (MDL) (Lam, etc.)
  • Sacrifices accuracy for simpler (less dense)
    structure
  • Case data not always accurate
  • Outliers are hard to model (needs more links)
  • Fewer links imply smaller CPD tables and less
    expensive inference
  • L L1 L2 where
  • L1 the length of the encoding of DAG (smaller
    for simpler DAG)
  • L2 the length of the encoding of the difference
    between DAG and DB (smaller for better match of
    DAG with DB)
  • Smaller L1 implies less accurate DAG, and thus
    larger L2
  • Find DAG by heuristic best-first search that
    Minimizes L

52
BN Learning Approaches
  • Neural network approach (Neal, Peng)
  • For noisy-or BN
  • Change inter-node link strength locally,
    following gradient descent approach to maximize
    L.

53
  • Compare Neural network approach with Coopers K2
  • Network Alarm (37 nodes)

cases missing links extra links time
500 2/0 2/6 63.76/5.91
1000 0/0 1/1 69.62/6.04
2000 0/0 0/0 77.45/5.86
10000 0/0 0/0 161.97/5.83
54
Current research in BN
  • Missing data
  • Missing value EM (expectation maximization)
  • Missing (hidden) variables are harder to handle
  • BN with time
  • Dynamic BN assuming temporal relation obey
    Markov chain
  • Cyclic relations
  • Often found in social-economic analysis
  • Using dynamic BN?
  • Continuous variable
  • Some work on variables obeying Gaussian
    distribution
  • Connecting to other fields
  • Databases Statistics Symbolic AI (FOL)
    Semantic web
  • Reasoning with uncertain evidence
  • Virtual evidence
  • Soft evidence

55
Other formalisms for Uncertainty Fuzzy sets and
fuzzy logic
  • Ordinary set theory
  • There are sets that are described by vague
    linguistic terms (sets without hard, clearly
    defined boundaries), e.g., tall-person, fast-car
  • Continuous
  • Subjective (context dependent)
  • Hard to define a clear-cut 0/1 membership function

56
  • Fuzzy set theory
  • height(john) 65 Tall(john) 0.9
  • height(harry) 58 Tall(harry) 0.5
  • height(joe) 51 Tall(joe) 0.1
  • Examples of membership functions

57
  • Fuzzy logic many-value logic
  • Fuzzy predicates (degree of truth)
  • Connectors/Operators
  • Compare with probability theory
  • Prob. Uncertainty of outcome,
  • Based on large of repetitions or instances
  • For each experiment (instance), the outcome is
    either true or false (without uncertainty or
    ambiguity)
  • unsure before it happens but sure after it
    happens
  • Fuzzy vagueness of conceptual/linguistic
    characteristics
  • Unsure even after it happens
  • whether a child of tall mother and short father
    is tall
  • unsure before the child is born
  • unsure after grown up (height 56)

58
  • Empirical vs subjective (testable vs agreeable)
  • Fuzzy set connectors may lead to unreasonable
    results
  • Consider two events A and B with P(A) lt P(B)
  • If A gt B (or A ? B) then
  • P(A B) P(A) minP(A), P(B)
  • P(A v B) P(B) maxP(A), P(B)
  • Not the case in general
  • P(A B) P(A)P(BA) ? P(A)
  • P(A v B) P(A) P(B) P(A B) ? P(B)
  • (equality holds only if P(BA) 1, i.e., A
    gt B)
  • Something prob. theory cannot represent
  • Tall(john) 0.9, Tall(john) 0.1
  • Tall(john) Tall(john) min0.1, 0.9) 0.1
  • johns degree of membership in the fuzzy set of
    median-height people (both Tall and not-Tall)
  • In prob. theory P(john ? Tall john ?Tall) 0

59
Uncertainty in rule-based systems
  • Elements in Working Memory (WM) may be uncertain
    because
  • Case input (initial elements in WM) may be
    uncertain
  • Ex the CD-Drive does not work 70 of the time
  • Decision from a rule application may be uncertain
    even if the rules conditions are met by WM with
    certainty
  • Ex flu gt sore throat with high probability
  • Combining symbolic rules with numeric
    uncertainty Mycins
  • Certainty Factor (CF)
  • An early attempt to incorporate uncertainty into
    KB systems
  • CF ? -1, 1
  • Each element in WM is associated with a CF
    certainty of that assertion
  • Each rule C1,...,Cn gt Conclusion is associated
    with a CF certainty of the association (between
    C1,...Cn and Conclusion).

60
  • CF propagation
  • Within a rule each Ci has CFi, then the
    certainty of Action is
  • minCF1,...CFn CF-of-the-rule
  • When more than one rules can apply to the current
    WM for the same Conclusion with different CFs,
    the largest of these CFs will be assigned as the
    CF for Conclusion
  • Similar to fuzzy rule for conjunctions and
    disjunctions
  • Good things of Mycins CF method
  • Easy to use
  • CF operations are reasonable in many applications
  • Probably the only method for uncertainty used in
    real-world rule-base systems
  • Limitations
  • It is in essence an ad hoc method (it can be
    viewed as a probabilistic inference system with
    some strong, sometimes unreasonable assumptions)
  • May produce counter-intuitive results.

61
Dempster-Shafer theory
  • A variation of Bayes theorem to represent
    ignorance
  • Uncertainty and ignorance
  • Suppose two events A and B are ME and EXH, given
    an evidence E
  • A having cancer B not having cancer E smoking
  • By Bayes theorem our beliefs on A and B, given
    E, are measured by P(AE) and P(BE), and P(AE)
    P(BE) 1
  • In reality,
  • I may have some belief in A, given E
  • I may have some belief in B, given E
  • I may have some belief not committed to either
    one,
  • The uncommitted belief (ignorance) should not be
    given to either A or B, even though I know one of
    the two must be true, but rather it should be
    given to A or B, denoted A, B
  • Uncommitted belief may be given to A and B when
    new evidence is discovered

62
  • Representing ignorance
  • Ex q A,B,C
  • Belief function

63
  • Plausibility (upper bound of belief of a node)

Lower bound (known belief)
Upper bound (maximally possible)
64
  • Evidence combination (how to use D-S theory)
  • Each piece of evidence has its own m(.) function
    for the same q
  • Belief based on combined evidence can be computed
    from


q
cancer

having
not


B

cancer

having


A



,

B
A
normalization factor incompatible combination
65

E1 E2 E1 E2
66
  • Ignorance is reduced
  • from m1(A,B) 0.3 to m(A,B) 0.049)
  • Belief interval is narrowed
  • A from 0.2, 0.5 to 0.607, 0.656
  • B from 0.5, 0.8 to 0.344, 0.393
  • Advantage
  • The only formal theory about ignorance
  • Disciplined way to handle evidence combination
  • Disadvantages
  • Computationally very expensive (lattice size
    2q)
  • Assuming hypotheses are ME and EXH
  • How to obtain m(.) for each piece of evidence is
    not clear, except subjectively
Write a Comment
User Comments (0)
About PowerShow.com