Abduction, Uncertainty, and Probabilistic Reasoning

About This Presentation

Title:

Abduction, Uncertainty, and Probabilistic Reasoning

Description:

Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more * – PowerPoint PPT presentation

Number of Views:136

Avg rating:3.0/5.0

Slides: 67

Provided by: YunP151

Learn more at: https://redirect.cs.umbc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Abduction, Uncertainty, and Probabilistic Reasoning

1
Abduction, Uncertainty, and Probabilistic
Reasoning

Chapters 13, 14, and more

2
Introduction

Abduction is a reasoning process that tries to
form plausible explanations for abnormal
observations
Abduction is distinct different from deduction
and induction
Abduction is inherently uncertain
Uncertainty becomes an important issue in AI
research
Some major formalisms for representing and
reasoning about uncertainty
Mycins certainty factor (an early
representative)
Probability theory (esp. Bayesian networks)
Dempster-Shafer theory
Fuzzy logic
Truth maintenance systems

3
Abduction

Definition (Encyclopedia Britannica) reasoning
that derives an explanatory hypothesis from a
given set of facts
The inference result is a hypothesis, which if
true, could explain the occurrence of the given
facts
Examples
Dendral, an expert system to construct 3D
structure of chemical compounds
Fact mass spectrometer data of the compound and
the chemical formula of the compound
KB chemistry, esp. strength of different types
of bounds
Reasoning form a hypothetical 3D structure which
meets the given chemical formula, and would most
likely produce the given mass spectrum if
subjected to electron beam bombardment

Medical diagnosis
Facts symptoms, lab test results, and other
observed findings (called manifestations)
KB causal associations between diseases and
manifestations
Reasoning one or more diseases whose presence
would causally explain the occurrence of the
given manifestations
Many other reasoning processes (e.g., word sense
disambiguation in natural language process, image
understanding, detectives work, etc.) can also
been seen as abductive reasoning.

5
Comparing abduction, deduction and induction

Deduction major premise All balls in the
box are black
minor premise This
ball is from the box
conclusion This
ball is black
Abduction rule All balls
in the box are black
observation This
ball is black
explanation This ball is
from the box
Induction case These
balls are from the box
observation These
balls are black
hypothesized rule All ball
in the box are black

A gt B A --------- B
A gt B B ------------- Possibly A
Whenever A then B but not vice versa -------------
Possibly A gt B
Induction from specific cases to general
rules Abduction and deduction both from
part of a specific case to other part of
the case using general rules (in different ways)
6
Characteristics of abduction reasoning

Reasoning results are hypotheses, not theorems
(may be false even if rules and facts are true),
e.g., misdiagnosis in medicine
There may be multiple plausible hypotheses
When given rules A gt B and C gt B, and fact B
both A and C are plausible hypotheses
Abduction is inherently uncertain
Hypotheses can be ranked by their plausibility if
that can be determined
Reasoning is often a Hypothesize-and-test cycle
hypothesize phase postulate possible hypotheses,
each of which could explain the given facts (or
explain most of the important facts)
test phase test the plausibility of all or some
of these hypotheses

One way to test a hypothesis H is to test if
something that is currently unknown but can be
predicted from H is actually true.
If we also know A gt D and C gt E, then ask if D
and E are true.
If it turns out D is true and E is false, then
hypothesis A becomes more plausible (support for
A increased, support for C decreased)
Alternative hypotheses compete with each other
(Okams razor prefers simpler hypotheses)
Reasoning is non-monotonic
Plausibility of hypotheses can increase/decrease
as new facts are collected (deductive inference
determines if a sentence is true but would never
change its truth value)
Some hypotheses may be discarded/defeated, and
new ones may be formed when new observations are
made

8
Source of Uncertainty in Intelligent Systems

Uncertain data (noise)
Uncertain knowledge (e.g, causal relations)
A disorder may cause any and all POSSIBLE
manifestations in a specific case
A manifestation can be caused by more than one
POSSIBLE disorders
Uncertain reasoning results
Abduction and induction are inherently uncertain
Default reasoning, even in deductive fashion, is
uncertain
Incomplete deductive inference may be uncertain

9
Probabilistic Inference

Based on probability theory (especially Bayes
theorem)
Well established discipline about uncertain
outcomes
Empirical science like physics/chemistry, can be
verified by experiments
Probability theory is too rigid to apply directly
in many knowledge-based applications
Some assumptions have to be made to simplify the
reality
Different formalisms have been developed in which
some aspects of the probability theory are
changed/modified.
We will briefly review the basics of probability
theory before discussing different approaches to
uncertainty
The presentation uses diagnostic process (an
abductive and evidential reasoning process) as an
example

10
Probability of Events

Sample space and events
Sample space S (e.g., all people in an area)
Events E1 ? S (e.g., all people having
cough)
E2 ? S (e.g., all people having
cold)
Prior (marginal) probabilities of events
P(E) E / S (frequency interpretation)
P(E) 0.1 (subjective probability)
0 lt P(E) lt 1 for all events
Two special events ? and S P(?) 0 and P(S)
1.0
Boolean operators between events (to form
compound events)
Conjunctive (intersection) E1 E2 ( E1 ?
E2)
Disjunctive (union) E1 v E2 ( E1 ? E2)
Negation (complement) E (E S E)

C
11

Probabilities of compound events
P(E) 1 P(E) because P(E) P(E) 1
P(E1 v E2) P(E1) P(E2) P(E1 E2)
But how to compute the joint probability P(E1
E2)?
Conditional probability (of E1, given E2)
How likely E1 occurs in the subspace of E2

Independence assumption
Two events E1 and E2 are said to be independent
of each other if
(given E2
does not change the likelihood of E1)
Computation can be simplified with independent
events
Mutually exclusive (ME) and exhaustive (EXH) set
of events
ME
EXH

13
Bayes Theorem

In the setting of diagnostic/evidential reasoning
Know prior probability of hypothesis
conditional probability
Want to compute the posterior probability
Bayes theorem (formula 1)
If the purpose is to find which of the n
hypotheses
is more plausible given , then we can ignore
the denominator and rank them use relative
likelihood

can be computed from
and , if we assume all hypotheses
are ME and EXH
Then we have another version of Bayes theorem
where , the sum of relative
likelihood of all n hypotheses, is a
normalization factor

15
Probabilistic Inference for simple diagnostic
problems

Knowledge base
Case input
Find the hypothesis with the highest
posterior probability
By Bayes theorem
Assume all pieces of evidence are conditionally
independent, given any hypothesis

The relative likelihood
The absolute posterior probability
Evidence accumulation (when new evidence
discovered)
If El1 present
If El1 present

17
Assessing the Assumptions

Assumption 1 hypotheses are mutually exclusive
and exhaustive
Single fault assumption (one and only hypothesis
must true)
Multi-faults do exist in individual cases
Can be viewed as an approximation of situations
where hypotheses are independent of each other
and their prior probabilities are very small
Assumption 2 pieces of evidence are
conditionally independent of each other, given
any hypothesis
Manifestations themselves are not independent of
each other, they are correlated by their common
causes
Reasonable under single fault assumption
Not so when multi-faults are to be considered

18
Limitations of the simple Bayesian system

Cannot handle well hypotheses of multiple
disorders
Suppose are independent of
each other
Consider a composite hypothesis
How to compute the posterior probability (or
relative likelihood)
Using Bayes theorem

but this is a very unreasonable assumption
Cannot handle causal chaining
Ex. A weather of the year
B cotton production of the year
C cotton price of next year
Observed A influences C
The influence is not direct (A gt B gt C)
P(CB, A) P(CB) instantiation of B blocks
influence of A on C
Need a better representation and a better
assumption

E and B are independent But when A is given, they
are (adversely) dependent because they become
competitors to explain A P(BA,E) ltltP(BA)
20
Bayesian Networks (BNs)

Definition BN (DAG, CPD)
DAG directed acyclic graph (BNs structure)
Nodes random variables (typically binary or
discrete, but methods also exist to handle
continuous variables)
Arcs indicate probabilistic dependencies between
nodes (lack of link signifies conditional
independence)
CPD conditional probability distribution (BNs
parameters)
Conditional probabilities at each node, usually
stored as a table (conditional probability table,
or CPT)
Root nodes are a special case no parents, so
just use priors in CPD

21
Example BN
P(a0) 0.001
A B C D
E
P(c0a0) 0.2 P(c0a0) 0.005
P(b0a0) 0.3 P(b0a1) 0.001
P(d0b0, c0) 0.1 P(d0b0, c1)
0.01 P(d0b1, c0) 0.01 P(d0b1, c1)
0.00001
P(d0) b0 b1
c0 0.1 0.01
c1 0.01 0.00001
P(e0c0) 0.4 P(e0c1) 0.002
Uppercase variables (A, B, ) Lowercase
values/states of variables (A has two states a0
and a1)
Note that we only specify P(a0) etc., not P(a1),
since they have to add to one
22
Netica

An commercial BN package by Norsys
Down load limited version for free from
http//www.norsys.com/
May also down load APIs

23
Conditional independence and chaining

Conditional independence assumption
where q is any set of variables
(nodes) other than and its successors
blocks influence of other nodes on
and its successors (q influences only
through variables in )
With this assumption, the complete joint
probability distribution of all variables in the
network can be represented by (recovered from)
local CPDs by chaining these CPDs

q
24
Chaining Example

Computing the joint probability for all
variables is easy
The joint distribution of all variables
P(A, B, C, D, E)
P(E A, B, C, D) P(A, B, C, D) by Bayes
theorem
P(E C) P(A, B, C, D) by cond. indep.
assumption
P(E C) P(D A, B, C) P(A, B, C)
P(E C) P(D B, C) P(C A, B) P(A, B)
P(E C) P(D B, C) P(C A) P(B A) P(A)
For a particular state
P(a0, b0, c1, d1, e0) P(a0)P(b0a0)P(c1a0)P(d1
b0, c1)P(e0 c1)
0.0010.30.80.990.002 4.75210(-7)

25
P(E) 0.002
P(B) 0.01
P(EA) 0.167 P(BA) 0.835 P(EA, E) 1.0
P(BA, E) 0.0112
P(A) B B
E 0.9 0.8
E 0.8 0.0
P(BA, E) P(B,A,E)/P(A,E) P(B,A,E)/(P(B,A,E)
P(B,A,E) 0.010.0020.9/(0.010.0020.9
0.990.0020.8) 0.000018/(0.000018
0.001548) 0.000018/0.001566 0.01123
26
Topological semantics

A node is conditionally independent of its
non-descendants given its parents
A node is conditionally independent of all other
nodes in the network given its parents, children,
and childrens parents (also known as its Markov
blanket)
The method called d-separation can be applied to
decide whether a set of nodes X is independent of
another set Y, given a third set Z

Chain A and C are independent, given B
Converging B and C are independent, NOT given A
Diverging B and C are independent, given A
27
Inference tasks

Simple queries Computer posterior probability
P(Xi Ee)
E.g., P(NoGas Gaugeempty, Lightson,
Startsfalse)
Posteriors for ALL nonevidence nodes
Priors for all nodes (E ?)
Conjunctive queries
P(Xi, Xj Ee) P(Xi Ee) P(Xj Xi, Ee)
Optimal decisions Decision networks or influence
diagrams
include utility information and actions
inference is to find P(outcome action,
evidence)
Value of information Which evidence should we
seek next?
Sensitivity analysis Which probability values
are most critical?
Explanation Why do I need a new starter motor?

MAP problems (explanation)
The solution provides a good explanation for your
action
This is an optimization problem

29
Approaches to inference

Exact inference
Enumeration
Variable elimination
Belief propagation in polytrees (singly connected
BNs)
Clustering / junction tree algorithms
Approximate inference
Stochastic simulation / sampling methods
Markov chain Monte Carlo methods
Loopy propagation
Others
Mean field theory
Neural networks

30
Inference by enumeration

To compute P(XEe), where X is a single variable
and E is evidence (instantiation of a set of
variables)
Add all of the terms (atomic event probabilities)
from the full joint distribution that are
consistent with E
If Y are the other (unobserved) variables,
excluding X, then the posterior distribution
P(XEe) a P(X, e) a ?yP(X, e, Y)
Sum is over all possible instantiations of
variables in Y
Each P(X, e, Y) term can be computed using the
chain rule
Computationally expensive!

31
Example Enumeration

P(xi) S pi P(xi pi) P(pi)
Suppose we want P(D), and only the value of E is
given as true
P (De) ? SA,B,CP(a, b, c, d, e)
? SA,B,CP(a) P(ba) P(ca) P(db,c) P(ec)
With simple iteration to compute this expression,
theres going to be a lot of repetition (e.g.,
P(ec) has to be recomputed every time we iterate
over C for all possible assignments of A and B))

32
Exercise Enumeration
p(smart).8
p(study).6
smart
study
p(fair).9
prepared
fair
p(prep) smart ?smart
study .9 .7
?study .5 .1
pass
p(pass) smart smart ?smart ?smart
p(pass) prep ?prep prep ?prep
fair .9 .7 .7 .2
?fair .1 .1 .1 .1
Query What is the probability that a student
studied, given that they pass the exam?
33
Variable elimination

Basically just enumeration, but with caching of
local calculations
Linear for polytrees
Potentially exponential for multiply connected
BNs
Exact inference in Bayesian networks is NP-hard!

34
Variable elimination

General idea
Write query in the form
Iteratively
Move all irrelevant terms outside of innermost
sum
Perform innermost sum, getting a new term
Insert the new term into the product

35
Variable elimination
8 x 4 32 multiplications 8 x 2 4 2
22 multiplications

Example
SASBSCP(a) P(ba) P(ca) P(db,c) P(ec)
SASBP(a)P(ba)SCP(ca) P(db,c) P(ec)
SAP(a)SBP(ba)SCP(ca) P(db,c) P(ec)
for each state of A a
for each state of B b
compute fC(a, b) SCP(ca) P(db,c)
P(ec)
compute fB(a) SBP(b)fC(a, b)
Compute result SAP(a)fB(a)
Here fC(a, b), fB(a) are called factors, which
are vectors or matrices

Variable C is summed out variable B is summed
out
36
Exercise Variable elimination
p(smart).8
p(study).6
smart
study
p(fair).9
prepared
fair
p(prep) smart ?smart
study .9 .7
?study .5 .1
pass
p(pass) smart smart ?smart ?smart
p(pass) prep ?prep prep ?prep
fair .9 .7 .7 .2
?fair .1 .1 .1 .1
Query What is the probability that a student is
smart, given that they pass the exam?
37
Belief Propagation

Singly connected network, SCN (also known as
polytree)
there is at most one undirected path between any
two nodes (i.e., the network is a tree if the
direction of arcs are ignored)
The influence of the instantiated variable
(evidence) spreads to the rest of the network
along the arcs

The instantiated variable influences
its predecessors and successors differently
(using CPT along opposite directions)
Computation is linear to the diameter of
the network (the longest undirected path)
Update belief (posterior) of every non-evidence
node in one pass
For multi-connected net conditioning

38
Conditioning

Conditioning Find the networks smallest cutset
S (a set of nodes whose removal renders the
network singly connected)
In this network, S A or B or C or D
For each instantiation of S, compute the belief
update with the belief propagation algorithm
Combine the results from all instantiations of S
(each is weighted by P(S s))
Computationally expensive (finding the smallest
cutset is in general NP-hard, and the total
number of possible instantiations of S is O(2S))

39
Junction Tree

Convert a BN to a junction tree
Moralization add undirected edge between every
pair of parents, then drop directions of all arc
Moralized Graph
Triangulation add an edge to any cycle of length
gt 3 Triangulated Graph
A junction tree is a tree of cliques of the
triangulated graph
Cliques are connected by links
A link stands for the set of all variables S
shared by these two cliques
Each clique has a potential (similar to CPT),
constructed from CPT of variables in the original
BN

40
Junction Tree

Example

41
Junction Tree

Reasoning
Since it is now a tree, polytree algorithm can be
applied, but now two cliques exchange P(S), the
distribution over S, their shared variables.
Complexity
O(n) steps, where n is the number of cliques
Each step is expensive if cliques are large (CPT
exponential to clique size)
Construction of CPT of JT is expensive as well,
but it needs to compute only once.

42
Some comments on BN reasoning

Let be the set of
all variables in a BN. Any BN reasoning task can
be expressed in the form of calculating
This can be done by marginalization of the joint
distribution P(X) over Y X \ U \ V
where each entry P(x) P(u,v,y) can be
calculated by chain rule from CPTs
Computation can be done more efficiently using,
say Junction tree, by utilizing variable
interdependencies
Computational complexity of BN reasoning is
proved to be NP-hard by reducing 3SAT problems to
BN reasoning (Cooper 1990)

43
Approximate inference Direct sampling

Suppose you are given values for some subset of
the variables, E, and want to infer distributions
for unknown variables, Z
Randomly generate a very large number of
instantiations from the BN according to the
distribution
Generate instantiations for all variables start
at root variables and work your way forward in
topological order
Rejection sampling Only keep those
instantiations that are consistent with the
values for E
Use the frequency of values for Z to get
estimated probabilities
Accuracy of the results depends on the size of
the sample (asymptotically approaches exact
results)
Very expensive and inefficient

44
Likelihood weighting

Idea Dont generate samples that need to be
rejected in the first place!
Sample only from the unknown variables Z and
YX\Z\E (E are fixed)
Weight each sample according to the likelihood
that it would occur, given the evidence E
A weight w is associated with each sample (w
initialized to 1)
When a evidence node (say E1 e1-0) is selected
for weighting, its parents are already
instantiated (say parents A and B are assigned
state a and b)
Modify w w P(e1-0 a, b) based on E1s CPT
Repeat for the other evidence nodes

45
Markov chain Monte Carlo algorithm

So called because
Markov chain each instance generated in the
sampling is dependent on the previous instance
Monte Carlo statistical sampling method
Perform a random walk through variable assignment
space, collecting statistics as you go
Start with a random instantiation, consistent
with evidence variables
At each step, randomly select a non-evidence
variable x, randomly sample its value by
Given enough samples, MCMC gives an accurate
estimate of the true distribution of values

46
Loopy Propagation

Belief propagation
Works only for polytrees (exact solution)
Each evidence propagates once throughout the
network
Loopy propagation
Let propagation continue until the network
stabilize (hope)
Experiments show
Many BN stabilize with loopy propagation
If it stabilizes, often yielding exact or very
good approximate solutions
Analysis
Conditions for convergence and quality
approximation are under intense investigation

47
Noisy-Or BN

A special BN of binary variables (Peng Reggia,
Cooper)
Causation independence parent nodes influence a
child independently
Advantages
One-to-one correspondence between causal links
and causal strengths
Easy for humans to understand (acquire and
evaluate KB)
Fewer of probabilities needed in KB
Computation is less expensive
Disadvantage less expressive (less general)

48
Learning BN (from case data)

Needs for learning
Difficult to construct BN by humans (esp. CPT)
Experts opinions are often biased, inaccurate,
and incomplete
Large databases of cases become available
What to learn
Parameter learning learning CPT when DAG is
known (easy)
Structural learning learning DAG (hard)
Difficulties in learning DAG from case data
There are too many possible DAG when of
variables is large (more than exponential)
n of possible DAG
3 25
10 41018
Missing values in database
Noisy data

49
BN Learning Approaches

Early effort Based on variable dependencies
(Pearl)
Find all pairs of variables that are dependent of
each other (applying standard statistical method
on the database)
Eliminate (as much as possible) indirect
dependencies
Determine directions of dependencies
Learning results are often incomplete (learned BN
contains indirect dependencies and undirected
links)

50
BN Learning Approaches

Bayesian approach (Cooper)
Find the most probable DAG, given database DB,
i.e.,
max(P(DAGDB)) or max(P(DAG, DB))
Based on some assumptions, a formula is developed
to compute P(DAG, DB) for a given pair of DAG and
DB
A hill-climbing algorithm (K2) is developed to
search a (sub)optimal DAG using a pre-determined
partial order of the variables
Compute CPTs after the DAG is determined
Extensions to handle some form of missing values

51
BN Learning Approaches

Minimum description length (MDL) (Lam, etc.)
Sacrifices accuracy for simpler (less dense)
structure
Case data not always accurate
Outliers are hard to model (needs more links)
Fewer links imply smaller CPD tables and less
expensive inference
L L1 L2 where
L1 the length of the encoding of DAG (smaller
for simpler DAG)
L2 the length of the encoding of the difference
between DAG and DB (smaller for better match of
DAG with DB)
Smaller L1 implies less accurate DAG, and thus
larger L2
Find DAG by heuristic best-first search that
Minimizes L

52
BN Learning Approaches

Neural network approach (Neal, Peng)
For noisy-or BN
Change inter-node link strength locally,
following gradient descent approach to maximize
L.

Compare Neural network approach with Coopers K2
Network Alarm (37 nodes)

cases missing links extra links time
500 2/0 2/6 63.76/5.91
1000 0/0 1/1 69.62/6.04
2000 0/0 0/0 77.45/5.86
10000 0/0 0/0 161.97/5.83
54
Current research in BN

Missing data
Missing value EM (expectation maximization)
Missing (hidden) variables are harder to handle
BN with time
Dynamic BN assuming temporal relation obey
Markov chain
Cyclic relations
Often found in social-economic analysis
Using dynamic BN?
Continuous variable
Some work on variables obeying Gaussian
distribution
Connecting to other fields
Databases Statistics Symbolic AI (FOL)
Semantic web
Reasoning with uncertain evidence
Virtual evidence
Soft evidence

55
Other formalisms for Uncertainty Fuzzy sets and
fuzzy logic

Ordinary set theory
There are sets that are described by vague
linguistic terms (sets without hard, clearly
defined boundaries), e.g., tall-person, fast-car
Continuous
Subjective (context dependent)
Hard to define a clear-cut 0/1 membership function

Fuzzy set theory
height(john) 65 Tall(john) 0.9
height(harry) 58 Tall(harry) 0.5
height(joe) 51 Tall(joe) 0.1
Examples of membership functions

Fuzzy logic many-value logic
Fuzzy predicates (degree of truth)
Connectors/Operators
Compare with probability theory
Prob. Uncertainty of outcome,
Based on large of repetitions or instances
For each experiment (instance), the outcome is
either true or false (without uncertainty or
ambiguity)
unsure before it happens but sure after it
happens
Fuzzy vagueness of conceptual/linguistic
characteristics
Unsure even after it happens
whether a child of tall mother and short father
is tall
unsure before the child is born
unsure after grown up (height 56)

Empirical vs subjective (testable vs agreeable)
Fuzzy set connectors may lead to unreasonable
results
Consider two events A and B with P(A) lt P(B)
If A gt B (or A ? B) then
P(A B) P(A) minP(A), P(B)
P(A v B) P(B) maxP(A), P(B)
Not the case in general
P(A B) P(A)P(BA) ? P(A)
P(A v B) P(A) P(B) P(A B) ? P(B)
(equality holds only if P(BA) 1, i.e., A
gt B)
Something prob. theory cannot represent
Tall(john) 0.9, Tall(john) 0.1
Tall(john) Tall(john) min0.1, 0.9) 0.1
johns degree of membership in the fuzzy set of
median-height people (both Tall and not-Tall)
In prob. theory P(john ? Tall john ?Tall) 0

59
Uncertainty in rule-based systems

Elements in Working Memory (WM) may be uncertain
because
Case input (initial elements in WM) may be
uncertain
Ex the CD-Drive does not work 70 of the time
Decision from a rule application may be uncertain
even if the rules conditions are met by WM with
certainty
Ex flu gt sore throat with high probability
Combining symbolic rules with numeric
uncertainty Mycins
Certainty Factor (CF)
An early attempt to incorporate uncertainty into
KB systems
CF ? -1, 1
Each element in WM is associated with a CF
certainty of that assertion
Each rule C1,...,Cn gt Conclusion is associated
with a CF certainty of the association (between
C1,...Cn and Conclusion).

CF propagation
Within a rule each Ci has CFi, then the
certainty of Action is
minCF1,...CFn CF-of-the-rule
When more than one rules can apply to the current
WM for the same Conclusion with different CFs,
the largest of these CFs will be assigned as the
CF for Conclusion
Similar to fuzzy rule for conjunctions and
disjunctions
Good things of Mycins CF method
Easy to use
CF operations are reasonable in many applications
Probably the only method for uncertainty used in
real-world rule-base systems
Limitations
It is in essence an ad hoc method (it can be
viewed as a probabilistic inference system with
some strong, sometimes unreasonable assumptions)
May produce counter-intuitive results.

61
Dempster-Shafer theory

A variation of Bayes theorem to represent
ignorance
Uncertainty and ignorance
Suppose two events A and B are ME and EXH, given
an evidence E
A having cancer B not having cancer E smoking
By Bayes theorem our beliefs on A and B, given
E, are measured by P(AE) and P(BE), and P(AE)
P(BE) 1
In reality,
I may have some belief in A, given E
I may have some belief in B, given E
I may have some belief not committed to either
one,
The uncommitted belief (ignorance) should not be
given to either A or B, even though I know one of
the two must be true, but rather it should be
given to A or B, denoted A, B
Uncommitted belief may be given to A and B when
new evidence is discovered