Lecture Notes 16: Bayes - PowerPoint PPT Presentation

Loading...

PPT – Lecture Notes 16: Bayes PowerPoint presentation | free to download - id: 1a89c4-ZDc1Z



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Lecture Notes 16: Bayes

Description:

If the product is actually bad, the person's prediction is 80% correct, whereas ... This is a modified version of the German Bank credit decision problem. ... – PowerPoint PPT presentation

Number of Views:75
Avg rating:3.0/5.0
Slides: 37
Provided by: zlinB
Learn more at: http://zlin.ba.ttu.edu
Category:
Tags: bad | bayes | credit | lecture | loan | notes | student

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Lecture Notes 16: Bayes


1
Lecture Notes 16 Bayes Theorem and Data Mining
  • Zhangxi Lin
  • ISQS 6347

2
Modeling Uncertainty
  • Probability Review
  • Bayes Classifier
  • Value of Information
  • Conditional Probability and Bayes Theorem
  • Expected Value of Perfect Information
  • Expected Value of Imperfect Information

3
Probability Review
  • P(AB) P(A and B) / P(B)
  • Probability of A given B
  • Example, there are 40 female students in a class
    of 100. 10 of them are from some foreign
    countries. 20 male students are also foreign
    students.
  • Even A student from a foreign country
  • Even B a female student
  • If randomly choosing a female student to present
    in the class, the probability she is a foreign
    student P(AB) 10 / 40 0.25, or P(AB) P
    (A B) / P (B) (10 /100) / (40 / 100) 0.1 /
    0.4 0.25
  • That is, P(AB) of AB / of B ( of AB /
    Total) / ( of B / Total) P(A B) / P(B)

4
Venn Diagrams
3010 40
2010 30
Foreign Student (20)
Female (30)
(10)
Male non-foreign student (40)
Female foreign student (10)
5
Probability Review
  • Complement

Non Female
Female
Non Foreign Student
Foreign student
6
Bayes Classifier
7
Bayes Theorem (From Wikipedia)
  • In probability theory, Bayes' theorem (often
    called Bayes' Law) relates the conditional and
    marginal probabilities of two random events. It
    is often used to compute posterior probabilities
    given observations. For example, a patient may be
    observed to have certain symptoms. Bayes' theorem
    can be used to compute the probability that a
    proposed diagnosis is correct, given that
    observation.
  • As a formal theorem, Bayes' theorem is valid in
    all interpretations of probability. However, it
    plays a central role in the debate around the
    foundations of statistics frequentist and
    Bayesian interpretations disagree about the ways
    in which probabilities should be assigned in
    applications. Frequentists assign probabilities
    to random events according to their frequencies
    of occurrence or to subsets of populations as
    proportions of the whole, while Bayesians
    describe probabilities in terms of beliefs and
    degrees of uncertainty. The articles on Bayesian
    probability and frequentist probability discuss
    these debates at greater length.

8
Bayes Theorem
So
The above formula is referred to as Bayes
theorem. It is extremely Useful in decision
analysis when using information.
9
Example of Bayes Theorem
  • Given
  • A doctor knows that meningitis (M) causes stiff
    neck (S) 50 of the time
  • Prior probability of any patient having
    meningitis is 1/50,000
  • Prior probability of any patient having stiff
    neck is 1/20
  • If a patient has stiff neck, whats the
    probability he/she has meningitis?

10
Bayes Classifiers
  • Consider each attribute and class label as random
    variables
  • Given a record with attributes (A1, A2,,An)
  • Goal is to predict class C ( (c1, c2, , cm))
  • Specifically, we want to find the value of C that
    maximizes P(C A1, A2,,An )
  • Can we estimate P(C A1, A2,,An ) directly from
    data?

11
Bayes Classifiers
  • Approach
  • compute the posterior probability P(C A1, A2,
    , An) for all values of C using the Bayes
    theorem
  • Choose value of C that maximizes P(C A1, A2,
    , An)
  • Equivalent to choosing value of C that maximizes
    P(A1, A2, , AnC) P(C)
  • How to estimate P(A1, A2, , An C )?

12
Example
  • C Evade (Yes, No)
  • A1 Refund (Yes, No)
  • A2 Marital Status (Single, Married, Divorced)
  • A3 Taxable income (60k 220k)
  • We can obtain P(A1, A2, A3C), P(A1, A2, A3), and
    P(C) from the data set
  • Then calculate P(CA1, A2, A3) for predictions
    given A1, A2, and A3, while C is unknown.

13
Naïve Bayes Classifier
  • Assume independence among attributes Ai when
    class is given
  • P(A1, A2, , An C) P(A1 Cj) P(A2 Cj) P(An
    Cj)
  • Can estimate P(Ai Cj) for all Ai and Cj.
  • New point is classified to Cj if P(Cj) ? P(Ai
    Cj) is maximal.
  • Note The above is equivalent to find i such that
    ? P(Ai Cj) is maximal, since P(Cj) is
    identical.

14
How to Estimate Probabilities from Data?
  • Class P(C) Nc/N
  • e.g., P(No) 7/10, P(Yes) 3/10
  • For discrete attributes P(Ai Ck)
    Aik/ Nc
  • where Aik is number of instances having
    attribute Ai and belongs to class Ck
  • Examples
  • P(StatusMarriedNo) 4/7P(RefundYesYes)0

k
15
How to Estimate Probabilities from Data?
  • For continuous attributes
  • Discretize the range into bins
  • one ordinal attribute per bin
  • violates independence assumption
  • Two-way split (A lt v) or (A gt v)
  • choose only one of the two splits as new
    attribute
  • Probability density estimation
  • Assume attribute follows a normal distribution
  • Use data to estimate parameters of distribution
    (e.g., mean and standard deviation)
  • Once probability distribution is known, can use
    it to estimate the conditional probability P(Aic)

16
How to Estimate Probabilities from Data?
  • Normal distribution
  • One for each (Ai,ci) pair
  • For (Income, ClassNo)
  • If ClassNo
  • sample mean 110
  • sample variance 2975

17
Example of Naïve Bayes Classifier
Given a Test Record
  • P(XClassNo) P(RefundNoClassNo) ?
    P(Married ClassNo) ? P(Income120K
    ClassNo) 4/7 ? 4/7 ? 0.0072
    0.0024
  • P(XClassYes) P(RefundNo ClassYes)
    ? P(Married ClassYes)
    ? P(Income120K ClassYes)
    1 ? 0 ? 1.2 ? 10-9 0
  • Since P(XNo)P(No) gt P(XYes)P(Yes)
  • Therefore P(NoX) gt P(YesX) gt Class No

18
Naïve Bayes Classifier
  • If one of the conditional probability is zero,
    then the entire expression becomes zero
  • Probability estimation

c number of classes p prior probability m
parameter
19
Example of Naïve Bayes Classifier
A attributes M mammals N non-mammals
P(AM)P(M) gt P(AN)P(N) gt Mammals
20
Naïve Bayes (Summary)
  • Robust to isolated noise points
  • Handle missing values by ignoring the instance
    during probability estimate calculations
  • Robust to irrelevant attributes
  • Independence assumption may not hold for some
    attributes
  • Use other techniques such as Bayesian Belief
    Networks (BBN)

21
Value of Information
  • When facing uncertain prospects we need
    information in order to reduce uncertainty
  • Information gathering includes consulting
    experts, conducting surveys, performing
    mathematical or statistical analyses, etc.

22
Expected Value of Perfect Information (EVPI)
Problem An buyer is to buy something online
Net gain
Seller type
Bad
- 100
0.01
Not use insurance Pay 100
EMV 18.8
Good
0.99
20
Buyer
Bad
- 2
0.01
EMV 17.8
Good
Use insurance Pay 1002 102
18
0.99
23
Expected Value of Imperfect Information (EVII)
  • We rarely access to perfect information, which is
    common. Thus we must extend our analysis to deal
    with imperfect information.
  • Now suppose we can access the online reputation
    to estimate the risk in trading with a seller.
  • Someone provide their suggestions to you
    according to their experience. Their predictions
    are not 100 correct
  • If the product is actually good, the persons
    prediction is 90 correct, whereas the remaining
    10 is suggested bad.
  • If the product is actually bad, the persons
    prediction is 80 correct, whereas the remaining
    20 is suggested good.
  • Although the estimate is not accurate enough, it
    can be used to improve our decision making
  • If we predict the risk is high to buy the product
    online, we purchase insurance

24
Decision Tree
Extended from the previous online trading question
Questions 1. Given the suggestion What is your
decision? 2. What is the probability wrt the
decision you made? 3. How do you estimate The
accuracy of a prediction?
Bad (?)
- 100
Seller type
No Ins
20
Good (?)
Predicted Good
Bad (?)
- 2
Insurance
Good (?)
18
Buyer
Bad (?)
- 100
No Ins
20
Good (?)
Bad (?)
- 2
Predicted Bad
Insurance
Good (?)
18
25
Applying Bayes Theorem
  • Let Good be even A
  • Let Bad be even B
  • Let Predicted Good be event G
  • Let Predicted Bad be event W
  • According to the previous information, for
    example by data mining the historical data, we
    know
  • P(GA) 0.9, P(WA) 0.1
  • P(WB) 0.8, P(GB) 0.2
  • P(A) 0.99, P(B) 0.01
  • We want to learn the probability the outcome is
    good providing the prediction is good. i.e.
  • P(AG) ?
  • We want to learn the probability the outcome is
    bad providing the prediction is bad. i.e.
  • P(BW) ?
  • We may apply Bayes theorem to solve this with
    imperfect information

26
Calculate P(G) and P(W)
  • P(G) P(GA)P(A) P(GB)P(B)
  • 0.9 0.99 0.2 0.01
  • 0.893
  • P(W) P(WB)P(B) P(WA)P(A)
  • 0.8 0.01 0.1 0.99
  • 0.107
  • 1 - P(G)

27
Applying Bayes Theorem
  • We have
  • P(AG) P(GA)P(A) / P(G)
  • P(GA)P(A) / P(GA)P(A) P(GB)P(B)
  • P(GA)P(A) / P(GA)P(A) P(GB)(1 - P(A))
  • 0.9 0.99 / 0.9 0.99 0.2 0.01
  • 0.9978 gt 0.99
  • P(BW) P(WB)P(B) / P(W)
  • P(WB)P(B) / P(WB)P(B) P(WA)P(A)
  • P(WB)P(B) / P(WB)P(B) P(WA)(1 - P(B))
  • 0.8 0.01 / 0.8 0.01 0.1 0.99
  • 0.0748 gt 0.01
  • Apparently, data mining provides good information
    and changes the original probability

28
Decision Tree
P(A) 0.99, P(B) 0.01
Bad (0.0022)
- 100
Seller type
EMV 19.87 Your choice
No Ins
Predicted Good P(G) 0.893
20
Good (0.9978)
Bad (0.0022)
- 2
EMV 17.78
Insurance
Good (0.9978)
18
Buyer
Bad (0.0748)
- 100
EMV 11.03
No Ins
20
Good (0.9252)
Bad (0.0748)
- 2
Predicted Bad P(W) 0.107
EMV 16.50 Your choice
Insurance
Good (0.9252)
18
Data mining can significantly improve your
decision making accuracy!
29
Consequences of a Decision
Predicted Good (G) (not to buy insurance) Predicted Bad (W) (need to buy insurance 2)
Actual Good (A) a Gain 20 b Net Gain 18
Actual Bad (B) c Lose 100 d Cost 2
P(A) (a b) / (a b c d) 0.99 P(B)
(c d) / (a b c d) 0.01
P(G) (a c) / (a b c d) 0.893
P(W) (b d) / (a b c d) 0.107
P(GA) a / (a b) 0.9, P(WA) b / (a b)
0.1 P(WB) c / (c d) 0.8, P(GB) d / (c
d) 0.2
30
German Bank Credit Decision
Computed Good (Action A, B) Computed Bad (Action A, B)
Actual Good True Positive 600 (6, 0) False Negative 100 (0, -1)
Actual Bad False Positive 80 (-2, -1) True Negative 220 (-20, 0)
700 300
680 320
This is a modified version of the German Bank
credit decision problem. 1. Assume because of the
anti-discrimination regulation there could be a
cost in FN depending on the action taken. 2. The
bank has two choices of actions A B. Each will
have different results. 3.Question 1 When the
classification model suggests that a specific
loan applicant has a probability 0.8 to be GOOD,
which action should be taken? 4. Question 2 When
the classification model suggests that a specific
loan applicant has a probability 0.6 to be GOOD,
which action should be taken?
31
The Payoffs from Two Actions
Computed Good (Action A) Computed Bad (Action A)
Actual Good True Positive 600 (6) False Negative 100 (0)
Actual Bad False Positive 100 (-2) True Negative 200 (-20)
700 300
700 300
Computed Good (Action B) Computed Bad (Action B)
Actual Good True Positive 600 (0) False Negative 100 (-1)
Actual Bad False Positive 100 (-1) True Negative 200 (0)
700 300
700 300
32
Summary
  • There are two decision scenarios
  • In previous classification problems, when
    predicted target is 1 then take an action,
    otherwise do nothing. Only the action will make
    something different.
  • There is a cutoff value for this kind of
    decision. A risk-aversion person may set a
    higher level of cutoff value, when the utility
    function is not linear with regard to the
    monetary result.
  • The risk-aversion person may opt for earn less
    without the emotional worry of the risk.
  • In current Bayesian decision problem, when the
    predicted target is 1 then take action A,
    otherwise take Action B. Both actions will result
    in some outcomes.

33
Web Page Browsing
P0
Problem When a browsing user Entered P5 from
P2, What is the probability He will proceed to
P3? How to solve the problem in general? 1.
Assume this is the first Order Markovian chain.
2. Construct a transition probability matrix
P1
P2
P5
0.7
P4
0.3
P3
  • We notice that
  • P(P2P4P0) may not equal to P(P2P4P1)
  • There is only one entrance of the web site at P0
  • There is no link from P3 to other pages.

34
Transition Probabilities
P(K,L)Probability of traveling FROM K TO L
P0/H
P1
P2
P3
P4
P5
Exit
P0/H
P(H,1)
P(H,2)
P(H,3)
P(H,4)
P(H,5)
P(H,E)
P(H,H)
P1
P(1,1)
P(1,2)
P(1,3)
P(1,4)
P(1,5)
P(1,E)
P(1,H)
P2
P(2,1)
P(2,2)
P(2,3)
P(2,4)
P(2,5)
P(2,E)
P(2,H)
P3
P(3,1)
P(3,2)
P(3,3)
P(3,4)
P(3,5)
P(3,E)
P(3,H)
P4
P(4,1)
P(4,2)
P(4,3)
P(4,4)
P(4,5)
P(4,E)
P(4,H)
P5
P(5,1)
P(5,2)
P(5,3)
P(5,4)
P(5,5)
P(5,E)
P(5,H)
Exit
0
0
0
0
0
0
0
35
Demonstration
  • Dataset Commrex web log data
  • Data Exploration
  • Link analysis
  • The links among nodes
  • Calculate the transition matrix
  • The Bayesian network model for the web log data
  • Reference
  • David Heckerman, A Tutorial on Learning With
    Bayesian Networks, March 1995 (Revised November
    1996), Technical Report, MSR-TR-95-06\\BASRV1\ISQS
    6347\tr-95-06.pdf

36
Readings
  • SPB, Chapter 3
  • RG, Chapter 10
About PowerShow.com