# Lecture 2. Bayesian Decision Theory - PowerPoint PPT Presentation

PPT – Lecture 2. Bayesian Decision Theory PowerPoint presentation | free to view - id: 40c8c6-ZTM3Z

The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
Title:

## Lecture 2. Bayesian Decision Theory

Description:

### Title: Lecture 2. Bayesian Decision Theory Author: Tianwei Yu Last modified by: Tianwei Yu Created Date: 1/23/2009 1:44:14 AM Document presentation format – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 36
Provided by: Tianw
Category:
Transcript and Presenter's Notes

Title: Lecture 2. Bayesian Decision Theory

1
Lecture 2.Bayesian Decision Theory
• Bayes Decision Rule
• Loss function
• Decision surface
• Multivariate normal and Discriminant Function

2
Bayes Decision
It is the decision making when all underlying
probability distributions are known. It is
optimal given the distributions are known. For
two classes w1 and w2 , Prior probabilities for
an unknown new observation P(w1) the new
observation belongs to class 1 P(w2) the new
observation belongs to class 2 P(w1 ) P(w2 )
1 It reflects our prior knowledge. It is our
decision rule when no feature on the new object
is available Classify as class 1 if P(w1 ) gt
P(w2 )
3
Bayes Decision
We observe features on each object. P(x w1)
P(x w2) class-specific density The Bayes
rule
4
Bayes Decision
Likelihood of observing x given class label.
5
Bayes Decision
Posterior probabilities.
6
Loss function
Loss function probability statement --gt
decision some classification mistakes can be
more costly than others. The set of c
classes The set of possible actions
deciding that an observation belongs to Loss
when taking action i given the observation
belongs to hidden class j
7
Loss function
The expected loss Given an observation with
covariant vector x, the conditional risk is
Our final goal is to minimize the total risk
over all x.
8
Loss function
The zero-one loss All errors are equally
costly. The conditional risk is The risk
corrsponding to this loss function is the average
probability error.
9
Loss function
Let denote the
loss for deciding class i when the true class is
j In minimizing the risk, we decide class one
if Rearrange it, we have
10
Loss function
Example
11
Loss function
Likelihood ratio.
If miss-classifying w2 is penalized more
Zero-one loss function
12
Discriminant function decision surface
Features -gt discriminant functions gi(x),
i1,,c Assign class i if gi(x) gt gj(x) ?j ? i
Decision surface defined by gi(x) gj(x)
13
Decision surface
The discriminant functions help partition the
feature space into c decision regions (not
necessarily contiguous). Our interest is to
estimate the boundaries between the regions.
14
Minimax
Minimizing the maximum possible loss. What
happens when the priors change?
15
Normal density
Reminder the covariance matrix is symmetric and
positive semidefinite. Entropy - the measure of
uncertainty Normal distribution has the maximum
entropy over all distributions with a given mean
and variance.
16
Reminder of some results for random vectors
Let S be a kxk square symmetrix matrix, then it
has k pairs of eigenvalues and eigenvectors. A
can be decomposed as
Positive-definite matrix
17
Normal density
Whitening transform
18
Normal density
To make a minimum error rate classification
(zero-one loss), we use discriminant
functions This is the log of the numerator in
the Bayes formula. The log posterior probability
is proportional to it. Log is used because we are
only comparing the gis, and log is
monotone. When normal density is assumed We
have
19
Discriminant function for normal density
• ?i ?2I

Linear discriminant function Note blue boxes
irrelevant terms.
20
Discriminant function for normal density
The decision surface is where With
equal prior, x0 is the middle point between the
two means. The decision surface is a
hyperplane,perpendicular to the line between the
means.
21
Discriminant function for normal density
Linear machine dicision surfaces are
hyperplanes.
22
Discriminant function for normal density
With unequal prior probabilities, the decision
boundary shifts to the less likely mean.
23
Discriminant function for normal density
(2) ?i ?
24
Discriminant function for normal density
Set The decision boundary is
25
Discriminant function for normal density
The hyperplane is generally not perpendicular to
the line between the means.
26
Discriminant function for normal density
(3) ?i is arbitrary Decision boundary
hyperplanes, hyperspheres, hyperellipsoids,
hyperparaboloids, hyperhyperboloids)
27
Discriminant function for normal density
28
Discriminant function for normal density
29
Discriminant function for normal density
Extention to multi-class.
30
Discriminant function for discrete features
Discrete features x x1, x2, , xd t ,
xi?0,1
pi P(xi 1 ?1) qi P(xi 1
?2) The likelihood will be
31
Discriminant function for discrete features
The discriminant function
The likelihood ratio
32
Discriminant function for discrete features
So the decision surface is again a hyperplane.
33
Optimality
Consider a two-class case. Two ways to make a
mistake in the classification Misclassifying an
observation from class 2 to class
1 Misclassifying an observation from class 1 to
class 2. The feature space is partitioned into
two regions by any classifier R1 and R2
34
Optimality
35
Optimality
In the multi-class case, there are numerous ways
to make mistakes. It is easier to calculate the
probability of correct classification. Bayes
classifier maximizes P(correct). Any other
partitioning will yield higher probability of
error. The result is not dependent on the form
of the underlying distributions.