Theory - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Theory

Description:

Expectation for two variables. Covariance / covariance matrix ... A discriminant is a function g(x), that discriminates between classes. ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 42
Provided by: robi170
Category:

less

Transcript and Presenter's Notes

Title: Theory


1
Dept. of Electrical and Computer
Engineering 0909.402.02 / 0909.504.04
Lecture 2
Theory Applications of Pattern Recognition
Probability Theory
Random variables Probability mass function Joint
probability Expected values Mean/Variance/Covarian
ce Stat. independence Correlation Conditional
probability Bayes rule Vector RVs Normal
distribution Central limit theorem Gaussian
derivatives Multivariate densities
2
Probability Theory
  • Discrete random variables
  • Probability mass function
  • Cumulative mass function
  • Expected value (average)
  • Variance and standard deviation
  • Pairs of random variables
  • Joint probability
  • Joint distribution
  • Statistical independence
  • Expectation for two variables
  • Covariance / covariance matrix
  • Correlation / correlation coefficient

3
The Law of Total Probability
  • Conditional Probability
  • Bayes Rule
  • Prior probability
  • Likelihood
  • Evidence
  • Posterior probability

4
Vector Random VariablesContinuous Random
Variables
  • Joint distribution for vector random variables
  • Bayes rule for vector r.v.
  • Expectation, mean vector, covariance matrix for
    vector r.v.
  • Probability density function pdf, ??????
  • Cumulative dist. function cdf, ??????
  • ????? X? ??? x ?? ??? ?? ??
  • Fx(x) P(X lt x)
  • Dist. for sum of r.v.

5
Gaussian Distribution
  • Central Limit Theorem
  • Gaussian (Normal) Distribution
  • Mean and variance
  • Standardizing Gaussian dist.
  • Gaussian derivatives and integrals
  • Error function
  • Using Gaussian Distribution
  • Using tables of Gaussian distribution
  • Using MATLAB

6
Other Important Distributions
  • Chi-square
  • Poisson
  • Binomial
  • Beta
  • Gamma
  • Students t-distribution
  • F-distribution

7
Multivariate GaussianDistribution
  • Multivariate normal density function
  • Mahalanobis distance
  • Whitening Transform

8
Bayes Decision theory
  • Statistically best classifier
  • Based on quantifying the trade offs betweens
    various classification decisions using a
    probabilistic approach
  • The theory assumes
  • Decision problem can be posed in probabilistic
    terms
  • All relevant probability values are known (in
    practice this is not true)

9
Class Conditional Probabilities
P(x ?2) Class conditional probability for
salmon Likelihood Given that salmon has been
observed, what is the probability of this
salmons lightness is between 11 and 12?
?1 Sea bass ?2 Salmon
lightness
10
Definitions Bayes Decision Rule
  • State of nature
  • A priori probability (prior)
  • A posteriori probability (posterior)
  • Likelihood
  • Evidence

11
Posterior Probabilities
  • Bayes rule allows us to compute the posterior
    probability (difficult to determine) from prior
    probabilities, likelihood and the evidence
    (easier to determine).

Posterior probabilities for priors P (?1) 2/3
and P(?2) 1/3. For example, given that a
pattern is measured to have feature value x 14,
the probability it is in category ?2 is roughly
0.08, and that it is in ?1 is 0.92. At every x,
the posteriors sum to 1.0.
12
Bayes Decision Rule
  • Choose the class that has the larger posterior
    probability !!!

Choose ?i if P(?i x) gt P(?j x) for all i
1,2,,c P(error) min P(?1 x), P(?2 x),
, P(?c x) If there are multiple features,
xx1, x2,, xd ? Choose ?i if P(?i x) gt P(?j
x) for all i 1,2,,c P(error) min P(?1
x), P(?2 x), , P(?c x)
13
2.2 The Loss Function
  • Mathematical description of how costly each
    action (making a class decision) is. Are certain
    mistakes more costly than others?

?1, ?2,, ?c Set of states of nature
(classes) ?1, ?2, ?a Set of possible
actions. Note that a need not be same as c.
Because we may make more (or less) number of
actions than the number of classes. For example,
not making a decision is also an action. ?1,
?2, ?a Losses associated with each
action ?(?i ?j The loss function Loss
incurred for taking action i when the true state
of nature is in fact j. R(?i x) Conditional
risk - Expected loss for taking action i
Bayes decision takes the action that minimizes
the conditional risk !
14
Bayes Decision RuleUsing Conditional Risk
  • Compute conditional risk R(?i x) for each action
    taken
  • Select the action that has the minimum
    conditional risk. Let this be action k
  • The overall risk is then
  • This is the Bayes Risk, the minimum possible risk
    that can be taken by any classifier !

15
2.3 Minimum Error Rate Classification
  • If we associate taking action i as selecting
    class i, and if all errors are equally likely, we
    obtain the zero-one loss
  • This loss function assigns no loss to correct
    classification, and assigns 1 to
    misclassification. The risk corresponding to this
    loss function is then
  • which is precisely the average probability of
    error. Clearly, to minimize this risk, we need to
    choose the class that maximizes the posterior
    probability, hence the Bayes rule !!!

16
????
17
2.4 Discriminant Based Classification
  • A discriminant is a function g(x), that
    discriminates between classes. This function
    assigns the input vector to a class according to
    its definition Choose class i if
  • Bayes rule can be implemented in terms of
    discriminant functions

Each discriminant function generates c decision
regions, R1,,Rc, which are separated by
decision boundaries. Decision regions need NOT
be contiguous. The decision boundary satisfies
18
The Normal Density (???? ??)
  • ??? ???? ??????
  • ??? ???? ??????

19
Normal ??? ?? ????
  • If likelihood probabilities are normally
    distributed, then a number of simplifications can
    be made. In particular, the discriminant function
    can be written as in this greatly simplified form
    (!)
  • ? 2.4.1 ?? ?? ??? ??? ??
  • There are three distinct cases
    that can occur

20
Case 1 _________
  • Features are statistically independent, and
    all features have the same variance Dist. are
    spherical in d dimensions, and the boundary is a
    generalized hyperplane of d-1 dimensions, and
    featured create equal sized hyperspherical
    clusters. Examples of such hyperspherical
    clusters are

21
Case 1________
  • This case results in linear discriminants of the
    form

Note how priors shift the discriminant function
away from the more likely mean !!!
22
Case 2_______
Covariance matrices are arbitrary, but equal to
each other for all classes. Features then form
hyper- ellipsoidal clusters of equal size and
shape. This also results in linear discriminant
functions whose decision boundaries are again
hyperplanes
23
Case 3____________
All bets off !In two class case, the decision
boundaries form hyperquadratics. The
discriminant functions are now, in general,
quadratic (nor linear)
24
Case 3____________
For the multi class case, the boundaries will
look even more complicated. As an example
Decision Boundaries
25
Case 3____________
In 3-D
26
Error Probabilities
In a two class case, there are two sources of
error x is in R1, yet SON is w2, or vice versa
xBOptimal Bayes solution xNon-optimal solution

P(error)
27
????(Bayes Error)
????? ?? ????? ??-?? x? ??? ??? ???? P(error)?
?? ?? P(errorx)? ????,
??? ??????????
???? ???? ??? ??? ? ????, ?? ??? ???? ?? ? ???? ?
pp. 54, ? (71) ? ?? x? ?? ??? ??? ??? ??? ??? ??
28
Error Bounds
It is difficult, at best if possible, to
analytically compute the error probabilities,
Particularly when the decision regions are not
contiguous. However, upper bounds forthis error
can be obtained The Chernoff bound and its
approximation Bhattacharya bound are two such
bounds that are often used. If the distributions
are Gaussian, these expressions are
relativelyeasier to compute? Often times even
non-Gaussian cases are considered as Gaussian.
29
2.7 Error probabilities and Integrals
30
2.8 Error Bounds for Normal Densities
2.8.1 Chernoff Bound
2.8.2 Bhattacharyya Bound
31
2.8.3 Signal Detection Theory and Operating
Characteristics
Receiver Operating Characteriotic
  • Discriminability

Can calculate the Bayes error rate
32
2.9 Bayes Decision Theory Discrete Features
Bayes formula
33
2.9.1 Independent Binary Features
Relevance of a yes answer for xi in determining
the classification
34
2.10.1 Missing Features
Marginal distribution
? Integrate the posterior probability over the
bad features
35
2.10.2 Noisy Features
Assume) if were know, would be
independent of
36
2.11 Bayesian Belief Networks
37
2.11 Bayesian Belief Networks
38
2.11 Bayesian Belief Networks
Useful in the case where we seek to determine
some particular configuration of other variables
? given evidence
39
2.11 Bayesian Belief Networks
1
40
2.12 Compound Bayesian Decision Theory and Context
Exploit statistical dependence to gain improved
performance ? by using context
  • Compound decision problem
  • Sequential compound decision problem

41
2.12 Compound Bayesian Decision Theory and Context
The posterior probability of ?
? The optimal procedure is to minimize the
compound conditional risk.
If no loss for being correct all errors are
equally costly, Procedure ? computing P(?X) for
all ?, selecting ?( posterior
probability is maximum )
In practice, enormous task(?cn) P(?) ?
dependent
Write a Comment
User Comments (0)
About PowerShow.com