Discriminative Na - PowerPoint PPT Presentation

About This Presentation
Title:

Discriminative Na

Description:

Discriminative classifiers: Support Vector Machines ... The inter-class discriminative information between classes are discarded ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 32
Provided by: kzhu
Category:

less

Transcript and Presenter's Notes

Title: Discriminative Na


1
Discriminative Naïve Bayesian Classifiers
  • Kaizhu Huang
  • Supervisors Prof. Irwin King,
  • Prof. Michael R. Lyu
  • Markers Prof. Lai Wan Chan,
  • Prof. Kin Hong Wong

2
Outline
  • Background
  • Classifiers
  • Discriminative classifiers Support Vector
    Machines
  • Generative classifiers Naïve Bayesian
    Classifiers
  • Motivation
  • Discriminative Naïve Bayesian Classifiers
  • Experiments
  • Discussions
  • Conclusion

3
Background
  • Discriminative Classifiers
  • Directly maximize a discriminative function or
    posterior function
  • Example Support Vector Machines

4
Background
  • Generative Classifiers
  • Model the joint distribution for each class
    P(xC) and then use Bayes rules to construct
    posterior classifiers P(Cx).
  • Example Naïve Bayesian Classifiers
  • Model the distribution for each class under the
    assumption each feature of the data is
    independent with others features, when given the
    class label.

5
Background
  • Comparison

Example of Missing Information
From left to right Original digit, 50 missing
digit, 75 missing digit, and occluded digit.
6
Background
  • Why Generative classifiers are not accurate as
    Discriminative classifiers?
  1. It is incomplete for generative classifiers to
    just approximate the inner-class information.
  2. The inter-class discriminative information
    between classes are discarded

Scheme for Generative classifiers in two-category
classification tasks
7
Background
  • Why Generative Classifiers are superior to
    Discriminative Classifiers in handling missing
    information problems?
  • SVM lacks the ability under the uncertainty
  • NB can conduct uncertainty inference under the
    estimated distribution.

A is the feature set T is the subset of A, which
is missing
8
Motivation
  • It seems that a good classifier should combine
    the strategies of discriminative classifiers and
    generative classifiers.
  • Our work trains one of the generative classifier
    Naïve Bayesian Classifies in a discriminative way.

9
Roadmap of our work

Discriminative training
10
How our work relates to other work?
Jaakkola and Haussler NIPS98
Difference Our method performs a reverse
process From Generative classifiers to
Discriminative classifiers
Beaufays etc., ICASS99, Hastie etc., JRSS 96
Difference Our method is designed for Bayesian
classifiers.
11
How our work relates to other work?
Optimization on Posterior Distribution P(Cx)
3.
Logistical Regression (LR)
Difference LR will encounter computational
difficulties in handling missing information
problems. When number of the missing or unknown
features grows, it will be intractable to
perform inference.
12
Roadmap of our work

13
Discriminative Naïve Bayesian Classifiers
Easily solved by Lagrange Multiplier method
Mathematic Explanation of Naïve Bayesian
Classifier
Working Scheme of Naïve Bayesian Classifier
14
Discriminative Naïve Bayesian Classifiers (DNB)
  • Optimization function of DNB

Divergence item
  • On one hand, the minimization of this function
    tries to approximate the dataset as accurately as
    possible.
  • On the other hand, the optimization on this
    function also tries to enlarge the divergence
    between classes.
  • Optimization on joint distribution directly
    inherits the ability of NB in handling missing
    information problems

15
Discriminative Naïve Bayesian Classifiers (DNB)
  • Complete Optimization problem

Cannot separately optimize and as in
NB, Since they are interactive variables now.
16
Discriminative Naïve Bayesian Classifiers (DNB)
  • Solve the Optimization problem
  • Nonlinear optimization problem under linear
    constraints. Using Rosen Gradient Projection
    methods

17
Discriminative Naïve Bayesian Classifiers (DNB)
Gradient and Projection matrix
18
Extension to Multi-category Classification
problems
19
Experimental results
  • Experimental Setup
  • Datasets
  • 5 benchmark datasets from UCI machine learning
    repository
  • Experimental Environments
  • PlatformWindows 2000
  • Developing tool Matlab 6.5

20
Without information missing
  • Observations
  • DNB outperforms NB in every datasets
  • DNB wins in 2 datasets while it loses in three
    dataets in comparison with SVM
  • SVM outperforms DNB in Segment and Satimages

21
With information missing
  • DNB uses
  • to conduct inference when there is information
    missing
  • SVM sets 0 values to the missing features (the
    default way to process unknown features in
    LIBSVM)

22
With information missing
23
With information missing
24
With information missing
25
With information missing
  • Observations
  • NB demonstrates a robust ability in handling
    missing information problems.
  • DNB inherits the ability of NB in handling
    missing information problems while it has a
    higher classification accuracy than NB
  • SVM cannot deal with missing information problems
    easily.
  • In small datasets, DNB demonstrates a superior
    ability than NB.

26
Discussion
  • Why SVM outperforms DNB when no information
    missing?

SVM
DNB
  • SVM directly minimizes the error rate, while DNB
    minimizes an intermediate term.
  • SVM assumes no model, while DNB assumes
    independent relationship among features. all
    models are wrong but some are useful.

27
Discussion
  • How DNB relates to Fisher Discriminant (FD)?

FD
  • Using the difference of the mean between two
    classes as the divergence measure is not an
    informative way in comparison with using
    distributions.
  • FD is usually used as dimension reduction method
    rather than a classification method

28
Discussion
  • Can DNB be extended to general Bayesian Network
    (BN) Classifier?
  • Finding optimal General Bayesian Network
    Classifiers is an NP-complete problem.
  • Structure learning problem will be involved.
    Direct application of DNB will encounter
    difficulties since the structure is non-fixed in
    restricted BNs .
  • The tree-like discriminative Bayesian Network
    Classifier is ongoing.

29
Discussion
Discriminative training of Tree-like Bayesian
Network Classifiers
And as far as possible from the distribution of
the other dataset
Two reference distributions are used in each
iteration.
Approximate the Empirical distribution as close
as possible
30
Future work
  • Extensive evaluations on discriminative Bayesian
    network classifiers including Discriminative
    Naïve Bayesian Classifiers and tree-like Bayesian
    Network Classifiers.

31
Conclusion
  • We develop a novel model named Discriminative
    Naïve Bayesian Classifiers
  • It outperforms Naïve Bayesian Classifiers when no
    information is missing
  • It outperforms SVMs in handling missing
    information problems.
Write a Comment
User Comments (0)
About PowerShow.com