Review - PowerPoint PPT Presentation

About This Presentation
Title:

Review

Description:

Learn N SVM's. SVM 1 learns 'Output==1' vs 'Output != 1' SVM 2 learns 'Output==2' ... SVM N learns 'Output==N' vs 'Output != N' Error Correct Output Code (ECOC) ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 42
Provided by: rong7
Learn more at: http://www.cse.msu.edu
Category:
Tags: learns | review

less

Transcript and Presenter's Notes

Title: Review


1
Review
  • Rong Jin

2
Comparison of Different Classification Models
  • The goal of all classifiers
  • Predicating class label y for an input x
  • Estimate p(yx)

3
K Nearest Neighbor (kNN) Approach
4
K Nearest Neighbor Approach (KNN)
  • What is the appropriate size for neighborhood
    N(x)?
  • Leave one out approach
  • Weight K nearest neighbor
  • Neighbor is defined through a weight function
  • Estimate p(yx)
  • How to estimate the appropriate value for ?2?

5
K Nearest Neighbor Approach (KNN)
  • What is the appropriate size for neighborhood
    N(x)?
  • Leave one out approach
  • Weight K nearest neighbor
  • Neighbor is defined through a weight function
  • Estimate p(yx)
  • How to estimate the appropriate value for ?2?

6
K Nearest Neighbor Approach (KNN)
  • What is the appropriate size for neighborhood
    N(x)?
  • Leave one out approach
  • Weight K nearest neighbor
  • Neighbor is defined through a weight function
  • Estimate p(yx)
  • How to estimate the appropriate value for ?2?

7
Weighted K Nearest Neighbor
  • Leave one out maximum likelihood
  • Estimate leave one out probability
  • Leave one out likelihood of training data
  • Search the optimal ?2 by maximizing the leave one
    out likelihood

8
Weight K Nearest Neighbor
  • Leave one out maximum likelihood
  • Estimate leave one out probability
  • Leave one out likelihood of training data
  • Search the optimal ?2 by maximizing the leave one
    out likelihood

9
Gaussian Generative Model
  • p(yx) p(xy) p(y) posterior likelihood ?
    prior
  • Estimate p(xy) and p(y)
  • Allocate a separate set of parameters for each
    class
  • ? ? ?1, ?2,, ?c
  • p(xly?) ? p(x?y)
  • Maximum likelihood estimation

10
Gaussian Generative Model
  • p(yx) p(xy) p(y) posterior likelihood ?
    prior
  • Estimate p(xy) and p(y)
  • Allocate a separate set of parameters for each
    class
  • ? ? ?1, ?2,, ?c
  • p(xly?) ? p(x?y)
  • Maximum likelihood estimation

11
Gaussian Generative Model
  • Difficult to estimate p(xy) if x is of high
    dimensionality
  • Naïve Bayes
  • Essentially a linear model
  • How to make a Gaussian generative model
    discriminative?
  • (?m,?m) of each class are only based on the data
    belonging to that class ? lack of discriminative
    power

12
Gaussian Generative Model
  • Maximum likelihood estimation

13
Gaussian Generative Model
  • Bound optimization algorithm

14
Gaussian Generative Model
We have decomposed the interaction of parameters
between different classes
Question how to handle x with multiple features ?
15
Logistic Regression Model
  • A linear decision boundary w?xb
  • A probabilistic model p(yx)
  • Maximum likelihood approach for estimating
    weights w and threshold b

16
Logistic Regression Model
  • Overfitting issue
  • Example text classification
  • Words that appears in only one document will be
    assigned with infinite large weight
  • Solution regularization

17
Non-linear Logistic Regression Model
  • Kernelize logistic regression model

18
Non-linear Logistic Regression Model
  • Hierarchical Mixture Expert Model
  • Group linear classifiers into a tree structure

Products generates nonlinearity in the prediction
function
19
Non-linear Logistic Regression Model
  • It could be a rough assumption by assuming all
    data points can be fitted by a linear model
  • But, it is usually appropriate to assume a local
    linear model
  • KNN can be viewed as a localized model without
    any parameters
  • Can we extend the KNN approach by introducing a
    localized linear model?

20
Localized Logistic Regression Model
  • Similar to the weight KNN
  • Weigh each training example by
  • Build a logistic regression model using the
    weighted examples

21
Localized Logistic Regression Model
  • Similar to the weight KNN
  • Weigh each training example by
  • Build a logistic regression model using the
    weighted examples

22
Conditional Exponential Model
  • An extension of logistic regression model to
    multiple class case
  • A different set of weights wy and threshold b for
    each class y
  • Translation invariance

23
Maximum Entropy Model
  • Finding the simplest model that matches with the
    data
  • Iterative scaling methods for optimization

24
Support Vector Machine
  • Classification margin
  • Maximum margin principle
  • Separate data far away from the decision boundary
  • Two objectives
  • Minimize the classification error over training
    data
  • Maximize the classification margin
  • Support vectors
  • Only support vectors have impact on the location
    of decision boundary

denotes 1 denotes -1
25
Support Vector Machine
  • Classification margin
  • Maximum margin principle
  • Separate data far away from the decision boundary
  • Two objectives
  • Minimize the classification error over training
    data
  • Maximize the classification margin
  • Support vectors
  • Only support vectors have impact on the location
    of decision boundary

denotes 1 denotes -1
26
Support Vector Machine
  • Separable case
  • Noisy case

27
Support Vector Machine
  • Separable case
  • Noisy case

28
Logistic Regression Model vs. Support Vector
Machine
  • Logistic regression model
  • Support vector machine

29
Logistic Regression Model vs. Support Vector
Machine
Logistic regression differs from support vector
machine only in the loss function
30
Kernel Tricks
  • Introducing nonlinearity into the discriminative
    models
  • Diffusion kernel
  • A graph laplacian L for local similarity
  • Diffusion kernel
  • Propagate local similarity information into a
    global one

31
Fisher Kernel
  • Derive a kernel function from a generative model
  • Key idea
  • Map a point x in original input space into the
    model space
  • The similarity of two data points are measured in
    the model space

Model Space
32
Kernel Methods in Generative Model
  • Usually, kernels can be introduced to a
    generative model through a Gaussian process
  • Define a kernelized covariance matrix
  • Positive semi-definitive, similar to Mercers
    condition

33
Multi-class SVM
  • SVMs can only handle two-class outputs
  • One-against-all
  • Learn N SVMs
  • SVM 1 learns Output1 vs Output ! 1
  • SVM 2 learns Output2 vs Output ! 2
  • SVM N learns OutputN vs Output ! N

34
Error Correct Output Code (ECOC)
  • Encode each class into a bit vector

1 1 2
x
1 1 1 0
35
Ordinal Regression
  • A special class of multi-class classification
    problem
  • There a natural ordinal relationship between
    multiple classes
  • Maximum margin principle
  • The computation of margin involves multiple
    classes

36
Ordinal Regression
37
Decision Tree
From slides of Andrew Moore
38
Decision Tree
  • A greedy approach for generating a decision tree
  • Choose the most informative feature
  • Using the mutual information measurements
  • Split data set according to the values of the
    selected feature
  • Recursive until each data item is classified
    correctly
  • Attributes with real values
  • Quantize the real value into a discrete one

39
Decision Tree
  • The overfitting problem
  • Tree pruning
  • Reduced error pruning
  • Rule post-pruning

40
Decision Tree
  • The overfitting problem
  • Tree pruning
  • Reduced error pruning
  • Rule post-pruning

41
Generalize Decision Tree
Each node is a linear classifier
?
?

?
?
?
?



a decision tree using classifiers for data
partition
a decision tree with simple data partition
Write a Comment
User Comments (0)
About PowerShow.com