Generalized Linear Discriminant Analysis - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Generalized Linear Discriminant Analysis

Description:

Linear Discriminant Analysis. Virtues of LDA: ... This in turn leads to more flexible forms of discriminant analysis, called FDA ... – PowerPoint PPT presentation

Number of Views:623
Avg rating:3.0/5.0
Slides: 22
Provided by: wh5
Category:

less

Transcript and Presenter's Notes

Title: Generalized Linear Discriminant Analysis


1
Generalized Linear Discriminant Analysis
Hao Wu Center for Automation Research Department
of Electrical and Computer Engineering University
of Maryland, College Park
ENEE698A
2
Outline
  • Recast LDA as a linear regression problem
  • Flexible discriminant analysis
  • Penalized discriminant analysis
  • Mixture discriminant analysis
  • Examples and conclusions

3
Linear Discriminant Analysis
  • Virtues of LDA
  • Simple prototype method for multiple class
    classification
  • Linear decision boundary leads to simple decision
    rule, and often produces best classification
  • Can provide natural low-dimensional views of the
    data
  • Limitations of LDA
  • Often linear decision boundaries are not adequate
    to separate the classes
  • A single prototype per class is insufficient
  • Sometimes too many correlated predictors lead to
    noisy coefficients

4
Generalize LDA
  • Recast the LDA as a linear regression problem
  • Many techniques exists for generalizing linear
    regression to more flexible, nonparametric forms
    of regression. This in turn leads to more
    flexible forms of discriminant analysis, called
    FDA
  • Penalized discriminant analysis
  • In the case of too many predictors, we want to
    fit the LDA model but penalize its coefficients
    to be smooth or otherwise coherent. Also the
    expanded basis set of FDA is so large that
    regularization is also required.
  • Both of these can be achieved via suitably
    regularized regression in the context of the FDA
    model
  • Mixture discriminant analysis
  • To model each class by mixture of two or more
    Gaussians with different centroids. This allows
    for more complex decision boundaries.

5
LDA by optimal scoring
Suppose is a function
that assigns scores to the classes, such that the
transformed class labels are optimally predicted
by linear regression on X. This produces a one
dimensional separation between the classes.
More generally K sets of independent scorings
for the class labels K corresponding linear
maps, chosen to be optimal for multiple
regression in
6
LDA by optimal scoring (contd)
  • Linear regression for classification
  • Fit the linear model
  • Computer the prediction
  • Decide the class

J classes
P 1 features
7
LDA by optimal scoring (contd)
  • By optimal scoring
  • Some notations
  • Matrix of K score vectors for the J classes
  • Matrix of K score vectors for the N training
    samples
  • Regression projection matrix
  • Then the average square residual turns into

8
LDA by optimal scoring (contd)
  • With normalization
  • Minimize
  • Amounts to finding the K largest eigenvectors
    of
  • With normalization

9
Summary for LDA by optimal scoring
10
An important fact
  • A well known fact
  • LDA can be performed by sequence of linear
    regressions, followed by classification to the
    closest centroid in the space of fits.
  • In optimal scoring method
  • The final coefficient matrix B is, up to a
    diagonal scale matrix, the same as the
    discriminant analysis coefficient matrix.
  • Classification
  • Assign an observation x to the class j that
    minimizes

11
Flexible Discriminant Analysis
  • Generalization
  • The real power of above result is in the
    generalizations that it invites. We can replace
    the linear regression fits by far more flexible,
    nonparametric fits to achieve a more flexible
    classifier than LDA.
  • A more general form of regression criterion
  • (generalized additive fits, spline functions,
    MARS)

12
Summary for FDA
  • Multivariate nonparametric regression
  • Fit a multiresponse, adaptive nonparametric
    regression of Y on X, giving fitted values .
    Let be the linear operator that fit the
    final chosen model, and
  • be the vector of fitted regression
    functions.
  • Optimal scores
  • Computer the eigen-decomposition of
    where the eigenvectors
    are normalized
  • Update
  • Update the model from step1 using the optimal
    scores

13
Some Results
14
FDA vs Regression
15
Penalized Discriminant Analysis
  • FDA can also be viewed directly as a form of
    regularized discriminant analysis.
  • Suppose
    with a quadratic penalty on the coefficients
  • Then the steps in FDA can be viewed as a
    generalized form of LDA, called PDA

16
Some results
17
Mixture Discriminant Analysis
  • LDA can be derived as the maximum likelihood
    method for normal populations with different
    means and common covariance matrix.
  • It is natural to generalize LDA by assuming that
    each observed class is in fact a mixture of
    unobserved normally distributed subclasses.
  • Gaussian mixture model
  • MLE estimation of parameters EM algorithm
  • Assumption the same covariance matrix for every
    subclass
  • Then M step becomes the weighted LDA ? FDA can be
    used

18
Some results
19
Conclusions
  • Linear discriminant analysis is equivalent to
    multi-response linear regression using optimal
    scorings to represent the groups.
  • Replacing linear regression by any nonparametric
    regression method can produce flexible
    discrimiant analysis.
  • (In this way, any multi-response regression
    techniques can be post-processed to improve their
    classification performance)
  • PDA is designed for situations in which there are
    many highly correlated predictors, such as image.
  • MDA uses Gaussian mixtures to each class and
    gives good performance in non-normal
    classification.

20
References
Trevor Hastie, Robert Tibshirani and Jerome
Friedman, "Elements of Statistical Learning Data
Mining, Inference and Prediction
Springer-Verlag, New York. Hastie, T. J.,
Tibshirani, R. and Buja, A. "Flexible
Discriminant Analysis by Optimal Scoring." JASA,
December 1994. Hastie, T. and Tibshirani, R.
"Discriminant Analysis by Gaussian
Mixtures."JRSSB (Jan 1996). Hastie, T. J.,
Buja, A., and Tibshirani, R. "Penalized
Discriminant Analysis." Annals of Statistics,
1995. Hastie, T., and Tibshirani, R. and Buja,
A. "Flexible Discriminant and Mixture Models" in
edited proceedings of "Neural Networks and
Statistics" conference, Edinburgh,1995. J. Kay
and D. Titterington, Eds. Oxford University Press
Hastie, T., talk on Flexible Discriminat and
Mixture Models
21
Thank you!
Thank Kevin for very helpful discussions!
Write a Comment
User Comments (0)
About PowerShow.com