Chapter 4 : Linear Methods for Classification - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Chapter 4 : Linear Methods for Classification

Description:

hyperplane or affine set L : defined by the equation (= a line in ) ... find hyperplane that minimizes some measure of overlap in the training data. least square ... – PowerPoint PPT presentation

Number of Views:78
Avg rating:3.0/5.0
Slides: 21
Provided by: bug94
Category:

less

Transcript and Presenter's Notes

Title: Chapter 4 : Linear Methods for Classification


1
Chapter 4 Linear Methods for Classification
  • Linear regression of an indicator matrix
  • Linear discriminant analysis
  • Logistic regression
  • Separating hyperplanes

In this chapter, decision boudaries are linear.
2
4.2. Linear regression of an indicator matrix
indicator
indicator response matrix
example
2 groups (K2) and 5 observations (N5)
observations 1 and 5 in group 1 observations 2, 3
and 4 in group 2
3
Linear regression model to each colums of Y
(see chapter 3 for linear regression)
model matrix
4
Classification of a new observation
  • compute the fitted output
  • identify the largest component and classify
    accordingly

justification
regression estimate of conditional expectation
x in group k if max for G k
5
Linear regression to estimate conditional
expectation?
problem
can be negative or greater than 1
(if prediction outside the hull of the training
data)
probability?
BUT good results
solution linear regression onto a basis
expansion h(X) of the inputs (see chapter 5)
6
More simplistic view point construct targets
kth element
linear model by least square
classification
7
Problem with K ? 3 classes can be masked by
others
solution quadratic rather than linear fit
8
4.3. Linear discriminant analysis
density of X in class Gk
prior probability
fk(x) Gaussian and the class have a common
covariance matrix
log-ratio
is linear in x
decision boundaries are linear
discriminant function
classification
9
Remarks
  • with 2 classes, linear discriminant analysis
    classification with linear least square
  • with more than 2 classes avoid masking problems
  • if not common covariance matrix, quadratic
    discriminant analysis

10
Regularized discriminant analysis (RDA)
compromise between linear discriminant analysis
(LDA) and quadratic discriminant analysis (QDA)
regularized covariance matrix
covariance matrix used in LDA
determined by cross-validation
11
Computations
  • Algorithm
  • Sphere the data X (using eigen-decomposition of
    the covariance matrix) common
    covarianceidentity
  • classify in the transformed space

simplified by diagonalisation of covariance
matrices
(eigen-decomposition)
12
Reduced-rank linear discriminant analysis
Fisher  Find the linear combination ZaTX such
that the between-class variance is maximized
relative to the within-class variance. 
maximizing the Rayleigh quotient
where B between-class covariance W
within-class covariance
13
4.4. Logistic regression
model specified by K-1 log-odds or logit
transformations
14
Fitting logistic regression model
usually, by maximum likelihood (Newton-Raphson
algorithm to solve the score equations)
example K 2 (2 groups)
write
encode
log-likelihood
15
Example South african heart disease
correlation between the set of predictors
surprising results some variables not included
in the logistic model
16
Quadratic approximations and inference
  • quadratic approximation of deviance Pearson
    chi-square statistic
  • if the model is correct, then consistent
    (convergence to the true )
  • normal distribution of
  • model building Rao score test, Wald test.

connection with least square parameters
estimates of logistic regression
coefficients of a weigthed least square fit
weigths
17
Differences between LDA and logistic regression
same form BUT differences in the way the
coefficients are estimated
logistic regression more general, less
assumptions (arbitrary density function for X),
more robust BUT very similar results in practice
18
4.5. Separating hyperplanes
perceptron classifiers such as
  • vector normal to the surface L
  • for any point x0 in L,
  • the signed distance of any point x to L is given
    by

hyperplane or affine set L defined by the
equation
( a line in )
properties
19
Rosenblatts perceptron learning algorithm
try to separate hyperplanes by minimizing the
distance of missclassified points to the
decisison boundary
minimize
M is the index set of missclassified points.
The algorithm uses stochastic gradient descent to
minimize this piecewise linear criterion.
20
Optimal separating hyperplanes
find hyperplane that minimizes some measure of
overlap in the training data.
  • unique solution
  • better classification performance on test data

advantages over Rosenblatts algorithm
least square
2 solutions by perceptron algorithm with
different random starts
Write a Comment
User Comments (0)
About PowerShow.com