Discriminant Analysis Concepts

About This Presentation

Title:

Discriminant Analysis Concepts

Description:

Think of it as MANOVA in reverse in MANOVA we asked if groups are ... The probability an individual from p1 is wrongly. classified is. f1(x)dx = P(2 | 1) ... – PowerPoint PPT presentation

Number of Views:45

Avg rating:3.0/5.0

Slides: 32

Provided by: stever70

Category:

more less

Transcript and Presenter's Notes

Title: Discriminant Analysis Concepts

1
Discriminant Analysis Concepts

Used to predict group membership from a set of
continuous predictors
Think of it as MANOVA in reverse in MANOVA we
asked if groups are significantly different on a
set of linearly combined responses.
The same responses can be used to predict group
membership.

2
Discriminant Analysis Concepts

Determine how can continuous variables be
linearly combined to best classify a subject into
a group.
A better term may be separation.
Slightly different is classification when we
seek rules that allocate new subjects into
established classes.
Logistic regression is a competitor.

3
Classification

Two populations p1 and p2
We have measurements x' x1 x2 . . .xp on
each of the individuals concerned.
Given a new value of x for an unknown individual
our problem is how we can best classify this
individual.

4
Illustration
f1(x)
f2(x)
R1
R2
Probability of misclassifying Population 2 member
in Population 1
Probability of misclassifying Population 1 member
in Population 2
5
Misclassification
The probability an individual from p1 is wrongly
classified is ? f1(x)dx P(2 1) and an
individual from p2 is wrongly classified is ?
f2(x)dx P(1 2)
R2
R1
6
Four Possibilities

Assume p1 and p2 are the prior probabilities of
p1 and p2, respectively.

Classified
Actual
7
Costs

In general there is a cost associated with
misclassification.
Assume the cost is zero for correct
classification.
C(2?1) as the cost of misclassifying a p1
individual as a p2 individual.
C(1?2) as the cost of misclassifying a p2
individual as a p1 individual.

Classified
Actual
8
Expected Cost of Misclassification (ECM)
ECM c(21)P(21)p1 c(12)P(12)p2
Goal To minimize ECM
9
It can be shown that the ECM is minimized if R1
contains those values of for which C(12)p2f2(x)
-C(21)p1f1(x) ? 0 and excludes those x for
which the above is gt 0. In other words R1 is the
set of points x for which f1(x) p2 C(12)
f2(x) p1 C(21) So when x
satisfies this inequality we would classify the
corresponding individual in p1.
gt
10
Conversely since R2 is the complement of R1 R2 is
the set such that f1(x) p2 C(12)
f2(x) p1 C(21) and an individual
whose x vector satisfied this inequality would be
allocated to p2.
lt
11

Assuming x has a multivariate normal distribution
i.e.
x Np(µi , ?) in population i (i 1,2)
(note that this implies the same covariance
matrix applies to each population) we have
f1(x) exp-1/2(x-m1)S-1(x-m1)
f2(x) exp-1/2(x-m2)S-1(x-m2)

and the general rule (1) after taking natural
logs and some rearrangement can be shown to be
equivalent to
?'x - ? '( µ1 µ2 ) ? c
2
where ? ?-1( µ1 - µ2 ) d1
d2
.
.
dp say
(Correspondingly d d1,d2.,dp (m1-m2)S-1
)
and c ln C(12) p2
C(21) p1

13
Priors

Typically, information is not available on the
prior probabilities p1 and p2.
Typically taken to be the same making c a
function of only the ratio of the two costs.
In addition, if the misclassification costs,
C(12) and C(21), are the same then c 0.

Ordinarily ?, µ1, µ2 are not known and need to
be estimated from the data by S, x1 and x2
respectively and we therefore use
S-1( x1 - x2 ) for ? etc
where S-1 is taken as the inverse of
Spooled (n1-1)S1 (n2-1)S2
(n1n2-2)
Where S1 and S2 are the sample covariance
matrices for the each of the two groups
(populations) respectively.

15
Minimum ECN for Two Normals

Allocate xo to p1 for which
?'xo - 1 ?(x1 x2) ? c
2

16
Linear Discriminant Function

? x ( µ1 - µ2 ) ?-1x is called the linear
discriminant function of x.
This linear combination of x summarizes all of
the possible information in x that is available
for discriminating between the two populations.

17
Unequal Covariance Matrices
Allocate xo to p1 for which
-0.5xo (S1-1 - S2-1 ) xo (x1S1-1 x2S2-1 ) xo
k ? c
where k 0.5ln(S1 S2-1 ) xo (x1S1-1x1
x2S2-1 x2)
(Quadratic Classification Rule)
18
Fishers Discriminant Function
Allocate xo to p1 for which
(x1-x2) SPooled-1 xo ? 0.5 (x1-x2) SPooled-1 (x1
x2)
Note The p-variate standard distance between two
vectors is defined as
For this problem maximized at a SPooled-1
(x1-x2)
19
Linear Discriminant Function,Alternative View
The linear combination of x, say is
called a linear discriminant function if .
20
Example
21
Example
Example of Linear Discriminant Function
4
The unscaled
3
2
1
0
b
-1
-2
-3
-4
-4
-3
-2
-1
0
1
2
3
4
22
Example
Example With Correlation
4
Correlation of 0.6
3
2
1
0
-1
b
-2
-3
-4
-4
-3
-2
-1
0
1
2
3
4
23
More Than Two Groups
Among-Groups sums of squares and cross-products
matrix
24
More Than Two Groups

Note Same decomposition we used with MANOVA

25
More Than Two Groups
Referred to as canonical discriminant analysis
26
Canonical Correlation Analysis

A statistical technique to identify and measure
the association between two sets of variables.
Multiple regression can be interpreted as a
special case of such an analysis.
The multiple correlation coefficient, R, can be
thought of as the maximum correlation that is
attainable between the dependent variable and a
linear combination of the independent variables

27
Canonical Correlation Analysis

CCA is an extension of the multiple R in multiple
regression.
In CCA, there can be multiple response variables.
Canonical correlations are the maximum
correlation between a linear combination of the
responses and a linear combination of the
predictor variables.

28
Canonical Correlations
Suppose where x1(x11,,x1q)? and
x2(x21,,x2,p-q)?. Note that Var(x1) ?11 is
q?q, Var(x2) ?22 is (p-q)?(p-q), Cov(x1,x2)
?12 is q?(p-q), Cov(x2,x1) ?21 is (p-q)?q, and
?12
29
The First Canonical Correlation

Find a1 and b1 (vectors of constants) such that
is large as possible.
Let U1 and V1 and call them
canonical variables.
Then Var(U1) ,
Var(V1) ,
and Cov(U1,V1) .

30
The First Canonical Correlation
The correlation between U1 and V1 is
31
Finding the Correlation
Let ?1 . It
can be shown that
is the largest eigenvalue of
a1 is the eigenvector corresponding to b1
is the eigenvector corresponding to the largest
eigenvalue of - this
largest eigenvalue also is .
Note that 0??1?1.

Write a Comment

User Comments (0)

About PowerShow.com

Discriminant Analysis Concepts - PowerPoint PPT Presentation

Discriminant Analysis Concepts

Think of it as MANOVA in reverse in MANOVA we asked if groups are ... The probability an individual from p1 is wrongly. classified is. f1(x)dx = P(2 | 1) ... – PowerPoint PPT presentation