Discriminant Analysis Concepts - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Discriminant Analysis Concepts

Description:

Think of it as MANOVA in reverse in MANOVA we asked if groups are ... The probability an individual from p1 is wrongly. classified is. f1(x)dx = P(2 | 1) ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 32
Provided by: stever70
Category:

less

Transcript and Presenter's Notes

Title: Discriminant Analysis Concepts


1
Discriminant Analysis Concepts
  • Used to predict group membership from a set of
    continuous predictors
  • Think of it as MANOVA in reverse in MANOVA we
    asked if groups are significantly different on a
    set of linearly combined responses.
  • The same responses can be used to predict group
    membership.

2
Discriminant Analysis Concepts
  • Determine how can continuous variables be
    linearly combined to best classify a subject into
    a group.
  • A better term may be separation.
  • Slightly different is classification when we
    seek rules that allocate new subjects into
    established classes.
  • Logistic regression is a competitor.

3
Classification
  • Two populations p1 and p2
  • We have measurements x' x1 x2 . . .xp on
    each of the individuals concerned.
  • Given a new value of x for an unknown individual
    our problem is how we can best classify this
    individual.

4
Illustration
f1(x)
f2(x)
R1
R2
Probability of misclassifying Population 2 member
in Population 1
Probability of misclassifying Population 1 member
in Population 2
5
Misclassification
The probability an individual from p1 is wrongly
classified is ? f1(x)dx P(2 1) and an
individual from p2 is wrongly classified is ?
f2(x)dx P(1 2)
R2
R1
6
Four Possibilities
  • Assume p1 and p2 are the prior probabilities of
    p1 and p2, respectively.

Classified
Actual
7
Costs
  • In general there is a cost associated with
    misclassification.
  • Assume the cost is zero for correct
    classification.
  • C(2?1) as the cost of misclassifying a p1
    individual as a p2 individual.
  • C(1?2) as the cost of misclassifying a p2
    individual as a p1 individual.

Classified
Actual
8
Expected Cost of Misclassification (ECM)
ECM c(21)P(21)p1 c(12)P(12)p2
Goal To minimize ECM
9
It can be shown that the ECM is minimized if R1
contains those values of for which C(12)p2f2(x)
-C(21)p1f1(x) ? 0 and excludes those x for
which the above is gt 0. In other words R1 is the
set of points x for which f1(x) p2 C(12)
f2(x) p1 C(21) So when x
satisfies this inequality we would classify the
corresponding individual in p1.
gt
10
Conversely since R2 is the complement of R1 R2 is
the set such that f1(x) p2 C(12)
f2(x) p1 C(21) and an individual
whose x vector satisfied this inequality would be
allocated to p2.
lt
11
  • Assuming x has a multivariate normal distribution
    i.e.
  • x Np(µi , ?) in population i (i 1,2)
  • (note that this implies the same covariance
    matrix applies to each population) we have
  • f1(x) exp-1/2(x-m1)S-1(x-m1)
  • f2(x) exp-1/2(x-m2)S-1(x-m2)


12
  • and the general rule (1) after taking natural
    logs and some rearrangement can be shown to be
    equivalent to
  • ?'x - ? '( µ1 µ2 ) ? c
  • 2
  • where ? ?-1( µ1 - µ2 ) d1
  • d2
  • .
  • .
  • dp say
  • (Correspondingly d d1,d2.,dp (m1-m2)S-1
    )
  • and c ln C(12) p2
  • C(21) p1



13
Priors
  • Typically, information is not available on the
    prior probabilities p1 and p2.
  • Typically taken to be the same making c a
    function of only the ratio of the two costs.
  • In addition, if the misclassification costs,
    C(12) and C(21), are the same then c 0.

14
  • Ordinarily ?, µ1, µ2 are not known and need to
  • be estimated from the data by S, x1 and x2
    respectively and we therefore use
  • S-1( x1 - x2 ) for ? etc
  • where S-1 is taken as the inverse of
  • Spooled (n1-1)S1 (n2-1)S2
  • (n1n2-2)
  • Where S1 and S2 are the sample covariance
    matrices for the each of the two groups
    (populations) respectively.

15
Minimum ECN for Two Normals
  • Allocate xo to p1 for which
  • ?'xo - 1 ?(x1 x2) ? c
  • 2


16
Linear Discriminant Function
  • ? x ( µ1 - µ2 ) ?-1x is called the linear
    discriminant function of x.
  • This linear combination of x summarizes all of
    the possible information in x that is available
    for discriminating between the two populations.

17
Unequal Covariance Matrices
Allocate xo to p1 for which
-0.5xo (S1-1 - S2-1 ) xo (x1S1-1 x2S2-1 ) xo
k ? c
where k 0.5ln(S1 S2-1 ) xo (x1S1-1x1
x2S2-1 x2)
(Quadratic Classification Rule)
18
Fishers Discriminant Function
Allocate xo to p1 for which
(x1-x2) SPooled-1 xo ? 0.5 (x1-x2) SPooled-1 (x1
x2)
Note The p-variate standard distance between two
vectors is defined as
For this problem maximized at a SPooled-1
(x1-x2)
19
Linear Discriminant Function,Alternative View
The linear combination of x, say is
called a linear discriminant function if .
20
Example
21
Example
Example of Linear Discriminant Function
4
The unscaled
3
2
1
0
b
-1
-2
-3
-4
-4
-3
-2
-1
0
1
2
3
4
22
Example
Example With Correlation
4
Correlation of 0.6
3
2
1
0
-1
b
-2
-3
-4
-4
-3
-2
-1
0
1
2
3
4
23
More Than Two Groups
Among-Groups sums of squares and cross-products
matrix
24
More Than Two Groups
  • Note Same decomposition we used with MANOVA

25
More Than Two Groups
Referred to as canonical discriminant analysis
26
Canonical Correlation Analysis
  • A statistical technique to identify and measure
    the association between two sets of variables.
  • Multiple regression can be interpreted as a
    special case of such an analysis.
  • The multiple correlation coefficient, R, can be
    thought of as the maximum correlation that is
    attainable between the dependent variable and a
    linear combination of the independent variables

27
Canonical Correlation Analysis
  • CCA is an extension of the multiple R in multiple
    regression.
  • In CCA, there can be multiple response variables.
  • Canonical correlations are the maximum
    correlation between a linear combination of the
    responses and a linear combination of the
    predictor variables.

28
Canonical Correlations
Suppose where x1(x11,,x1q)? and
x2(x21,,x2,p-q)?.   Note that Var(x1) ?11 is
q?q, Var(x2) ?22 is (p-q)?(p-q), Cov(x1,x2)
?12 is q?(p-q), Cov(x2,x1) ?21 is (p-q)?q, and
?12
29
The First Canonical Correlation
  • Find a1 and b1 (vectors of constants) such that
    is large as possible.
  • Let U1 and V1 and call them
    canonical variables.
  • Then Var(U1) ,
  • Var(V1) ,
  • and Cov(U1,V1) .

30
The First Canonical Correlation
The correlation between U1 and V1 is
31
Finding the Correlation
Let ?1 . It
can be shown that
      is the largest eigenvalue of      
a1 is the eigenvector corresponding to       b1
is the eigenvector corresponding to the largest
eigenvalue of - this
largest eigenvalue also is .  
Note that 0??1?1.
Write a Comment
User Comments (0)
About PowerShow.com