Title: Techniques for studying correlation and covariance structure
1Techniques for studying correlation and
covariance structure
- Principal Components Analysis (PCA)
- Factor Analysis
2Principal Component Analysis
3Let
have a p-variate Normal distribution
with mean vector
Definition
The linear combination
is called the first principal component if
is chosen to maximize
subject to
4Consider maximizing
subject to
Using the Lagrange multiplier technique
Let
5Now
and
6Summary
is the first principal component if
is the eigenvector (length 1)of S associated with
the largest eigenvalue l1 of S.
7The complete set of Principal components
Let
have a p-variate Normal distribution
with mean vector
Definition
The set of linear combinations
are called the principal components of
if
are chosen such that
8and
- Var(C1) is maximized.
- Var(Ci) is maximized subject to Ci being
independent of C1, , Ci-1 (the previous i -1
principle components)
Note we have already shown that
is the eigenvector of S associated with the
largest eigenvalue, l1 ,of the covariance matrix
and
9We will now show that
is the eigenvector of S associated with the ith
largest eigenvalue, li of the covariance matrix
and
Proof (by induction Assume true for i -1, then
prove true for i)
10Now
has covariance matrix
11Hence Ci is independent of C1, , Ci-1 if
We want to maximize
subject to
Let
12Now
and
13Now
hence
(1)
Also for j lt i
Hence fj 0 for j lt I and equation (1) becomes
14 are the eignevectors
of S associated with the eigenvalues
Thus
and
- Var(C1) is maximized.
- Var(Ci) is maximized subject to Ci being
independent of C1, , Ci-1 (the previous i -1
principal components)
where
15Recall any positive matrix, S
where
are eigenvectors of S of length 1 and
are eigenvalues of S.
16Example
- In this example wildlife (moose) population
density was measured over time (once a year) in
three areas.
17picture
Area 3
Area 2
Area 1
18The Sample Statistics
The mean vector
The covariance matrix
The correlation matrix
19Principal component Analysis
The eigenvalues of S
The eigenvectors of S
The principal components
20Area 3
Area 2
Area 1
21Area 3
Area 2
Area 1
22Area 3
Area 2
Area 1
23Graphical Picture of Principal Components
Multivariate Normal data falls in an ellipsoidal
pattern.
The shape and orientation of the ellipsoid is
determined by the covariance matrix S.
The eignevectors of S are vectors giving the
directions of the axes of the ellopsoid. The
eigenvalues give the length of these axes.
24- Recall that if S is a positive definite matrix
where P is an orthogonal matrix (PP PP I)
with the columns equal to the eigenvectors of S.
and D is a diagonal matrix with diagonal
elements equal to the eigenvalues of S.
25- The vector of Principal components
has covariance matrix
26- An orthogonal matrix rotates vectors, thus
rotates the vector
into the vector of Principal components
Also
tr(D)
27denotes the proportion of variance explained by
the ith principal component Ci.
28The Example
29Also
where
30- Comment
- If instead of the covariance matrix, S, The
correlation matrix R, is used to extract the
Principal components then the Principal
components are defined in terms of the standard
scores of the observations
The correlation matrix is the covariance matrix
of the standard scores of the observations
31More Examples
32(No Transcript)
33(No Transcript)
34(No Transcript)
35(No Transcript)
36(No Transcript)
37- Computation of the eigenvalues and eigenvectors
of S
Recall
38continuing we see that
For large values of n
39The algorithm for computing the eigenvector
rescaling so that the elements do not become to
large in value. i.e. rescale so that the largest
element is 1.
using the fact that
40- Continue with i 2 , , p 1 using the matrix
Example Using Excel - Eigen