Principal Components presentation

About This Presentation

Transcript and Presenter's Notes

Title: Principal Components

1
Principal Components

As part of a study on football helmets,
scientists collected head measurements from a
number of football players
They measured 6 different aspects of the players
heads

2
Principal Components

3
Principal Components

Suppose we have data that is distributed over the
3D plane that goes thru (1,0,0), (0,1,0), (0,0,1)
No matter which pair of axes we use, the plots
will just look like a cloud of points
We will never realize that the data actually lies
on a 2 dimensional surface

4
Principal Components

Recall the formula for correlation in simple
linear regression
Corr Sxy/?(SxxSyy)
The numerator, Sxy, is proportional to covariance
It measures the extent to which larger values of
X are associated with larger (or smaller) values
of Y
If Cov0, then X and Y are not related

5
Principal Components

In terms of a plot, this means that the plot of Y
vs X is just a cloud of points
The plot does not tilt in either direction

6
Principal Components

The problem with the data in the plane example is
that the values are correlated
If we could look down the edge of the plane, then
we would see that there is not a third dimension
to the data
All the data lies in only the two dimensions of
the plane

7
Principal Components

8
Principal Components

If we can find the axes of the ellipsoid, we can
view the data in terms of these components
The longest axis is the most interesting
The shortest axis does not have much information

9
Principal Components

10
Principal Components

For variables x1, x2, , xk, define the
covariance matrix so that the (i,j) element is
Sxixj
This means that the matrix will be symmetric
If two variables are uncorrelated, then the
corresponding element of Cov will be zero (or
nearly so)

11
Principal Components

If we find the eigenvectors and eigenvalues of
Cov, this will diagonalize the Cov matrix
Ccov(data)
v,deig(c)
D is a diagonal matrix of eigenvalues
Diag(d) returns a list of the eigenvalues

12
Principal Components

13
Principal Components

14
Principal Components

In some sense, the sum of the e-values is the
overall variance
We can think of the individual e-values in terms
of what percent of the total they are
Eig() tends to return the e-values in ascending
order
We want them in descending order
Dsort-sort(-diag(d))
Then cumsum(dsort)/sum(dsort) tells us what
percent of the total would be contained in the
first k PCs

15
Principal Components

General rule use enough PCs to contain 80-90 of
the total
Balance this against how many PCs
If only 2-3 PCs contain most of the total, then
our problem is a lot simpler than we thought
Besides plots, we can use the PCs to detect
groupings of the data or outliers, say

16
Principal Components

Write a Comment

User Comments (0)

About PowerShow.com

Principal Components PowerPoint PPT Presentation