Principal Components Analysis - PowerPoint PPT Presentation

1 / 165
About This Presentation
Title:

Principal Components Analysis

Description:

The scree test suggests a single component is required, but several additional ... This approach also produces the expected random signature in the scree plot. ... – PowerPoint PPT presentation

Number of Views:142
Avg rating:3.0/5.0
Slides: 166
Provided by: michael1175
Category:

less

Transcript and Presenter's Notes

Title: Principal Components Analysis


1
Principal Components Analysis
2
  • Principal components analysis is a method for
    re-expressing a set of variables. It is used for
  • Converting a large set of variables to a smaller
    number of linear combinations that contain most
    of the original information
  • Exploring the dimensionality in a set of
    variables
  • Eliminating multicollinearity in a set of
    predictors
  • Identification of outliers

3
Principal components analysis creates linear
combinations of the original variables. Each
successive linear combination (a) Accounts for
as much of the variance in the original variables
as possible (b) Is independent of the previously
created linear combinations
4
The principal components represent a rotation of
the original reference axes to a new position.
The projection of the data points onto these new
axes represent the new linear combination scores.
5
Weight Vectors and Rotation . . . One more time


6
Linear combinations are created by vector
multiplication and produce a new vector. The
result of the vector multiplication is the
perpendicular projection of points onto the new
vector.
x2
a
w
LC
The projection is the score that the person gets
on the linear combination.
x1
7
If a second linear combination is formed, and
this vector is kept orthogonal to the first, then
a rigid rotation of the original reference
vectors occurs.
x2
a
w1
w2
x1
The result of the matrix multiplication is to
give the projections of Point a onto these new
vectors. These are also the scores on the new
linear combinations.
8
The elements of the weight matrix, W, for
creating the two new linear combinations are the
cosines of the angles between the old reference
vectors and the new vectors. They indicate the
angle of rotation necessary for moving the old
reference axes rigidly into the new position. The
result of the matrix multiplication is to produce
the projections onto these new vectors.
cos11q
cos12q
W
cos22q
cos21q
Each column of W is a weight vector for producing
a new linear combination.
9
2
135
45
1
2
45
1
225
cos11q
cos12q
.707
-.707
W

cos22q
cos21q
.707
.707
10
(No Transcript)
11
Principal components proceeds by seeking a
linear combination (z) that is a weighted
combination of the original variables, Xu, such
that the variance of z has the largest value
possible, under the constraint that u have unit
length (uu 1)
The constraint that u have unit length is not
just a convenience, it is necessary in this case
to finding a solution.
12
The problem can be recast as maximizing uRu.
This is sensible because the application of a
vector of weights to a variance-covariance matrix
will produce the variance of the linear
combination that is produced by applying the
weights to the data matrix corresponding to the
variance-covariance matrix.
13
The weights that satisfy the variance maximizing
goal are called an eigenvector (u). The variance
of the resulting linear combination is called an
eigenvalue (l). If the original correlation
matrix is of full rank (no perfect dependencies),
then there will be as many eigenvectors and
eigenvalues as there are original variables.
14
The variance-covariance matrix of the principal
components is a diagonal matrix, D, with the
eigenvalues on the main diagonal D URU
15
Principal components analysis seeks to place the
orthogonal axes in such a way that the major
dimensions in the data are highlighted. Three
variable problem, with two major underlying
dimensions.
16
Principal components analysis seeks to place the
orthogonal axes in such a way that the major
dimensions in the data are highlighted. Three
variable problem, with one major underlying
dimension.
17
Principal components analysis re-expresses the
original variables. It simply rearranges the
variance, shifting it so that most is contained
in the first principal component, the next most
in the second principal component, and so
on. This means that trace(D) trace(R). The
matrices D and R contain the same information,
but in different arrangements.
18
The proportion of variance that each principal
component accounts for in the original data can
be expressed as
19
The determinant of R is usually complex to find.
It can be found easily with D
That the determinant of R is also the determinant
of D is a reminder that principal components
analysis simply rearranges the variance.
20
Recall that X ZsDW Any data matrix can be
decomposed into three parts
  • A matrix of uncorrelated variables with unit
    variance
  • A transformation that changes the scale
    (stretches or shrinks)
  • An orthogonal transformation that rotates the
    reference vectors.

21
In the context of principal components, Zs is a
matrix of standardized principal components
scores, D is the diagonal matrix that contains
variances for the variances for the principal
components (the eigenvalues). W is a matrix that
contains the weights for creating the linear
combinations. In principal components, this
matrix is symbolized as U, and contains the
eigenvectors X ZsD½U
22
The formula can be rearranged to provide the
principal components scores Zs XUD-½ These
might be used in other statistical analyses
because of their desirable properties.
23
The meaning of the new linear combinations is
sometimes easier to grasp by examining the
correlations of the original variables with the
new linear combinationsthe principal component
scores.
24
(No Transcript)
25
These correlations are called principal component
loadings and are found by F UD½ The
principal component loadings are just a rescaling
of the eigenvectors.
26
Principal components analysis seeks to place the
orthogonal axes in such a way that the major
dimensions in the data are highlighted. Three
variable problem, with two major underlying
dimensions.
27
Principal components analysis seeks to place the
orthogonal axes in such a way that the major
dimensions in the data are highlighted. Three
variable problem, with one major underlying
dimension.
28
The proportion of variance in Xi that is
accounted for by c principal components is
calculated by
Each element, fij, is a principal component
loading. Why does it make sense to simply sum the
squares of these correlations to get the
proportion of variance accounted for in Xi by the
c components?
29
The proportion of variance in Xi that is
accounted for by c principal components is
sometimes called the communality. It provides an
index of how well the principal components can
reproduce each of the original variables.
30
The variances of linear combinations can be
obtained by applying the weights for the linear
combinations to the covariance matrix of the
original variables. In standard score form D
URU
31
This also means that the correlations among the
original variables can be recovered from the
covariance matrix of linear combinations R UDU
32
If all of the principal components are derived,
the reconstructed correlation matrix will be
exactly the original correlation matrix. But, if
only c components are derived, the reconstructed
matrix will be an estimate of the original
matrix, R, that is implied by the components
33
The closeness of the reconstructed matrix to the
original can be used to gauge how well the c
components capture the variance in the original
variables
34
The correlation matrix, reconstructed correlation
matrix, and the residual matrix play a role in
determining how many components should be derived.
This is Bartletts test of sphericity, a test
that the matrix R is an identity matrix. A
modification of this test can be applied to
residual matrices to test if additional
components should be extracted.
35
  • Two other methods are commonly used to determine
    how many components are necessary to capture the
    information in the original variables
  • The scree test
  • Kaisers l gt 1.0 rule

36
Huba et al. (1981) collected data on drug use
reported by 1634 students (7th to 9th grade) in
Los Angeles. Participants rated their use on a
5-point scale 1 never tried, 2 only once, 3
a few times, 4 many times, 5 regularly.

37
Should the matrix be analyzed?
Bartlett's test indicates the correlation matrix
is clearly not an identity matrix
This can vary between 0 and 1, indicating whether
there is sufficient multicollinearity to warrant
an analysis. Higher values indicate the
desirability of a principal components analysis.
38
Two principal components account for nearly half
of the information in the original variables
39
All principal components are extracted so all of
the variance in the variables is accounted for.
40
Part of the loading matrix. How should the first
two principal components be interpreted?
41
With all components extracted, the reproduced
correlations equal the original correlations and
the residuals are zero.
42
Only two principal components are indicated by
the scree test.
43
Only two principal components are indicated by
the Kaiser rule as well.
44
With only two principal components, much less of
the variance in each variable is accounted for.
45
The loadings for the first two components do not
change when they are the only ones extracted.
46
Now the reproduced correlations deviate from the
original correlations, sometimes substantially.
47
Scores on the principal components could be
calculated using Zs XUD-½ These might be used
as independent predictors of some other useful
outcome (e.g., school attendance, grades,
truancy, etc.) or they might be used as
uncorrelated outcomes representing overall level
of experience with drugs and alcohol, and,
preference for hard-core drugs. These might be
predicted by other variables (e.g., age,
stability of the family, stress, etc.).
48
Another example . . . Ofir and Simonson (2001)
collected data (N 201) on an individual
difference measure called need for cognition.
They wanted to know if this 18-item scale was
best described by one dimension. If so, a single
composite score would be an appropriate summary
for research purposes.
49
1. I would prefer complex to simple
problems. 2. I like to have the
responsibility of handling a situation that
involves a lot of thinking. 3.
Thinking is not my idea of fun. 4. I
would rather do something that requires little
thought than something that is sure to challenge
my thinking abilities. 5. I try to
anticipate and avoid situations where there is
likely a chance I will have to think in depth
about something. 6. I find joy in
deliberating hard and for long hours.
7. I only think as hard as I have to.
8. I prefer to think about small, daily projects
to long-term ones. 9. I like tasks
that require little thought once Ive learned
them. 10. The idea of relying on thought
to make my way to the top appeals to me.
11. I really enjoy a task that involves coming
up with new solutions to problems. 12.
Learning new ways to think doesnt excite me very
much. 13. I prefer my life to filled
with puzzles that I must solve. 14. The
notion of thinking abstractly appeals to me.
15. I would prefer a task that is
intellectual, difficult, and important to one
that is somewhat important but does not require
much thought. 16. I feel relief rather
than satisfaction after completing a task that
required a lot of attention. 17. Its
enough for me that something gets the job done I
dont care how or why it works.. 18. I
usually end up deliberating about issues even
though they do not affect me personally.
50
Each items is rated using the following scale 1
very characteristic of me 2 somewhat
characteristic of me 3 neutral 4 somewhat
uncharacteristic of me 5 very uncharacteristic
of me
51
Sample size is less than 201 due to listwise
deletion.
52
(No Transcript)
53
(No Transcript)
54
(No Transcript)
55
(No Transcript)
56
(No Transcript)
57
(No Transcript)
58
(No Transcript)
59
The correlation matrix for the 18 items is not an
identity matrix and can be analyzed using
principal components.
60
Is a one-dimensional model reasonable?
61
(No Transcript)
62
How is the first principal component interpreted?
Why are some of the weights negative?
  • I would prefer complex to simple problems.
  • I like to have the responsibility of handling a
    situation that involves a lot of thinking.
  • Thinking is not my idea of fun.
  • I would rather do something that requires little
    thought than something that is sure to challenge
    my thinking abilities.
  • I try to anticipate and avoid situations where
    there is likely a chance I will have to think in
    depth about something.
  • I find joy in deliberating hard and for long
    hours.

63
The scree test suggests a single component is
required, but several additional components have
eigenvalues greater than 1.00. Are those
meaningful? Are they just error?
One way to address this question is to determine
what pattern of eigenvalues would emerge if the
data were simply random.
64
One approach, first suggested by Horn, is based
on generating a new matrix of random, normally
distributed variables. This matrix is the same
size as the original data matrix.
65
Another approach, called bootstrapping, generates
a new matrix of data by resampling with
replacement from the original data matrix. Each
data point is independently (not case) sampled.
66
The random data are then analyzed with principal
components analysis.
67
If the data are truly random, then the
correlation matrix should approximate an identity
matrix.
68
The principal components analysis will capitalize
on chance relations in the data and extract some
components with eigenvalues greater than 1.00.
69
The pattern of eigenvalues for random data is
quite different from that for data with a
nonrandom structure.
70
The same logic underlies the analysis of the
bootstrap sample. It too is a random data matrix,
but has the advantage of carrying any
idiosyncratic distributional characteristics that
existed in the original data. This makes it a
potentially better baseline for comparison.
71
Because of the way the bootstrap sample is
generated, its correlation matrix should
approximate an identity matrix.
72
The analysis capitalizes on chance, providing a
clue to how many eigenvalues greater than 1.00
might occur even in the absence of any meaningful
structure.
73
This approach also produces the expected random
signature in the scree plot.
74
Clearly, only one component underlies the actual
data.
75
Principal components analysis can also be used to
screen the data for outliers, especially cases
that may not be univariate outliers but are
unusual in the multivariate sense.
A random sample of 250 cases were generated for 9
variables from a population having this
correlation structure
76
The data are in standard score form to make
interpretation easy. One case was replaced with a
profile that made it unusual in the multivariate
sense, though not terribly deviant in the
univariate sense Variable 1 3 Variable 2
-3 Variable 3 3 Variable 4 -3 Variable 5
3 Variable 6 -3 Variable 7 3 Variable
8 -3 Variable 9 3
Each value by itself is unusual. In a sample this
large, such values would be expected, but not
for the same case.
77
Do Univariate Diagnostics Detect the Case?
78
(No Transcript)
79
(No Transcript)
80
(No Transcript)
81
(No Transcript)
82
(No Transcript)
83
(No Transcript)
84
(No Transcript)
85
(No Transcript)
86
(No Transcript)
87
(No Transcript)
88
(No Transcript)
89
(No Transcript)
90
(No Transcript)
91
(No Transcript)
92
(No Transcript)
93
(No Transcript)
94
(No Transcript)
95
(No Transcript)
96
(No Transcript)
97
(No Transcript)
98
(No Transcript)
99
(No Transcript)
100
(No Transcript)
101
(No Transcript)
102
A principal components analysis will seek linear
combinations that capture the major sources of
variance in the data. Most of these will be
governed by the well-behaved data. But, once
those data are captured, especially deviant
multivariate cases may dominant the smaller
components and emerge more readily. In this
approach, all components are derived and
component scores are produced. Then diagnostics
are performed on the component scores.
103
(No Transcript)
104
(No Transcript)
105
(No Transcript)
106
(No Transcript)
107
(No Transcript)
108
(No Transcript)
109
(No Transcript)
110
(No Transcript)
111
(No Transcript)
112
(No Transcript)
113
(No Transcript)
114
(No Transcript)
115
(No Transcript)
116
(No Transcript)
117
(No Transcript)
118
(No Transcript)
119
(No Transcript)
120
(No Transcript)
121
(No Transcript)
122
Principal components analysis can also be used to
solve multicollinearity problems. The previously
used data file (9 variables) contains an extra
variable that can be used as an outcome variable.
123
Note the relations among the predictors (first
nine columns and between the predictors and the
outcome variable (last column).
124
Previous outlier has been removed, N 249.
125
(No Transcript)
126
If only data were really this good . . .
127
Whats the problem here?
128
We could eliminate one of the predictors, perhaps
the one with the lowest tolerance.
129
That improves matters, but principal components
analysis offers an alternative.
Three components account for over three fourths
of the variance.
130
Three components are indicated and were
extracted. The principal component scores were
generated.
131
These components will track the clusters of
variables in the data, but not as nicely as an
additional rotation of the reference axes might
allow.
132
The regression analysis can be duplicated, but
with the 3 principal components scores in place
of the original 9 variables.
133
Does this make sense?
134
(No Transcript)
135
Why are the part and zero-order correlations the
same? All multicollinearity has been removed.
136
The regression coefficients are completely
independent . . .
137
Additional confirmation that the
multicollinearity has been removed . . .
138
Sometimes researchers will use principal
components analysis to determine how composite
scores should be created, but then will create
these scores as simple sums rather than optimally
weighted principal component scores. Why would it
matter?
139
  • Three non-optimal rules were used
  • Use all items but add or subtract depending on
    the sign of the loading on a component.

Unit_1 IV1 IV2 IV3 IV4 IV5 IV6 IV7
IV8 IV9 Unit_2 -IV1 - IV2 - IV3 - IV4 IV5
IV6 IV7 IV8 IV9 Unit_3 IV1 IV2 IV3 -
IV4 IV5 IV6 IV7 IV8 IV9
140
Three non-optimal rules were used (b) Use only
those items that load at least .30 in absolute
value.
L30_1 IV1 IV2 IV3 IV4 IV5 IV6
IV7 L30_2 -IV1 - IV2 - IV3 - IV4 IV7 IV8
IV9 L30_3 IV1 IV2 IV3 IV5 IV6 IV8
IV9
141
Three non-optimal rules were used (c) Use only
those items that load at least .50 in absolute
value.
L50_1 IV1 IV2 IV3 IV4 IV5 IV6
IV7 L50_2 IV7 IV8 IV9 L50_3 IV2 IV5
IV6
142
  • Three non-optimal rules were used
  • Use all items but add or subtract depending on
    the sign of the loading on a component.
  • Use only those items that load at least .30 in
    absolute value.
  • Use only those items that load at least .50 in
    absolute value.

143
(No Transcript)
144
(No Transcript)
145
(No Transcript)
146
(No Transcript)
147
Principal components only retain their desirable
properties when they are computed appropriately.
Short-cut procedures can lead to composite scores
that are no longer orthogonal and that are
missing important sources of information.
148
The use of principal components analysis has a
hidden danger when used in experimental research.
Numerous measures might be collected and
principal components analysis is a reasonable way
to simplify the data prior to conducting major
analyses. Imagine a researcher conducts a study
(N 500) in which 20 different outcome measures
are collected.
149
Thinking that these would produce a lot of
significance tests and inflation of the Type I
error rate, he decides to conduct a principal
components analysis to reduce the set.
150
Looks like a good plan. The correlation matrix is
clearly not an identity matrix . . .
151
(No Transcript)
152
Looks like a considerable reduction is possible.
Just one component appears to underlie the data.
A single principle component score could be
generated for each person and used in subsequent
analyses in place of the original 20 variables.
153
How would this component be interpreted? Any
potential problems in its derivation? Because the
data came from an experiment, there is probably
variation in the scores that is due to the
manipulation. That variation could be
artificially inflating or deflating the
correlations among the variables. It needs to be
removed before a principal components analysis is
conducted.
154
(No Transcript)
155
(No Transcript)
156
(No Transcript)
157
(No Transcript)
158
Model the group (treatment) contribution and then
remove it. Analyze the residuals in a principal
components analysis.
159
(No Transcript)
160
Weak evidence for principal components . . .
161
Hmm, look familiar . . . ?
162
How about this?
163
(No Transcript)
164
The original principal components analysis would
have indicated that a single component, and a
single t-test, would have provided an adequate
test of group differences. When the correct
matrix is analyzed (residuals), there is no
evidence for multidimensionality, indicating the
need for 20 individual t-tests.
165
Principal components can be made more
interpretable by a further rotation of the axes
away from their original optimal position. The
new position (accomplished by another weight
matrix) will account for as much variance as the
original location, but distributes it in a way
that might enhance interpretation. These rotation
procedures are common in factor analyses, a close
relative of principal components that often has
stronger underlying dimensions goal to it.
Write a Comment
User Comments (0)
About PowerShow.com