Interpreting Principal Components presentation

About This Presentation

Transcript and Presenter's Notes

Title: Interpreting Principal Components

1
Interpreting Principal Components

Simon Mason
International Research Institute for Climate
Prediction
The Earth Institute of Columbia University

L i n k i n g S c i e n c e t o S o c
i e t y
2
Retaining Principal Components Principal
components analysis is specifically designed as a
data reduction technique. How many of the new
variables should be retained to represent the
total variability of the original variables
adequately? A stopping rule is required to
identify at which point additional principal
components are no longer required.
L i n k i n g S c i e n c e t o S p o
r t !
3
Retaining Principal Components There is a range
of criteria that could be used to formulate a
stopping rule Internal criteria 1. Total
variance explained 2. Marginal variance
explained 3. Comparison with other
deleted/retained eigenvalues External
criteria 4. Usefulness 5. Physical
interpretability.
L i n k i n g S c i e n c e t o S p o
r t !
4
Retaining Principal Components
Total variance explained Ensures a minimum
loss of information, but No a priori criteria for
defining the proportion of signal.
L i n k i n g S c i e n c e t o S p o
r t !
5
Retaining Principal Components
Marginal variance explained Ensures that each
component explains a substantial proportion of
the total variance. Choice of c?
L i n k i n g S c i e n c e t o S p o
r t !
6
Retaining Principal Components
Marginal variance explained 1. Original
variables For the correlation matrix, the
Guttmann - Kaiser criterion sets c 1. For the
covariance matrix, Kaisers rule sets c to the
average of the original variables
L i n k i n g S c i e n c e t o S p o
r t !
7
Retaining Principal Components
Marginal variance explained 2. Significant a.
The broken stick rule b. Rule N Randomization
procedures.
L i n k i n g S c i e n c e t o S p o
r t !
8
Retaining Principal Components
Similar variance explained Delete if components
with similar variance are deleted. 1. ?2
approximations 2. Scree test Delete eigenvalues
below the elbow.
L i n k i n g S c i e n c e t o S p o
r t !
9
Retaining Principal Components
Similar variance explained 3. Log-eigenvalue
test Scree test using logarithms of
eigenvalues. Based on the assumption that the
eigenvalues should decline exponentially.
L i n k i n g S c i e n c e t o S p o
r t !
10
Retaining Principal Components
Usefulness If principal components are to be
used in other applications, retain the number
that gives the best results. Use
cross-validation. Perhaps retain subsets that do
not necessarily include the first few
components. Possibly subject to sampling errors,
especially subset selection.
L i n k i n g S c i e n c e t o S p o
r t !
11
Retaining Principal Components
Physical interpretability 1. Time scores Do the
time scores differ from white noise? 2. Spatial
loadings Loadings identify modes of variability.
L i n k i n g S c i e n c e t o S p o
r t !
12
Interpreting the Principal Components Principal
components are notoriously difficult to interpret
physically. The weights are defined to maximize
the variance, not maximize the interpretability!
With spatial data (including climate data) the
interpretation becomes even more difficult
because there are geometric controls on the
correlations between the data points.
L i n k i n g S c i e n c e t o S p o
r t !
13
Buell patterns Imagine a rectangular domain in
which all the points are strongly correlated with
their neighbours.
L i n k i n g S c i e n c e t o S p o
r t !
14
Buell patterns The points in the middle of the
domain will have the strongest average
correlations with all other points, simply
because their average distance to all other grids
is a minimum.
The strong correlations between neighbouring
grids will be represented by PC 1, with the
central grids dominating.
L i n k i n g S c i e n c e t o S p o
r t !
15
Buell patterns The points in the corners of the
domain will have the weakest average correlations
with all other points, simply because their
average distance to all other grids is a maximum.
The weak correlations between distant grids will
be represented by PC 2. The direction of the
dipole reflects the domain shape.
L i n k i n g S c i e n c e t o S p o
r t !
16
Buell patterns? Are these real, or are they a
function of the domain shape?
L i n k i n g S c i e n c e t o S p o
r t !
17

Buell patterns
Because of domain shape dependency
the first PC frequently indicates positive
loadings with strongest values in the centre of
the domain
the second PC frequently indicates negative
loadings on one side and positive loadings on the
other side in the direction of the longest
dimension of the domain.
Similar kinds of problems arise when using
gridded data with converging longitudes, or
simply with longitude spacing different from
latitude spacing
station data.

L i n k i n g S c i e n c e t o S p o
r t !
18
Rotation The principal component weights are
defined to maximize the variance, not maximize
the interpretability! The weights could be
redefined to meet alternative criteria. Rotation
is sometimes performed to maximize the weights of
as many metrics as possible, and to minimize the
weights of the others. An objective of rotation
is to attain simple structure 1. weights are
either close to zero or close to one 2.
variables have high weights on only one component.
L i n k i n g S c i e n c e t o S p o
r t !
19
Rotation The principal component weights are
defined to maximize the variance, not maximize
the interpretability! The weights could be
redefined to meet alternative criteria. Rotation
is sometimes performed to maximize the weights of
as many metrics as possible, and to minimize the
weights of the others. An objective of rotation
is to attain simple structure 1. weights are
either close to zero or close to one 2.
variables have high weights on only one component.
L i n k i n g S c i e n c e t o S p o
r t !
20
Rotation

Commonly used rotation procedures include
Varimax maximises the variance of the squared
loadings.
Quartimin oblique rotation
Procrustes maximises the similarity between one
set of loadings and a target set. Can be
orthogonal or oblique.

L i n k i n g S c i e n c e t o S p o
r t !
21
Rotation Rotation does NOT solve Buell pattern
problems, nor station and uneven gridded data
problems, it only reduces them. What if a mode
does not have simple structure for example, a
general warming trend? These problems are only
of concern for interpretation. Rotation may be
redundant if the principal components are used as
input into some other procedures.
L i n k i n g S c i e n c e t o S p o
r t !

Write a Comment

User Comments (0)

About PowerShow.com

Interpreting Principal Components PowerPoint PPT Presentation