Canonical Correlation - PowerPoint PPT Presentation

1 / 119
About This Presentation
Title:

Canonical Correlation

Description:

It finds the linear combinations of variables in two sets that are maximally ... CUPCAKES 27.4 2.5 60 1.11 3.5 4 10 10 1 4. Structural variables (X) ... – PowerPoint PPT presentation

Number of Views:1227
Avg rating:3.0/5.0
Slides: 120
Provided by: artsci6
Category:

less

Transcript and Presenter's Notes

Title: Canonical Correlation


1
Canonical Correlation
2
Canonical correlation analyses attempts to
simultaneously solve the goals of multiple
correlation and principal components analysis. It
finds the linear combinations of variables in two
sets that are maximally correlated across sets
but orthogonal within sets.
3
X1 X2 X3 X4 . . . Xq
Y1 Y2 Y3 Y4 . . . Yp
What is the best way to understand how the
variables in these two sets are related?
4
  • Bivariate correlations across sets
  • Multiple correlations across sets
  • Principal components within sets correlations
    between principal components across sets

5
X1 X2 X3 X4 . . . Xq
Y1 Y2 Y3 Y4 . . . Yp
What linear combinations of the X variables (u)
and the Y variables (t) will maximize their
correlation?
6
b1X1 b2X2 b3X3 b4X4 . bpXp u
a1Y1 a2Y2 a3Y3 a4Y4 . aqYq t
What linear combinations of the X variables (u)
and the Y variables (t) will maximize their
correlation?
7
If X and Y are in standard score form, and u
Xb t Ya then find a and b to maximize rt,u
while
8
If X and Y are in standard score form, and u
Xb t Ya then find a and b to maximize rt,u
while
9
(No Transcript)
10
(No Transcript)
11
(No Transcript)
12
The correlation between the two sets is called
the canonical correlation and is the largest
possible correlation that can be found between
linear combinations. The weights (a and b) that
are used to create the linear combinations are
called the standardized canonical coefficients.
The linear combinations created are called the
canonical variates.
13
Additional canonical variates and their
correlations can be found provided they satisfy
14
Additional canonical variates and their
correlations can be found provided they satisfy
15
The extraction of canonical variates can continue
up to a maximum defined by the number of measures
in the smaller of the two sets.
16
The standardized canonical coefficients (a and b)
are interpreted in the same way as standardized
regression coefficients in multiple
regressionthey indicate the unique contribution
of a variable to the linear combination. It is
also possible to derive the correlations between
each variable and the linear combination. These
are called canonical loadings and are interpreted
the same way as loadings in principal components.
17
These loadings can be calculated as
As in principal components analysis, the loadings
can assist in understanding the nature of the
linear combinations in each set.
18
Fader and Lodish (1990) collected data for 331
different grocery products. They sought
relations between what they called structural
variables and promotional variables. The
structural variables were characteristics not
likely to be changed by short-term promotional
activities. The promotional variables represented
promotional activities. The major goal was to
determine if different promotional activities
were associated with different types of grocery
products.
19
Structural variables (X) PENET Percentage of
households making at least one category
purchase PCYCLE Average interpurchase
time PRICE Average dollars spent in the category
per purchase occasion PVTSH Combined market
share for all private-label and generic
products PURHH Average number of purchase
occasions per household during the year
20
Promotional variables (Y) FEAT Percent of
volume sold on feature (advertised in local
newspaper) DISP Percent of volume sold on
display (e.g., end of aisle) PCUT Percent of
volume sold at a temporary reduced
price SCOUP Percent of volume purchased using a
retailers store coupon MCOUP Percent of
volume purchased using a manufacturers coupon
21
Structural variables (X) PENET Percentage of
households making at least one category
purchase PCYCLE Average interpurchase
time PRICE Average dollars spent in the
category per purchase occasion PVTSH Combined
market share for all private-label and generic
products PURHH Average number of
purchase occasions per household during the year
Promotional variables (Y) FEAT Percent of volume
sold on feature (advertised in local
newspaper) DISP Percent of volume sold on display
(e.g., end of aisle) PCUT Percent of volume sold
at a temporary reduced price SCOUP Percent of
volume purchased using a retailers store
coupon MCOUP Percent of volume purchased using
a manufacturers coupon
PENET PURHH PCYCLE PRICE PVTSH FEAT DISP PCUT
SCOUP MCOUP BEER 62.3 11.1 46 5.16 .4 19 32
27 1 1 WINE 42.9 5.8 59 4.58 1.0 14 26 8 0 1
FRESH BREAD 98.6 26.6 21 1.30 39.4 12 4 15 1 2 CU
PCAKES 27.4 2.5 60 1.11 3.5 4 10 10 1 4
22
(No Transcript)
23
Canonical correlation analysis must be obtained
using syntax statements in SPSS
MANOVA penet purhh pcycle price pvtsh with feat
disp pcut scoup mcoup /print signif(multiv dimenr
eigen stepdown univ hypoth) error(cor) /discrim
raw stan cor alpha(1).
24
Test Name Value Approx. F Hypoth. DF
Error DF Sig. of F Pillais .73057
11.12256 25.00 1625.00 .000
Hotellings 1.09732 14.01931 25.00
1597.00 .000 Wilks .41262
12.85124 25.00 1193.96 .000 Roys
.41271
These tests indicate whether there is any
significant relationship between the two sets of
variables. They do not indicate how many of those
sets of linear combinations are significant. With
5 variables in each set, there are up to 5 sets
of linear combinations that could be derived.
This test tells us that at least the first one is
significant.
25
Test Name Value Approx. F Hypoth. DF
Error DF Sig. of F Pillais .73057
11.12256 25.00 1625.00 .000
Hotellings 1.09732 14.01931 25.00
1597.00 .000 Wilks .41262
12.85124 25.00 1193.96 .000 Roys
.41271
Ra has an approximate F distribution with pq and
(1ts-.5pq) degrees of freedom.
26
Eigenvalues and Canonical Correlations Root No.
Eigenvalue Pct. Cum. Pct. Canon Cor.
Sq. Cor 1 .703 64.040
64.040 .642 .413 2
.305 27.790 91.830 .483
.234 3 .075 6.877
98.708 .265 .070 4
.013 1.198 99.906 .114
.013 5 .001 .094
100.000 .032 .001
The canonical correlations are extracted in
decreasing size. At each step they represent the
largest correlation possible between linear
combinations in the two sets, provided the linear
combinations are independent of any previously
derived linear combinations.
27
Eigenvalues and Canonical Correlations Root No.
Eigenvalue Pct. Cum. Pct. Canon Cor.
Sq. Cor 1 .703 64.040
64.040 .642 .413 2
.305 27.790 91.830 .483
.234 3 .075 6.877
98.708 .265 .070 4
.013 1.198 99.906 .114
.013 5 .001 .094
100.000 .032 .001
28
Dimension Reduction Analysis Roots Wilks
L. F Hypoth. DF Error DF Sig. of F 1
TO 5 .41262 12.85124 25.00
1193.96 .000 2 TO 5 .70257
7.53593 16.00 984.36 .000 3 TO 5
.91682 3.17374 9.00 786.25
.001 4 TO 5 .98600 1.14582
4.00 648.00 .334 5 TO 5 .99897
.33534 1.00 325.00 .563
Procedures for testing the significance of the
canonical correlations can be applied
sequentially. At each step, the test indicates
whether there is any remaining significant
relationships between the two sets. In this case,
three sets of linear combinations can be formed.
29
As in principal components, identifying the
number of significant sets of linear combinations
is just the beginning. The nature of those linear
combinations must also be determined. This
requires interpreting the canonical weights and
loadings.
30
The linear combinations can be formed using the
variables in their original metrics. Sometimes
this makes it easier to understand the role a
particular variable plays because the metric is
well understood.
Raw canonical coefficients for DEPENDENT
variables Function No. Variable
1 2 3 4
5 PENET .036 -.018 .016
.016 .011 PURHH -.073
-.013 -.175 .072 -.329 PCYCLE
-.012 -.031 -.019 .049
-.020 PRICE .198 -.838
-.417 -.299 .305 PVTSH
.000 .024 -.061 .002 .039
31
More typically the linear combinations are formed
after the variables have been standardized. The
weights are then interpreted as standardized
regression coefficients and the resulting linear
combinations are in standard score form.
Standardized canonical coefficients for
DEPENDENT variables Function No.
Variable 1 2 3
4 5 PENET 1.066 -.527
.484 .483 .326 PURHH
-.307 -.055 -.737 .304
-1.382 PCYCLE -.262 -.695
-.417 1.104 -.455 PRICE
.208 -.883 -.439 -.315 .321
PVTSH .000 .359 -.898
.024 .576
Structural variables (X) PENET Percentage of
households making at least one category
purchase PCYCLE Average interpurchase
time PRICE Average dollars spent in the category
per purchase occasion PVTSH Combined market share
for all private-label and generic
products PURHH Average number of purchase
occasions per household during the year
32
The loadings provide information about the
bivariate relationship between each variable and
each linear combination.
Correlations between DEPENDENT and canonical
variables Function No. Variable
1 2 3 4
5 PENET .956 .114 -.042
.223 -.145 PURHH .555
.148 -.389 -.207 -.690 PCYCLE
-.582 -.320 .060 .697
.263 PRICE -.011 -.769 -.285
-.569 .059 PVTSH .336
.465 -.705 .245 .337
Structural variables (X) PENET Percentage of
households making at least one category
purchase PCYCLE Average interpurchase
time PRICE Average dollars spent in the category
per purchase occasion PVTSH Combined market share
for all private-label and generic
products PURHH Average number of purchase
occasions per household during the year
33
The same coefficients exist for the other set of
variables.
Raw canonical coefficients for COVARIATES
Function No. COVARIATE 1
2 3 4 5 FEAT
.083 -.151 -.058 -.232
.215 DISP .044 .011 .108
.091 .074 PCUT .021
.199 .037 .079 -.247 SCOUP
-.015 -.385 -.788 1.124
-.268 MCOUP .022 -.079
.043 -.003 -.057
34
Standardized canonical coefficients for
COVARIATES CAN. VAR. COVARIATE
1 2 3 4 5
FEAT .637 -1.160 -.448
-1.780 1.649 DISP .318
.077 .770 .653 .532 PCUT
.164 1.530 .281 .611
-1.898 SCOUP -.014 -.362
-.740 1.056 -.252 MCOUP
.202 -.728 .400 -.029 -.523
Promotional variables (Y) FEAT Percent of volume
sold on feature (advertised in local
newspaper) DISP Percent of volume sold on display
(e.g., end of aisle) PCUT Percent of volume sold
at a temporary reduced price SCOUP Percent of
volume purchased using a retailers store
coupon MCOUP Percent of volume purchased using a
manufacturers coupon
35
Correlations between COVARIATES and canonical
variables CAN. VAR. Covariate
1 2 3 4 5
FEAT .939 .073 -.293
-.157 .046 DISP .730
.136 .384 .412 .362 PCUT
.896 .321 -.184 -.063
-.238 SCOUP .617 -.167
-.614 .462 -.024 MCOUP
.156 -.717 .427 -.069 -.523
Promotional variables (Y) FEAT Percent of volume
sold on feature (advertised in local
newspaper) DISP Percent of volume sold on display
(e.g., end of aisle) PCUT Percent of volume sold
at a temporary reduced price SCOUP Percent of
volume purchased using a retailers store
coupon MCOUP Percent of volume purchased using a
manufacturers coupon
36
(No Transcript)
37
(No Transcript)
38
(No Transcript)
39
(No Transcript)
40
(No Transcript)
41
(No Transcript)
42
(No Transcript)
43
What if the linear combinations had been formed
within each set separately and then correlated
across sets? What might justify this approach?
44
penet purhh pcycle price pvtsh
45
feat disp pcut scoup mcoup
46
(No Transcript)
47
  • Crosby, Evans, and Cowles (1990) examined the
    impact of relationship quality on the outcome of
    insurance sales. They examined relationship
    characteristics and outcomes for 151
    transactions.
  • Relationship Characteristics
  • Appearance similarity
  • Lifestyle similarity
  • Status similarity
  • Interaction intensity
  • Mutual disclosure
  • Cooperative intentions

48
  • Crosby, Evans, and Cowles (1990) examined the
    impact of relationship quality on the outcome of
    insurance sales. They examined relationship
    characteristics and outcomes for 151
    transactions.
  • Outcomes
  • Trust in the salesperson
  • Satisfaction with the salesperson
  • Cross-sell
  • Total insurance sales

49
(No Transcript)
50
Matrix data Variables rowtype_ trust satis
cross total appear life status interact mutual
coop . Begin data N 151 151 151 151 151 151 151
151 151 151 Mean 0 0 0 0 0 0 0 0 0 0 STDDEV 1 1 1
1 1 1 1 1 1 1 Corr 1.00 corr .63 1.00 corr .28
.22 1.00 corr .23 .24 .51 1.00 corr .38
.33 .29 .20 1.00 corr .42 .28 .36 .39
.57 1.00 corr .37 .30 .39 .29 .48 .59
1.00 corr .30 .36 .21 .18 .15 .29
.30 1.00 corr .45 .37 .31 .39 .29 .41
.35 .44 1.00 corr .56 .56 .24 .29 .18
.33 .30 .46 .63 1.00 end data.
51
Variable labels trust ' Trust in the
salesperson' Satis 'Satisfaction with the
salesperson' cross 'Cross-sell' total 'Total
insurance sales' appear 'Appearance
similarity' life 'Lifestyle similarity' status
'Status similarity' interact 'Interaction
intensity' mutual 'Mutual disclosure' coop
'Cooperative intentions' . MANOVA trust satis
cross total with appear life status interact
mutual coop /matrixIN() /print signif(multiv
dimenr eigen stepdown univ hypoth)
error(cor) /discrim raw stan cor alpha(1).
52
Multivariate Tests of Significance (S 4, M
1/2, N 69 1/2) Test Name Value
Approx. F Hypoth. DF Error DF Sig. of F
Pillais .73301 5.38481 24.00
576.00 .000 Hotellings 1.35153
7.85574 24.00 558.00 .000 Wilks
.37940 6.57954 24.00 493.10
.000 Roys .52771
There is at least one significant relationship
between the two sets of measures. With 6 and 4
measures in the two sets, there are a maximum of
4 possible sets of linear combinations that can
be formed.
53
Eigenvalues and Canonical Correlations Root No.
Eigenvalue Pct. Cum. Pct. Canon Cor.
Sq. Cor 1 1.117 82.672
82.672 .726 .528 2
.176 13.050 95.722 .387
.150 3 .050 3.706
99.428 .218 .048 4
.008 .572 100.000 .088
.008
54
Dimension Reduction Analysis Roots Wilks
L. F Hypoth. DF Error DF Sig. of F 1
TO 4 .37940 6.57954 24.00
493.10 .000 2 TO 4 .80331
2.15996 15.00 392.40 .007 3 TO 4
.94500 1.02566 8.00 286.00
.417 4 TO 4 .99233 .37087
3.00 144.00 .774
Two of the four possible sets of linear
combinations are significant.
55
Standardized canonical coefficients for
DEPENDENT variables Function No.
Variable 1 2 3
4 TRUST -.543 .317 -.390
1.082 SATIS -.364 -.936
.103 -.816 CROSS -.186
.148 1.160 .057 TOTAL -.239
.721 -.672 -.597
Outcomes Trust in the salesperson Satisfaction
with the salesperson Cross-sell Total insurance
sales
56
Correlations between DEPENDENT and canonical
variables Function No. Variable
1 2 3 4 TRUST
-.879 -.065 -.155 .447
SATIS -.804 -.530 -.048
-.265 CROSS -.540 .399
.731 -.124 TOTAL -.546 .645
-.145 -.515
Outcomes Trust in the salesperson Satisfaction
with the salesperson Cross-sell Total insurance
sales
57
Standardized canonical coefficients for
COVARIATES CAN. VAR. COVARIATE
1 2 3 4 APPEAR
-.268 -.561 .342 .552 LIFE
-.164 .833 -.467
.138 STATUS -.156 .128 .906
-.007 INTERACT -.049 -.379
.361 -.853 MUTUAL -.128
.749 -.209 -.441 COOP -.603
-.773 -.566 .408
Relationship Characteristics Appearance
similarity Lifestyle similarity Status
similarity Interaction intensity Mutual
disclosure Cooperative intentions
58
Correlations between COVARIATES and canonical
variables CAN. VAR. Covariate
1 2 3 4 APPEAR
-.589 -.003 .402 .445 LIFE
-.674 .531 .095 .155
STATUS -.622 .267 .660
.052 INTERACT -.517 -.209 .196
-.739 MUTUAL -.729 .319
-.182 -.345 COOP -.855
-.263 -.353 -.120
Relationship Characteristics Appearance
similarity Lifestyle similarity Status
similarity Interaction intensity Mutual
disclosure Cooperative intentions
59
  • Remaining issues
  • How much variance is really accounted for?
  • How easily does the procedure capitalize on
    chance?
  • How are canonical correlations cross-validated?
  • Can the results be rotated?

60
How much variance isreally accounted
for? Reliance on the canonical correlations for
evidence of variance accounted for across sets of
variables can be misleading. Each linear
combination only captures a portion of the
variance in its own set. That needs to be taken
into account when judging the variance accounted
for across sets.
61
The squared canonical correlation indicates the
shared variance between linear combinations from
the two sets.
62
Each linear combination accounts for only a
portion of the variance in the variables in its
set.
63
Redundancy coefficients indicate the proportion
of variance in the variables of the opposite set
that is accounted for by the linear combination.
64
Canonical Loadings
Adequacy Coefficients
Canonical communality coefficients
65
Redundancy coefficients are defined as the
product of adequacy coefficients and the square
of canonical correlations.
66
Fader and Lodish (1990) collected data for 331
different grocery products. They sought
relations between what they called structural
variables and promotional variables. The
structural variables were characteristics not
likely to be changed by short-term promotional
activities. The promotional variables represented
promotional activities. The major goal was to
determine if different promotional activities
were associated with different types of grocery
products.
67
Structural variables (X) PENET Percentage of
households making at least one category
purchase PCYCLE Average interpurchase
time PRICE Average dollars spent in the category
per purchase occasion PVTSH Combined market
share for all private-label and generic
products PURHH Average number of purchase
occasions per household during the year
68
Promotional variables (Y) FEAT Percent of
volume sold on feature (advertised in local
newspaper) DISP Percent of volume sold on
display (e.g., end of aisle) PCUT Percent of
volume sold at a temporary reduced
price SCOUP Percent of volume purchased using a
retailers store coupon MCOUP Percent of
volume purchased using a manufacturers coupon
69
Structural variables (X) PENET Percentage of
households making at least one category
purchase PCYCLE Average interpurchase
time PRICE Average dollars spent in the
category per purchase occasion PVTSH Combined
market share for all private-label and generic
products PURHH Average number of
purchase occasions per household during the year
Promotional variables (Y) FEAT Percent of volume
sold on feature (advertised in local
newspaper) DISP Percent of volume sold on display
(e.g., end of aisle) PCUT Percent of volume sold
at a temporary reduced price SCOUP Percent of
volume purchased using a retailers store
coupon MCOUP Percent of volume purchased using
a manufacturers coupon
PENET PURHH PCYCLE PRICE PVTSH FEAT DISP PCUT
SCOUP MCOUP BEER 62.3 11.1 46 5.16 .4 19 32
27 1 1 WINE 42.9 5.8 59 4.58 1.0 14 26 8 0 1
FRESH BREAD 98.6 26.6 21 1.30 39.4 12 4 15 1 2 CU
PCAKES 27.4 2.5 60 1.11 3.5 4 10 10 1 4
70
Test Name Value Approx. F Hypoth. DF
Error DF Sig. of F Pillais .73057
11.12256 25.00 1625.00 .000
Hotellings 1.09732 14.01931 25.00
1597.00 .000 Wilks .41262
12.85124 25.00 1193.96 .000 Roys
.41271
These tests indicate whether there is any
significant relationship between the two sets of
variables. They do not indicate how many of those
sets of linear combinations are significant. With
5 variables in each set, there are up to 5 sets
of linear combinations that could be derived.
This test tells us that at least the first one is
significant.
71
Test Name Value Approx. F Hypoth. DF
Error DF Sig. of F Pillais .73057
11.12256 25.00 1625.00 .000
Hotellings 1.09732 14.01931 25.00
1597.00 .000 Wilks .41262
12.85124 25.00 1193.96 .000 Roys
.41271
Ra has an approximate F distribution with pq and
(1ts-.5pq) degrees of freedom.
72
Eigenvalues and Canonical Correlations Root No.
Eigenvalue Pct. Cum. Pct. Canon Cor.
Sq. Cor 1 .703 64.040
64.040 .642 .413 2
.305 27.790 91.830 .483
.234 3 .075 6.877
98.708 .265 .070 4
.013 1.198 99.906 .114
.013 5 .001 .094
100.000 .032 .001
The canonical correlations are extracted in
decreasing size. At each step they represent the
largest correlation possible between linear
combinations in the two sets, provided the linear
combinations are independent of any previously
derived linear combinations.
73
Eigenvalues and Canonical Correlations Root No.
Eigenvalue Pct. Cum. Pct. Canon Cor.
Sq. Cor 1 .703 64.040
64.040 .642 .413 2
.305 27.790 91.830 .483
.234 3 .075 6.877
98.708 .265 .070 4
.013 1.198 99.906 .114
.013 5 .001 .094
100.000 .032 .001
74
Dimension Reduction Analysis Roots Wilks
L. F Hypoth. DF Error DF Sig. of F 1
TO 5 .41262 12.85124 25.00
1193.96 .000 2 TO 5 .70257
7.53593 16.00 984.36 .000 3 TO 5
.91682 3.17374 9.00 786.25
.001 4 TO 5 .98600 1.14582
4.00 648.00 .334 5 TO 5 .99897
.33534 1.00 325.00 .563
Procedures for testing the significance of the
canonical correlations can be applied
sequentially. At each step, the test indicates
whether there is any remaining significant
relationships between the two sets. In this case,
three sets of linear combinations can be formed.
75
The standardized canonical coefficients are the
weights applied to standardized variables to
create the new linear combinations.
Standardized canonical coefficients for
DEPENDENT variables Function No.
Variable 1 2 3
4 5 PENET 1.066 -.527
.484 .483 .326 PURHH
-.307 -.055 -.737 .304
-1.382 PCYCLE -.262 -.695
-.417 1.104 -.455 PRICE
.208 -.883 -.439 -.315 .321
PVTSH .000 .359 -.898
.024 .576
Structural variables (X) PENET Percentage of
households making at least one category
purchase PCYCLE Average interpurchase
time PRICE Average dollars spent in the category
per purchase occasion PVTSH Combined market share
for all private-label and generic
products PURHH Average number of purchase
occasions per household during the year
76
The loadings provide information about the
bivariate relationship between each variable and
each linear combination.
Correlations between DEPENDENT and canonical
variables Function No. Variable
1 2 3 4
5 PENET .956 .114 -.042
.223 -.145 PURHH .555
.148 -.389 -.207 -.690 PCYCLE
-.582 -.320 .060 .697
.263 PRICE -.011 -.769 -.285
-.569 .059 PVTSH .336
.465 -.705 .245 .337
Structural variables (X) PENET Percentage of
households making at least one category
purchase PCYCLE Average interpurchase
time PRICE Average dollars spent in the category
per purchase occasion PVTSH Combined market share
for all private-label and generic
products PURHH Average number of purchase
occasions per household during the year
77
Standardized canonical coefficients for
COVARIATES CAN. VAR. COVARIATE
1 2 3 4 5
FEAT .637 -1.160 -.448
-1.780 1.649 DISP .318
.077 .770 .653 .532 PCUT
.164 1.530 .281 .611
-1.898 SCOUP -.014 -.362
-.740 1.056 -.252 MCOUP
.202 -.728 .400 -.029 -.523
Promotional variables (Y) FEAT Percent of volume
sold on feature (advertised in local
newspaper) DISP Percent of volume sold on display
(e.g., end of aisle) PCUT Percent of volume sold
at a temporary reduced price SCOUP Percent of
volume purchased using a retailers store
coupon MCOUP Percent of volume purchased using a
manufacturers coupon
78
Correlations between COVARIATES and canonical
variables CAN. VAR. Covariate
1 2 3 4 5
FEAT .939 .073 -.293
-.157 .046 DISP .730
.136 .384 .412 .362 PCUT
.896 .321 -.184 -.063
-.238 SCOUP .617 -.167
-.614 .462 -.024 MCOUP
.156 -.717 .427 -.069 -.523
Promotional variables (Y) FEAT Percent of volume
sold on feature (advertised in local
newspaper) DISP Percent of volume sold on display
(e.g., end of aisle) PCUT Percent of volume sold
at a temporary reduced price SCOUP Percent of
volume purchased using a retailers store
coupon MCOUP Percent of volume purchased using a
manufacturers coupon
79
Variance in dependent variables explained by
canonical variables CAN. VAR. Pct Var DE Cum
Pct DE Pct Var CO Cum Pct CO 1
33.462 33.462 13.810 13.810 2
18.895 52.357 4.415 18.226
3 14.708 67.065 1.032
19.258 4 19.263 86.328
.250 19.508 5 13.672 100.000
.014 19.522
Correlations between DEPENDENT and canonical
variables Function No. Variable
1 PENET .956 PURHH
.555 PCYCLE -.582 PRICE
-.011 PVTSH .336
Variance in covariates explained by canonical
variables CAN. VAR. Pct Var DE Cum Pct DE Pct
Var CO Cum Pct CO 1 21.654
21.654 52.467 52.467 2
3.127 24.781 13.382 65.849 3
1.159 25.940 16.521 82.371
4 .108 26.048 8.337
90.708 5 .010 26.058
9.292 100.000
Adequacy (33.462) times the squared canonical
correlation (.413)
(SL2i,1)/i
Adequacy Coefficients
RedundancyCoefficients
80
Variance in dependent variables explained by
canonical variables CAN. VAR. Pct Var DE Cum
Pct DE Pct Var CO Cum Pct CO 1
33.462 33.462 13.810 13.810 2
18.895 52.357 4.415 18.226
3 14.708 67.065 1.032
19.258 4 19.263 86.328
.250 19.508 5 13.672 100.000
.014 19.522
Variance in covariates explained by canonical
variables CAN. VAR. Pct Var DE Cum Pct DE Pct
Var CO Cum Pct CO 1 21.654
21.654 52.467 52.467 2
3.127 24.781 13.382 65.849 3
1.159 25.940 16.521 82.371
4 .108 26.048 8.337
90.708 5 .010 26.058
9.292 100.000
81
Any given loading can be squared to indicate the
proportion of the variance in that variable that
is accounted for by that canonical variate. The
sum of the squared loadings for a given variable
indicates the total proportion of variance
accounted for by the collection of canonical
variates. The average of the squared loadings for
a canonical variate is the adequacy coefficient
and indicates the proportion of variance in the
collection of variables that is accounted for by
the canonical variate. The redundancy coefficient
is the proportion of variance in a set of
variables that is accounted for by a linear
combination from the other set. The sum of the
redundancy coefficients gives the total
proportion of variance in one set that is
accounted for by the other set. These will
usually be different values for each set.
82
How easily does the procedure capitalize on
chance? Canonical correlation analysis has
elements of two proceduresprincipal components
analysis and multiple regression analysisthat
can capitalize on chance. It is important to
gauge how susceptible canonical correlation
analysis is to this problem.
83
A sample of 500 cases was generated, each with 10
variables from random normal distributions (m
100, s 10). The first 5 variables are
considered one set the remaining 5 variables are
considered the other set.
84
A few correlations are significant by chance
alone . . .
85
MANOVA x1 x2 x3 x4 x5 with y1 y2 y3 y4 y5 /print
signif(multiv dimenr eigen univ)
error(cor) /discrim raw stan cor alpha(1).
86
Multivariate Tests of Significance (S 5, M
-1/2, N 244 ) Test Name Value
Approx. F Hypoth. DF Error DF Sig. of F
Pillais .05241 1.04651 25.00
2470.00 .400 Hotellings .05346
1.04436 25.00 2442.00 .403 Wilks
.94845 1.04568 25.00 1821.77
.401 Roys .02742
The overall test of significance indicates that
there are no linear combinations that can be
formed between the two sets that would provide a
significant association.
87
Eigenvalues and Canonical Correlations Root No.
Eigenvalue Pct. Cum. Pct. Canon Cor.
Sq. Cor 1 .028 52.738
52.738 .166 .027 2
.015 28.362 81.100 .122
.015 3 .006 11.294
92.394 .077 .006 4
.004 7.561 99.955 .063
.004 5 .000 .045
100.000 .005 .000
The largest canonical correlation represents the
best attempt to make sense out of the random
associations. That this value is so small
indicates that the procedure will not generate
strong associations where none are known to exist.
88
Dimension Reduction Analysis Roots Wilks
L. F Hypoth. DF Error DF Sig. of F
1 TO 5 .94845 1.04568 25.00
1821.77 .401 2 TO 5 .97519
.77454 16.00 1500.67 .716 3 TO 5
.98997 .55211 9.00 1197.55
.837 4 TO 5 .99595 .50066
4.00 986.00 .735 5 TO 5 .99998
.01187 1.00 494.00 .913
Given the overall nonsignificant test of
association, the dimension reduction analysis is
unnecessary. There cannot be any sets of linear
combinations that provide nonrandom associations
between sets.
89
Standardized canonical coefficients for DEPENDENT
variables Function No. Variable
1 2 3 4
5 X1 -.912 -.216 .111
.061 -.333 X2 -.174
.980 -.013 .114 -.132 X3
-.368 .025 .046 -.305
.884 X4 .139 -.117 .935
.306 .130 X5 .042
.052 .343 -.857 -.394
Correlations between DEPENDENT and canonical
variables Function No. Variable
1 2 3 4
5 X1 -.902 -.219 .108
.137 -.328 X2 -.182
.966 .116 .126 -.065 X3
-.350 .097 .069 -.374
.850 X4 .097 .005 .933
.330 .105 X5 .070
.081 .332 -.889 -.297
Although there are no significant canonical
correlations, individual weights and correlations
can be sizeable. Why?
90
Standardized canonical coefficients for
COVARIATES CAN. VAR. COVARIATE
1 2 3 4
5 Y1 -.729 .348 .173
-.175 .547 Y2 -.412
-.621 -.648 .170 -.091 Y3
.042 -.591 .612 .389
.375 Y4 -.427 -.128
.411 -.301 -.742 Y5 .195
-.317 -.065 -.899 .244
Correlations between COVARIATES and canonical
variables CAN. VAR. Covariate
1 2 3 4
5 Y1 -.784 .293 .109
-.140 .518 Y2 -.484
-.597 -.623 .145 -.022 Y3
.025 -.645 .629 .278
.333 Y4 -.434 -.167
.467 -.261 -.705 Y5 .217
-.393 -.024 -.850 .275
91
Variance in dependent variables explained by
canonical variables CAN. VAR. Pct Var DE Cum
Pct DE Pct Var CO Cum Pct CO 1
19.686 19.686 .540 .540 2
19.943 39.629 .298 .838
3 20.215 59.844 .121
.959 4 21.474 81.318 .086
1.045 5 18.682 100.000
.000 1.046
Variance in covariates explained by canonical
variables CAN. VAR. Pct Var DE Cum Pct DE Pct
Var CO Cum Pct CO 1 .595
.595 21.717 21.717 2 .311
.906 20.807 42.524 3
.122 1.028 20.268 62.792 4
.073 1.101 18.165 80.957
5 .000 1.101 19.043 100.000
The linear combinations will faithfully reproduce
the variance of the variables within sets. But,
because the data are random, each set accounts
for trivial variance in the other set.
92
Provided the significance tests are used to guide
decisions about the presence of canonical
correlations, the procedure will not unfairly
capitalize on chance. Nonetheless, like other
statistical procedures, our faith in the
conclusions is likely to be bolstered
considerably with cross-validation.
93
How are canonical correlations cross-validated?
The most convincing approach to cross-validation
requires a calibration sample and a hold-out
sample. The calibration sample is analyzed and
the canonical coefficients derived. Those
coefficients are then applied to the hold-out
sample. A standard canonical correlation analysis
is also conducted on the hold-out sample and the
correlations among the actual and estimated
canonical variates are computed.
94
  • The data from Crosby, Evans, and Cowles (1990)
    was used as the basis for the cross-validation.
    In that study, characteristics of insurance sales
    transactions were measured.
  • Relationship Characteristics
  • Appearance similarity
  • Lifestyle similarity
  • Status similarity
  • Interaction intensity
  • Mutual disclosure
  • Cooperative intentions
  • Outcomes
  • Trust in the salesperson
  • Satisfaction with the salesperson
  • Cross-sell
  • Total insurance sales

95
In the original study, the variables had the
following correlation matrix
For the cross-validation, two separate sample of
250 cases were generated, each having this
correlation matrix as the population correlation
matrix. One sample was then used as the
calibration sample and the other was used as the
hold-out sample.
96
Multivariate Tests of Significance (S 4, M
1/2, N 119 ) Test Name Value Approx.
F Hypoth. DF Error DF Sig. of F Pillais
.88474 11.50214 24.00 972.00
.000 Hotellings 1.63071 16.20515
24.00 954.00 .000 Wilks
.31038 13.92044 24.00 838.47
.000 Roys .54378 - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - -
- - - Eigenvalues and Canonical Correlations
Root No. Eigenvalue Pct. Cum. Pct.
Canon Cor. Sq. Cor 1 1.192
73.093 73.093 .737 .544
2 .353 21.648 94.741
.511 .261 3 .078
4.784 99.525 .269 .072
4 .008 .475 100.000
.088 .008
Sample 1 Calibration Sample
97
Dimension Reduction Analysis Roots Wilks
L. F Hypoth. DF Error DF Sig. of F
1 TO 4 .31038 13.92044 24.00
838.47 .000 2 TO 4 .68034
6.64488 15.00 665.70 .000 3 TO 4
.92050 2.55832 8.00 484.00
.010 4 TO 4 .99231 .62784
3.00 243.00 .598
Sample 1 Calibration Sample
98
Standardized canonical coefficients for
DEPENDENT variables Function No.
Variable 1 2 3
TRUST -.458 .649 .722
SATIS -.499 -.991 -.148
CROSS -.167 -.152 -.955
TOTAL -.156 .821 -.034 - -
- - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - Correlations between
DEPENDENT and canonical variables
Function No. Variable 1 2
3 TRUST -.874 .151
.331 SATIS -.885 -.392
.065 CROSS -.506 .151
-.795 TOTAL -.471 .653 -.330
Sample 1 Calibration Sample
99
Standardized canonical coefficients for
COVARIATES CAN. VAR. COVARIATE
1 2 3 APPEAR
-.378 -.399 .364 LIFESTYL
-.019 .764 -.258 STATUS
-.192 -.161 -.719 INTENSIT
-.150 -.507 -.531 MUTUAL
-.070 .865 .087 COOP
-.587 -.481 .767 - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - -
- - Correlations between COVARIATES and
canonical variables CAN. VAR.
Covariate 1 2 3
APPEAR -.634 .008 -.014
LIFESTYL -.626 .504 -.325
STATUS -.632 .060 -.623
INTENSIT -.594 -.281 -.416
MUTUAL -.670 .542 .098 COOP
-.837 -.034 .318
Sample 1 Calibration Sample
100
Multivariate Tests of Significance (S 4, M
1/2, N 119 ) Test Name Value Approx.
F Hypoth. DF Error DF Sig. of F Pillais
.78926 9.95559 24.00 972.00
.000 Hotellings 1.53353 15.23948
24.00 954.00 .000 Wilks
.34466 12.47535 24.00 838.47
.000 Roys .56014 - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - -
- - - Eigenvalues and Canonical Correlations
Root No. Eigenvalue Pct. Cum. Pct.
Canon Cor. Sq. Cor 1 1.273
83.042 83.042 .748 .560
2 .159 10.400 93.442
.371 .138 3 .100
6.490 99.932 .301 .091
4 .001 .068 100.000
.032 .001
Sample 2 Hold-out Sample
101
Dimension Reduction Analysis Roots Wilks
L. F Hypoth. DF Error DF Sig. of F
1 TO 4 .34466 12.47535 24.00
838.47 .000 2 TO 4 .78356
4.09951 15.00 665.70 .000 3 TO 4
.90854 2.97232 8.00 484.00
.003 4 TO 4 .99896 .08406
3.00 243.00 .969
Sample 2 Hold-out Sample
102
Standardized canonical coefficients for DEPENDENT
variables Function No. Variable
1 2 3 TRUST
-.556 -.066 .299 SATIS
-.329 .722 .010 CROSS
-.034 -.060 -1.193 TOTAL
-.401 -.822 .558 - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - -
- - Correlations between DEPENDENT and canonical
variables Function No. Variable
1 2 3 TRUST
-.878 .248 .074 SATIS
-.790 .527 -.052 CROSS
-.501 -.285 -.816 TOTAL
-.586 -.752 .008
Sample 2 Hold-out Sample
103
Standardized canonical coefficients for
COVARIATES CAN. VAR. COVARIATE
1 2 3 APPEAR
-.247 .717 -.467 LIFESTYL
-.184 -.967 .726 STATUS
-.040 -.208 -1.040 INTENSIT
-.105 .240 -.143 MUTUAL
-.061 -.682 .037 COOP
-.688 .756 .457 - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - -
- - Correlations between COVARIATES and
canonical variables CAN. VAR.
Covariate 1 2 3
APPEAR -.560 .019 -.472
LIFESTYL -.648 -.579 -.100
STATUS -.550 -.350 -.701
INTENSIT -.526 .058 -.073
MUTUAL -.692 -.293 .099 COOP
-.906 .185 .239
Sample 2 Hold-out Sample
104
The patterns of canonical correlations, weights,
and loadings are similar across samples, but the
best evidence for cross-validation comes from a
close correspondence between canonical variate
scores calculated in the hold-out sample using
that samples weights and using the weights from
the calibration sample.
105
Four sets of canonical scores need to be
calculated Actual Dependent Canonical Variate
Scores Z DV Sample 2 W DV Sample 2 Actual
Covariate Canonical Variate Scores Z CV Sample 2
W CV Sample 2 Estimated Dependent Canonical
Variate Scores Z DV Sample 2 W DV Sample
1 Estimated Covariate Canonical Variate Scores Z
CV Sample 2 W CV Sample 1
106
(No Transcript)
107
(No Transcript)
108
(No Transcript)
109
(No Transcript)
110
(No Transcript)
111
Can the results be rotated? Given the resemblance
to principal components analysis, it might seem
sensible to rotate the variates to make
interpretation easier. This can be done, with
some restrictions and knowledge of how it alters
the nature of the analysis. It must be kept in
mind that any rotation will destroy one key
feature of the original analysisthat successive
pairs of linear combinations have maximum
correlations.
112
An orthogonal rotation to simple structure is
usually done on the structure matrix from one of
the sets. This resembles rotation to simple
structure in principal components analysis.
Varimax is the usual method used. Then the same
transformation is done on the other structure
matrix. This may not produce simple structure in
the other matrix but it preserves one important
feature of the original analysisthe total amount
of variance in one set accounted for by the other
set is preserved (i.e., the redundancies are the
same).
113
The transformation matrix can be applied to the
original weights to get the new weights for the
rotated canonical variates. Because the
transformation is orthogonal, the new variates
will be independent as well within sets. If the
same transformation is done for both sets, then
the correlations among the new canonical variates
will preserve the information contained in the
original canonical variates, but distribute it
differently.
114
MANOVA trust,satis,cross,total with
appear,lifestyl,status,intensit,mutual,coop /print
signif(multiv dimenr eigen) /discrim raw stan
cor rotate.
115
VARIMAX rotated correlations between canonical
variables and COVARIATES Can. Var.
DEP. VAR. 1 2 3
TRUST .398 .285 .432
SATIS .382 .768 .112
CROSS .875 .345 .189
TOTAL .409 .130 .465 - -
- - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - Transformation Matrix
1 2 3 1
-.278 -.601 -.749 2
.011 .778 -.628 3
-.961 .183 .209
This transformation matrix is applied to the
original matrices of canonical weights to get the
new weights. Those weights can be applied to the
correlation matrix relating the sets of variables
to produce the new canonical correlations.
116
VARIMAX rotated correlations between canonical
variables and COVARIATES Can. Var.
DEP. VAR. 1 2 3
APPEAR .846 .039 .360
LIFESTYL .946 .185 -.087
STATUS .155 .958 .212
INTENSIT .110 .356 .816
MUTUAL -.026 -.001 .077 COOP
.100 -.046 -.018 - - - -
- - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - Transformation Matrix
1 2 3 1
-.855 -.310 -.417 2
-.460 .080 .884 3
.241 -.947 .211
117
If maximum interpretability is desired within
sets, then an alternative approach would be to
conduct principal components analyses within sets
followed by multiple regression analyses between
sets.
118
  • Assumptions and other odds and ends
  • Interval level data
  • Linear relations
  • Homoskedasticity
  • Low measurement error
  • Unrestricted variances
  • Low multicollinearity

119
  • Assumptions and other odds and ends
  • Similar distributions for all measures
  • Multivariate normality for significance tests
  • Sufficient sample size (20 times as many cases as
    variables to interpret the first canonical
    correlation 40-60 times as many cases as
    variables for more than one canonical
    correlation)
  • No outliers
Write a Comment
User Comments (0)
About PowerShow.com