Title: LECTURE NOTES Repeated Measures Analysis: MANOVA and Covariance Pattern models
1LECTURE NOTESRepeated Measures Analysis
MANOVA and Covariance Pattern models
2Day16 Basic Repeated Measures Design
- Data collected in a sequence of evenly spaced
points in time (not necessarily equally spaced) - Treatments are assigned to experimental units
- I.e., subjects
- Two factors
- Treatment between-subjects factor
- Time within-subjects factor
-
3Hypotheses
- How do treatment differences change over time?
- Is there a Treatment ? Time interaction?
- How do response means change by trt?
- Is there a Trt main effect?
- How do response means change over time?
- Is there a Time main effect?
4Example Two groups
id group time1 time2 time3 time4 1
A 31 29 15 26 2
A 24 28 20 32 3
A 14 20 28 30 4
B 38 34 30 34 5
B 25 29 25 29 6
B 30 28 16 34
- Preliminary Analysis this includes
- Profile plots
- Mean plots
- Correlation between repeated measurements
5Profile plots by group
- differences at baseline among subjects
- different trends for different subjects
- Variability higher at time 1 and low at time 4
B
A
6Mean plots by group
- differences at baseline between group means
- non-linear trends
B
A
7(b) Correlation (covariance) across time points
time1 time2 time3
time4 time1 1.00000 0.94035
-0.14150 0.28445 time2 0.94035
1.00000 -0.02819 0.26921
time3 -0.14150 -0.02819
1.00000 0.27844 time4 0.28445
0.26921 0.27844 1.00000
Certainly do NOT have equal correlations (CS?)!
Time1 and time2 are highly correlated, but time1
and time3 are inversely correlated!
8Statistical analysis strategies
- Strategy 1 ANCOVA on the final measurement,
adjusting for baseline differences (end-point
analysis) - Strategy 2 repeated-measures ANOVA
Univariate approach - Strategy 4 Multivariate ANOVA approach
- Strategy 3 Summary approach
- Strategy 5 GEE
- Strategy 6 Mixed Models
9Comparison of traditional and new methods
FROM Ralitza Gueorguieva, PhD John H. Krystal,
MD Move Over ANOVA Progress in Analyzing
Repeated-Measures Data and Its Reflection in
Papers Published in the Archives of General
Psychiatry. Arch Gen Psychiatry. 200461310-317.
10General syntax in SAS's GLM procedure
- Syntax for MANOVA and rANOVA
- PROC GLM DATA sas-dataset-name CLASS
factor1 factor2 ... factork MODEL y1 y2 ...
yk factor1 ... factork REPEATED
repeated-factor-name k / PRINTE LSMEANS
factor1 factor2 ... factork RUN - Output can be restricted to rANOVA only using NOM
option and to MANOVA analysis using NOU in the
Repeated Statement
11Strategy 1. End-point analysis
ANCOVA Asks whether or not the two group means
differ at the final time point, adjusting for
differences at baseline (using ANCOVA).
proc glm datahorizontal class group model
time4 time1 group run
- Comparing groups at every follow-up time point
in this way would hugely increase your type I
error.
12Strategy 2 univariate repeated measures ANOVA
(rANOVA)
Explain away some error variability by accounting
for differences between subjects - requires
Sphericity
proc glm datahorizontal class group
model time1-time4 group repeated time
4/printe run quit
13Strategy 3 Summary analysis
- One way to overcome the problem of correlated
observations over time within each subject is to
summarize the observations over time by their
mean or some function and use ANOVA - This summary analysis leads to a conservative
test - Example, avetimemean(time1,time2,time3,time4)
proc glm datahorizontal class group
model avetime group run quit
- A special application of this is pre-post
analysis
14Strategy4 MANOVA Approach
- Successive response measurements made over time
are considered correlated dependent variables - That is, response variables for each level of
within-subject factor is presumed to be a
different dependent variable - MANOVA assumes there is an unstructured
covariance matrix for dependent variables
15Why MANOVA
- You do a MANOVA instead of a series of
one-at-a-time ANOVAs for two main reasons - to reduce the experiment-wise level of Type I
error. - None of the individual ANOVAs may produce a
significant main effect on the response, but in
combination they might, which suggests that the
variables are more meaningful taken together than
considered separately - MANOVA takes into account the inter-correlations
among the response Variables
16MANOVA
- If the multivariate test is
- not significant, report no group differences
among the mean vectors - significant, perform univariate ANOVA and
relevant contrasts - Contrasts (similar to contrasts we considered
previously) - Prior (planned)
- Post hoc (unplanned)
17MANOVA Test Statistics
- SAS reports four tests
- Wilks Lmbda
- Pillais trace (good for smaller sample size)
- Hotelling- Lawley Trace
- Roys greatest root
- These are covered in Multivariate class
- We will use results from Wilks
18MANOVA Test Statistics
- Wilks Lambda (?) was the first MANOVA test
statistic developed and is very important for
several multivariate procedures in addition to
MANOVA. - Wilks Lambda (?) is the error sum of squares (E)
divided by the sum of the effect sum of squares
(H) and the error sum of squares. - The quantity (1 - ?) is often interpreted as the
proportion of variance in the dependent variables
explained by the model effect. However, this
quantity is not unbiased and can be quite
misleading in small samples. - ? is approximately chi-square distributed
19rANOVA vs. MANOVA
- For tests that involve only between-subjects
effects, both the MANOVA rANOVA give rise to
the same tests. - For within-subject effects they yield different
tests. - In Proc GLM, rANOVA are in a table "Univariate
Tests of Hypotheses for Within Subject Effects." - Results for MANOVA are displayed in a table
labeled "Repeated Measures Analysis of Variance. - The multivariate tests are Wilks lambda,
Pillais trace, Hotelling-Lawley trace, and Roys
greatest root. - The only assumption required for valid tests is
that the dependent variables in the model have a
multivariate normal distribution with a common
covariance matrix across the between-subject
effects.
20Boxs test of equal covariances
- Boxs M test can be used if there are significant
differences among the covariance matrices by
group. - when Boxs test finds that the covariance
matrices are significantly different across
groups that may indicate an increased possibility
of Type I error, so you might want to make a
smaller error region (alpha0.001). - If you redid the analysis with a confidence level
of .001, you should report the results of the
Boxs M.
Box's M test for equality of variances proc
discrim dataexercise methodnormal pooltest
wcov class diet var time1 time2 time3 run
21- Example3 Suppose 24 subjects are randomly
assigned to two groups (Control and Treatment)
and their responses are measured at 4 times.
These times are labeled as 0 (baseline), 1 (after
one month posttest) 3 (after 3 months of
follow-up) and 6 (after 6 months of follow-up). - time is the within-subjects factor in this design
- Treatment is the between-subjects (grouping)
factor
22- Some of the data points
- data short
- input Group Subj y0 y1 y3 y6
- datalines
- 1 1 296 175 187 242
- 1 2 376 329 236 126
- 1 3 309 238 150 173
- 1 4 222 60 82 135
- 1 5 150 271 250 266
- 1 6 316 291 238 194
- 1 7 321 364 270 358
- 1 8 447 402 294 266
- 1 9 220 70 95 137
- 2 23 319 68 67 12
- 2 24 300 138 114 12
Hypothesis H01 no trt effect H02 no time
effect H03 no interaction Sphericity is
violated
23 Results of GLM Analysis
24 Results of GLM Analysis
25 Results of GLM Analysis
- The test of sphericity, when requested,
immediately precedes both sets of within-subjects
tests. - Although the output shows two separate tests of
sphericity, the only one of interest is the
second test, which is the test of sphericity
applied to the common covariance matrix of the
transformed within-subject variables. - If the Chi-square approximation has an associated
p value less than your alpha level, the
sphericity assumption has been violated
Sphericity Tests Sphericity Tests Sphericity Tests Sphericity Tests Sphericity Tests
Variables DF Mauchly's Criterion Chi-Square Pr gt ChiSq
Transformed Variates 5 0.462959 15.95853 0.0070
Orthogonal Components 5 0.462959 15.95853 0.0070
26 Example 3 continued rMANOVA
- The first multivariate test of a within-subjects
effect is the within-subjects main effect test. - It examines changes in response as a function of
time. - The null hypothesis is that the mean response
does not change over time.
MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no time EffectH Type III SSCP Matrix for timeE Error SSCP MatrixS1 M0.5 N9 MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no time EffectH Type III SSCP Matrix for timeE Error SSCP MatrixS1 M0.5 N9 MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no time EffectH Type III SSCP Matrix for timeE Error SSCP MatrixS1 M0.5 N9 MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no time EffectH Type III SSCP Matrix for timeE Error SSCP MatrixS1 M0.5 N9 MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no time EffectH Type III SSCP Matrix for timeE Error SSCP MatrixS1 M0.5 N9 MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no time EffectH Type III SSCP Matrix for timeE Error SSCP MatrixS1 M0.5 N9
Statistic Value F Value Num DF Den DF Pr gt F
Wilks' Lambda 0.19328615 27.82 3 20 lt.0001
Pillai's Trace 0.80671385 27.82 3 20 lt.0001
Hotelling-Lawley Trace 4.17367645 27.82 3 20 lt.0001
Roy's Greatest Root 4.17367645 27.82 3 20 lt.0001
27 - Next SAS tests the hypothesis that treatment
interacts with time. - In this instance, the F value associated with
these multivariate tests of the interaction is
high therefore, the associated p value is low
F(3, 20) 6.73, p .0025.
MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no timeGroup EffectH Type III SSCP Matrix for timeGroupE Error SSCP MatrixS1 M0.5 N9 MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no timeGroup EffectH Type III SSCP Matrix for timeGroupE Error SSCP MatrixS1 M0.5 N9 MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no timeGroup EffectH Type III SSCP Matrix for timeGroupE Error SSCP MatrixS1 M0.5 N9 MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no timeGroup EffectH Type III SSCP Matrix for timeGroupE Error SSCP MatrixS1 M0.5 N9 MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no timeGroup EffectH Type III SSCP Matrix for timeGroupE Error SSCP MatrixS1 M0.5 N9 MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no timeGroup EffectH Type III SSCP Matrix for timeGroupE Error SSCP MatrixS1 M0.5 N9
Statistic Value F Value Num DF Den DF Pr gt F
Wilks' Lambda 0.49748100 6.73 3 20 0.0025
Pillai's Trace 0.50251900 6.73 3 20 0.0025
Hotelling-Lawley Trace 1.01012703 6.73 3 20 0.0025
Roy's Greatest Root 1.01012703 6.73 3 20 0.0025
28 - Between-Subjects Tests
- Following the MANOVA for multivariate tests of
significance for within-subjects effects, SAS
prints tests of the between-subjects effects.
There is only one approach to testing these
effects.
The GLM Procedure Repeated Measures Analysis of
Variance Tests of Hypotheses for Between Subjects
Effects
Source DF Type III SS Mean Square F Value Pr gt F
Group 1 248677.0417 248677.0417 19.64 0.0002
Error 22 278540.9583 12660.9527
Source DF Type III SS Mean Square F Value Pr gt F Adj Pr gt F Adj Pr gt F
Source DF Type III SS Mean Square F Value Pr gt F G - G H-F-L
time 3 326635.5833 108878.5278 37.80 lt.0001 lt.0001 lt.0001
timeGroup 3 59461.8750 19820.6250 6.88 0.0004 0.0019 0.0012
Error(time) 66 190098.5417 2880.2809
Greenhouse-Geisser Epsilon 0.7204
Huynh-Feldt-Lecoutre Epsilon 0.8016
29- Observations
- The sphericity assumption was violated
- With nonspherical data either use corrected
univariate tests that we described earlier or use
results from MANOVA test. - The corrected univariate p values appear under
the G - G and H - F headers in the output shown
above. - Note that in this case, rANOVA agrees with the
MANOVA that there is a statistically significant
within-subjects main effect for time, as well as
interaction between treatment and time. - Further polynomial contrast analysis can be made
on time
30Analysis of Variance of Contrast Variables time_N
represents the nth degree polynomial contrast for
time
Source DF Type III SS Mean Square F Value Pr gt F
Mean 1 89560.72781 89560.72781 44.23 lt.0001
Group 1 20747.22078 20747.22078 10.24 0.0041
Error 22 44552.41504 2025.10977
time_1
time_2
Source DF Type III SS Mean Square F Value Pr gt F
Mean 1 186802.0020 186802.0020 37.82 lt.0001
Group 1 4428.6429 4428.6429 0.90 0.3539
Error 22 108650.6885 4938.6677
time_3
Source DF Type III SS Mean Square F Value Pr gt F
Mean 1 50272.85354 50272.85354 29.98 lt.0001
Group 1 34286.01136 34286.01136 20.44 0.0002
Error 22 36895.43813 1677.06537
31More on orthogonal Contrast
- proc glm datashort
- class group
- model y0 y1 y3 y6 group/ nouni
- repeated time 4 (0 1 3 6) profile /summary
printm NOM generates contrasts between adjacent
levels of the factor - proc glm datashort
- class group
- model y0 y1 y3 y6 group/ nouni
- repeated time 4 (0 1 3 6) helmert /summary
printm NOM HELMERT-generates contrasts between
each level of the factor and the mean of
subsequent levels. - run
32Day17 Strategy 6 Mixed Model Approach
- Models with fixed and random effects are mixed
models - treatment, which is usually considered a fixed
effect - subject factor is a random effect
- Analysis can follow
- Linear Mixed models
- Covariance pattern models user specifies
covariance structure - Random coefficient models induce covariance
structure
33SAS Mixed Repeated Measures Syntax
34SAS Mixed Model
- PROC MIXED cl
- CLASS
- MODEL ltdependent variablegt ltfixed sourcesgt
- cl requests confidence limits for variance
covariance estimates - Identifies variables used as sources of variation
and subject option of REPEATED statement - Specifies dependent variable and all fixed
sources of variation (includes treatment, time
and their interaction. The ddfm option computes
the correct degrees of freedom for the various
terms.
35SAS Mixed Model
- REPEATED/ subject ltEU idgt typeltcovariance
structuregt r rcorr
- subject identifies the experimental unit in
the data set which represents the repeated
measures. It identifies the units that are
indpendent. - type identifies the covariance structure
- r requests printing of the covariance matrix for
the repeated measures - rcorr requests printing of the correlation matrix
for the repeated measures
36Covariance Structures Independent with common
variance
- Equal variances along main diagonal
- Zero covariances along off diagonal
- Variances constant and residuals independent
across time. - The standard ANOVA model
- Simple, because a single parameter is estimated
the pooled variance
37Covariance Structures Unstructured
- Separate variances on diagonal
- Separate covariances on off diagonal
- Most complex structure
- Variance estimated for each time, covariance for
each pair of times - Need to estimate 10 parameters, 104(41)/2
- Leads to less precise parameter estimation
(degrees of freedom problem)
38Covariance Structures compound symmetry
- Equal variances on diagonal
- equal covariances along off diagonal (equal
correlation) - Simplest structure for fitting repeated measures
- Split-plot in time analysis
- Used for past 50 years
- Requires estimation of 2 parameters
39Covariance Structures First order
Autoregressive
- Equal variances on main-diagonal
- Off diagonal represents variance multiplied by
the correlation raised to increasing powers as
the observations become increasingly separated in
time. - Increasing power means decreasing covariances.
- Times must be equally ordered and equally spaced.
- Estimates 2 parameters
- AR(1)
40First order Autoregressive Heterogeneous
- unequal variances on main-diagonal
- Off diagonal represents product of standard
errors multiplied by the correlation raised to
increasing powers as the observations become
increasingly separated in time. - Increasing power means decreasing covariances.
- Times must be equally ordered and equally spaced.
- Estimates 5 parameters
- ARH(1)
41Strategies for Finding suitable covariance
structures
- Run unstructured first
- Next run compound symmetry simplest repeated
measures structure - Next try other covariance structures that best
fit the experimental design
42Criteria for Selecting best Covariance Structure
- Need to use model fitting statistics
- AIC Akaikes Information Criteria
- BIC Schwarzs Bayesian Criteria
- Let q of covariance parameters, p of fixed
effect parameters in model and n of
observations and - AIC -2log(L) 2q
- BIC -2log(L) q log(n)
- AAIC -2log(L) q(logn 1)
- Smaller the number the better
- Goal covariance structure that is better than
compound symmetry
43 - Example3 Suppose 24 subjects are randomly
assigned to two groups (Control and Treatment)
and their responses are measured at 4 times.
These times are labeled as 0 (baseline), 1 (after
one month posttest) 3 (after 3 months of
follow-up) and 6 (after 6 months of follow-up).
proc corr datashort cov var yo y1 y3
y6 run Pearson Correlation Coefficients, N
24 yo y1 y3 y6 Y0 1.00
0.51 0.50 0.07 y1 1.00
0.93 0.67 y3 1.00 0.65 Y4 1.
00
- - What type of correlation structure do you think
is right? - Variances are 5456, 13505, 7881,6929
- exercise compare models for this
44Example4 exercise pulse study
- Exercise data examples
- The data consists of people who were randomly
assigned to two different diets low-fat and not
low-fat and three different types of exercise at
rest, walking leisurely and running. Their pulse
rate was measured at three different time points
during their assigned exercise at 1 minute, 15
minutes and 30 minutes. - data exercise
- input id exertype diet time1 time2 time3
- cards
- 1 1 1 85 85 88
- 2 1 1 90 92 93
- 3 1 1 97 97 94
- 4 1 1 80 82 83
- 5 1 1 91 92 91
- 6 1 2 83 83 84
- 7 1 2 87 88 90
- 8 1 2 92 94 95
- 9 1 2 97 99 96
- 10 1 2 100 97 100
45Example
- Let's look at the correlations, variances and
covariances for the exercise data. - since, we cannot use this kind of covariance
structures in a traditional repeated measures
analysis, we will use SAS PROC MIXED for such an
analysis. - proc corr dataexercise cov
- var time1 time2 time3
- run
- Pearson Correlation Coefficients, N 30
- time1 time2 time3
- time1 1.00000 0.54454 0.51915
- time2 0.54454 1.00000 0.85028
- time3 0.51915 0.85028 1.00000
46Example compound symmetry
- proc mixed datalong
- class exertype time
- model pulse exertype time exertypetime
- repeated time / subjectid typecs
- run
- Fit Statistics
- -2 Res Log Likelihood 590.8
- AIC (smaller is better) 594.8
- AICC (smaller is better) 595.0
- BIC (smaller is better) 597.6
- Null Model Likelihood Ratio Test
- DF Chi-Square Pr gt ChiSq
- 1 15.36 lt.0001
- Type 3 Tests of Fixed Effects
- Num Den
- Effect DF DF F Value Pr gt
F - exertype 2 27 27.00
lt.0001 - time 2 54 23.54
lt.0001
47Example unstructured
- proc mixed datalong
- class exertype time
- model pulse exertype time exertypetime
- repeated time / subjectid typeun
- run
- Fit Statistics
- -2 Res Log Likelihood 577.7
- AIC (smaller is better) 589.7
- AICC (smaller is better) 590.9
- BIC (smaller is better) 598.1
- Null Model Likelihood Ratio Test
- DF Chi-Square Pr gt ChiSq
- 5 28.46 lt.0001
- Type 3 Tests of Fixed Effects
- Num Den
- Effect DF DF F Value Pr gt
F - exertype 2 27 27.00
lt.0001 - time 2 27 22.32
lt.0001
48Example AR(1)
- proc mixed datalong
- class exertype time
- model pulse exertype time exertypetime
- repeated time / subjectid typear(1)
- run
- -2 Res Log Likelihood 590.1
- AIC (smaller is better) 594.1
- AICC (smaller is better) 594.3
- BIC (smaller is better) 596.9
- Null Model Likelihood Ratio Test
- DF Chi-Square Pr gt ChiSq
- 1 16.08 lt.0001
- Type 3 Tests of Fixed Effects
- Num Den
- Effect DF DF F Value Pr gt
F - exertype 2 27 28.39
lt.0001 - time 2 54 18.20
lt.0001 - exertypetime 4 54 11.73 lt.0001
49Example ARH(1)
- proc mixed datalong
- class exertype time
- model pulse exertype time exertypetime
- repeated time / subjectid typearh(1)
- run
- Covariance Parameter Estimates
- Cov
- Parm Subject Estimate
- Var(1) id 35.7683
- Var(2) id 87.1927
- Var(3) id 115.50
- ARH(1) id 0.5101
- Fit Statistics
- -2 Res Log Likelihood 579.8
- AIC (smaller is better) 587.8
- AICC (smaller is better) 588.3
- BIC (smaller is better) 593.4
- Null Model Likelihood Ratio Test
- DF Chi-Square Pr gt ChiSq
50Example model comparison
Model AIC -2RLL Parms(df 1) Diff -2RLL(vs. CS) Diff in df (vs. CS) p value for Diff (from a chi square dist)
Compound Symmetry 594.8 590.8 2
Unstructured 589.7 577.7 6 13.1 4 .01
Autoregressive 594.1 590.1 2 .7 0 na
Autoregressive Heterogenous Variances 587.8 579.8 4 11 2 0.027
The two most promising structures are
Autoregressive Heterogeneous Variances and
Unstructured since these two models have the
smallest AIC values and the -2 Log Likelihood
scores are significantly smaller than the -2 Log
Likehood scores of other models.
51RM with two group factors
- Looking at models including only diet or exertype
separately does not answer all our questions. We
would also like to know if the people on the
low-fat diet who engage in running have lower
pulse rates than the people participating in the
not low-fat diet who are not running. In order to
address these types of questions we need to look
at a model that includes the interaction of diet
and exertype. - proc mixed datalong
- class diet exertype time
- model pulse exertypediettime
- repeated time / subjectid typearh(1)
- run
- quit
- proc glm dataexercise
- class diet exertype
- model time1 time2 time3 dietexertype
- repeated time 3
- run
- quit
52Group comparison in Proc Mixed
- If we would like to look at the differences among
groups at each level of another variable we have
to utilize the lsmeans statement with the slice
option. - For example, we could test for differences among
the exertype groups at each level of diet across
all levels of time or we could test for
differences in groups of exertype for each time
point across both levels of diet we could also
test for differences in groups of exertype for
each combination of time and diet levels. - proc mixed datalong
- class diet exertype time
- model pulse exertypediettime
- repeated time / subjectid typearh(1)
- lsmeans dietexertype / slicediet /testing
for differences among exertype for each level of
diet / - lsmeans exertypetime / slicetime
/differences in exertype for each time point/ - lsmeans exertypediettime / slicetimediet
- run
- quit
53Worked Example from JL Text Book (Self Read)
54Subject time 1 time 2 time 3 time 4
time 5 summary 1 y11 y12
y13 y14 y15 f(y11,
,y15) 2 y21 y22
y23 y24 y25 f(y21,
,y25)
n
yn1 yn2 yn3 yn4
yn5 f(yn1, ,yn5)
Summarizing with function over time removes
correlation Growth Curve
Approach
5523 factorial design with temp, moisture and soil
type Each combination of factor level were
randomly assigned to two pots of soil Samples of
soil were taken in days 0, 7,14,30 and
60 Concentration of herbicide was measured for
each sample sphericity condition does not hold
56(No Transcript)
57 58 xi day, n 5 yi log(concentration) f k
slope
sums
59(No Transcript)
60(No Transcript)
61If you dont have a summary function, proc glm
can summarize with orthogonal polynomials over
time.
Linear orthogonal polynomial over time
62Quadratic orthogonal polynomial over time