1 / 62

LECTURE NOTESRepeated Measures Analysis

MANOVA and Covariance Pattern models

Day16 Basic Repeated Measures Design

- Data collected in a sequence of evenly spaced

points in time (not necessarily equally spaced) - Treatments are assigned to experimental units
- I.e., subjects
- Two factors
- Treatment between-subjects factor
- Time within-subjects factor

Hypotheses

- How do treatment differences change over time?
- Is there a Treatment ? Time interaction?
- How do response means change by trt?
- Is there a Trt main effect?
- How do response means change over time?
- Is there a Time main effect?

Example Two groups

id group time1 time2 time3 time4 1

A 31 29 15 26 2

A 24 28 20 32 3

A 14 20 28 30 4

B 38 34 30 34 5

B 25 29 25 29 6

B 30 28 16 34

- Preliminary Analysis this includes
- Profile plots
- Mean plots
- Correlation between repeated measurements

Profile plots by group

- differences at baseline among subjects
- different trends for different subjects
- Variability higher at time 1 and low at time 4

B

A

Mean plots by group

- differences at baseline between group means
- non-linear trends

B

A

(b) Correlation (covariance) across time points

time1 time2 time3

time4 time1 1.00000 0.94035

-0.14150 0.28445 time2 0.94035

1.00000 -0.02819 0.26921

time3 -0.14150 -0.02819

1.00000 0.27844 time4 0.28445

0.26921 0.27844 1.00000

Certainly do NOT have equal correlations (CS?)!

Time1 and time2 are highly correlated, but time1

and time3 are inversely correlated!

Statistical analysis strategies

- Strategy 1 ANCOVA on the final measurement,

adjusting for baseline differences (end-point

analysis) - Strategy 2 repeated-measures ANOVA

Univariate approach - Strategy 4 Multivariate ANOVA approach
- Strategy 3 Summary approach
- Strategy 5 GEE
- Strategy 6 Mixed Models

Comparison of traditional and new methods

FROM Ralitza Gueorguieva, PhD John H. Krystal,

MD Move Over ANOVA Progress in Analyzing

Repeated-Measures Data and Its Reflection in

Papers Published in the Archives of General

Psychiatry. Arch Gen Psychiatry.Â 200461310-317.

General syntax in SAS's GLM procedure

- Syntax for MANOVA and rANOVA
- PROC GLM DATA sas-dataset-name Â CLASS

factor1 factor2 ... factork Â MODEL y1 y2 ...

yk factor1 ... factork Â REPEATED

repeated-factor-name k / PRINTE Â LSMEANS

factor1 factor2 ... factork RUN - Output can be restricted to rANOVA only using NOM

option and to MANOVA analysis using NOU in the

Repeated Statement

Strategy 1. End-point analysis

ANCOVA Asks whether or not the two group means

differ at the final time point, adjusting for

differences at baseline (using ANCOVA).

proc glm datahorizontal class group model

time4 time1 group run

- Comparing groups at every follow-up time point

in this way would hugely increase your type I

error.

Strategy 2 univariate repeated measures ANOVA

(rANOVA)

Explain away some error variability by accounting

for differences between subjects - requires

Sphericity

proc glm datahorizontal class group

model time1-time4 group repeated time

4/printe run quit

Strategy 3 Summary analysis

- One way to overcome the problem of correlated

observations over time within each subject is to

summarize the observations over time by their

mean or some function and use ANOVA - This summary analysis leads to a conservative

test - Example, avetimemean(time1,time2,time3,time4)

proc glm datahorizontal class group

model avetime group run quit

- A special application of this is pre-post

analysis

Strategy4 MANOVA Approach

- Successive response measurements made over time

are considered correlated dependent variables - That is, response variables for each level of

within-subject factor is presumed to be a

different dependent variable - MANOVA assumes there is an unstructured

covariance matrix for dependent variables

Why MANOVA

- You do a MANOVA instead of a series of

one-at-a-time ANOVAs for two main reasons - to reduce the experiment-wise level of Type I

error. - None of the individual ANOVAs may produce a

significant main effect on the response, but in

combination they might, which suggests that the

variables are more meaningful taken together than

considered separately - MANOVA takes into account the inter-correlations

among the response Variables

MANOVA

- If the multivariate test is
- not significant, report no group differences

among the mean vectors - significant, perform univariate ANOVA and

relevant contrasts - Contrasts (similar to contrasts we considered

previously) - Prior (planned)
- Post hoc (unplanned)

MANOVA Test Statistics

- SAS reports four tests
- Wilks Lmbda
- Pillais trace (good for smaller sample size)
- Hotelling- Lawley Trace
- Roys greatest root
- These are covered in Multivariate class
- We will use results from Wilks

MANOVA Test Statistics

- Wilks Lambda (?) was the first MANOVA test

statistic developed and is very important for

several multivariate procedures in addition to

MANOVA. - Wilks Lambda (?) is the error sum of squares (E)

divided by the sum of the effect sum of squares

(H) and the error sum of squares. - The quantity (1 - ?) is often interpreted as the

proportion of variance in the dependent variables

explained by the model effect. However, this

quantity is not unbiased and can be quite

misleading in small samples. - ? is approximately chi-square distributed

rANOVA vs. MANOVA

- For tests that involve only between-subjects

effects, both the MANOVA rANOVA give rise to

the same tests. - For within-subject effects they yield different

tests. - In Proc GLM, rANOVA are in a table "Univariate

Tests of Hypotheses for Within Subject Effects." - Results for MANOVA are displayed in a table

labeled "Repeated Measures Analysis of Variance. - The multivariate tests are Wilks lambda,

Pillais trace, Hotelling-Lawley trace, and Roys

greatest root. - The only assumption required for valid tests is

that the dependent variables in the model have a

multivariate normal distribution with a common

covariance matrix across the between-subject

effects.

Boxs test of equal covariances

- Boxs M test can be used if there are significant

differences among the covariance matrices by

group. - when Boxs test finds that the covariance

matrices are significantly different across

groups that may indicate an increased possibility

of Type I error, so you might want to make a

smaller error region (alpha0.001). - If you redid the analysis with a confidence level

of .001, you should report the results of the

Boxs M.

Box's M test for equality of variances proc

discrim dataexercise methodnormal pooltest

wcov class diet var time1 time2 time3 run

- Example3 Suppose 24 subjects are randomly

assigned to two groups (Control and Treatment)

and their responses are measured at 4 times.

These times are labeled as 0 (baseline), 1 (after

one month posttest) 3 (after 3 months of

follow-up) and 6 (after 6 months of follow-up). - time is the within-subjects factor in this design

- Treatment is the between-subjects (grouping)

factor

- Some of the data points
- data short
- input Group Subj y0 y1 y3 y6
- datalines
- 1 1 296 175 187 242
- 1 2 376 329 236 126
- 1 3 309 238 150 173
- 1 4 222 60 82 135
- 1 5 150 271 250 266
- 1 6 316 291 238 194
- 1 7 321 364 270 358
- 1 8 447 402 294 266
- 1 9 220 70 95 137
- 2 23 319 68 67 12
- 2 24 300 138 114 12

Hypothesis H01 no trt effect H02 no time

effect H03 no interaction Sphericity is

violated

Results of GLM Analysis

Results of GLM Analysis

Results of GLM Analysis

- The test of sphericity, when requested,

immediately precedes both sets of within-subjects

tests. - Although the output shows two separate tests of

sphericity, the only one of interest is the

second test, which is the test of sphericity

applied to the common covariance matrix of the

transformed within-subject variables. - If the Chi-square approximation has an associated

p value less than your alpha level, the

sphericity assumption has been violated

Sphericity Tests Sphericity Tests Sphericity Tests Sphericity Tests Sphericity Tests

Variables DF Mauchly's Criterion Chi-Square Pr gt ChiSq

Transformed Variates 5 0.462959 15.95853 0.0070

Orthogonal Components 5 0.462959 15.95853 0.0070

Example 3 continued rMANOVA

- The first multivariate test of a within-subjects

effect is the within-subjects main effect test. - It examines changes in response as a function of

time. - The null hypothesis is that the mean response

does not change over time.

MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no time EffectH Type III SSCP Matrix for timeE Error SSCP MatrixS1 M0.5 N9 MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no time EffectH Type III SSCP Matrix for timeE Error SSCP MatrixS1 M0.5 N9 MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no time EffectH Type III SSCP Matrix for timeE Error SSCP MatrixS1 M0.5 N9 MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no time EffectH Type III SSCP Matrix for timeE Error SSCP MatrixS1 M0.5 N9 MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no time EffectH Type III SSCP Matrix for timeE Error SSCP MatrixS1 M0.5 N9 MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no time EffectH Type III SSCP Matrix for timeE Error SSCP MatrixS1 M0.5 N9

Statistic Value F Value Num DF Den DF Pr gt F

Wilks' Lambda 0.19328615 27.82 3 20 lt.0001

Pillai's Trace 0.80671385 27.82 3 20 lt.0001

Hotelling-Lawley Trace 4.17367645 27.82 3 20 lt.0001

Roy's Greatest Root 4.17367645 27.82 3 20 lt.0001

- Next SAS tests the hypothesis that treatment

interacts with time. - In this instance, the F value associated with

these multivariate tests of the interaction is

high therefore, the associated p value is low

F(3, 20) 6.73, p .0025.

MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no timeGroup EffectH Type III SSCP Matrix for timeGroupE Error SSCP MatrixS1 M0.5 N9 MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no timeGroup EffectH Type III SSCP Matrix for timeGroupE Error SSCP MatrixS1 M0.5 N9 MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no timeGroup EffectH Type III SSCP Matrix for timeGroupE Error SSCP MatrixS1 M0.5 N9 MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no timeGroup EffectH Type III SSCP Matrix for timeGroupE Error SSCP MatrixS1 M0.5 N9 MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no timeGroup EffectH Type III SSCP Matrix for timeGroupE Error SSCP MatrixS1 M0.5 N9 MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no timeGroup EffectH Type III SSCP Matrix for timeGroupE Error SSCP MatrixS1 M0.5 N9

Statistic Value F Value Num DF Den DF Pr gt F

Wilks' Lambda 0.49748100 6.73 3 20 0.0025

Pillai's Trace 0.50251900 6.73 3 20 0.0025

Hotelling-Lawley Trace 1.01012703 6.73 3 20 0.0025

Roy's Greatest Root 1.01012703 6.73 3 20 0.0025

- Between-Subjects Tests
- Following the MANOVA for multivariate tests of

significance for within-subjects effects, SAS

prints tests of the between-subjects effects.

There is only one approach to testing these

effects.

The GLM Procedure Repeated Measures Analysis of

Variance Tests of Hypotheses for Between Subjects

Effects

Source DF Type III SS Mean Square F Value Pr gt F

Group 1 248677.0417 248677.0417 19.64 0.0002

Error 22 278540.9583 12660.9527

Source DF Type III SS Mean Square F Value Pr gt F Adj Pr gt F Adj Pr gt F

Source DF Type III SS Mean Square F Value Pr gt F G - G H-F-L

time 3 326635.5833 108878.5278 37.80 lt.0001 lt.0001 lt.0001

timeGroup 3 59461.8750 19820.6250 6.88 0.0004 0.0019 0.0012

Error(time) 66 190098.5417 2880.2809

Greenhouse-Geisser Epsilon 0.7204

Huynh-Feldt-Lecoutre Epsilon 0.8016

- Observations
- The sphericity assumption was violated
- With nonspherical data either use corrected

univariate tests that we described earlier or use

results from MANOVA test. - The corrected univariate p values appear under

the G - G and H - F headers in the output shown

above. - Note that in this case, rANOVA agrees with the

MANOVA that there is a statistically significant

within-subjects main effect for time, as well as

interaction between treatment and time. - Further polynomial contrast analysis can be made

on time

Analysis of Variance of Contrast Variables time_N

represents the nth degree polynomial contrast for

time

Source DF Type III SS Mean Square F Value Pr gt F

Mean 1 89560.72781 89560.72781 44.23 lt.0001

Group 1 20747.22078 20747.22078 10.24 0.0041

Error 22 44552.41504 2025.10977

time_1

time_2

Source DF Type III SS Mean Square F Value Pr gt F

Mean 1 186802.0020 186802.0020 37.82 lt.0001

Group 1 4428.6429 4428.6429 0.90 0.3539

Error 22 108650.6885 4938.6677

time_3

Source DF Type III SS Mean Square F Value Pr gt F

Mean 1 50272.85354 50272.85354 29.98 lt.0001

Group 1 34286.01136 34286.01136 20.44 0.0002

Error 22 36895.43813 1677.06537

More on orthogonal Contrast

- proc glm datashort
- class group
- model y0 y1 y3 y6 group/ nouni
- repeated time 4 (0 1 3 6) profile /summary

printm NOM generates contrasts between adjacent

levels of the factor - proc glm datashort
- class group
- model y0 y1 y3 y6 group/ nouni
- repeated time 4 (0 1 3 6) helmert /summary

printm NOM HELMERT-generates contrasts between

each level of the factor and the mean of

subsequent levels. - run

Day17 Strategy 6 Mixed Model Approach

- Models with fixed and random effects are mixed

models - treatment, which is usually considered a fixed

effect - subject factor is a random effect
- Analysis can follow
- Linear Mixed models
- Covariance pattern models user specifies

covariance structure - Random coefficient models induce covariance

structure

SAS Mixed Repeated Measures Syntax

SAS Mixed Model

- PROC MIXED cl
- CLASS
- MODEL ltdependent variablegt ltfixed sourcesgt

- cl requests confidence limits for variance

covariance estimates - Identifies variables used as sources of variation

and subject option of REPEATED statement - Specifies dependent variable and all fixed

sources of variation (includes treatment, time

and their interaction. The ddfm option computes

the correct degrees of freedom for the various

terms.

SAS Mixed Model

- REPEATED/ subject ltEU idgt typeltcovariance

structuregt r rcorr

- subject identifies the experimental unit in

the data set which represents the repeated

measures. It identifies the units that are

indpendent. - type identifies the covariance structure
- r requests printing of the covariance matrix for

the repeated measures - rcorr requests printing of the correlation matrix

for the repeated measures

Covariance Structures Independent with common

variance

- Equal variances along main diagonal
- Zero covariances along off diagonal
- Variances constant and residuals independent

across time. - The standard ANOVA model
- Simple, because a single parameter is estimated

the pooled variance

Covariance Structures Unstructured

- Separate variances on diagonal
- Separate covariances on off diagonal
- Most complex structure
- Variance estimated for each time, covariance for

each pair of times - Need to estimate 10 parameters, 104(41)/2
- Leads to less precise parameter estimation

(degrees of freedom problem)

Covariance Structures compound symmetry

- Equal variances on diagonal
- equal covariances along off diagonal (equal

correlation) - Simplest structure for fitting repeated measures
- Split-plot in time analysis
- Used for past 50 years
- Requires estimation of 2 parameters

Covariance Structures First order

Autoregressive

- Equal variances on main-diagonal
- Off diagonal represents variance multiplied by

the correlation raised to increasing powers as

the observations become increasingly separated in

time. - Increasing power means decreasing covariances.
- Times must be equally ordered and equally spaced.
- Estimates 2 parameters
- AR(1)

First order Autoregressive Heterogeneous

- unequal variances on main-diagonal
- Off diagonal represents product of standard

errors multiplied by the correlation raised to

increasing powers as the observations become

increasingly separated in time. - Increasing power means decreasing covariances.
- Times must be equally ordered and equally spaced.
- Estimates 5 parameters
- ARH(1)

Strategies for Finding suitable covariance

structures

- Run unstructured first
- Next run compound symmetry simplest repeated

measures structure - Next try other covariance structures that best

fit the experimental design

Criteria for Selecting best Covariance Structure

- Need to use model fitting statistics
- AIC Akaikes Information Criteria
- BIC Schwarzs Bayesian Criteria
- Let q of covariance parameters, p of fixed

effect parameters in model and n of

observations and - AIC -2log(L) 2q
- BIC -2log(L) q log(n)
- AAIC -2log(L) q(logn 1)
- Smaller the number the better
- Goal covariance structure that is better than

compound symmetry

- Example3 Suppose 24 subjects are randomly

assigned to two groups (Control and Treatment)

and their responses are measured at 4 times.

These times are labeled as 0 (baseline), 1 (after

one month posttest) 3 (after 3 months of

follow-up) and 6 (after 6 months of follow-up).

proc corr datashort cov var yo y1 y3

y6 run Pearson Correlation Coefficients, N

24 yo y1 y3 y6 Y0 1.00

0.51 0.50 0.07 y1 1.00

0.93 0.67 y3 1.00 0.65 Y4 1.

00

- - What type of correlation structure do you think

is right? - Variances are 5456, 13505, 7881,6929
- exercise compare models for this

Example4 exercise pulse study

- Exercise data examples
- The data consists of people who were randomly

assigned to two different diets low-fat and not

low-fat and three different types of exercise at

rest, walking leisurely and running. Their pulse

rate was measured at three different time points

during their assigned exercise at 1 minute, 15

minutes and 30 minutes. - data exercise
- input id exertype diet time1 time2 time3
- cards
- 1 1 1 85 85 88
- 2 1 1 90 92 93
- 3 1 1 97 97 94
- 4 1 1 80 82 83
- 5 1 1 91 92 91
- 6 1 2 83 83 84
- 7 1 2 87 88 90
- 8 1 2 92 94 95
- 9 1 2 97 99 96
- 10 1 2 100 97 100

Example

- Let's look at the correlations, variances and

covariances for the exercise data. - since, we cannot use this kind of covariance

structures in a traditional repeated measures

analysis, we will use SAS PROC MIXED for such an

analysis. - proc corr dataexercise cov
- var time1 time2 time3
- run
- Pearson Correlation Coefficients, N 30
- time1 time2 time3
- time1 1.00000 0.54454 0.51915
- time2 0.54454 1.00000 0.85028
- time3 0.51915 0.85028 1.00000

Example compound symmetry

- proc mixed datalong
- Â class exertype time
- Â model pulse exertype time exertypetime
- Â repeated time / subjectid typecs
- run
- Fit Statistics
- -2 Res Log Likelihood 590.8
- AIC (smaller is better) 594.8
- AICC (smaller is better) 595.0
- BIC (smaller is better) 597.6
- Null Model Likelihood Ratio Test
- DF Chi-Square Pr gt ChiSq
- 1 15.36 lt.0001
- Type 3 Tests of Fixed Effects
- Num Den
- Effect DF DF F Value Pr gt

F - exertype 2 27 27.00

lt.0001 - time 2 54 23.54

lt.0001

Example unstructured

- proc mixed datalong
- Â class exertype time
- Â model pulse exertype time exertypetime
- Â repeated time / subjectid typeun
- run
- Fit Statistics
- -2 Res Log Likelihood 577.7
- AIC (smaller is better) 589.7
- AICC (smaller is better) 590.9
- BIC (smaller is better) 598.1
- Null Model Likelihood Ratio Test
- DF Chi-Square Pr gt ChiSq
- 5 28.46 lt.0001
- Type 3 Tests of Fixed Effects
- Num Den
- Effect DF DF F Value Pr gt

F - exertype 2 27 27.00

lt.0001 - time 2 27 22.32

lt.0001

Example AR(1)

- proc mixed datalong
- Â class exertype time
- Â model pulse exertype time exertypetime
- Â repeated time / subjectid typear(1)
- run
- -2 Res Log Likelihood 590.1
- AIC (smaller is better) 594.1
- AICC (smaller is better) 594.3
- BIC (smaller is better) 596.9
- Null Model Likelihood Ratio Test
- DF Chi-Square Pr gt ChiSq
- 1 16.08 lt.0001
- Type 3 Tests of Fixed Effects
- Num Den
- Effect DF DF F Value Pr gt

F - exertype 2 27 28.39

lt.0001 - time 2 54 18.20

lt.0001 - exertypetime 4 54 11.73 lt.0001

Example ARH(1)

- proc mixed datalong
- Â class exertype time
- Â model pulse exertype time exertypetime
- Â repeated time / subjectid typearh(1)
- run
- Covariance Parameter Estimates
- Cov
- Parm Subject Estimate
- Var(1) id 35.7683
- Var(2) id 87.1927
- Var(3) id 115.50
- ARH(1) id 0.5101
- Fit Statistics
- -2 Res Log Likelihood 579.8
- AIC (smaller is better) 587.8
- AICC (smaller is better) 588.3
- BIC (smaller is better) 593.4
- Null Model Likelihood Ratio Test
- DF Chi-Square Pr gt ChiSq

Example model comparison

Model AIC -2RLL Parms(df 1) Diff -2RLL(vs. CS) Diff in df (vs. CS) p value for Diff (from a chi square dist)

Compound Symmetry 594.8 590.8 2 Â Â Â

Unstructured 589.7 577.7 6 13.1 4 .01

Autoregressive 594.1 590.1 2 .7 0 na

Autoregressive Heterogenous Variances 587.8 579.8 4 11 2 0.027

The two most promising structures are

Autoregressive Heterogeneous Variances and

Unstructured since these two models have the

smallest AIC values and the -2 Log Likelihood

scores are significantly smaller than the -2 Log

Likehood scores of other models.

RM with two group factors

- Looking at models including only diet or exertype

separately does not answer all our questions. We

would also like to know if the people on the

low-fat diet who engage in running have lower

pulse rates than the people participating in the

not low-fat diet who are not running. In order to

address these types of questions we need to look

at a model that includes the interaction of diet

and exertype. - proc mixed datalong
- class diet exertype time
- model pulse exertypediettime
- repeated time / subjectid typearh(1)
- run
- quit
- proc glm dataexercise
- class diet exertype
- model time1 time2 time3 dietexertype
- repeated time 3
- run
- quit

Group comparison in Proc Mixed

- If we would like to look at the differences among

groups at each level of another variable we have

to utilize the lsmeans statement with the slice

option. - For example, we could test for differences among

the exertype groups at each level of diet across

all levels of time or we could test for

differences in groups of exertype for each time

point across both levels of diet we could also

test for differences in groups of exertype for

each combination of time and diet levels. - proc mixed datalong
- class diet exertype time
- model pulse exertypediettime
- repeated time / subjectid typearh(1)
- lsmeans dietexertype / slicediet /testing

for differences among exertype for each level of

diet / - lsmeans exertypetime / slicetime

/differences in exertype for each time point/ - lsmeans exertypediettime / slicetimediet
- run
- quit

Worked Example from JL Text Book (Self Read)

Subject time 1 time 2 time 3 time 4

time 5 summary 1 y11 y12

y13 y14 y15 f(y11,

,y15) 2 y21 y22

y23 y24 y25 f(y21,

,y25)

n

yn1 yn2 yn3 yn4

yn5 f(yn1, ,yn5)

Summarizing with function over time removes

correlation Growth Curve

Approach

23 factorial design with temp, moisture and soil

type Each combination of factor level were

randomly assigned to two pots of soil Samples of

soil were taken in days 0, 7,14,30 and

60 Concentration of herbicide was measured for

each sample sphericity condition does not hold

(No Transcript)

xi day, n 5 yi log(concentration) f k

slope

sums

(No Transcript)

(No Transcript)

If you dont have a summary function, proc glm

can summarize with orthogonal polynomials over

time.

Linear orthogonal polynomial over time

Quadratic orthogonal polynomial over time