Analysis of ordinal repeated categorical response data by using marginal model (Maximum likelihood approach) - PowerPoint PPT Presentation

About This Presentation

Title:

Analysis of ordinal repeated categorical response data by using marginal model (Maximum likelihood approach)

Description:

Analysis of ordinal repeated categorical response data by using marginal model ... Since the response is ordinal, so it is often advantageous to construct logits ... – PowerPoint PPT presentation

Number of Views:220

Avg rating:3.0/5.0

Slides: 44

Provided by: aa289

Category:

more less

Transcript and Presenter's Notes

Title: Analysis of ordinal repeated categorical response data by using marginal model (Maximum likelihood approach)

1
Analysis of ordinal repeated categorical
response data by using marginal model (Maximum
likelihood approach) by Abdul Salam Instructor
K.C. Carriere Stat 562
2
Contents

Introduction
Background of data
Objective of the study
Basic theory
Marginal model
Model fitting using ML
SAS Codes
Results
Conclusion

3
Introduction

Definition
Categorical data
Repeated categorical data
Advantages and Disadvantages of repeated
Measurements Designs

4
Definition

Categorical data
Categorical data fits into a small number of
discrete categories (as opposed to continuous).
Categorical data is either non-ordered (nominal)
such as gender or city, or ordered (ordinal) such
as high, medium, or low temperatures.

5
Definition (cont-)

Repeated categorical data
The term repeated measurements refers broadly
to data in which the response of each
experimental unit or subject is observed on
multiple occasions or under multiple conditions.
When the response is categorical then it is
called repeated categorical data.

6
Definition (cont-)

Application of Repeated categorical data
Repeated categorical response data occur commonly
in health-related application, especially in
longitudinal studies. For example, a physician
might evaluate patients at weekly intervals
regarding whether a new drug treatment is
successful. In some cases explanatory variable
also vary over time.

7
Advantages of Repeated Measurements Designs

Individual patterns of change.
Provide more efficient estimates of relevant
parameters than cross-sectional designs with the
same number and pattern of measurement.
Between subjects sources of variability can be
excluded from the experimental error.

8
Disadvantages of Repeated Measurements Designs

Analysis of repeated data is complicated by the
dependence among the repeated observations made
on the same experimental unit.
Often investigator cannot control the
circumstances for obtaining measurements, so that
the data may be unbalanced or partially
incomplete.

9
Background of Insomnia data

A randomized, double blind clinical trail has
been performed for comparing an active hypnotic
drug with a placebo in patients who have insomnia
problems. The outcome variable which is patients
response to the question, How quickly did you
fall asleep after going to bed? measured using
categories (lt20 minutes, 20-30 minutes, 30-60
minutes, and gt60 minutes). Patients were asked
this question before and following a two-week
treatment period.

10
Background of Insomnia data

Patients were randomly assigned to one of the two
treatments active and placebo. The two
treatments, active and placebo, form a binary
explanatory variable. Patients receiving the two
treatments were independent samples.

11
Table1 Time to falling Asleep, by Treatment and
Occasion.(n239).
Time to Falling Asleep Time to Falling Asleep Time to Falling Asleep Time to Falling Asleep Time to Falling Asleep
Follow Up Follow Up Follow Up Follow Up
Treatment Initial lt20 min 20 30 min 30 60 min gt 60 min
Active lt20 7 4 1 0
20 30 11 5 2 2
30 60 13 23 3 1
60 9 17 13 8
Placebo lt20 7 4 2 1
20 30 14 5 1 0
30 60 6 9 18 2
gt 60 4 11 14 22
12
Objectives

To study the effect of time on the response.
To study the effect of treatment on the response.
Is the time to fall asleep is quicker for active
treatment than placebo?
Is there any interaction between treatment and
time? How does the treatment affect the time to
fall asleep over time?

13
Pharmaceutical Company Interest
Company hope that patients with a Active
treatment have a significantly higher rate of
improvement than patients with placebo.
14
Generalized linear model to the analysis of
Repeated Measurements Designs

Marginal Models
Random Effect Models
Transition models.

15
Basic Theory
16
GLMs for ordinal response.

Extensions of generalized linear model
methodology for the analysis of repeated
measurements accommodate discrete or continuous,
time-independent or dependent covariates. GLMs
have three components A random component, which
identify the response variable Y and its
probability distribution a systematic component
specify explanatory variables used in a linear
predictor function a link function specifies the
functional relationship between the systematic
component and the E(Y)..

17
Random Component.

Since the response is ordinal, so it is often
advantageous to construct logits that account for
categorical ordering and are less affected by the
number of choice of categories of the response,
which is known as cumulative response
probabilities, from which the cumulative logits
are defined. For ordinal response with c 1
ordered categories labeled as 0,1, 2,.,C for
each individuals or experimental unit. The
cumulative response probabilities are

j 0,1,.c Thus

18
Systematic component.

The systematic component of the generalized
linear model specifies the explanatory variables.
The linear combination of these explanatory
variables is called the linear predictor denoted
by

The vector ß characterizes how the
cross-sectional response distribution depends on
the explanatory variables.
19
Link Function.

The link function explain the relation ship
between random and systematic component, that how

relates to the explanatory variables in the
linear predictor. For ordinal response having c1
categories, one might use the cumulative logit.
Logitj logit P(Y j),
j1,..c
20
Link Function.
where
GLM is simplified to proportional odds model,
then ßj may simplify to ß indicating the same
effect for each logit. The proportional odds
model is
for j 1,.c,
21
Link Function.
For individuals with covariate vector x and x,
the odds ratio for the response below category j
is
The odds ratio does not depend on response
category j. The regression coefficient can be
calculated by taking log, which indicate the
difference in logit (log odds) of response
variable per unit change in the x.
22
Maximum Likelihood Method (ML).

The standard approach to maximum likelihood (ML)
fitting of marginal models involves solving the
score equations using the Newton-Raphson method,
Fisher scoring, or some other iterative
reweighted least squares algorithm. ML fitting of
marginal logit models is awkward. For T
observations on an I-category response, at each
setting of predictors the likelihood refers to IT
multinomial joint probabilities, but the model
applies to T sets of marginal multinomial
parameters, and assume that marginal multinomial
variates are independent.

23
ML Model Speciofication.

Let consider T categorical responses, where the
tth variable has It categories. The responses are
ordinal observed for P covariate patterns,
defined by a set of explanatory variables. Let r

denote the number of response profiles for each
covariate pattern. The vector of counts for
covariate pattern p is denoted by Yp. The Yp are
assumed to be independent multinomial random
vectors,
24
ML Model Speciofication.

Where is a vector of positive probabilities
and 1rT is a r-dimensional vector of 1s. Since
the model applies to T sets of marginal
multinomial parameters, the marginal models can
be written as a generalized linear model with the
link function,

25
ML Fitting of marginal Models
Lang and Agresti (1994) considered the likelihood
as a function of rather then. The likelihood
function for a marginal logit model is the
product of the multinomial mass functions from
the various predictors setting. One approach for
ML fitting views the model as a set of
constraints and uses methods for maximizing a
function subject to constraints
26
ML Fitting of marginal Models
Let be a vector having elements and the
lagrange multipliers . The Lagrangian
likelihood equations have form
where
is a vector with terms involving the contents in
marginal logits that the model specifies
constraints as well as log-likelihood derivative.
The Newton-Raphson iterative scheme is
27
ML Fitting of marginal Models
After obtaining the fitted values on convergence
of the algorithm, they calculate model parameter
estimates using
This maximum likelihood fitting method makes no
assumption about the model that describes the
joint distribution. Thus, when the marginal
model holds, the ML estimate are consistent
regardless of the dependence structure for that
distribution.
28
Inference

Hypothesis testing for parameters
After obtaining model parameter estimates and
estimated covariance matrix, one can apply
standard methods of inference, for instance Wald
chi-squared test for marginal homogeneity.
Goodness of Fit test
To assess model goodness of fit, one can compare
observed and fitted cell counts using the
likelihood-ratio statistics G2 or the Pearson
Chi-square statistics. For nonsparse tables,
assuming that the model holds, these statistics
have approximate chi-squared distributions with
degree of freedom equal to the number of
constraints implied by

29
Limitations of ML

The number of multinomial probabilities increases
dramatically as the number of predictors
increases.
ML approaches are not practical when T is large
or there are many predictors, especially when
some are continuous.
It does not make any assumption about the model
that describes the joint distribution .

30
Results
Time to Falling Asleep Time to Falling Asleep Time to Falling Asleep Time to Falling Asleep Time to Falling Asleep
Treatment Occasion lt20 min 20 30 min 30 60 min gt 60 min
Active Initial 0.101 0.168 0.336 0.395
Follow up 0.336 0.412 0.160 0.092
Placebo Initial 0.117 0.167 0.292 0.425
Follow up 0.258 0.242 0.292 0.208
Table2 Sample Marginal Proportions for Insomnia
Data.
31
Figure 1 Sample Marginal Proportions Insomnia
data.
32
Marginal Proportion

sample proportion of time to falling asleep in
lt20 minutes for subject who received Active
treatment at initial occasion is
(7410) / (741011138)
12/1190.1008
Similarly the sample proportion of time to
falling asleep in gt60 minutes for subject
received placebo at follow up is
(10222) / (7421..1422)
25/1200.20833
And so on.

33
What did you get from Marginal Proportion table?

From initial to follow up occasion, time to
falling asleep seems to shift downward for both
treatments.
The degree of shift seems greater for the active
treatment than placebo, indicating possible
interaction. Or we could say that effect of
treatment on the response is different at
different occasion.

34
Fitted Marginal Model

Let x represent the treatment, with x1 for an
Active treatment and x0 for
the placebo. Let t denote the occasion
measurement , with t0 for initial and
t1 for follow up. Let (Yt) represent the outcome
variable which is patients
response at time t to the question, How quickly
did you fall asleep after
going to bed? with j0 for lt20 minutes, j1 for
20-30 minutes, j2 for 30-60
minutes, and j3 for gt60 minutes). The marginal
model with cumulative link
can be written for our data set as

logit P(Y j)
35
SAS code
data isomnia input treatment initial
follow count _at__at_ If count0 then
count1E-8 datalines active lt20 lt20
7 active lt20 20-30 4 active
lt20 30-60 1 active lt20 gt60
0 active 20-30 lt20 11
active 20-30 20-30 5 active 20-30
30-60 2 active 20-30 gt60
2 active 30-60 lt20 13 active
30-60 20-30 23 active 30-60 30-60
3 active 30-60 gt60 1 active
gt60 lt20 9 active gt60
20-30 17 active gt60 30-60 13
active gt60 gt60 8 placbo lt20
lt20 7 placbo lt20 20-30
4 placbo lt20 30-60 2 placbo lt20
gt60 1 placbo 20-30 lt20
14 placbo 20-30 20-30 5 placbo
20-30 30-60 1 placbo 20-30 gt60
0 placbo 30-60 lt20 6
placbo 30-60 20-30 9 placbo 30-60
30-60 18 placbo 30-60 gt60 2
placbo gt60 lt20 4 placbo gt60
20-30 11 placbo gt60 30-60 14
placbo gt60 gt60 22
36
SAS code
proc catmod orderdata dataisomnia weight
count population Treatment response
clogit model initialfollow(1 0 0 1 1 1,
a 1 ß1 ß2 ß3 active follow, j1
0 1 0 1 1 1, a
2 ß1 ß2 ß3 active follow, j2
0 0 1 1 1 1, a 3
ß1 ß2 ß3 active follow, j3
1 0 0 1 0 0, a 1 ß1
active initial, j1
0 1 0 1 0 0, a 2 ß1 active initial ,
j2 0 0 1 1
0 0, a 3 ß1 active initial, j3
1 0 0 0 1 0, a 1
ß2 placebo follow, j1
0 1 0 0 1 0, a 2 ß2
placebo follow, j2
0 0 1 0 1 0, a 3 ß2 placebo
follow, j3 1 0 0 0 0 0, a 1
placebo initial, j1
0 1 0 0 0 0, a 2 placebo initial,
j2 0 0 1 0 0
0) a 3 placebo initial, j3 (1 2 3
'Cutpoint', 4'Treatment', 5'TIme effect',
6'TimeTreatment effect') /
freq quit
37
Fitted Marginal Model

After fitting the marginal model using maximum
likelihood
method to the above marginal distribution gave
the following
results
Logit P (Y J) -1.16 0.10 1.371.074
(Occasion)
0.046 (Treatment)
0.662 (Occasion
Treatment)

38
Hypothesis testing for estimators

For Occasion
ß1 1.074 S.E (ß1) 0.162 p-valuelt0.0001
For Treatment
ß2 0.046 S.E (ß2) 0.236 p-value 0.84
For interaction (Occasion time)
ß3 0.662 S.E (ß3) 0.244 p-value 0.00665

39
Model Goodness of fit test

The Likelihood ratio test (G2) has been used for
Goodness of fit
test. ML model fitting, comparing the observed to
fitted cell
counts in modeling the 12 marginal logits using
these six
parameters with df6 gives G2 8.0 and p-value
0.238,
indicating that the model fit the given data set
well

40
Interpretation of Parameters

Effect of Treatment (Active vs Placebo)
1. At initial observation
The estimated odds that the time to falling
asleep for the active treatment is below any
fixed equal Exp 0.0461.04 times the estimated
odds for the placebo treatment.
2. At Follow up observation
The estimated odds that the time to falling
asleep for the active treatment is below any
fixed equal Exp0.0460.662 2.03 times the
estimated odds for the placebo treatment.

41
Interpretation of Parameters (cont.)

For the Active treatment the slope is ß3 0.662
(SE0.244) higher than for the placebo, giving
strong evidence of faster improvement. In other
words, initially the two treatments had similar
effect, but at the follow up those patients with
the active treatment tended to fall asleep more
quickly.

42
Conclusion

Using the maximum likelihood methods for the
marginal distribution for the above given
Insomnia data set, we have sufficient evidence to
conclude that treatment and time have substantial
effects on the response (time to fall asleep).

43
Thank You For Your Attention

Write a Comment

User Comments (0)