Analysis of ordinal repeated categorical response data by using marginal model (Maximum likelihood approach) - PowerPoint PPT Presentation

About This Presentation
Title:

Analysis of ordinal repeated categorical response data by using marginal model (Maximum likelihood approach)

Description:

Analysis of ordinal repeated categorical response data by using marginal model ... Since the response is ordinal, so it is often advantageous to construct logits ... – PowerPoint PPT presentation

Number of Views:220
Avg rating:3.0/5.0
Slides: 44
Provided by: aa289
Category:

less

Transcript and Presenter's Notes

Title: Analysis of ordinal repeated categorical response data by using marginal model (Maximum likelihood approach)


1
Analysis of ordinal repeated categorical
response data by using marginal model (Maximum
likelihood approach) by Abdul Salam Instructor
K.C. Carriere Stat 562
2
Contents
  • Introduction
  • Background of data
  • Objective of the study
  • Basic theory
  • Marginal model
  • Model fitting using ML
  • SAS Codes
  • Results
  • Conclusion

3
Introduction
  • Definition
  • Categorical data
  • Repeated categorical data
  • Advantages and Disadvantages of repeated
    Measurements Designs

4
Definition
  • Categorical data
  • Categorical data fits into a small number of
    discrete categories (as opposed to continuous).
    Categorical data is either non-ordered (nominal)
    such as gender or city, or ordered (ordinal) such
    as high, medium, or low temperatures.

5
Definition (cont-)
  • Repeated categorical data
  • The term repeated measurements refers broadly
    to data in which the response of each
    experimental unit or subject is observed on
    multiple occasions or under multiple conditions.
    When the response is categorical then it is
    called repeated categorical data.

6
Definition (cont-)
  • Application of Repeated categorical data
  • Repeated categorical response data occur commonly
    in health-related application, especially in
    longitudinal studies. For example, a physician
    might evaluate patients at weekly intervals
    regarding whether a new drug treatment is
    successful. In some cases explanatory variable
    also vary over time.

7
Advantages of Repeated Measurements Designs
  • Individual patterns of change.
  • Provide more efficient estimates of relevant
    parameters than cross-sectional designs with the
    same number and pattern of measurement.
  • Between subjects sources of variability can be
    excluded from the experimental error.

8
Disadvantages of Repeated Measurements Designs
  • Analysis of repeated data is complicated by the
    dependence among the repeated observations made
    on the same experimental unit.
  • Often investigator cannot control the
    circumstances for obtaining measurements, so that
    the data may be unbalanced or partially
    incomplete.

9
Background of Insomnia data
  • A randomized, double blind clinical trail has
    been performed for comparing an active hypnotic
    drug with a placebo in patients who have insomnia
    problems. The outcome variable which is patients
    response to the question, How quickly did you
    fall asleep after going to bed? measured using
    categories (lt20 minutes, 20-30 minutes, 30-60
    minutes, and gt60 minutes). Patients were asked
    this question before and following a two-week
    treatment period.

10
Background of Insomnia data
  • Patients were randomly assigned to one of the two
    treatments active and placebo. The two
    treatments, active and placebo, form a binary
    explanatory variable. Patients receiving the two
    treatments were independent samples.

11
Table1 Time to falling Asleep, by Treatment and
Occasion.(n239).
Time to Falling Asleep Time to Falling Asleep Time to Falling Asleep Time to Falling Asleep Time to Falling Asleep
Follow Up Follow Up Follow Up Follow Up
Treatment Initial lt20 min 20 30 min 30 60 min gt 60 min
Active lt20 7 4 1 0
20 30 11 5 2 2
30 60 13 23 3 1
60 9 17 13 8
Placebo lt20 7 4 2 1
20 30 14 5 1 0
30 60 6 9 18 2
gt 60 4 11 14 22
12
Objectives
  • To study the effect of time on the response.
  • To study the effect of treatment on the response.
    Is the time to fall asleep is quicker for active
    treatment than placebo?
  • Is there any interaction between treatment and
    time? How does the treatment affect the time to
    fall asleep over time?

13
Pharmaceutical Company Interest
Company hope that patients with a Active
treatment have a significantly higher rate of
improvement than patients with placebo.
14
Generalized linear model to the analysis of
Repeated Measurements Designs
  • Marginal Models
  • Random Effect Models
  • Transition models.

15
Basic Theory
16
GLMs for ordinal response.
  • Extensions of generalized linear model
    methodology for the analysis of repeated
    measurements accommodate discrete or continuous,
    time-independent or dependent covariates. GLMs
    have three components A random component, which
    identify the response variable Y and its
    probability distribution a systematic component
    specify explanatory variables used in a linear
    predictor function a link function specifies the
    functional relationship between the systematic
    component and the E(Y)..

17
Random Component.
  • Since the response is ordinal, so it is often
    advantageous to construct logits that account for
    categorical ordering and are less affected by the
    number of choice of categories of the response,
    which is known as cumulative response
    probabilities, from which the cumulative logits
    are defined. For ordinal response with c 1
    ordered categories labeled as 0,1, 2,.,C for
    each individuals or experimental unit. The
    cumulative response probabilities are


j 0,1,.c Thus

18
Systematic component.
  • The systematic component of the generalized
    linear model specifies the explanatory variables.
    The linear combination of these explanatory
    variables is called the linear predictor denoted
    by

The vector ß characterizes how the
cross-sectional response distribution depends on
the explanatory variables.
19
Link Function.
  • The link function explain the relation ship
    between random and systematic component, that how

relates to the explanatory variables in the
linear predictor. For ordinal response having c1
categories, one might use the cumulative logit.
Logitj logit P(Y j),
j1,..c
20
Link Function.
where
GLM is simplified to proportional odds model,
then ßj may simplify to ß indicating the same
effect for each logit. The proportional odds
model is
for j 1,.c,
21
Link Function.
For individuals with covariate vector x and x,
the odds ratio for the response below category j
is
The odds ratio does not depend on response
category j. The regression coefficient can be
calculated by taking log, which indicate the
difference in logit (log odds) of response
variable per unit change in the x.
22
Maximum Likelihood Method (ML).
  • The standard approach to maximum likelihood (ML)
    fitting of marginal models involves solving the
    score equations using the Newton-Raphson method,
    Fisher scoring, or some other iterative
    reweighted least squares algorithm. ML fitting of
    marginal logit models is awkward. For T
    observations on an I-category response, at each
    setting of predictors the likelihood refers to IT
    multinomial joint probabilities, but the model
    applies to T sets of marginal multinomial
    parameters, and assume that marginal multinomial
    variates are independent.

23
ML Model Speciofication.
  • Let consider T categorical responses, where the
    tth variable has It categories. The responses are
    ordinal observed for P covariate patterns,
    defined by a set of explanatory variables. Let r

denote the number of response profiles for each
covariate pattern. The vector of counts for
covariate pattern p is denoted by Yp. The Yp are
assumed to be independent multinomial random
vectors,
24
ML Model Speciofication.
  • Where is a vector of positive probabilities
    and 1rT is a r-dimensional vector of 1s. Since
    the model applies to T sets of marginal
    multinomial parameters, the marginal models can
    be written as a generalized linear model with the
    link function,

25
ML Fitting of marginal Models
Lang and Agresti (1994) considered the likelihood
as a function of rather then. The likelihood
function for a marginal logit model is the
product of the multinomial mass functions from
the various predictors setting. One approach for
ML fitting views the model as a set of
constraints and uses methods for maximizing a
function subject to constraints
26
ML Fitting of marginal Models
Let be a vector having elements and the
lagrange multipliers . The Lagrangian
likelihood equations have form
where
is a vector with terms involving the contents in
marginal logits that the model specifies
constraints as well as log-likelihood derivative.
The Newton-Raphson iterative scheme is
27
ML Fitting of marginal Models
After obtaining the fitted values on convergence
of the algorithm, they calculate model parameter
estimates using
This maximum likelihood fitting method makes no
assumption about the model that describes the
joint distribution. Thus, when the marginal
model holds, the ML estimate are consistent
regardless of the dependence structure for that
distribution.
28
Inference
  • Hypothesis testing for parameters
  • After obtaining model parameter estimates and
    estimated covariance matrix, one can apply
    standard methods of inference, for instance Wald
    chi-squared test for marginal homogeneity.
  • Goodness of Fit test
  • To assess model goodness of fit, one can compare
    observed and fitted cell counts using the
    likelihood-ratio statistics G2 or the Pearson
    Chi-square statistics. For nonsparse tables,
    assuming that the model holds, these statistics
    have approximate chi-squared distributions with
    degree of freedom equal to the number of
    constraints implied by

29
Limitations of ML
  • The number of multinomial probabilities increases
    dramatically as the number of predictors
    increases.
  • ML approaches are not practical when T is large
    or there are many predictors, especially when
    some are continuous.
  • It does not make any assumption about the model
    that describes the joint distribution .

30
Results
Time to Falling Asleep Time to Falling Asleep Time to Falling Asleep Time to Falling Asleep Time to Falling Asleep
Treatment Occasion lt20 min 20 30 min 30 60 min gt 60 min
Active Initial 0.101 0.168 0.336 0.395
Follow up 0.336 0.412 0.160 0.092
Placebo Initial 0.117 0.167 0.292 0.425
Follow up 0.258 0.242 0.292 0.208
Table2 Sample Marginal Proportions for Insomnia
Data.
31
Figure 1 Sample Marginal Proportions Insomnia
data.
32
Marginal Proportion
  • sample proportion of time to falling asleep in
    lt20 minutes for subject who received Active
    treatment at initial occasion is
  • (7410) / (741011138)
    12/1190.1008
  • Similarly the sample proportion of time to
    falling asleep in gt60 minutes for subject
    received placebo at follow up is
  • (10222) / (7421..1422)
    25/1200.20833
  • And so on.

33
What did you get from Marginal Proportion table?
  • From initial to follow up occasion, time to
    falling asleep seems to shift downward for both
    treatments.
  • The degree of shift seems greater for the active
    treatment than placebo, indicating possible
    interaction. Or we could say that effect of
    treatment on the response is different at
    different occasion.

34
Fitted Marginal Model
  • Let x represent the treatment, with x1 for an
    Active treatment and x0 for
  • the placebo. Let t denote the occasion
    measurement , with t0 for initial and
  • t1 for follow up. Let (Yt) represent the outcome
    variable which is patients
  • response at time t to the question, How quickly
    did you fall asleep after
  • going to bed? with j0 for lt20 minutes, j1 for
    20-30 minutes, j2 for 30-60
  • minutes, and j3 for gt60 minutes). The marginal
    model with cumulative link
  • can be written for our data set as

logit P(Y j)
35
SAS code
data isomnia input treatment initial
follow count _at__at_ If count0 then
count1E-8 datalines active lt20 lt20
7 active lt20 20-30 4 active
lt20 30-60 1 active lt20 gt60
0 active 20-30 lt20 11
active 20-30 20-30 5 active 20-30
30-60 2 active 20-30 gt60
2 active 30-60 lt20 13 active
30-60 20-30 23 active 30-60 30-60
3 active 30-60 gt60 1 active
gt60 lt20 9 active gt60
20-30 17 active gt60 30-60 13
active gt60 gt60 8 placbo lt20
lt20 7 placbo lt20 20-30
4 placbo lt20 30-60 2 placbo lt20
gt60 1 placbo 20-30 lt20
14 placbo 20-30 20-30 5 placbo
20-30 30-60 1 placbo 20-30 gt60
0 placbo 30-60 lt20 6
placbo 30-60 20-30 9 placbo 30-60
30-60 18 placbo 30-60 gt60 2
placbo gt60 lt20 4 placbo gt60
20-30 11 placbo gt60 30-60 14
placbo gt60 gt60 22
36
SAS code
proc catmod orderdata dataisomnia weight
count population Treatment response
clogit model initialfollow(1 0 0 1 1 1,
a 1 ß1 ß2 ß3 active follow, j1
0 1 0 1 1 1, a
2 ß1 ß2 ß3 active follow, j2
0 0 1 1 1 1, a 3
ß1 ß2 ß3 active follow, j3
1 0 0 1 0 0, a 1 ß1
active initial, j1
0 1 0 1 0 0, a 2 ß1 active initial ,
j2 0 0 1 1
0 0, a 3 ß1 active initial, j3
1 0 0 0 1 0, a 1
ß2 placebo follow, j1
0 1 0 0 1 0, a 2 ß2
placebo follow, j2
0 0 1 0 1 0, a 3 ß2 placebo
follow, j3 1 0 0 0 0 0, a 1
placebo initial, j1
0 1 0 0 0 0, a 2 placebo initial,
j2 0 0 1 0 0
0) a 3 placebo initial, j3 (1 2 3
'Cutpoint', 4'Treatment', 5'TIme effect',
6'TimeTreatment effect') /
freq quit
37
Fitted Marginal Model
  • After fitting the marginal model using maximum
    likelihood
  • method to the above marginal distribution gave
    the following
  • results
  • Logit P (Y J) -1.16 0.10 1.371.074
    (Occasion)
  • 0.046 (Treatment)
  • 0.662 (Occasion
    Treatment)

38
Hypothesis testing for estimators
  • For Occasion
  • ß1 1.074 S.E (ß1) 0.162 p-valuelt0.0001
  • For Treatment
  • ß2 0.046 S.E (ß2) 0.236 p-value 0.84
  • For interaction (Occasion time)
  • ß3 0.662 S.E (ß3) 0.244 p-value 0.00665

39
Model Goodness of fit test
  • The Likelihood ratio test (G2) has been used for
    Goodness of fit
  • test. ML model fitting, comparing the observed to
    fitted cell
  • counts in modeling the 12 marginal logits using
    these six
  • parameters with df6 gives G2 8.0 and p-value
    0.238,
  • indicating that the model fit the given data set
    well

40
Interpretation of Parameters
  • Effect of Treatment (Active vs Placebo)
  • 1. At initial observation
  • The estimated odds that the time to falling
    asleep for the active treatment is below any
    fixed equal Exp 0.0461.04 times the estimated
    odds for the placebo treatment.
  • 2. At Follow up observation
  • The estimated odds that the time to falling
    asleep for the active treatment is below any
    fixed equal Exp0.0460.662 2.03 times the
    estimated odds for the placebo treatment.

41
Interpretation of Parameters (cont.)
  • For the Active treatment the slope is ß3 0.662
    (SE0.244) higher than for the placebo, giving
    strong evidence of faster improvement. In other
    words, initially the two treatments had similar
    effect, but at the follow up those patients with
    the active treatment tended to fall asleep more
    quickly.

42
Conclusion
  • Using the maximum likelihood methods for the
    marginal distribution for the above given
    Insomnia data set, we have sufficient evidence to
    conclude that treatment and time have substantial
    effects on the response (time to fall asleep).

43
Thank You For Your Attention
Write a Comment
User Comments (0)
About PowerShow.com