# Lecture 9: Marginal Logistic Regression Model and GEE Chapter 8 - PowerPoint PPT Presentation

PPT – Lecture 9: Marginal Logistic Regression Model and GEE Chapter 8 PowerPoint presentation | free to download - id: 126e37-MTc2Z The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
Title:

## Lecture 9: Marginal Logistic Regression Model and GEE Chapter 8

Description:

### The prevalence of xerophthalmia also indicates some seasonality with a winter maximum ... of respiratory infection on xerophthalmia and age, adjusting for other ... – PowerPoint PPT presentation

Number of Views:480
Avg rating:3.0/5.0
Slides: 56
Provided by: mingw
Category:
Tags:
Transcript and Presenter's Notes

Title: Lecture 9: Marginal Logistic Regression Model and GEE Chapter 8

1
Lecture 9 Marginal Logistic Regression Model and
GEE (Chapter 8)
2
Marginal Logistic Regression Model and GEE
Marginal models are suitable to estimate
population average parameters
• For example, in the Indonesian study, a marginal
model can be used to address questions such as
• What is the prevalence of respiratory infection
in children as a function of age?
• Is the prevalence of respiratory infection
greater in the sub-population of children with
vitamin A deficiency?
• How does the association of vitamin A deficiency
and respiratory infection change with age?
• The scientific objective is to characterize and
contrast populations of children.

3
Marginal Models for Binary Responses Logistic
Regression
Model for the Mean
Model for the Association Marginal odds ratio,
4
Marginal Odds Ratio
a greater value indicates positive association
Two possible specifications
Degree of association is the same for all pairs
of observations from the same subject
Degree of association is inversely proportional
to the time between observations from the same
subject
5
Parameter Interpretation in Logistic Regression
ICHS Study
Marginal Logistic Regression
6
Parameter Interpretation in Logistic Regression
ICHS Study
Logistic Regression with Random Effects
7
Parameter Interpretation in Logistic Regression
ICHS Study
Transition Logistic Regression Model
8
Parameter Interpretation in Logistic Regression
ICHS Study
Transition Logistic Regression Model (contd)
9
Maximum Likelihood Estimation of ß in GLM
Cross-sectional Data
• If Yi is binary or a count, we specify the
likelihood function and estimate the parameters
of interest using Maximum Likelihood Estimation

10
Maximum Likelihood Estimation of ß in GLM
Cross-sectional Data
• For example, if Y is binary, i.e.

we estimate ß0 and ß1 by maximizing
11
Maximum Likelihood Estimation of ß in GLM
Cross-sectional Data
• For example, if Y is a count, i.e.

we estimate ß0 and ß1 by maximizing
12
Maximum Likelihood Estimation of ß in GLM
Cross-sectional Data
In general, we have
Solving the score equation is equivalent to
maximizing the likelihood function.
is called the score equation. Solutions to the
score equation are not available in closed form,
and so require an iterative procedure called
iterative weighted least squares (IWLS) algorithm
13
Maximum Likelihood Estimation of ß in GLM
Cross-sectional Data
Main ideas of IWLS
• ?i EYi, vi var(Yi) v(?i)
• Choose to make close to yi on average
• Weight yi by vi-1

14
GEE Estimation of ß in GLM Longitudinal Data
In the case of a linear regression model with the
assumption of normality, the extension from
ordinary linear regression to longitudinal
problems was facilitated by thinking about a
multivariate normal distribution. By specifying
a model for the mean EYi and the model for the
covariance matrix Vi, we can fully specify the
multivariate normal distribution and use MLE.
15
GEE Estimation of ß in GLM Longitudinal Data
Unfortunately, if the elements of Yi are counts
or binary response, we cannot naturally extend
the Bernoulli or Poisson distributions to take
into account of correlation. Multivariate
extensions of these distributions are quite
complex (except for biostat students!). The main
impediments with binary and count data are
• There are not multivariate generalizations of the
necessary probability distributions
• Population-average and subject-specific
approaches do not lead to the same model for the
mean response

16
GEE (applies only to marginal models)
• Under a GEE approach, we forget about trying to
specify a model for the whole multivariate
distribution of a data vector. Instead, the idea
is to just model the mean response EYi and the
covariance matrix Vi of a data vector as in the
normal case.
• GEE is based on the concept of estimating
equations and provides a very general approach
for analyzing correlated responses that can be
discrete or continuous.

17
GEE
• The idea behind GEE is to generalize and extend
the usual likelihood equations for a GLM with a
univariate response by incorporating the
covariance matrix of the vector of responses Y
• For the case of linear models, the Generalized
Least Square (GLS) estimator for the vector of
regression coefficients is a special case of the
GEE approach

18
GEE
• In the absence of a convenient likelihood to work
with, it is sensible to estimate ß by solving the
following multivariate equation
• where

Note with continuous data, the estimate from
this score equation reduces to the MLE
19
GEE (contd)
• The method of generalized estimating equations
provides consistent estimates for the mean
parameter when a model for the correlation may
not be reliably specified.
• is a multivariate generalization
of the score equation used to
maximize the likelihood function under a GLM

20
GLM for Longitudinal Data (GEE)
In summary
• For GEE models, we specify a GLM for the mean
response
• independence, completely
unstructured
• The estimates of ß and their standard errors will
be consistent (i.e. unbiased for large sample
size).
• If the specification of Vi is correct, then the
GEE solution is the maximum likelihood estimate.

21
GEE
One important property of the GLM family is that
the score function depends only on
the mean and variance of Yi. Therefore the
estimating equation
can be used to estimate the regression
coefficients for any choices of link and variance
functions, whether or not they correspond to a
particular member of the exponential family.
is the generalized
estimating equation.
22
GEE Properties
• is nearly efficient relative to the maximum
likelihood estimate of , provided that
var(Yi) has been reasonably approximated.
• GEE is the maximum likelihood score equation for
multivariate Gaussian data, and for binary data,
when var(Yi) is correctly specified.
• is consistent for , even when
var(Yi) is incorrectly specified.

23
What we need to specify for implementing GEE
Model for the mean
Known variance function
Working correlation matrix model for the
pairwise correlations among the responses
24
Working covariance matrix
V is called the working covariance matrix to
distinguish it from the true underlying
covariance of Y
25
GEE
minimize
GEE equations
Solution of the GEE equation
26
Properties of GEE Estimates
• The GEE estimator is consistent whether or not
the within-subject associations/correlations have
been correctly modelled.
• That is, for the GEE estimator to provide a valid
estimate of the true ß, we only require that the
model for the mean response has been correctly
specified.

27
Asymptotic distribution of the GEE estimator
• In large samples, the GEE estimator is
multivariate normal

True covariance matrix
28
Sandwich estimate of
meat
Consistent estimate of the true covariance
matrix of Y
29
Link to stata command xtgee for continuous data
• substitute into GEE equations, to get
• Use Weighted Least Square for

30
Link to stata command xtgee for continuous data
(contd)
• xtgee, identity link, corr(exch), robust
• Use Sandwich Estimator for

31
Link to stata commands xtgee for binary data
• Substitute into GEE equation. But no closed-form
solution, so need iterative procedure.
• Difference between using robust or not, is
analogous to continuous data.
• xtgee, logit link, corr(exch), robust

32
Using GEE
33
Bottom Line
If the scientific focus is on the regression
coefficients ß
• Focus on modeling the mean structure
• Use a reasonable approximation of the covariance
structure
• Check the inferences for ß by comparing ßs
robust standard errors with respect to different
covariance assumptions
• If the ßs standard errors differ substantially,
a more careful treatment of the covariance model
might be necessary.

34
Maximum Likelihood Estimation for Binary Data
35
Example 2x2 crossover trial
Data from the 2x2 crossover trial on
cerebrovascular deficiency adapted from Jones and
Kenward, where treatment A and B are active drug
and placebo, respectively the outcome indicates
whether an electrocardiogram was judged abnormal
(0) or normal (1).
36
Example 2x2 crossover trial (contd)
Goal To compare the effect of an active drug
(A) and a placebo (B) on cerebrovascular
deficiency
• 34 patients received A followed by B
• 33 patients received B followed by A
• Yij 1 if normal electrocardiogram reading

At period 1
37
Example 2x2 crossover trial (contd)
Calculate MLE of odds ratios separately for
period 1 and period 2. Odds ratio of being
normal for the active drug versus the placebo is
This estimate is larger than 1, and therefore
indicates that the active drug produces a higher
proportion of normal readings. However, the
estimate is not statistically significant.
Should we compare (???) the data for Periods 1
and 2?
38
Example 2x2 crossover trial (contd)
This approach has several limitations
• Ignore the carry-over effect, i.e. the effect of
the treatment at period 1 might influence the
response at period 2 (treatment x period
interaction)
• Two responses for the same subject are likely to
be correlated
• In fact, the odds ratio
• is estimated to be

So, lets use GEE to estimate a population
average odds ratio, taking into account
within-subject correlation
39
Example 2x2 crossover trial (contd)
GEE Approach
• We combine data from both periods.
• We can analyze a 2x2 crossover trial as a
longitudinal study with ni n 2 and m 67.

40
Example 2x2 crossover trial (contd)
GEE Approach (contd)
• Fit a logistic regression model

41
exp(0.57)-1 0.77 ? Population average odds of
a normal reading are estimated to be 77 higher
when using the drugs as compared to the placebo
exp(3.56) 35 ? Subjects with normal responses
at the first visit have odds of normal reading at
the next visit that are almost 35 times higher
than those whose first response was abnormal
42
In summary
• Model 1 includes the treatment x period
interaction (little support from the data), and
estimates marginal odds ratio by GEE
• Model 2 drops the period x treatment interaction
and estimates the marginal odds ratio by GEE
• Model 3 assumes that the marginal odds ratio is
1, here 0.56, with standard error 0.38
(much larger than under Models 1 and 2)
• Note If we fit Model 3, but using robust
standard errors, then we obtain similar results
to the GEE approach.

43
• The prevalance of respiratory infections in six
consecutive quarters reveals a positive seasonal
trend with a summer maximum
• The prevalence of xerophthalmia also indicates
some seasonality with a winter maximum

44
Example Respiratory Infections
• 275 children in Indonesia were examined for up to
six consecutive quarters for the presence of
respiratory infections (i1,,m275 j1,,6
visits).
• Goals of the analysis
• Determine whether prevalence of respiratory
infection is higher among children who suffer
from xerophthlamia (an ocular manifestation of
chronic vitamin A deficiency)?
• Estimate the change of respiratory infection with
age.
• Consider seasonality as a potential confounder.

45
Cross-Sectional Analysis
• Model 1 First visit only
• Look only at the data from the first visit
• Fit a logistic regression model of respiratory
infection on xerophthalmia and age, adjusting for
other covariates
• We find a strong non-linear cross-sectional age
effect on the prevalence of respiratory infection
• Cross-sectional analysis suggests that the
prevalence of respiratory infection increases
from age 12 months and reaches its peak at age 20
months before starting to decline

46
Cross-Sectional Analysis
• Model 2 All visits controlling for
seasonality
• Look at data from all visits
• Fit a logistic regression model of respiratory
infection on xerophthalmia and age, adjust for
other covariates
• We still find a strong non-linear cross-sectional
age effect on prevalence of respiratory infection
• The age coefficient in Model 2 can be interpreted
as weighted averages of the cross-sectional age
coefficients for each visit.

47
Longitudinal Analysis
Here we want to distinguish the contributions of
cross-sectional and longitudinal information to
the estimated relationship of respiratory
infection and age.
48
Longitudinal Analysis
• Model 3 Separate CS from LDA
• Separate differences among sub-populations of
children at different ages and a fixed time (CS)
from changes in children over time (LD)

49
Longitudinal Analysis
Model 3 Separate CS from LDA
50
Longitudinal Analysis
Model 4 Separates CS from LDA controlling for
seasonality
51
Summary of Results
• Pattern of convex relationship between age and
the risk of respiratory infection appears to
coincide with the pattern of seasonality.
• If we include harmonic terms, then the
longitudinal parameters (corresponding to the
follow-up in the table) are not statistically
significant.
• The longitudinal information (i.e. variation over
time of respiratory infections versus variations
over time of age) is highly confounded by
seasonality.
• Therefore, in the presence of a strong seasonal
signal, we can learn little about the effects of
aging from data collected over an 18-month period
if we restrict our attention to longitudinal
information.
• However, much can be learned by comparing
children at different ages so long as we can
assume that there are not cohort effects
• BE CAREFUL in longitudinal analysis, always
look for time-varying confounders!

52
Table 8.7. Logistic regressions of the
prevalence of respiratory function on age and
xerophthalmia adjusting for gender, season, and
height for age. Model 1 and 2 estimate
cross-sectional effects Models 3 and 4
distinguish cross-sectional from longitudinal
effects. Models 2-4 are fitted using the
alternating logistic regression implementation of
GEE.
ßc
ßL
CS, 1st visit only
CS, All visits
LD
LD
53
The association between RI and Xero. is positive,
although not statistically significant at the 5
level.
Model 2
54
Prevalence of respiratory infection increases
from age 12 months and reaches its peak at 20
months before starting to decline.
The age coefficients in Model 2 can be
interpreted as a weighted average of the
cross-sectional age coefficients from each visit.
55
The risk of RI declines in the first 7 to 8
months of follow-up, before rising later in life.
(ageij ageik)