Loading...

PPT – Lecture 9: Marginal Logistic Regression Model and GEE Chapter 8 PowerPoint presentation | free to download - id: 126e37-MTc2Z

The Adobe Flash plugin is needed to view this content

Lecture 9 Marginal Logistic Regression Model and

GEE (Chapter 8)

Marginal Logistic Regression Model and GEE

Marginal models are suitable to estimate

population average parameters

- For example, in the Indonesian study, a marginal

model can be used to address questions such as - What is the prevalence of respiratory infection

in children as a function of age? - Is the prevalence of respiratory infection

greater in the sub-population of children with

vitamin A deficiency? - How does the association of vitamin A deficiency

and respiratory infection change with age? - The scientific objective is to characterize and

contrast populations of children.

Marginal Models for Binary Responses Logistic

Regression

Model for the Mean

Model for the Association Marginal odds ratio,

Marginal Odds Ratio

a greater value indicates positive association

Two possible specifications

Degree of association is the same for all pairs

of observations from the same subject

Degree of association is inversely proportional

to the time between observations from the same

subject

Parameter Interpretation in Logistic Regression

ICHS Study

Marginal Logistic Regression

Parameter Interpretation in Logistic Regression

ICHS Study

Logistic Regression with Random Effects

Parameter Interpretation in Logistic Regression

ICHS Study

Transition Logistic Regression Model

Parameter Interpretation in Logistic Regression

ICHS Study

Transition Logistic Regression Model (contd)

Maximum Likelihood Estimation of ß in GLM

Cross-sectional Data

- If Yi is binary or a count, we specify the

likelihood function and estimate the parameters

of interest using Maximum Likelihood Estimation

Maximum Likelihood Estimation of ß in GLM

Cross-sectional Data

- For example, if Y is binary, i.e.

we estimate ß0 and ß1 by maximizing

Maximum Likelihood Estimation of ß in GLM

Cross-sectional Data

- For example, if Y is a count, i.e.

we estimate ß0 and ß1 by maximizing

Maximum Likelihood Estimation of ß in GLM

Cross-sectional Data

In general, we have

Solving the score equation is equivalent to

maximizing the likelihood function.

is called the score equation. Solutions to the

score equation are not available in closed form,

and so require an iterative procedure called

iterative weighted least squares (IWLS) algorithm

Maximum Likelihood Estimation of ß in GLM

Cross-sectional Data

Main ideas of IWLS

- ?i EYi, vi var(Yi) v(?i)
- Choose to make close to yi on average
- Weight yi by vi-1

GEE Estimation of ß in GLM Longitudinal Data

In the case of a linear regression model with the

assumption of normality, the extension from

ordinary linear regression to longitudinal

problems was facilitated by thinking about a

multivariate normal distribution. By specifying

a model for the mean EYi and the model for the

covariance matrix Vi, we can fully specify the

multivariate normal distribution and use MLE.

GEE Estimation of ß in GLM Longitudinal Data

Unfortunately, if the elements of Yi are counts

or binary response, we cannot naturally extend

the Bernoulli or Poisson distributions to take

into account of correlation. Multivariate

extensions of these distributions are quite

complex (except for biostat students!). The main

impediments with binary and count data are

- There are not multivariate generalizations of the

necessary probability distributions - Population-average and subject-specific

approaches do not lead to the same model for the

mean response

GEE (applies only to marginal models)

- Under a GEE approach, we forget about trying to

specify a model for the whole multivariate

distribution of a data vector. Instead, the idea

is to just model the mean response EYi and the

covariance matrix Vi of a data vector as in the

normal case. - GEE is based on the concept of estimating

equations and provides a very general approach

for analyzing correlated responses that can be

discrete or continuous.

GEE

- The idea behind GEE is to generalize and extend

the usual likelihood equations for a GLM with a

univariate response by incorporating the

covariance matrix of the vector of responses Y - For the case of linear models, the Generalized

Least Square (GLS) estimator for the vector of

regression coefficients is a special case of the

GEE approach

GEE

- In the absence of a convenient likelihood to work

with, it is sensible to estimate ß by solving the

following multivariate equation - where

Note with continuous data, the estimate from

this score equation reduces to the MLE

GEE (contd)

- The method of generalized estimating equations

provides consistent estimates for the mean

parameter when a model for the correlation may

not be reliably specified. - is a multivariate generalization

of the score equation used to

maximize the likelihood function under a GLM

GLM for Longitudinal Data (GEE)

In summary

- For GEE models, we specify a GLM for the mean

response - independence, completely

unstructured - The estimates of ß and their standard errors will

be consistent (i.e. unbiased for large sample

size). - If the specification of Vi is correct, then the

GEE solution is the maximum likelihood estimate.

GEE

One important property of the GLM family is that

the score function depends only on

the mean and variance of Yi. Therefore the

estimating equation

can be used to estimate the regression

coefficients for any choices of link and variance

functions, whether or not they correspond to a

particular member of the exponential family.

is the generalized

estimating equation.

GEE Properties

- is nearly efficient relative to the maximum

likelihood estimate of , provided that

var(Yi) has been reasonably approximated. - GEE is the maximum likelihood score equation for

multivariate Gaussian data, and for binary data,

when var(Yi) is correctly specified. - is consistent for , even when

var(Yi) is incorrectly specified.

What we need to specify for implementing GEE

Model for the mean

Known variance function

Working correlation matrix model for the

pairwise correlations among the responses

Working covariance matrix

V is called the working covariance matrix to

distinguish it from the true underlying

covariance of Y

GEE

minimize

GEE equations

Solution of the GEE equation

Properties of GEE Estimates

- The GEE estimator is consistent whether or not

the within-subject associations/correlations have

been correctly modelled. - That is, for the GEE estimator to provide a valid

estimate of the true ß, we only require that the

model for the mean response has been correctly

specified.

Asymptotic distribution of the GEE estimator

- In large samples, the GEE estimator is

multivariate normal

True covariance matrix

Sandwich estimate of

bread

meat

Consistent estimate of the true covariance

matrix of Y

Link to stata command xtgee for continuous data

- substitute into GEE equations, to get
- xtgee, identity link, corr(exch)
- Use Weighted Least Square for

Link to stata command xtgee for continuous data

(contd)

- xtgee, identity link, corr(exch), robust
- Use Sandwich Estimator for

Link to stata commands xtgee for binary data

- Substitute into GEE equation. But no closed-form

solution, so need iterative procedure. - Difference between using robust or not, is

analogous to continuous data. - xtgee,logit link, corr(exch)
- xtgee, logit link, corr(exch), robust

Using GEE

Bottom Line

If the scientific focus is on the regression

coefficients ß

- Focus on modeling the mean structure
- Use a reasonable approximation of the covariance

structure - Check the inferences for ß by comparing ßs

robust standard errors with respect to different

covariance assumptions - If the ßs standard errors differ substantially,

a more careful treatment of the covariance model

might be necessary.

Maximum Likelihood Estimation for Binary Data

Example 2x2 crossover trial

Data from the 2x2 crossover trial on

cerebrovascular deficiency adapted from Jones and

Kenward, where treatment A and B are active drug

and placebo, respectively the outcome indicates

whether an electrocardiogram was judged abnormal

(0) or normal (1).

Example 2x2 crossover trial (contd)

Goal To compare the effect of an active drug

(A) and a placebo (B) on cerebrovascular

deficiency

- 34 patients received A followed by B
- 33 patients received B followed by A
- Yij 1 if normal electrocardiogram reading

At period 1

Example 2x2 crossover trial (contd)

Calculate MLE of odds ratios separately for

period 1 and period 2. Odds ratio of being

normal for the active drug versus the placebo is

This estimate is larger than 1, and therefore

indicates that the active drug produces a higher

proportion of normal readings. However, the

estimate is not statistically significant.

Should we compare (???) the data for Periods 1

and 2?

Example 2x2 crossover trial (contd)

This approach has several limitations

- Ignore the carry-over effect, i.e. the effect of

the treatment at period 1 might influence the

response at period 2 (treatment x period

interaction) - Two responses for the same subject are likely to

be correlated - In fact, the odds ratio
- is estimated to be

So, lets use GEE to estimate a population

average odds ratio, taking into account

within-subject correlation

Example 2x2 crossover trial (contd)

GEE Approach

- We combine data from both periods.
- We can analyze a 2x2 crossover trial as a

longitudinal study with ni n 2 and m 67.

Example 2x2 crossover trial (contd)

GEE Approach (contd)

- Fit a logistic regression model

exp(0.57)-1 0.77 ? Population average odds of

a normal reading are estimated to be 77 higher

when using the drugs as compared to the placebo

exp(3.56) 35 ? Subjects with normal responses

at the first visit have odds of normal reading at

the next visit that are almost 35 times higher

than those whose first response was abnormal

In summary

- Model 1 includes the treatment x period

interaction (little support from the data), and

estimates marginal odds ratio by GEE - Model 2 drops the period x treatment interaction

and estimates the marginal odds ratio by GEE - Model 3 assumes that the marginal odds ratio is

1, here 0.56, with standard error 0.38

(much larger than under Models 1 and 2) - Note If we fit Model 3, but using robust

standard errors, then we obtain similar results

to the GEE approach.

- The prevalance of respiratory infections in six

consecutive quarters reveals a positive seasonal

trend with a summer maximum - The prevalence of xerophthalmia also indicates

some seasonality with a winter maximum

Example Respiratory Infections

- 275 children in Indonesia were examined for up to

six consecutive quarters for the presence of

respiratory infections (i1, ,m275 j1, ,6

visits). - Goals of the analysis
- Determine whether prevalence of respiratory

infection is higher among children who suffer

from xerophthlamia (an ocular manifestation of

chronic vitamin A deficiency)? - Estimate the change of respiratory infection with

age. - Consider seasonality as a potential confounder.

Cross-Sectional Analysis

- Model 1 First visit only
- Look only at the data from the first visit
- Fit a logistic regression model of respiratory

infection on xerophthalmia and age, adjusting for

other covariates - We find a strong non-linear cross-sectional age

effect on the prevalence of respiratory infection - Cross-sectional analysis suggests that the

prevalence of respiratory infection increases

from age 12 months and reaches its peak at age 20

months before starting to decline

Cross-Sectional Analysis

- Model 2 All visits controlling for

seasonality - Look at data from all visits
- Fit a logistic regression model of respiratory

infection on xerophthalmia and age, adjust for

other covariates - We still find a strong non-linear cross-sectional

age effect on prevalence of respiratory infection - The age coefficient in Model 2 can be interpreted

as weighted averages of the cross-sectional age

coefficients for each visit.

Longitudinal Analysis

Here we want to distinguish the contributions of

cross-sectional and longitudinal information to

the estimated relationship of respiratory

infection and age.

Longitudinal Analysis

- Model 3 Separate CS from LDA
- Separate differences among sub-populations of

children at different ages and a fixed time (CS)

from changes in children over time (LD)

Longitudinal Analysis

Model 3 Separate CS from LDA

Longitudinal Analysis

Model 4 Separates CS from LDA controlling for

seasonality

Summary of Results

- Pattern of convex relationship between age and

the risk of respiratory infection appears to

coincide with the pattern of seasonality. - If we include harmonic terms, then the

longitudinal parameters (corresponding to the

follow-up in the table) are not statistically

significant. - The longitudinal information (i.e. variation over

time of respiratory infections versus variations

over time of age) is highly confounded by

seasonality. - Therefore, in the presence of a strong seasonal

signal, we can learn little about the effects of

aging from data collected over an 18-month period

if we restrict our attention to longitudinal

information. - However, much can be learned by comparing

children at different ages so long as we can

assume that there are not cohort effects

confounding the inferences about age. - BE CAREFUL in longitudinal analysis, always

look for time-varying confounders!

Table 8.7. Logistic regressions of the

prevalence of respiratory function on age and

xerophthalmia adjusting for gender, season, and

height for age. Model 1 and 2 estimate

cross-sectional effects Models 3 and 4

distinguish cross-sectional from longitudinal

effects. Models 2-4 are fitted using the

alternating logistic regression implementation of

GEE.

ßc

ßL

CS, 1st visit only

CS, All visits

LD

LD

The association between RI and Xero. is positive,

although not statistically significant at the 5

level.

Model 2

Prevalence of respiratory infection increases

from age 12 months and reaches its peak at 20

months before starting to decline.

The age coefficients in Model 2 can be

interpreted as a weighted average of the

cross-sectional age coefficients from each visit.

The risk of RI declines in the first 7 to 8

months of follow-up, before rising later in life.

(ageij ageik)