Logistic Regression and the new: Residual Logistic Regression - PowerPoint PPT Presentation

About This Presentation
Title:

Logistic Regression and the new: Residual Logistic Regression

Description:

Logistic Regression and the new: Residual Logistic Regression F. Berenice Baez-Revueltas Wei Zhu * * * * * * * * * * * * * * * * * * * * * * Outline Logistic ... – PowerPoint PPT presentation

Number of Views:208
Avg rating:3.0/5.0
Slides: 32
Provided by: amsSunys
Category:

less

Transcript and Presenter's Notes

Title: Logistic Regression and the new: Residual Logistic Regression


1
Logistic Regression and the newResidual
Logistic Regression
  • F. Berenice Baez-Revueltas
  • Wei Zhu

2
Outline
  1. Logistic Regression
  2. Confounding Variables
  3. Controlling for Confounding Variables
  4. Residual Linear Regression
  5. Residual Logistic Regression
  6. Examples
  7. Discussion
  8. Future Work

3
1. Logistic Regression Model
  • In 1938, Ronald Fisher and Frank Yates
    suggested the logit link for regression with a
    binary response variable.

4
A popular model for categorical response variable
  • Logistic regression model is the most popular
    model for binary data.
  • Logistic regression model is generally used to
    study the relationship between a binary response
    variable and a group of predictors (can be either
    continuous or categorical).
  • Y 1 (true, success, YES, etc.) or
  • Y 0 ( false, failure, NO, etc.)
  • Logistic regression model can be extended to
    model a categorical response variable with more
    than two categories. The resulting model is
    sometimes referred to as the multinomial logistic
    regression model (in contrast to the binomial
    logistic regression for a binary response
    variable.)

5
More on the rationale of the logistic regression
model
  • Consider a binary response variable Y0 or 1and a
    single predictor variable x. We want to model
    E(Yx) P(Y1x) as a function of x. The logistic
    regression model expresses the logistic transform
    of P(Y1x) as a linear function of the
    predictor.
  • This model can be rewritten as
  • E(Yx) P(Y1 x) 1 P(Y0x) 0 P(Y1x) is
    bounded between 0 and 1 for all values of x. The
    following linear model may violate this condition
    sometimes
  • P(Y1x)

6
More on the properties of the logistic regression
model
  • In the simple logistic regression, the regression
    coefficient has the interpretation that it
    is the log of the odds ratio of a success event
    (Y1) for a unit change in x.
  • For multiple predictor variables, the logistic
    regression model is

7
Logistic Regression, SAS Procedure
  • http//www.ats.ucla.edu/stat/sas/output/SAS_logit_
    output.htm
  • Proc Logistic
  • This page shows an example of logistic regression
    with footnotes explaining the output. The data
    were collected on 200 high school students, with
    measurements on various tests, including science,
    math, reading and social studies. The response
    variable is high writing test score (honcomp),
    where a writing score greater than or equal to 60
    is considered high, and less than 60 considered
    low from which we explore its relationship with
    gender (female), reading test score (read), and
    science test score (science). The dataset used in
    this page can be downloaded from
    http//www.ats.ucla.edu/stat/sas/webbooks/reg/defa
    ult.htm.
  • data logit
  • set "c\temp\hsb2"
  • honcomp (write gt 60)
  • run
  • proc logistic data logit descending
  • model honcomp female read science
  • run

8
Logistic Regression, SAS Output
9
2. Confounding Variables
  • Correlated with both the dependent and
    independent variables
  • Represent major threat to the validity of
    inferences on cause and effect
  • Add to multicollinearity
  • Can lead to over or underestimation of an effect,
    it can even change the direction of the
    conclusion
  • They add error in the interpretation of what may
    be an accurate measurement

10
  • For a variable to be a confounder it needs to
    have
  • Relationship with the exposure
  • Relationship with the outcome even in the absence
    of the exposure (not an intermediary)
  • Not on the causal pathway
  • Uneven distribution in comparison groups

Exposure
Outcome
Third variable
11
Confounding
Maternal age is correlated with birth order and a
risk factor for Down Syndrome, even if Birth
order is low
No Confounding
Smoking is correlated with alcohol consumption
and is a risk factor for Lung Cancer even for
persons who dont drink alcohol
12
3. Controlling for Confounding Variables
  • In study designs
  • Restriction
  • Random allocation of subjects to study groups to
    attempt to even out unknown confounders
  • Matching subjects using potential confounders

13
  • In data analysis
  • Stratified analysis using Mantel Haenszel method
    to adjust for confounders
  • Case-control studies
  • Cohort studies
  • Restriction (is still possible but it means to
    throw data away)
  • Model fitting using regression techniques

14
Pros and Cons of Controlling Methods
  • Matching methods call for subjects with exactly
    the same characteristics
  • Risk of over or under matching
  • Cohort studies can lead to too much loss of
    information when excluding subjects
  • Some strata might become too thin and thus
    insignificant creating also loss of information
  • Regression methods, if well handled, can control
    for confounding factors

15
4. Residual Linear Regression
  • Consider a dependant variable Y and a set of n
    independent covariates, from which the first k
    (kltn) of them are potential confounding factors
  • Initial model treating only the confounding
    variables as follows
  • Residuals are calculated from this model, let

16
  • The residuals are with the
    following properties
  • Zero mean
  • Homoscedasticity
  • Normally distributed
  • ,
  • This residual will be considered the new
    dependant variable. That is, the new model to be
    fitted is
  • which is equivalent to

17
The Usual Logistic Regression Approach to
Control for Confounders
  • Consider a binary outcome Y and n covariates
    where the first k (kltn) of them being potential
    confounding factors
  • The usual way to control for these confounding
    variables is to simply put all the n variables in
    the same model as

18
5. Residual Logistic Regression
  • Each subject has a binary outcome Y
  • Consider n covariates, where the first k (kltn)
    are potential confounding factors
  • Initial model with as the probability of
    success where only confounding effect is analyzed

19
Method 1
  • The confounding variables effect is retained and
    plugged in to the second level regression model
    along with the variables of interest following
    the residual linear regression approach.
  • That is, let
  • The new model to be fitted is

20
Method 2
  • Pearson residuals are calculated from the initial
    model using the Pearson residual (Hosmer and
    Lemeshow, 1989)
  • where is the estimated probability of
    success based on the confounding variables alone
  • The second level regression will use this
    residual as the new dependant variable.

21
  • Therefore the new dependant variable is Z, and
    because it is not dichotomous anymore we can
    apply a multiple linear regression model to
    analyze the effect of the rest of the covariates.
  • The new model to be fitted is a linear
    regression model

22
6. Example 1
  • Data Low Birth Weight
  • Dow. Indicator of birth weight less than 2.5 Kg
  • Age Mothers age in years
  • Lwt Mothers weight in pounds
  • Smk Smoking status during pregnancy
  • Ht History of hypertension

Age Lwt Smk Ht
Age 1.0000 0.1738 -0.0444 -0.0158
Lwt 1.0000 -0.0408 0.2369
Smk 1.0000 0.0134
Ht 1.0000
Correlation matrix with alpha0.05
23
  • Potential confounding factor Age
  • Model for (probability of low birth weight)
  • Logistic regression
  • Residual logistic regression
  • initial model
  • Method 1
  • Method 2

24
Results
Variables Logistic Regression Logistic Regression Logistic Regression RLR Method1 RLR Method1 RLR Method1
Variables Odds ratio P-value SE Odds ratio P-value SE
lwt 0.988 0.060 0.0064 0.989 0.078 0.0065
smk 3.480 0.001 0.3576 3.455 0.001 0.3687
ht 3.395 0.053 0.6322 3.317 0.059 0.6342
RLR Method 2
Conf. factors
Variables P-value SE
lwt 0.077 0.0024
Smk 0.000 0.1534
ht 0.042 0.3094
Variables P-value P-value
Variables Log reg Ini model
Age 0.055 0.027
25
Example 2
  • Data Alzheimer patients
  • Decline Whether the subjects cognitive
    capabilities deteriorates or not
  • Age Subjects age
  • Gender Subjects gender
  • MMS Mini Mental Score
  • PDS Psychometric deterioration scale
  • HDT Depression scale

Age Gender MMS PDS HDT
Age 1.0000 0.0413 -0.2120 0.3327 0.9679
Gender 1.0000 -0.1074 0.2020 -0.1839
MMS 1.0000 0.3784 -0.1839
PDS 1.0000 0.0110
HDT 1.0000
Correlation matrix with alpha0.05
26
  • Potential confounding factors Age, Gender
  • Model for (probability of declining)
  • Logistic regression
  • Residual logistic regression
  • initial model
  • Method 1
  • Method 2

27
Results
Variables Logistic Regression Logistic Regression Logistic Regression RLR Method1 RLR Method1 RLR Method1
Variables Odds ratio P-value SE Odds ratio P-value SE
mms 0.717 0.023 0.1451 0.720 0.023 0.1443
pds 1.691 0.001 0.1629 1.674 0.001 0.1565
hdt 1.018 0.643 0.0380 1.018 0.644 0.0377
RLR Method 2
Conf. factors
Variables P-value P-value
Variables Log reg Ini model
Age 0.004 0.000
Gender 0.935 0.551
Variables P-value SE
mms lt0.001 0.0915
pds lt0.001 0.0935
hdt 0.061 0.0273
28
7. Discussion
  • The usual logistic regression is not designed to
    control for confounding factors and there is a
    risk for multicollinearity.
  • Method 1 is designed to control for confounding
    factors however, from the given examples we can
    see Method 1 yields similar results to the usual
    logistic regression approach
  • Method 2 appears to be more accurate with some SE
    significantly reduced and thus the p-values for
    some regressors are significantly smaller.
    However it will not yield the odds ratios as
    Method 1 can.

29
8. Future Work
  • We will further examine the assumptions behind
    Method 2 to understand why it sometimes yields
    more significant results.
  • We will also study residual longitudinal data
    analysis, including the survival analysis, where
    one or more time dependant variable(s) will be
    taken into account.

30
Selected References
  • Menard, S. Applied Logistic Regression Analysis.
    Series Quantitative Applications in the Social
    Sciences. Sage University Series
  • Lemeshow, S Teres, D. Avrunin, J.S. and
    Pastides, H. Predicting the Outcome of Intensive
    Care Unit Patients. Journal of the American
    Statistical Association 83, 348-356
  • Hosmer, D.W. Jovanovic, B. and Lemeshow, S. Best
    Subsets Logistic Regression. Biometrics 45,
    1265-1270. 1989.
  • Pergibon, D. Logistic Regression Diagnostics. The
    Annals of Statistics 19(4), 705-724. 1981.

31
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com