logistic regression - PowerPoint PPT Presentation

About This Presentation
Title:

logistic regression

Description:

Title: logistic regression Author: Department of Comminity Health Last modified by: FMHS IT Created Date: 10/7/1998 9:16:33 PM Document presentation format – PowerPoint PPT presentation

Number of Views:171
Avg rating:3.0/5.0
Slides: 27
Provided by: Departm96
Category:

less

Transcript and Presenter's Notes

Title: logistic regression


1
POPLHLTH 304 Regression (modelling) in
Epidemiology Simon Thornley (Slides adapted from
Assoc. Prof. Roger Marshall)
2
Which method does not control for confounding?
  1. Stratification
  2. Exclusion criteria
  3. Regression modelling
  4. Objective assessment of outcomes

3
Observational epidemiology
Usually in epidemiology studies are
observational Myriad factors determine the
occurrence of disease Trying to elicit the
effects of specific factors from others
(confounding variables) is often
difficult Regression models (as alternative to
stratification) are useful
4
Stratification difficulty
  • Often too many confounders stratifying leads to
    too many strata e.g. 4 categories of AGE, 2 of
    sex, 4 of ethnicity 32 strata. Empty cells
    problematic
  • Need better, more statistically efficient, way to
    deal with problem
  • Want to control for (many) possible confounding
    variables while eliciting effect (relative risk
    or odds ratio) of an exposure of interest
  • Building statistical models is one solution

5
What is a statistical regression
model? Usually regarded as a formula that
relates an outcome Y to one or more predictors
(exposures) X1 X2.of Y The formula imposes a
framework that we assume is the way we think Y
is related to X1 , X2 ,. in the real
world Model is specified as unknown
parameters estimated from data model fitting.
6
Linear (regression) model
Often may consider Y increases with X e.g
blood pressure increases with age May also
consider it does so linearly Data seems to
support this Though with much variability.
7
Straight line model (simple linear regression
model) for how Y depends on X
Y
This is the model structure, framework, Fitting
to data involves drawing a good fitting line
through the points line gives mean Y for given
X E(YX)
X
0
8
Regression
Relationship between X and Y Y
depends on X (rather than X depends
on Y) Y is dependent (outcome, disease)
variable X is independent (exposure,
predictor, covariate) variable
9
Consider 2 potential predictors of Y, say X1
X2 Can plot data scatter points in a
3-dimensional space
Y
X1
X2
10
Analog of a line in 2-D is a plane in 3-D
Y
X1
X2

11
Straight line model with just X1 is
Extending this to a plane is Or
further Here E(Y) means average Y given .
12
Binary Y in epidemiologyIn epi, Y is often a
binary disease/no disease outcomeX1 X2 etc are
risk factors for the disease. One of which may
be an exposure of interest, the others
confounders.
13
logistic model need to modify to account for
binary Y, occurrence of disease D
  • Again information on X1 X2collapsed into a risk
    score
  • relationship between probability of disease and Q
    follows now follows logistic formula

14
Logistic regression formula is of the
form Probability of CHDeQ/(1eQ) where Q is a
weighted average of risk factors (a linear
score). For example Q
-5.311.09SMOKE 0.41SEX and SEX1 if man, 0 if
woman SMOKE1 if smokes , 0 if no The values
-5.31, 1.09, 0.41 are estimated from the data
and are the beta-coefficients.
15
The model gives a probability for each of the
4 combinations Smoking man has
probability Q-5.32 1.09 x 1 0.41 x 1
-3.82 Prob e-3.82/(1e-3.82)0.0214 Nonsmoking
man Q -5.321.09x0 0.41 x 1 -4.91,
Prob0.00732 Nonsmoking woman Q-5.32,
Prob0.00486 Smoking woman Q-5.321.09,
Prob0.01434
16
Relative risk estimates RR for smoking (in men)
is 0.0214/0.007322.92 RR for smoking (in
women) is 0.01434/0.004862.95 Notice
these are also approximately e1.092.97 i.e take
exponential of beta-coefficient of variable
estimates its RR (actually e1.092.97 is the
disease odds ratio, but approx equal to RR when
disease is rare)
17
Why logistic formula? Ans P(D) always
between 0 and 1 whatever value of Q i.e. behaves
like a probability should.
18
Can include as many variables in Q as we
like Q-5.45 1.23SMOKE0.31SEX.124AGE
-0.2ETHNIC but model may be too ambitious.
i.e. Can a single model be expected to really
accurately account for effects of numerous
variables?
19
Logistic model in epidemiology controlling for
confounding Y is occurrence of disease on a
cohort study X1 is binary exposure of
interest X2 X3 are confounding risk
factors b1 is effect of X1 controlling for
effects of X2 X3 etc
20
Relative risk In fact is
approximately relative risk of X1 (assumed to
be same for all values of X2 X3 ) - no effect
modification/interaction RR is (assumed) same
whatever X2, X3, etc e.g. if X1 is smoking, X2
age, X3 is alcohol RR for smoking is same
whatever age, alcohol
21
Case-control studies Development is for cohort
studies (since probs of disease P( D ) are
estimable in a cohort study). .but can use
for case-control studies too (even though probs
of disease are not estimable) Can still use
as RR estimate.
22
Logistic Modelling advantages Can adjust for
many confounders at once beta coefficients give
odds ratio estimates of relative risk, valid if
disease rare deals with interactions (effect
modification) if necessary easy to do on
computer gives confidence intervals, P-values
etc can apply to case-control data
23
Disadvantages Model is just a model - not
necessarily reality Black box approach, can lose
touch with data Requires decisions what
variables in model? How to code variables?
Continuous or dichotomised? ORs not valid as RR
for non-rare disease (in cohort)
24
Logistic regression is favoured in epidemiology
because
  1. It can be used to adjust for many confounders at
    once
  2. It enhances statistical power over stratification
  3. It results in an outcome that is constrained
    between one and zero (the domain of a
    probability).

25
How do you estimate an odds ratio from a logistic
model?
  1. It is equal to the beta coefficient
  2. It is equal to the exponential of the beta
    coefficient
  3. It is equal to the logit of the sum of the
    product of the variables and the beta
    coefficients.

26
Which one of the following statements are true?
  1. The choice of independent and dependent variable
    in regression modelling is unimportant
  2. A regression model estimates the average value of
    the dependent variable, given the values of a
    number of independent variables
  3. Independent variables are outcomes and dependent
    variables exposures.
Write a Comment
User Comments (0)
About PowerShow.com