Short Course in Biostatistics2005 Lesson 6: Introduction to Regression - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Short Course in Biostatistics2005 Lesson 6: Introduction to Regression

Description:

However, most binary regression analyses are based on the odds ratio. ... an increase in X by one unit results has an multiplicative effect on the odds. ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 33
Provided by: lmag6
Category:

less

Transcript and Presenter's Notes

Title: Short Course in Biostatistics2005 Lesson 6: Introduction to Regression


1
Short Course in Biostatistics-2005Lesson 6
Introduction to Regression
  • Larry Magder
  • April 15, 2005

2
Outline
  • Brief review of parametric statistical methods
  • Two purposes of regression models
  • Modeling the effect of a quantitative predictor.
  • Estimating the effect of one variable,
    controlling for other variables.

3
Review of Parametric Statistical Methods
  • Many scientific questions can be viewed as
    questions about the unknown probability
    distribution of random variables.
  • These distributions can be characterized by a
    small number of parameters.
  • Statistical Methods
  • 1. Summarize what the data say about the
    possible values of the parameters.
  • 2. Quantify the evidence in the data with
    respect to hypotheses about the parameters.

4
Regression Models, Purpose 1 Modeling effect
of a quantitative predictor
  • Suppose you want to know whether a new drug
    increases blood pressure. Then you might focus
    on two parameters
  • ?1 mean BP change in those given the drug
  • ?0 mean BP change in those not given drug
  • The null hypothesis would be H0 ?0?1
  • Given data we can estimate the ?s and assess the
    evidence with respect to the null hypothesis

5
Modeling effect of quantitative predictor
  • What if your predictor is a quantitative
    variable?
  • e.g, we might be interested in the effect of
    different doses of drug.
  • For simplicity, assume there are three doses.
  • One approach. Assume
  • ?1 mean BP change in those given dose 1
  • ?2 mean BP change in those given dose 2
  • ?3 mean BP change in those given dose 3

6
Modeling effect of quantitative predictor
  • Alternatively, we might assume
  • ?x ?0 ?1x where x is the dose
  • Note that this expression implies that
  • ?0 ?0
  • ?1 ?0 ?1
  • ?2 ?0 2?1
  • ?3 ?0 3?1
  • That is, each unit change in exposure leads to an
    increase in blood pressure by an additional ß1
    units

7
Modeling effect of quantitative predictor
  • Graphically, this model appears as follows

8
Modeling effect of quantitative predictor
  • Advantages of this model
  • It is more parsimonious than assuming a separate
    mean at each level. (Effect of treatment is
    summarized by one parameter).
  • This model is not confined to a situation when
    there are only a small number of doses as in this
    example.
  • Disadvantages of this model
  • It assumes a linear relationship between dose and
    effect. (Smoothing Assumption)

9
Regression Models, Purpose 2Parameterizing
the effect of one variable, controlling for other
variables
  • Motivating example Suppose you are interested
    in the effect of moderate alcohol use during
    pregnancy on the mean birth weight of infants.
  • Results

10
Parameterizing the effect of one variable,
controlling for other variables
  • We might want to control for smoking.
  • Lets redo the analysis stratifying by smoking
    status.
  • What would be a good estimate of the effect of
    moderate drinking on birth weight, controlling
    for smoking?

11
Parameterizing the effect of one variable,
controlling for other variables
  • The effect of drinking controlling for smoking
    might be viewed as the effect of drinking after
    holding smoking constant.
  • In these data, the effect of drinking controlling
    for smoking appears to be 50 gms in one strata
    and 39 gms in the other strata.
  • If we are willing to assume the effect is the
    same in both strata (no effect modification)
    then we might take an average of 50 and 39
  • This average can be called the effect of
    drinking on birth weight, controlling for smoking.

12
Parameterizing the effect of one variable,
controlling for other variables
  • One way to model these data and incorporate the
    assumption of no effect modification is
    parameterize it as a regression model.

13
Parameterizing the effect of one variable,
controlling for other variables
  • Let
  • X11 if mother drank, 0 otherwise
  • X21 if mother smoked, 0 otherwise
  • Then, assume E(Y) ?0 ?1X1 ?2X2
  • Note this implies
  • For Smokers and Drinkers, E(Y) ?0
    ?1 ?2
  • For Smokers and Non-Drinkers E(Y) ?0 ?2
  • For Non-Smokers and Drinkers E(Y) ?0 ?1
  • For Non-Smokers/Non-Drinkers E(Y) ?0

14
Parameterizing the effect of one variable,
controlling for other variables
  • By subtracting, it can be seen that the effect of
    drinking in each strata defined by smoking is ?1
    !
  • Therefore, ?1 is the effect of drinking
    controlling for smoking.
  • The linear equation is a simple way to
    parameterize this effect of interest.
  • The assumption that the effect of drinking is the
    same among smokers and non-smokers is another
    strong smoothing assumption

15
General Multiple Regression Model
  • More generally, we often fit regression models
    with many independent variables, i.e.
  • E(Y) ?0 ?1X1 ?2X2 ?3X3 .... ?kXk
  • The predictors can be dummy variables or
    continuous variables.
  • Generally we also assume that the variance of Y
    is the same at all levels of the predictor
    variables.
  • Generally we also assume that the distribution of
    Y is normal at all levels of the predictor
    variables (not essential unless departure from
    normality is extreme).

16
General Multiple Regression Model
  • In these models, it can be shown that the ?s are
    interpretable as the effect of the corresponding
    predictor, controlling for all other variables in
    the model.
  • All ANOVA models (discussed last week) can be
    parameterized as linear regression models.

17
Smoothing Assumptions of the General Multiple
Regression Model
  • It is important to realize that Multiple
    Regression Models make smoothing assumptions of
    two types.
  • 1. The model assumes that the relationship
    between the E(Y) and quantitative predictors
    is linear.
  • 2. The model assumes that the effects are
    simply additive (i.e. the effect one variable
    is the same in all levels of other variables)
  • (No Effect Modification w.r.t. Mean
    Difference)

18
Advantages and Disadvantages of Regression
Approach
  • Advantage
  • Smoothing Assumption Makes it possible to
    represent important research questions with a
    small number of key parameters. Makes it
    possible to simultaneously consider many
    variables
  • Disadvantage
  • Smoothing Assumption Makes strong simplifying
    assumptions about the true relationships.

19
Relaxing the Smoothing Assumptions
  • Each type of smoothing assumption can be relaxed
    by adding additional complexity to the regression
    model.
  • Relaxing the assumption of linearity Add
    quadratic terms, piecewise linear terms, or other
    creative approaches.
  • Example
  • ?x ?0 ?1x ?2x2 where x is the
    dose

20
Relaxing the Smoothing Assumptions
  • Relaxing the assumption of no effect
    modification
  • Let X11 if mother drank, 0 otherwise
  • X21 if mother smoked, 0 otherwise
  • Then, assume E(Y) ?0 ?1X1 ?2X2 ?3
    X1X2
  • Note this implies
  • For Smokers and Drinkers, v E(Y) ?0
    ?1 ?2 ?3
  • For Smokers and Non-Drinkers E(Y) ?0 ?2
  • For Non-Smokers and Drinkers E(Y) ?0 ?1
  • For Non-Smokers/Non-Drinkers E(Y) ?0
  • Which implies
  • Effect of Drinking among smokers ?2 ?3
  • Effect of Drinking among non-smokers ?2

21
Estimation, Confidence Intervals, P-values
  • How do we estimate the terms in the model?
  • Need to collect independent realizations of Y and
    the associated predictors X1, X2, ...

22
Estimation, Confidence Intervals, P-values
  • Note, for any possible choice of estimates, I
    would get a predicted value for Y. For example
    for
  • the predicted value of Y would be
  • Choose ?'s that make the result in estimates of Y
    that are closest to the observed value, on
    average.
  • More specifically, we minimize the sum of the
    squared distances. ("Least Squares Regression")

23
Estimation, Confidence Intervals, P-values
  • How is this done? By the computer!
  • (It could take years to fit a single multiple
    regression model by hand).
  • The computer will also provide confidence
    intervals and p-values based on t-distributions.
  • These inferences are approximate, but they are
    exactly correct if the underlying data are
    assumed to be normal.

24
Example
  • Head growth and neurodevelopment of infants born
    to HIV-1infected drug-using women
  • Macmillan C, Magder LS, et al.,
    Neurology, 2001
  • Using data from WITS cohort study of
    HIV-1infected women and their children, we fit
    regression models to estimate the effects of
    HIV-1 infection and in utero hard drug exposure
    Bayley Scales of Infant Development controlling
    for relevant confounders.

25
Example (continued)
  • Data
  • 1,094 infants born to HIV-1-positive mothers
  • 147 (13) of the infants became HIV-1positive
  • 383 (35) were exposed in utero to opiates or
    cocaine (drug-positive).
  • Bayley scores measured repeatedly during the
    first 2 years of life
  • (Note that repeated measures on the same
    children violates the assumption of
    independence so we had to use somewhat more
    complex methods).

26
Example (continued)
  • Box Plots of Bayley Motor Scores by HIV status
    and in utero drug exposure. (A score of 100 is
    normal)

4 months of age
12 months of age
24 months of age
27
Example (continued)
  • Regression Model for 4-month Bayley Score.
  • E(Bayley Score) ß0 ß1HIV/DRUG
    ß2HIV/NoDRUG
  • ß3DRUGnoHIV ß4AZT ß5 SMOKE
  • ß6 ALCOHOL ....

28
Example (continued) Results
29
Example (continued) Fitted Model
Expected Bayley Motor Standardized Scores by Age
in 4groups (?HIV-,Drug- ?HIV-,Drug ?HIV,
Drug- XHIV, Drug)
30
Extension to other types of random variable
  • The regression models described above were to
    model a quantitative outcome.
  • They can be extended to model binary outcomes
    simply by replacing E(Y) on the left hand side by
    P(Y).
  • For example, we might assume that
  • Prob(Y1) ?0 ?1X1 ?2X2 ?3X3 .... ?kXk
  • This model assumes that an increase in X by one
    unit results has an additive effect on the
    probability.
  • The ?s are interpretable as "Risk Differences".
  • The model assumes a smooth relationship between
    quantitative predictors and the prob. of binary
    outcome.
  • The ?s are interpretable as the effect of the
    corresponding predictor on the probability of the
    outcome, controlling for all other variables in
    the model.

31
Extension to other types of random variable
  • A second approach would be the following
  • Prob(Y1) (?0)(?1X1)(?2X2)(?3X3) .... (?kXk)
  • This model assumes that an increase in X by one
    unit results has an multiplicative effect on the
    probability.
  • The ?s are interpretable as "Risk Ratios".
  • The model assumes a smooth relationship between
    quantitative predictors and the prob. of binary
    outcome.
  • The ?s are interpretable as the effect of the
    corresponding predictor on the probability of the
    outcome, controlling for all other variables in
    the model.

32
Logistic Regression
  • However, most binary regression analyses are
    based on the odds ratio. The Logistic Regression
    Model is as follows
  • Odds(Y1) (?0)(?1X1)(?2X2)(?3X3) .... (?kXk)
  • This model assumes that an increase in X by one
    unit results has an multiplicative effect on the
    odds.
  • The ?s are interpretable as "Odds Ratios".
  • The model assumes a smooth relationship between
    quantitative predictors and the odds of the
    outcome.
  • The ?s are interpretable as the effect of the
    corresponding predictor on the probability of the
    outcome, controlling for all other variables in
    the model.
Write a Comment
User Comments (0)
About PowerShow.com