Multiple Regression - PowerPoint PPT Presentation

About This Presentation
Title:

Multiple Regression

Description:

Partial Regression Coefficients: bi effect (on the mean response) ... Trismus (x6=1 if Present, 0 if absent) Underlying Disease (x7=1 if Present, 0 if absent) ... – PowerPoint PPT presentation

Number of Views:85
Avg rating:3.0/5.0
Slides: 27
Provided by: larryw4
Category:

less

Transcript and Presenter's Notes

Title: Multiple Regression


1
Chapter 12
  • Multiple Regression

2
Multiple Regression
  • Numeric Response variable (y)
  • k Numeric predictor variables (k lt n)
  • Model
  • Y b0 b1x1 ??? bkxk e
  • Partial Regression Coefficients bi ? effect (on
    the mean response) of increasing the ith
    predictor variable by 1 unit, holding all other
    predictors constant
  • Model Assumptions (Involving Error terms e )
  • Normally distributed with mean 0
  • Constant Variance s2
  • Independent (Problematic when data are series in
    time/space)

3
Example - Effect of Birth weight on Body Size in
Early Adolescence
  • Response Height at Early adolescence (n 250
    cases)
  • Predictors (k6 explanatory variables)
  • Adolescent Age (x1, in years -- 11-14)
  • Tanner stage (x2, units not given)
  • Gender (x31 if male, 0 if female)
  • Gestational age (x4, in weeks at birth)
  • Birth length (x5, units not given)
  • Birthweight Group (x61,...,6 lt1500g (1),
    1500-1999g(2), 2000-2499g(3), 2500-2999g(4),
    3000-3499g(5), gt3500g(6))

Source Falkner, et al (2004)
4
Least Squares Estimation
  • Population Model for mean response
  • Least Squares Fitted (predicted) equation,
    minimizing SSE
  • All statistical software packages/spreadsheets
    can compute least squares estimates and their
    standard errors

5
Analysis of Variance
  • Direct extension to ANOVA based on simple linear
    regression
  • Only adjustments are to degrees of freedom
  • DFR k DFE n-(k1)

6
Testing for the Overall Model - F-test
  • Tests whether any of the explanatory variables
    are associated with the response
  • H0 b1???bk0 (None of the xs associated with
    y)
  • HA Not all bi 0

7
Example - Effect of Birth weight on Body Size in
Early Adolescence
  • Authors did not print ANOVA, but did provide
    following
  • n250 k6 R20.26
  • H0 b1???b60 HA Not all bi 0

8
Testing Individual Partial Coefficients - t-tests
  • Wish to determine whether the response is
    associated with a single explanatory variable,
    after controlling for the others
  • H0 bi 0 HA bi ? 0 (2-sided
    alternative)

9
Example - Effect of Birth weight on Body Size in
Early Adolescence
Controlling for all other predictors, adolescent
age, Tanner stage, and Birth length are
associated with adolescent height measurement
10
Comparing Regression Models
  • Conflicting Goals Explaining variation in Y
    while keeping model as simple as possible
    (parsimony)
  • We can test whether a subset of k-g predictors
    (including possibly cross-product terms) can be
    dropped from a model that contains the remaining
    g predictors. H0 bg1bk 0
  • Complete Model Contains all k predictors
  • Reduced Model Eliminates the predictors from H0
  • Fit both models, obtaining sums of squares for
    each (or R2 from each)
  • Complete SSRc , SSEc (Rc2)
  • Reduced SSRr , SSEr (Rr2)

11
Comparing Regression Models
  • H0 bg1bp 0 (After removing the effects of
    X1,,Xg, none of other predictors are associated
    with Y)
  • Ha H0 is false

P-value based on F-distribution with k-g and
n-(k1) d.f.
12
Models with Dummy Variables
  • Some models have both numeric and categorical
    explanatory variables (Recall gender in example)
  • If a categorical variable has m levels, need to
    create m-1 dummy variables that take on the
    values 1 if the level of interest is present, 0
    otherwise.
  • The baseline level of the categorical variable is
    the one for which all m-1 dummy variables are set
    to 0
  • The regression coefficient corresponding to a
    dummy variable is the difference between the mean
    for that level and the mean for baseline group,
    controlling for all numeric predictors

13
Example - Deep Cervical Infections
  • Subjects - Patients with deep neck infections
  • Response (Y) - Length of Stay in hospital
  • Predictors (One numeric, 11 Dichotomous)
  • Age (x1)
  • Gender (x21 if female, 0 if male)
  • Fever (x31 if Body Temp gt 38C, 0 if not)
  • Neck swelling (x41 if Present, 0 if absent)
  • Neck Pain (x51 if Present, 0 if absent)
  • Trismus (x61 if Present, 0 if absent)
  • Underlying Disease (x71 if Present, 0 if absent)
  • Respiration Difficulty (x81 if Present, 0 if
    absent)
  • Complication (x91 if Present, 0 if absent)
  • WBC gt 15000/mm3 (x101 if Present, 0 if absent)
  • CRP gt 100mg/ml (x111 if Present, 0 if absent)

Source Wang, et al (2003)
14
Example - Weather and Spinal Patients
  • Subjects - Visitors to National Spinal Network in
    23 cities Completing SF-36 Form
  • Response - Physical Function subscale (1 of 10
    reported)
  • Predictors
  • Patients age (x1)
  • Gender (x21 if female, 0 if male)
  • High temperature on day of visit (x3)
  • Low temperature on day of visit (x4)
  • Dew point (x5)
  • Wet bulb (x6)
  • Total precipitation (x7)
  • Barometric Pressure (x7)
  • Length of sunlight (x8)
  • Moon Phase (new, wax crescent, 1st Qtr, wax
    gibbous, full moon, wan gibbous, last Qtr, wan
    crescent, presumably had 8-17 dummy variables)

Source Glaser, et al (2004)
15
Modeling Interactions
  • Statistical Interaction When the effect of one
    predictor (on the response) depends on the level
    of other predictors.
  • Can be modeled (and thus tested) with
    cross-product terms (case of 2 predictors)
  • E(Y) a b1X1 b2X2 b3X1X2
  • X20 ? E(Y) a b1X1
  • X210 ? E(Y) a b1X1 10b2 10b3X1
  • (a 10b2)
    (b1 10b3)X1
  • The effect of increasing X1 by 1 on E(Y) depends
    on level of X2, unless b30 (t-test)

16
Logistic Regression
  • Logistic Regression - Binary Response variable
    and numeric and/or categorical explanatory
    variable(s)
  • Goal Model the probability of a particular
    outcome as a function of the predictor
    variable(s)
  • Problem Probabilities are bounded between 0 and 1

17
Logistic Regression with 1 Predictor
  • Response - Presence/Absence of characteristic
  • Predictor - Numeric variable observed for each
    case
  • Model - p (x) ? Probability of presence at
    predictor level x
  • b 0 ? P(Presence) is the same at each level
    of x
  • b gt 0 ? P(Presence) increases as x increases
  • b lt 0 ? P(Presence) decreases as x increases

18
Logistic Regression with 1 Predictor
  • b0, b1 are unknown parameters and must be
    estimated using statistical software such as
    SPSS, SAS, or STATA
  • Primary interest in estimating and testing
    hypotheses regarding b1
  • Large-Sample test (Wald Test) (Some software
    runs z-test)
  • H0 b1 0 HA b1 ? 0

19
Example - Rizatriptan for Migraine
  • Response - Complete Pain Relief at 2 hours
    (Yes/No)
  • Predictor - Dose (mg) Placebo (0),2.5,5,10

Source Gijsmant, et al (1997)
20
Example - Rizatriptan for Migraine (SPSS)
21
Odds Ratio
  • Interpretation of Regression Coefficient (b1)
  • In linear regression, the slope coefficient is
    the change in the mean response as x increases by
    1 unit
  • In logistic regression, we can show that
  • Thus eb1 represents the change in the odds of
    the outcome (multiplicatively) by increasing x by
    1 unit
  • If b10, the odds (and probability) are equal at
    all x levels (eb11)
  • If b1gt0 , the odds (and probability) increase as
    x increases (eb1gt1)
  • If b1lt 0 , the odds (and probability) decrease
    as x increases (eb1lt1)

22
95 Confidence Interval for Odds Ratio
  • Step 1 Construct a 95 CI for b
  • Step 2 Raise e 2.718 to the lower and upper
    bounds of the CI
  • If entire interval is above 1, conclude positive
    association
  • If entire interval is below 1, conclude negative
    association
  • If interval contains 1, cannot conclude there is
    an association

23
Example - Rizatriptan for Migraine
  • 95 CI for b1
  • 95 CI for population odds ratio
  • Conclude positive association between dose and
    probability of complete relief

24
Multiple Logistic Regression
  • Extension to more than one predictor variable
    (either numeric or dummy variables).
  • With p predictors, the model is written
  • Adjusted Odds ratio for raising xi by 1 unit,
    holding all other predictors constant
  • Inferences on bi and ORi are conducted as was
    described above for the case with a single
    predictor

25
Example - ED in Older Dutch Men
  • Response Presence/Absence of ED (n1688)
  • Predictors (k12)
  • Age stratum (50-54, 55-59, 60-64, 65-69, 70-78)
  • Smoking status (Nonsmoker, Smoker)
  • BMI stratum (lt25, 25-30, gt30)
  • Lower urinary tract symptoms (None, Mild,
    Moderate, Severe)
  • Under treatment for cardiac symptoms (No, Yes)
  • Under treatment for COPD (No, Yes)
  • Baseline group for dummy variables

Source Blanker, et al (2001)
26
Example - ED in Older Dutch Men
  • Interpretations Risk of ED appears to be
  • Increasing with age, BMI, and LUTS strata
  • Higher among smokers
  • Higher among men being treated for cardiac or
    COPD
Write a Comment
User Comments (0)
About PowerShow.com