Multiple Linear Regression

About This Presentation

Title:

Multiple Linear Regression

Description:

Nothing explains everything Multiple Linear Regression Laurens Holmes, Jr. Nemours/A.I.duPont Hospital for Children What is MLR? Multiple Regression is a statistical ... – PowerPoint PPT presentation

Number of Views:119

Avg rating:3.0/5.0

Slides: 49

Provided by: Nem46

Learn more at: http://nemoursresearch.org

Category:

more less

Transcript and Presenter's Notes

Title: Multiple Linear Regression

1
Multiple Linear Regression
Nothing explains everything

Laurens Holmes, Jr.
Nemours/A.I.duPont Hospital for Children

2
What is MLR?

Multiple Regression is a statistical method for
estimating the relationship between a dependent
variable and two or more
independent (or predictor) variables.

3
Multiple Linear Regression

Simply, MLR is a method for studying the
relationship between a dependent variable and two
or more independent variables.
Purposes
Prediction
Explanation
Theory building

4
Operation?

Uses the ordinary least squares solution (as does
simple linear or bi-variable regression)
Describes a line for which the (sum of
squared) differences between the predicted and
the actual values of the dependent variable are
at a minimum.
Represents the function that minimizes the sum
of the squared errors.
Ypred a b1X1 B2X2 BnXn

5
Operation?

MLR produces a model that identifies the best
weighted combination of independent variables to
predict the dependent (or criterion) variable.
Ypred a b1X1 B2X2 BnXn
MLR estimates the relative importance of several
hypothesized predictors.
MLR assess the contribution of the combined
variables to change the dependent variable.

6
Design Requirements

One dependent variable (criterion)
Two or more independent variables (predictor or
explanatory variables).
Sample size gt 50 (at least 10 times as many
cases as independent variables)

7
Variations
Predictable variation by the combination of
independent variables
Total Variation in Y
Unpredictable Variation
8
MLR Model Basic Assumptions

Independence The data of any particular subject
are independent of the data of all other subjects
Normality in the population, the data on the
dependent variable are normally distributed for
each of the possible combinations of the level of
the X variables each of the variables is
normally distributed
Homoscedasticity In the population, the
variances of the dependent variable for each of
the possible combinations of the levels of the X
variables are equal.
Linearity In the population, the relation
between the dependent variable and the
independent variable is linear when all the other
independent variables are held constant.

9
Simple vs. Multiple Regression

One dependent variable Y predicted from a set of
independent variables (X1, X2 .Xk)
One regression coefficient for each independent
variable
R2 proportion of variation in dependent variable
Y predictable by set of independent variables
(Xs)

One dependent variable Y predicted from one
independent variable X
One regression coefficient
r2 proportion of variation in dependent variable
Y predictable from X

10
MLR Equation

Ypred a b1X1 B2X2 BnXn (predpredicted,
1 and 2 are underscore)
Ypred dependent variable or the variable to be
predicted.
X the independent or predictor variables
a raw score equations include a constant or Y
Intercept ob Y axis, representing the value of Y
when X 0.
b b weights or partial regression
coefficients.
The bs show the relative contribution of their
independent variable on the dependent variable
when controlling for the effects of the other
predictors

11
Variables in the model?

One approach is to perform literature review and
examine theories to identify potential predictors
, thus building a theoretical variate, which
may reflect the biologic or clinical relevance of
the variable.
This is sometimes referred to as the standard
(simultaneous) regression method.
A second approach is to examine statistics that
show the effects of each variable both within and
out of
the equation.
The statistical variate is built based on those
variables
showing the most effect (significant at 0.25).
These are sometimes called Forward and Backward
Stepwise Regression

12
MLR Output

The following notions are essential for the
understanding of MLR output R2, adjusted R2,
constant, b coefficient, beta, F-test, t-test
For MLR R2 (the coefficient of multiple
determination) is used rather than r (Pearsons
correlation coefficient) to assess the strength
of this more complex
relationship (as compared to a bivariate
correlation)

13
Adjusted R square and b coefficient

The adjusted R2 adjusts for the inflation in R2
caused by the number of variables in the
equation. As the sample size increases above 20
cases per variable, adjustment is less needed
(and vice versa).
b coefficient measures the amount of increase or
decrease in the dependent variable for a one-unit
difference in the independent variable,
controlling for the other independent variable(s)
in the equation.

14
B coefficient

Ideally, the independent variables are
uncorrelated.
Consequently, controlling for one of them will
not affect the relationship between the other
independent variable and the dependent variable

15
Intercorrelation or collinearlity

If the two independent variables are
uncorrelated, we can uniquely partition the
amount of variance in Y due to X1 and X2 and bias
is avoided.
Small intercorrelations between the independent
variables will not greatly biased the b
coefficients.
However, large intercorrelations will biased the
b coefficients and for this reason other
mathematical procedures are needed

16
MRL Model Building

Each predictor is taken in turn. That is, all
other predictors are first placed in the equation
and then the predictor of interest is entered.
This allows us to determine the unique
(additional) contribution of the predictor
variable.
By repeating the procedure for each predictor we
can determine the unique contribution of each
independent variable.

17
Different Ways of Building Regression Models

Simultaneous all independent variables entered
together
Stepwise independent variables entered according
to some order
By size or correlation with dependent variable
In order of significance
Hierarchical independent variables entered in
stages

18
Various Significance Tests

Testing R2
Test R2 through an F test
Test of competing models (difference between R2)
through an F test of difference of R2s
Testing b
Test of each partial regression coefficient (b)
by t-tests
Comparison of partial regression coefficients
with each other - t-test of difference between
standardized partial regression coefficients (?)

19
F and t tests

The F-test is used as a general indicator of the
probability that any of the predictor variables
contribute to the variance in the dependent
variable within the population.
The null hypothesis is that the predictors
weights are all effectively equal to zero.
Implying that, none of the predictors contribute
to the variance in the dependent variable in the
population

20
F and t tests

t-tests are used to test the significance of each
predictor in the equation.
The null hypothesis is that a predictors weight
is effectively equal to zero when the effects of
the other predictors are taken into account.
That is, it does not contribute to the variance
in the dependent variable within the population.

21
R Square

When comparing the R2 of an original set of
variables to the R2 after additional variables
have been included, the researcher is able to
identify the unique variation explained by the
additional set of variables.
Any co-variation between the original set of
variables and the new variables will be
attributed to the original variables.
R2 (multiple correlation squared) variation in
Y accounted for by the set of predictors
Adjusted R2 sample variation around R2 can only
lead to inflation of the value.
The adjustment takes into account the size of the
sample and number of predictors to adjust the
value to be a better estimate of the population
value.
R2 is similar to ?2 value but will be a little
smaller because R2 only looks at linear
relationship while ?2 will account for non-linear
relationships.

22
Vignette

Suppose we wish to examine the factors that
predict the length of hospitalization following
spinal surgery in children with CP(dependent
continuous variable).
The available variables in the dataset are
hematocrit, estimated blood loss, cell saver,
operating time, age at surgery, and parked red
blood cells.
If the dependent and independent variables are
measured on continuous scale, what will be an
appropriate test statistic?
Select appropriate variables (theory based and
statistical approach), and determine the effect
of estimated blood loss while controlling
hematocrit and parked red blood cell, age at
surgery, cell saver, operating time (duration of
surgery).

23
SPSS 1) analyze, 2) regression, 3) linear
24
(No Transcript)
25
SPSS Screen
26
SPSS Output
Interpret the coefficients
27
SPSS Output
Interpret the r square
What does the ANOVA result mean?
28
Repeated Measure Analysis of Variance (RM ANOVA)
RM removes variability in baseline prognostic
factor ideal model !!!

Univariable (Univariate)

29
Repeated Measures ANOVA

Between Subjects Design
ANOVA in which each participant participated in
one of the three treatment groups for example.
Within Subjects or Repeated Measures Design
Participants participate in one treatment and the
outcome of the treatment is measured in different
time points for example 3, (before treatment,
immediately after, and 6 months after treatment)

30
RM ANOVA Vs. Paired T test

Repeated measures ANOVA, also known as
within-subjects ANOVA, are an extension of Paired
T-Tests.
Like T-Tests, repeated measures ANOVA gives us
the statistic tools to determine whether or not
changed has occurred over time.
T-Tests compare average scores at two different
time periods for a single group of subjects.
Repeated measures ANOVA compared the average
score at multiple time periods for a single group
of subjects.

31
RM ANOVA Understanding the terms analysis
interpretation

The first step in solving repeated measures ANOVA
is to combine the data from the multiple time
periods into a single time factor for analysis.
The different time periods are analogous to the
categories of the independent variable is a
one-way analysis of variance.
The time factor is then tested to see if the mean
for the dependent variable is different for some
categories of the time factor.
If the time factor is statistically significant
in the ANOVA test, then Bonferroni pair wise
comparisons are computed to identify specific
differences between time periods.

32
RM ANOVA Understanding the terms analysis
interpretation

The dependent variable is measured at three time
periods, there are three paired comparisons
time 1 versus time 2 (preoperative or before
treatment measure)
time 2 versus time 3 (immediate after
surgery/treatment measure)
time 1 versus time 3 (Follow-up post operative
measure)

33
Statistical Assumptions of RM ANOVA

Independence
Normality
Homogeneity of within-treatment variances
Sphericity

RM is ideal in testing the hypothesis on
treatment effectiveness when ethical constraints
restricts the use of control subjects
34
Homogeneity of Variance

In one-way ANOVA, we expect the variances to be
equal
We also expect that the samples are not related
to one another (so no covariance or correlation)

35
Sphericity and Compound Symmetry

Extension of homogeneity of variance assumption
Compound Symmetry is stricter than Sphericity
(but maybe easier to explain)
All variances are equal to each other
All covariance are equal to each other

36
Sphericity and Compound Symmetry

If we meet assumption of Compound Symmetry than
we meet assumption of Sphericity
Sphericity is less strict and is the only thing
we need to meet for RM ANOVA
Sphericity is that the variance of the
differences are equal
Variance of difference scores between time 1 and
2 is equal to the variance of difference scores
between time 2 and 3.

37
Spericity Assumption Violations

A more conservative method of evaluating the
significance of the obtained F is needed
Greenhouse-Geisser (1958) correction
Gives appropriate critical value for worst
situation in which assumptions are maximally
violated
Huynh-Feldt correction
The Huynh-Feldt epsilon is an attempt to correct
the Greenhouse-Geisser epsilon, which tends to be
overly conservative, especially for small sample
sizes

38
Sample Table for RM ANOVA
39
RM ANOVA

All participants participate in all treatment
conditions, ex. surgery for spinal deformity
correction.
Participant emerges as an independent source of
variance.
In RM ANOVA there is no such variability.
The other sources of variance include the
repeated measures treatment and the Participant
x treatment interaction

40
RM ANOVA Equation
41
Vignette

Suppose a spinal fusion was performed to correct
spinal deformities in Adolescent Idiopathic
Scoliosis (AIS). If the main cobb angle was
measured preoperatively, immediately after
surgery (first erect), and during two years of
follow-up, was the surgical procedure effective
in correcting the curve deformity and maintaining
correction after two years of follow-up?
Hint correction loss gt 10 degrees in indicative
of a clinically significant loss of correction.

42
Sample variables on preoperative, immediate
operative and 2 year follow-up
Normality assumption of the variables on the
three measuring points of the cobb angle.
43

On SPSS, select
Analyze
GLM
RM

44
SPSS Output
From the variables box select accordingly 1, 2,
and 3rd measurement points during the study
period.
Click the option box and select descriptive, and
Bon multiple comparison.
45
SPSS OUTPUT
Observe the time means and their SD
Observe the sphericity significance in terms of
variance
46
SPSS Output
Report the Greenhouse-Geisser result
47
SPSS Output
48
48

Write a Comment

User Comments (0)

About PowerShow.com

Multiple Linear Regression - PowerPoint PPT Presentation

Multiple Linear Regression

Nothing explains everything Multiple Linear Regression Laurens Holmes, Jr. Nemours/A.I.duPont Hospital for Children What is MLR? Multiple Regression is a statistical ... – PowerPoint PPT presentation