MARE 250 - PowerPoint PPT Presentation

1 / 18

About This Presentation

Title:

MARE 250

Description:

Multiple Regression MARE 250 Dr. Jason Turner * * * * y Linear Regression y = b0 + b1x y = dependent variable b0 + b1 = are constants b0 = y intercept b1 = slope x ... – PowerPoint PPT presentation

Number of Views:54

Avg rating:3.0/5.0

Slides: 19

Provided by: Dr2268

Category:

more less

Transcript and Presenter's Notes

Title: MARE 250

1
Multiple Regression
MARE 250 Dr. Jason Turner
2
Linear Regression
y b0 b1x
y dependent variable b0 b1 are
constants b0 y intercept b1 slope x
independent variable
Urchin density b0 b1(salinity)
3
Multiple Regression
Multiple regression allows us to learn more about
the relationship between several independent or
predictor variables and a dependent or criterion
variable For example, we might be looking for a
reliable way to estimate the age of AHI at the
dock instead of waiting for laboratory analyses
y b0 b1x
y b0 b1x1 b2x2 bnxn
4
Multiple Regression
In the social and natural sciences multiple
regression procedures are very widely used in
research Multiple regression allows the
researcher to ask what is the best predictor of
...? For example, educational researchers might
want to learn what are the best predictors of
success in high-school Psychologists may want to
determine which personality variable best
predicts social adjustment Sociologists may want
to find out which of the multiple social
indicators best predict whether or not a new
immigrant group will adapt and be absorbed into
society.
5
Multiple Regression
The general computational problem that needs to
be solved in multiple regression analysis is to
fit a straight line to a number of points

In the simplest case - one dependent and one
independent variable Can be visualized this in
a scatterplot
6
The Regression Equation
A line in a two dimensional or two-variable space
is defined by the equation YabX the animation
below shows a two dimensional regression equation
plotted with three different confidence intervals
(90, 95 99)
In the multivariate case, when there is more than
one independent variable, the regression line
cannot be visualized in the two dimensional
space, but can be computed rather easily
7
Residual Variance and R-square
The smaller the variability of the residual
values around the regression line relative to the
overall variability, the better is our
prediction Coefficient of determination (r2) - If
we have an R-square of 0.4 we have explained 40
of the original variability, and are left with
60 residual variability. Ideally, we would like
to explain most if not all of the original
variability Therefore - r2 value is an indicator
of how well the model fits the data (e.g., an r2
close to 1.0 indicates that we have accounted for
almost all of the variability with the variables
specified in the model
8
Assumptions, Assumptions
Assumption of Linearity It is assumed that the
relationship between variables is linear -
always look at bivariate scatterplot of the
variables of interest Normality Assumption It is
assumed in multiple regression that the residuals
(predicted minus observed values) are distributed
normally (i.e., follow the normal
distribution) Most tests (specifically the
F-test) are quite robust with regard to
violations of this assumption Review the
distributions of the major variables with
histograms
9
Effects of Outliers
Outliers may be influential observations
A data point whose removal causes the regression
equation (line) to change considerably Consider
removal much like an outlier If no explanation
up to researcher
To index

10
For Example
We are interested in predicting values for Y
based upon several XsAge of AHI based upon FL,
BM, OP, PF, GR
FL
OP
GR
PF
11
For Example
We are interested in predicting values for Y
based upon several XsAge of AHI based upon FL,
BM, OP, PF, GR We run multiple regression and
get the equation Age - 12.2 0.0370 Fork
Len. 0.093 Body Mass 0.126 Operculum 0.463
Pect. Fin 0.129 Girth
12
So Whats the Problem?
It has been said that the simplest explanation is
often the best Age - 12.2 0.0370 Fork Len.
0.093 Body Mass 0.126 Operculum 0.463 Pect.
Fin 0.129 Girth Difficult for scientists or
fishermen to estimate age based upon 5
measurements
13
Variable Reduction
2 Methods for variable reduction Stepwise
Regression - removes and adds variables to the
regression model for the purpose of identifying a
useful subset of the predictors Best Subsets
Regression - generates regression models using
maximum R2 criterion by first examining all
one-predictor regression models and selecting the
2 best, then all two-predictor models and
selecting the 2 best, etc.
14
Stepwise Regression

Building Models via Stepwise Regression
Stepwise model-building techniques for regression
The basic procedures involve
identifying an initial model
iteratively "stepping," that is, repeatedly
altering the model at the previous step by adding
or removing a predictor variable in accordance
with the "stepping criteria,"
terminating the search when stepping is no
longer possible given the stepping criteria

15
How does it work? Stepwise Regression
Mallows CP can be used to determine the
precision of one model over another Lower
Mallows CP more precise better model
16
Best Subsets Regression

Building Models via Best Subsets Regression
Best Subsets model-building techniques for
regression
The basic procedures involve
first examining all one-predictor regression
models and selecting the 2 best
next examining all two-predictor regression
models and selecting the 2 best, etc. for all
variables
from output WE decide the best model based upon
high adjusted R2 and small Mallows' Cp

17
How does it work? Best Subsets Regression
18
Who Cares?
Stepwise and/or Best Subsets analysis allows you
(i.e. computer) to determine which predictor
variables (or combination of) best explain (can
be used to predict) Y Much more important as
number of predictor variables increase Helps to
make better sense of complicated multivariate data

Write a Comment

User Comments (0)