Basic linear regression and multiple regression presentation

About This Presentation

Transcript and Presenter's Notes

Title: Basic linear regression and multiple regression

1

Basic linear regression and multiple regression
Psych 350
Lecture 12 R. Chris Fraleyhttp//www.yourperson
ality.net/psych350/fall2012/

2
Example

Lets say we wish to model the relationship
between coffee consumption and happiness

3
Some Possible Functions
4
Lines

Linear relationships
Y a bX
a Y-intercept (the value of Y when X 0)
b slope (the rise over the run, the steepness
of the line) a weight

Y 1 2X
5
Lines and intercepts

Y a 2X
Notice that the implied values of Y go up as we
increase a.
By changing a, we are changing the elevation of
the line.

Y 5 2X
Y 3 2X
Y 1 2X
6
Lines and slopes

Slope as rise over run how much of a change in
Y is there given a 1 unit increase in X.
As we move up 1 unit on X, we go up 2 units on Y
2/1 2 (the slope)

rise from 1 to 3 (a 2 unit change)
rise
run
move from 0 to 1
Y 1 2X
7
Lines and slopes

Notice that as we increase the slope, b, we
increase the steepness of the line

10
Y 1 4X
5
HAPPINESS
Y 1 2X
0
-5
-4
-2
0
2
4
COFFEE
8
Lines and slopes
b4
10

We can also have negative slopes and slopes of
zero.
When the slope is zero, the predicted values of Y
are equal to a. Y a 0X
Y a

b2
5
HAPPINESS
0
b0
b-2
-5
b-4
-4
-2
0
2
4
COFFEE
9
Other functions

Quadratic function
Y a bX2
a still represents the intercept (value of Y when
X 0)
b still represents a weight, and influences the
magnitude of the squaring function

10
Quadratic and intercepts

As we increase a, the elevation of the curve
increases

30
Y 5 1X2
25
20
HAPPINESS
15
10
5
Y 0 1X2
0
-4
-2
0
2
4
COFFEE
11
Quadratic and Weight

When we increase the weight, b, the quadratic
effect is accentuated

120
Y 0 5X2
100
80
HAPPINESS
60
40
20
Y 0 1X2
0
-4
-2
0
2
4
COFFEE
12
Quadratic and Weight

As before, we can have negative weights for
quadratic functions.
In this case, negative values of b flip the curve
upside-down.
As before, when b 0, the value of Y a for
all values of X.

Y 0 5X2
100
Y 0 1X2
50
HAPPINESS
0
Y 0 0X2
-50
Y 0 1X2
-100
Y 0 5X2
-4
-2
0
2
4
COFFEE
13
Linear Quadratic Combinations

When linear and quadratic terms are present in
the same equation, one can derive j-shaped curves
Y a b1X b2X2

14
Some terminology

When the relation between variables are expressed
in this manner, we call the relevant equation(s)
mathematical models
The intercept and weight values are called
parameters of the model.
Although one can describe the relationship
between two variables in the way we have done
here, for now on well assume that our models are
causal models, such that the variable on the
left-hand side of the equation is assumed to be
caused by the variable(s) on the right-hand side.

15
Terminology

The values of Y in these models are often called
predicted values, sometimes abbreviated as Y-hat
or . Why? They are the values of Y that are
implied by the specific parameters of the model.

16
Estimation

Up to this point, we have assumed that our models
are correct.
There are two important issues we need to deal
with, however
Assuming the basic model is correct (e.g.,
linear), what are the correct parameters for the
model?
Is the basic form of the model correct? That is,
is a linear, as opposed to a quadratic, model the
appropriate model for characterizing the
relationship between variables?

17
Estimation

The process of obtaining the correct parameter
values (assuming we are working with the right
model) is called parameter estimation.

18
Parameter Estimation example

Lets assume that we believe there is a linear
relationship between X and Y.
Assume we have collected the following data
Which set of parameter values will bring us
closest to representing the data accurately?

19
Estimation example

We begin by picking some values, plugging them
into the linear equation, and seeing how well the
implied values correspond to the observed values
We can quantify what we mean by how well by
examining the difference between the
model-implied Y and the actual Y value
this difference, , is often called
error in prediction

20
Estimation example

Lets try a different value of b and see what
happens
Now the implied values of Y are getting closer to
the actual values of Y, but were still off by
quite a bit

21
Estimation example

Things are getting better, but certainly things
could improve

22
Estimation example

Ah, much better

23
Estimation example

Now thats very nice
There is a perfect correspondence between the
implied values of Y and the actual values of Y

24
Estimation example

Whoa. Thats a little worse.
Simply increasing b doesnt seem to make things
increasingly better

25
Estimation example

Ugg. Things are getting worse again.

26
Parameter Estimation example

Here is one way to think about what were doing
We are trying to find a set of parameter values
that will give us a smallthe smallestdiscrepancy
between the predicted Y values and the actual
values of Y.
How can we quantify this?

27
Parameter Estimation example

One way to do so is to find the difference
between each value of Y and the corresponding
predicted value (we called these differences
errors before), square these differences, and
average them together

28
Parameter Estimation example

The form of this equation should be familiar.
Notice that it represents some kind of average of
squared deviations
This average is often called error variance.

29
Parameter Estimation example

In estimating the parameters of our model, we are
trying to find a set of parameters that minimizes
the error variance. In other words, we want
to be as small as it possibly can be.
The process of finding this minimum value is
called least-squares estimation.

30
Parameter Estimation example

In this graph I have plotted the error variance
as a function of the different parameter values
we chose for b.
Notice that our error was large at first (at b
-2), but got smaller as we made b larger.
Eventually, the error reached a minimum when b
2 and, then, began to increase again as we made b
larger.

Different values of b
31
Parameter Estimation example

The minimum in this example occurred when b 2.
This is the best value of b, when we define
best as the value that minimizes the error
variance.
There is no other value of b that will make the
error smaller. (0 is as low as you can go.)

Different values of b
32
Ways to estimate parameters

The method we just used is sometimes called the
brute force or gradient descent method to
estimating parameters.
More formally, gradient decent involves starting
with viable parameter value, calculating the
error using slightly different value, moving the
best guess parameter value in the direction of
the smallest error, then repeating this process
until the error is as small as it can be.
Analytic methods
With simple linear models, the equation is so
simple that brute force methods are unnecessary.

33
Analytic least-squares estimation

Specifically, one can use calculus to find the
values of a and b that will minimize the error
function

34
Analytic least-squares estimation

When this is done (we wont actually do the
calculus here ? ), the obtain the following
equations

35
Analytic least-squares estimation

Thus, we can easily find the least-squares
estimates of a and b from simple knowledge of (1)
the correlation between X and Y, (2) the SDs of
X and Y, and (3) the means of X and Y

36
A neat fact

Notice what happens when X and Y are in standard
score form
Thus,

In the parameter estimation example, we dealt
with a situation in which a linear model of the
form Y 2 2X perfectly accounted for the data.
(That is, there was no discrepancy between the
values implied by the model and the actual data.)
Even when this is not the case (i.e., when the
model doesnt explain the data perfectly), we can
still find least squares estimates of the
parameters.

38
(No Transcript)
39
Error Variance

In this example, the value of b that minimizes
the error variance is also 2. However, even when
b 2, there are discrepancies between the
predictions entailed by the model and the actual
data values.
Thus, the error variance becomes not only a way
to estimate parameters, but a way to evaluate the
basic model itself.

40
R-squared

In short, when the model is a good representation
of the relationship between Y and X, the error
variance of the model should be relatively low.
This is typically quantified by an index called
the multiple R or the squared version of it, R2.

41
R-squared

R-squared represents the proportion of the
variance in Y that is accounted for by the model
When the model doesnt do any better than
guessing the mean, R2 will equal zero. When the
model is perfect (i.e., it accounts for the data
perfectly), R2 will equal 1.00.

42
Neat fact

When dealing with a simple linear model with one
X, R2 is equal to the correlation of X and Y,
squared.
Why? Keep in mind that R2 is in a standardized
metric in virtue of having divided the error
variance by the variance of Y. Previously, when
working with standardized scores in simple linear
regression equations, we found that the parameter
b is equal to r. Since b is estimated via
least-squares techniques, it is directly related
to R2.

43
Why is R2 useful?

R2 is useful because it is a standard metric for
interpreting model fit.
It doesnt matter how large the variance of Y is
because everything is evaluated relative to the
variance of Y
Set end-points 1 is perfect and 0 is as bad as a
model can be.

44
Multiple Regression

In many situations in personality psychology we
are interested in modeling Y not only as a
function of a single X variable, but potentially
many X variables.
Example We might attempt to explain variation in
academic achievement as a function of SES and
maternal education.

Y a b1SES b2MATEDU
Notice that adding a new variable to the model
is simple. This equation states that Y, academic
achievement, is a function of at least two
things, SES and MATEDU.

However, what the regression coefficients now
represent is not merely the change in Y expected
given a 1 unit increase in X. They represent the
change in Y given a 1-unit change in X assuming
all the other variables in the equation equal
zero.
In other words, these coefficients are kind of
like partial correlations (technically, they are
called semi-partial correlations). Were
statistically controlling SES when estimating the
effect of MATEDU.

Estimating regression coefficients in SPSS
Correlations
SES MATEDU ACHIEVEG5
SES 1.00 .542 .279
MATEDU .542 1.00 .364
ACHIEVEG5 .279 .364 1.00

48
(No Transcript)
49
(No Transcript)
50
Note The regression parameter estimates are in
the column labeled B. Constant a intercept
51
Achievement 76.86 1.443MATEDU .539SES
52

These parameter estimates imply that moving up
one unit on SES leads to a 1.4 unit increase on
achievement.
Moreover, moving up 1 unit in maternal education
corresponds to a half-unit increase in
achievement.

Does this mean that Maternal Education matters
more than SES in predicting educational
achievement?
Not necessarily. As it stands, the two variables
might be on very different metrics. (Perhaps
MATEDU ranges from 0 to 20 and SES ranges from 0
to 4.) To evaluate their relative contributions
to Y, one can standardize both variables or
examine standardized regression coefficients.

54
Z(Achievement) 0 .301Z(MATEDU) .118Z(SES)
55
The multiple R and the R squared for the full
model are listed here. This particular model
explains 14 of the variance in academic
achievement
56
Adding SESSES (SES2) improves R-squared by about
1 These parameters suggest that higher SES
predicts higher achievement, but in a limiting
way. There are diminishing returns on the high
end of SES.
57
SES a B1MATEDU B2SES B3SESSES Y-hat
-2 0 .2560 .436-2 -.320-2-2 -2.15
-1 0 .2560 .436-1 -.320-1-1 -0.76
0 0 .2560 .4360 -.32000 0.00
1 0 .2560 .4361 -.32011 0.12
2 0 .2560 .4362 -.32022 -0.41
58
Predicted Z(Achievement)
Z(SES)

Write a Comment

User Comments (0)

About PowerShow.com

Basic linear regression and multiple regression PowerPoint PPT Presentation