Simple%20Linear%20Regression - PowerPoint PPT Presentation

About This Presentation

Title:

Simple%20Linear%20Regression

Description:

David Beckham: 1.83m Victoria Beckham: 1.68m Brad Pitt: 1.83m Angelina Jolie: 1.70m To predict height of the wife in a couple, based on the ... History: when there is ... – PowerPoint PPT presentation

Number of Views:168

Avg rating:3.0/5.0

Slides: 65

Provided by: ferr60

Learn more at: http://www.ams.sunysb.edu

Category:

more less

Transcript and Presenter's Notes

Title: Simple%20Linear%20Regression

1
Simple Linear Regression
2
1. Introduction
Example
George Bush 1.81m Laura Bush ?
Brad Pitt 1.83m Angelina Jolie 1.70m
David Beckham 1.83m Victoria Beckham 1.68m
? To predict height of the wife in a couple,
based on the husbands height
Response (out come or dependent) variable (Y)
height of the wife Predictor
(explanatory or independent) variable (X)
height of the husband
3
Regression analysis
? regression analysis is a statistical
methodology to estimate the relationship of a
response variable to a set of predictor variable.
? when there is just one predictor variable, we
will use simple linear regression. When there are
two or more predictor variables, we use multiple
linear regression.
? when it is not clear which variable represents
a response and which is a predictor, correlation
analysis is used to study the strength of the
relationship
History

? The earliest form of linear regression was the
method of least squares, which was published by
Legendre in 1805, and by Gauss in 1809.
? The method was extended by Francis Galton in
the 19th century to describe a biological
phenomenon.
? This work was extended by Karl Pearson and Udny
Yule to a more general statistical context around
20th century.

4
A probabilistic model

We denote the n observed values of the
predictor variable x as

We denote the corresponding observed values
of the response variable Y as

5
Notations of the simple linear Regression
- Observed value of the random variable Yi
depends on xi
- random error with
unknown mean of Yi
True Regression Line
Unknown Slope
Unknown Intercept
6
Simple Linear Regression Model

y dependent variable
x independent variable
b0 intercept
b1 slope of the line
error variable

b0 and b1 are unknown, therefore, are estimated
from the data.
y
Rise
b1 Rise/Run
Run
b0
x
7
(No Transcript)
8
4 BASIC ASSUMPTIONS for statistical inference
Linear function of the predictor variable
Have a common variance, Same for all values of x.
Normally distributed
Independent
9
Comments
1. Linear not in x But in the parameters
and
Example
linear, logx x
2. Predictor variable is not set as predetermined
fixed values, is random along with Y. The
model can be considered as a conditional model
Example Height and Weight of the
children. Height (X) given Weight (Y)
predict
Conditional expectation of Y given X x
10
2. Fitting the Simple Linear Regression Model

2.1 Least Squares (LS) Fit

11
Example 10.1 (Tires Tread Wear vs. Mileage
Scatter Plot. From Statistics and Data Analysis
Tamhane and Dunlop Prentice Hall. )
12
Estimating the Coefficients

The estimates are determined by
drawing a sample from the population of interest,
calculating sample statistics.
producing a straight line that cuts into the data.

y
w
The question is Which straight line fits best?
w
w
w
w
w w w w
w
w w
w w
w
x
13
For the least squares regression method, the best
line is the one that minimizes the sum of squared
vertical differences between the points and the
line.
(2,4)
4
w
(4,3.2)
w
3
2.5
The smaller the sum of squared differences the
better the fit of the line to the data.
2
w
(1,2)
(3,1.5)
w
3
4
2
14
(No Transcript)
15
The best fitting straight line in the sense of
minimizing Q LS estimate

One way to find the LS estimate and
Setting these partial derivatives equal to zero
and simplifying, we get the normal equations

Solve the equations and we get

To simplify, we introduce
The resulting equation is
known as the least squares regression line, which
is an estimate of the true regression line.

18
Example 10.2 (Tire Tread vs. Mileage LS Line Fit)

Find the equation of the line for the tire tread
wear data from Table10.1,we have
and n9.From these we calculate

The slope and intercept estimates are
Therefore, the equation of the LS line is
Conclusion there is a loss of 7.281 mils in the
tire groove depth for every 1000 miles of
driving.
Given a particular
We can find
This means that the mean groove depth for all
tires driven for 25,000miles is estimated to be
178.62 miles.

20
2.2 Goodness of Fit of the LS Line

Coefficient of Determination and Correlation
The residuals
are used to evaluate the goodness of fit of the
LS line.

We define
Note total sum of squares (SST)
Regression sum of squares (SSR)
Error sum of squares (SSE)
is called the coefficient of determination

22
Example 10.3 (Tire Tread Wear vs. Mileage
Coefficient of Determination and Correlation

For the tire tread wear data, calculate using
the result s from example 10.2 We have
Next calculate
Therefore
The Pearson correlation is
where the sign of r follows from the sign of
since 95.3 of the variation in tread
wear is accounted for by linear regression on
mileage, the relationship between the two is
strongly linear with a negative slope.

23
The Maximum Likelihood Estimators (MLE)

Consider the linear model
where is drawn from a normal population with
mean 0 and standard deviation s, the likelihood
function for Y is
Thus, the log-likelihood for the data is

24
The MLE Estimators

Solving
We obtain the MLEs of the three unknown model
parameters
The MLEs of the model parameters a and b are the
same as the LSEs both unbiased
The MLE of the error variance, however, is
biased

25
2.3 An Unbiased Estimator of s2

An unbiased estimate of is given by
Example 10.4(Tire Tread Wear Vs. Mileage
Estimate of
Find the estimate of for the tread wear data
using the results from Example 10.3 We have
SSE2351.3 and n-27,therefore
Which has 7 d.f. The estimate of is
miles.

26
3. Statistical Inference on b0 and b1
Under the normal error assumption
Point estimators Sampling
distributions of and
For mathematical derivations, please refer
to the Tamhane and Dunlop text book, P331.
27
Statistical Inference on b0 and b1 ,
Cont

Pivotal Quantities (P.Q.s) Confidence
Intervals (CIs)

28
Statistical Inference on b0 and b1 ,
Cont
Hypothesis tests -- Test
statistics -- At the significance level
, we reject in favor of if and
only if (iff) -- The first test is
used to show whether there is a linear
relationship between x and y
29
Analysis of Variance (ANOVA), Cont
Mean Square -- a sum of squares divided
by its d.f.
30
Analysis of Variance (ANOVA)
ANOVA Table Example
Source of Variation (Source) Sum of Squares (SS) Degrees of Freedom (d.f.) Mean Square (MS) F
Regression Error SSR SSE 1 n - 2
Total SST n - 1
Source SS d.f. MS F
Regression Error 50,887.20 2531.53 1 7 50,887.20 361.25 140.71
Total 53,418.73 8
31
4. Finance Application Market Model

One of the most important applications of linear
regression is the market model.
It is assumed that rate of return on a stock (R)
is linearly related to the rate of return on the
overall market.
R b0 b1Rm e

Rate of return on a particular stock
Rate of return on some major stock index
The beta coefficient measures how sensitive the
stocks rate of return is to changes in the
level of the overall market.
32
Example The market model

Estimate the market model for Nortel, a stock
traded in the Toronto Stock Exchange.
Data consisted of monthly percentage return for
Nortel and monthly percentage returnfor all the
stocks.

This is a measure of the stocks market related
risk. In this sample, for each 1 increase in
the TSE return, the average increase in Nortels
return is .8877.
This is a measure of the total risk embedded in
the Nortel stock, that is market-related. Specific
ally, 31.37 of the variation in Nortels return
are explained by the variation in the TSEs
returns.
33
5. Regression Diagnostics5.1 Checking for Model
Assumptions

Checking for Linearity
Checking for Constant Variance
Checking for Normality
Checking for Independence

34
Checking for Linearity
Xi Mileage Yß0 ß1 x Yi Groove
Depth
Yß0 ß1 x Yi
fitted value ei residual
Residual ei Yi- Yi
i Xi Yi Yi ei
1 0 394.33 360.64 33.69
2 4 329.50 331.51 -2.01
3 8 291.00 302.39 -11.39
4 12 255.17 273.27 -18.10
5 16 229.33 244.15 -14.82
6 20 204.83 215.02 -10.19
7 24 179.00 185.90 -6.90
8 28 163.83 156.78 7.05
9 32 150.33 127.66 22.67
35
Checking for Normality
36
Checking for Constant Variance

Var(Y) is not constant. A
sample residual plots when
Var(Y) is constant.

37
Checking for Independence

Does not apply for Simple Linear Regression Model
Only apply for time series data

38
5.2 Checking for Outliers Influential
Observations

What is OUTLIER
Why checking for outliers is important
Mathematical definition
How to deal with them

39
5.2-A. Intro

Recall Box and Whiskers Plot (Chapter 4 of TD)
Where (mild) OUTLIER is defined as any
observations that lies outside of Q1-(1.5IQR)
and Q3(1.5IQR) (Interquartile range, IQR Q3
- Q1)
(Extreme) OUTLIER as that lies outside of
Q1-(3IQR) and Q3(3IQR)
Observation "far away" from the rest of the data

40
5.2-B. Why are outliers a problem?

May indicate a sample peculiarity or a data entry
error or other problem
Regression coefficients estimated that minimize
the Sum of Squares for Error (SSE) are very
sensitive to outliers gtgtBias or distortion of
estimates
Any statistical test based on sample means and
variances can be distorted In the presence of
outliers gtgtDistortion of p-values
Faulty conclusions.
Example
( Estimators not sensitive to outliers are said
to be robust )

Sorted Data Median Mean Variance 95 CI for mean
Real Data 1 3 5 9 12 5 6.0 20.6 0.45, 11.55
Data with Error 1 3 5 9 120 5 27.6 2676.8 -36.630,91.83
41
5.2-C. Mathematical Definition

Outlier
The standardized residual is given by

If eigt2, then the corresponding observation
may be regarded an outlier.
Example (Tire Tread Wear vs. Mileage)
STUDENTIZED RESIDUAL a type of standardized
residual calculated with the current observation
deleted from the analysis.
The LS fit can be excessively influenced by
observation that is not necessarily an outlier as
defined above.

i 1 2 3 4 5 6 7 8 9
ei 2.25 -0.12 -0.66 -1.02 -0.83 -0.57 -0.40 0.43 1.51
42
5.2-C. Mathematical Definition

Influential Observation
Observation with extreme x-value, y-value, or
both.

On average hii is (k1)/n, regard any
hiigt2(k1)/n as high leverage
If xi deviates greatly from mean x, then hii is
large
Standardized residual will be large for a high
leverage observation
Influence can be thought of as the product of
leverage and outlierness.
Example (Observation is influential/ high
leverage, but not an outlier)

43
5.2-C. SAS code of the tire example
SAS code Data tire Input x y Datalines 0
394.33 4 329.50 32 150.33 Run proc reg
datatire model yx output outresid
rstudentr hlev cookdcd dffitsdffit Run pr
oc print dataresid where abs(r)gt2 or
levgt(4/9) or cdgt(4/9) or abs(dffit)gt(2sqrt(1/9))
run
44
5.2-C. SAS output of the tire example
SAS output
45
5.2-D. How to deal with Outliers Influential
Observations

Investigate (Data errors? Rare events? Can be
corrected?)
Ways to accommodate outliers
Non Parametric Methods (robust to outliers)
Data Transformations
Deletion (or report model results both with and
without the outliers or influential observations
to see how much they change)

46
5.3 Data Transformations

Reason
To achieve linearity
To achieve homogeneity of variance
To achieve normality or symmetry about the
regression equation

47
Types of Transformation

Linearzing Transformation
transformation of a response variable, or
predicted variable, or both, which produces an
approximate linear relationship between
variables.
Variance Stabilizing Transformation
make transformation if the constant variance
assumption is violated

48
Linearizing Transformation

Use mathematical operation, e.g. square root,
power, log, exponential, etc.
Only one variable needs to be transformed in the
simple linear regression.
Which one? Predictor or Response? Why?

49
e.g. We take a log transformation on Y a exp
(-bx) ltgt log Y log a - b x
Xi Yi log Yi Y exp (logYi) Ei
0 394.33 5.926 374.64 19.69
4 329.50 5.807 332.58 -3.08
8 291.00 5.688 295.24 -4.24
12 255.17 5.569 262.09 -6.92
16 229.33 5.450 232.67 -3.34
20 204.83 5.331 206.54 -1.71
24 179.00 5.211 183.36 -4.36
28 163.83 5.092 162.77 1.06
32 150.33 4.973 144.50 5.83
50
Variance Stabilizing Transformation

Delta method Two terms Taylor-series
approximations
Var( h(Y)) h(m)2 g2 (m) where Var(Y)
g2(m), E(Y) m
set h(m)2 g2 (m) 1
h(m)
h(m) ? h(y)
e.g. Var(Y) c2 m2 , where c gt 0, g(m) cm ?
g(y) cy
h(y)
Therefore it is the logarithmic transformation

51
6. Correlation Analysis

Pearson Product Moment Correlation a
measurement of how closely two variables share a
linear relationship.
Useful when it is not possible to determine which
variable is the predictor and which is the
response.
Health vs wealth. Which is predictor? Which is
response?

52
Statistical Inference on the Correlation
Coefficient ?

We can derive a test on the correlation
coefficient in the same way that we have been
doing in class.
Assumptions
X, Y are from the bivariate normal distribution
Start with point estimator
r sample correlation coefficient estimator of
the population correlation coefficient ?
Get the pivotal quantity
The distribution of r is quite complicated
T0 test statistic for ? 0

53
Bivariate Normal Distribution

pdf
Properties
µ1, µ2 means for X, Y
s12, s22 variances for X, Y
? the correlation coeffbetween X, Y

54
Derivation of T0
Are these equivalent?

Therefore, we can use t as a statistic for
testing against the null hypothesis H0 ß10
Equivalently, we can test against H0 ?0

Substitute
After a few steps we can see that these two
t-tests are indeed equivalent
55
Exact Statistical Inference on ?

Example
A researcher wants to determine if two test
instruments give similar results. The two test
instruments are administered to a sample of 15
students. The correlation coefficient between
the two sets of scores is found to be 0.7. Is
this correlation statistically significant at the
.01 level?
H0 ?0 , Ha ??0
for a .01, 3.534 t0 gt t13, .005
3.012
? Reject H0

Test
H0 ?0 , Ha ??0
Test statistic
Reject H0 iff

56
Approximate Statistical Inference on ?

There is no exact method of testing ? vs an
arbitrary ?0
Distribution of R is very complicated
T0 tn-2 only when ? 0
To test ? vs an arbitrary ?0 one can use Fishers
transformation
Therefore, let

57
Approximate Statistical Inference on ?

Test
Sample estimate
Z test statistic
CI for ?

We reject H0 if z0 gt za/2
58
Approximate Statistical Inference on ?using SAS

Code
Output

59
Pitfalls of Regression and Correlation Analysis

Correlation and causation
Ticks cause good health
Coincidental data
Sun spots and republicans
Lurking variables
Church, suicide, population
Restricted range
Local, global linearity

60
7. Simple Linear Regression in Matrix Form
?

Linear regression model

vector of response
X matrix
vector of errors
vector of parameters

The normal equations

Using matrix notation
Using summa-tions
?
61
Matrix Form of Multiple Regression by LS
or y X b e in short LS criterion is
The LS solutions are
62
Summary
Linear regression analysis
Model Assumptions
Correlation Coefficient r
The Least squares (LS) estimates b0 and b1
Probabilistic model for Linear regression
Correlation Analysis
Outliers? Influential Observations? Data
Transformations?
Confidence Interval Prediction interval
63
Least Squares (LS) Fit Sample correlation
coefficient r Statistical inference on ß0
ß1 Prediction Interval Model
Assumptions Correlation Analysis
Linearity Constant Variance Normality
Independence
64
Questions?

Write a Comment

User Comments (0)