Title: Simple%20Linear%20Regression
1Simple Linear Regression
21. Introduction
Example
George Bush 1.81m Laura Bush ?
Brad Pitt 1.83m Angelina Jolie 1.70m
David Beckham 1.83m Victoria Beckham 1.68m
? To predict height of the wife in a couple,
based on the husbands height
Response (out come or dependent) variable (Y)
height of the wife Predictor
(explanatory or independent) variable (X)
height of the husband
3Regression analysis
? regression analysis is a statistical
methodology to estimate the relationship of a
response variable to a set of predictor variable.
? when there is just one predictor variable, we
will use simple linear regression. When there are
two or more predictor variables, we use multiple
linear regression.
? when it is not clear which variable represents
a response and which is a predictor, correlation
analysis is used to study the strength of the
relationship
History
- ? The earliest form of linear regression was the
method of least squares, which was published by
Legendre in 1805, and by Gauss in 1809. - ? The method was extended by Francis Galton in
the 19th century to describe a biological
phenomenon. - ? This work was extended by Karl Pearson and Udny
Yule to a more general statistical context around
20th century.
4A probabilistic model
- We denote the n observed values of the
predictor variable x as
- We denote the corresponding observed values
of the response variable Y as
5Notations of the simple linear Regression
- Observed value of the random variable Yi
depends on xi
- random error with
unknown mean of Yi
True Regression Line
Unknown Slope
Unknown Intercept
6Simple Linear Regression Model
- y dependent variable
- x independent variable
- b0 intercept
- b1 slope of the line
- error variable
b0 and b1 are unknown, therefore, are estimated
from the data.
y
Rise
b1 Rise/Run
Run
b0
x
7(No Transcript)
84 BASIC ASSUMPTIONS for statistical inference
Linear function of the predictor variable
Have a common variance, Same for all values of x.
Normally distributed
Independent
9Comments
1. Linear not in x But in the parameters
and
Example
linear, logx x
2. Predictor variable is not set as predetermined
fixed values, is random along with Y. The
model can be considered as a conditional model
Example Height and Weight of the
children. Height (X) given Weight (Y)
predict
Conditional expectation of Y given X x
102. Fitting the Simple Linear Regression Model
- 2.1 Least Squares (LS) Fit
11Example 10.1 (Tires Tread Wear vs. Mileage
Scatter Plot. From Statistics and Data Analysis
Tamhane and Dunlop Prentice Hall. )
12Estimating the Coefficients
- The estimates are determined by
- drawing a sample from the population of interest,
- calculating sample statistics.
- producing a straight line that cuts into the data.
y
w
The question is Which straight line fits best?
w
w
w
w
w w w w
w
w w
w w
w
x
13For the least squares regression method, the best
line is the one that minimizes the sum of squared
vertical differences between the points and the
line.
(2,4)
4
w
(4,3.2)
w
3
2.5
The smaller the sum of squared differences the
better the fit of the line to the data.
2
w
(1,2)
(3,1.5)
w
3
4
2
14(No Transcript)
15The best fitting straight line in the sense of
minimizing Q LS estimate
- One way to find the LS estimate and
- Setting these partial derivatives equal to zero
and simplifying, we get the normal equations
16- Solve the equations and we get
17- To simplify, we introduce
- The resulting equation is
known as the least squares regression line, which
is an estimate of the true regression line.
18Example 10.2 (Tire Tread vs. Mileage LS Line Fit)
- Find the equation of the line for the tire tread
wear data from Table10.1,we have - and n9.From these we calculate
19- The slope and intercept estimates are
- Therefore, the equation of the LS line is
- Conclusion there is a loss of 7.281 mils in the
tire groove depth for every 1000 miles of
driving. - Given a particular
- We can find
- This means that the mean groove depth for all
tires driven for 25,000miles is estimated to be
178.62 miles.
202.2 Goodness of Fit of the LS Line
- Coefficient of Determination and Correlation
- The residuals
- are used to evaluate the goodness of fit of the
LS line.
21- We define
- Note total sum of squares (SST)
- Regression sum of squares (SSR)
- Error sum of squares (SSE)
- is called the coefficient of determination
22Example 10.3 (Tire Tread Wear vs. Mileage
Coefficient of Determination and Correlation
- For the tire tread wear data, calculate using
the result s from example 10.2 We have - Next calculate
- Therefore
- The Pearson correlation is
-
- where the sign of r follows from the sign of
since 95.3 of the variation in tread
wear is accounted for by linear regression on
mileage, the relationship between the two is
strongly linear with a negative slope.
23The Maximum Likelihood Estimators (MLE)
- Consider the linear model
- where is drawn from a normal population with
mean 0 and standard deviation s, the likelihood
function for Y is - Thus, the log-likelihood for the data is
24The MLE Estimators
- Solving
- We obtain the MLEs of the three unknown model
parameters - The MLEs of the model parameters a and b are the
same as the LSEs both unbiased - The MLE of the error variance, however, is
biased
252.3 An Unbiased Estimator of s2
- An unbiased estimate of is given by
- Example 10.4(Tire Tread Wear Vs. Mileage
Estimate of - Find the estimate of for the tread wear data
using the results from Example 10.3 We have
SSE2351.3 and n-27,therefore -
- Which has 7 d.f. The estimate of is
miles.
26 3. Statistical Inference on b0 and b1
Under the normal error assumption
Point estimators Sampling
distributions of and
For mathematical derivations, please refer
to the Tamhane and Dunlop text book, P331.
27 Statistical Inference on b0 and b1 ,
Cont
Pivotal Quantities (P.Q.s) Confidence
Intervals (CIs)
28 Statistical Inference on b0 and b1 ,
Cont
Hypothesis tests -- Test
statistics -- At the significance level
, we reject in favor of if and
only if (iff) -- The first test is
used to show whether there is a linear
relationship between x and y
29 Analysis of Variance (ANOVA), Cont
Mean Square -- a sum of squares divided
by its d.f.
30 Analysis of Variance (ANOVA)
ANOVA Table Example
Source of Variation (Source) Sum of Squares (SS) Degrees of Freedom (d.f.) Mean Square (MS) F
Regression Error SSR SSE 1 n - 2
Total SST n - 1
Source SS d.f. MS F
Regression Error 50,887.20 2531.53 1 7 50,887.20 361.25 140.71
Total 53,418.73 8
314. Finance Application Market Model
- One of the most important applications of linear
regression is the market model. - It is assumed that rate of return on a stock (R)
is linearly related to the rate of return on the
overall market. - R b0 b1Rm e
Rate of return on a particular stock
Rate of return on some major stock index
The beta coefficient measures how sensitive the
stocks rate of return is to changes in the
level of the overall market.
32 Example The market model
- Estimate the market model for Nortel, a stock
traded in the Toronto Stock Exchange. - Data consisted of monthly percentage return for
Nortel and monthly percentage returnfor all the
stocks.
This is a measure of the stocks market related
risk. In this sample, for each 1 increase in
the TSE return, the average increase in Nortels
return is .8877.
This is a measure of the total risk embedded in
the Nortel stock, that is market-related. Specific
ally, 31.37 of the variation in Nortels return
are explained by the variation in the TSEs
returns.
335. Regression Diagnostics5.1 Checking for Model
Assumptions
- Checking for Linearity
- Checking for Constant Variance
- Checking for Normality
- Checking for Independence
34Checking for Linearity
Xi Mileage Yß0 ß1 x Yi Groove
Depth
Yß0 ß1 x Yi
fitted value ei residual
Residual ei Yi- Yi
i Xi Yi Yi ei
1 0 394.33 360.64 33.69
2 4 329.50 331.51 -2.01
3 8 291.00 302.39 -11.39
4 12 255.17 273.27 -18.10
5 16 229.33 244.15 -14.82
6 20 204.83 215.02 -10.19
7 24 179.00 185.90 -6.90
8 28 163.83 156.78 7.05
9 32 150.33 127.66 22.67
35Checking for Normality
36Checking for Constant Variance
- Var(Y) is not constant. A
sample residual plots when -
Var(Y) is constant.
37Checking for Independence
- Does not apply for Simple Linear Regression Model
- Only apply for time series data
385.2 Checking for Outliers Influential
Observations
- What is OUTLIER
- Why checking for outliers is important
- Mathematical definition
- How to deal with them
395.2-A. Intro
- Recall Box and Whiskers Plot (Chapter 4 of TD)
- Where (mild) OUTLIER is defined as any
observations that lies outside of Q1-(1.5IQR)
and Q3(1.5IQR) (Interquartile range, IQR Q3
- Q1) - (Extreme) OUTLIER as that lies outside of
Q1-(3IQR) and Q3(3IQR) - Observation "far away" from the rest of the data
405.2-B. Why are outliers a problem?
- May indicate a sample peculiarity or a data entry
error or other problem - Regression coefficients estimated that minimize
the Sum of Squares for Error (SSE) are very
sensitive to outliers gtgtBias or distortion of
estimates - Any statistical test based on sample means and
variances can be distorted In the presence of
outliers gtgtDistortion of p-values - Faulty conclusions.
- Example
- ( Estimators not sensitive to outliers are said
to be robust )
Sorted Data Median Mean Variance 95 CI for mean
Real Data 1 3 5 9 12 5 6.0 20.6 0.45, 11.55
Data with Error 1 3 5 9 120 5 27.6 2676.8 -36.630,91.83
415.2-C. Mathematical Definition
- Outlier
- The standardized residual is given by
- If eigt2, then the corresponding observation
may be regarded an outlier. - Example (Tire Tread Wear vs. Mileage)
- STUDENTIZED RESIDUAL a type of standardized
residual calculated with the current observation
deleted from the analysis. - The LS fit can be excessively influenced by
observation that is not necessarily an outlier as
defined above.
i 1 2 3 4 5 6 7 8 9
ei 2.25 -0.12 -0.66 -1.02 -0.83 -0.57 -0.40 0.43 1.51
425.2-C. Mathematical Definition
- Influential Observation
- Observation with extreme x-value, y-value, or
both.
- On average hii is (k1)/n, regard any
hiigt2(k1)/n as high leverage - If xi deviates greatly from mean x, then hii is
large - Standardized residual will be large for a high
leverage observation - Influence can be thought of as the product of
leverage and outlierness. - Example (Observation is influential/ high
leverage, but not an outlier)
435.2-C. SAS code of the tire example
SAS code Data tire Input x y Datalines 0
394.33 4 329.50 32 150.33 Run proc reg
datatire model yx output outresid
rstudentr hlev cookdcd dffitsdffit Run pr
oc print dataresid where abs(r)gt2 or
levgt(4/9) or cdgt(4/9) or abs(dffit)gt(2sqrt(1/9))
run
445.2-C. SAS output of the tire example
SAS output
455.2-D. How to deal with Outliers Influential
Observations
- Investigate (Data errors? Rare events? Can be
corrected?) - Ways to accommodate outliers
- Non Parametric Methods (robust to outliers)
- Data Transformations
- Deletion (or report model results both with and
without the outliers or influential observations
to see how much they change)
465.3 Data Transformations
- Reason
- To achieve linearity
- To achieve homogeneity of variance
- To achieve normality or symmetry about the
regression equation
47Types of Transformation
- Linearzing Transformation
- transformation of a response variable, or
predicted variable, or both, which produces an
approximate linear relationship between
variables. - Variance Stabilizing Transformation
- make transformation if the constant variance
assumption is violated
48Linearizing Transformation
- Use mathematical operation, e.g. square root,
power, log, exponential, etc. - Only one variable needs to be transformed in the
simple linear regression. - Which one? Predictor or Response? Why?
49e.g. We take a log transformation on Y a exp
(-bx) ltgt log Y log a - b x
Xi Yi log Yi Y exp (logYi) Ei
0 394.33 5.926 374.64 19.69
4 329.50 5.807 332.58 -3.08
8 291.00 5.688 295.24 -4.24
12 255.17 5.569 262.09 -6.92
16 229.33 5.450 232.67 -3.34
20 204.83 5.331 206.54 -1.71
24 179.00 5.211 183.36 -4.36
28 163.83 5.092 162.77 1.06
32 150.33 4.973 144.50 5.83
50Variance Stabilizing Transformation
- Delta method Two terms Taylor-series
approximations -
- Var( h(Y)) h(m)2 g2 (m) where Var(Y)
g2(m), E(Y) m - set h(m)2 g2 (m) 1
- h(m)
- h(m) ? h(y)
- e.g. Var(Y) c2 m2 , where c gt 0, g(m) cm ?
g(y) cy -
- h(y)
- Therefore it is the logarithmic transformation
516. Correlation Analysis
- Pearson Product Moment Correlation a
measurement of how closely two variables share a
linear relationship. -
- Useful when it is not possible to determine which
variable is the predictor and which is the
response. - Health vs wealth. Which is predictor? Which is
response?
52Statistical Inference on the Correlation
Coefficient ?
- We can derive a test on the correlation
coefficient in the same way that we have been
doing in class. - Assumptions
- X, Y are from the bivariate normal distribution
- Start with point estimator
- r sample correlation coefficient estimator of
the population correlation coefficient ? -
- Get the pivotal quantity
- The distribution of r is quite complicated
- T0 test statistic for ? 0
-
53Bivariate Normal Distribution
- pdf
- Properties
- µ1, µ2 means for X, Y
- s12, s22 variances for X, Y
- ? the correlation coeffbetween X, Y
54Derivation of T0
Are these equivalent?
- Therefore, we can use t as a statistic for
testing against the null hypothesis H0 ß10 - Equivalently, we can test against H0 ?0
Substitute
After a few steps we can see that these two
t-tests are indeed equivalent
55Exact Statistical Inference on ?
- Example
- A researcher wants to determine if two test
instruments give similar results. The two test
instruments are administered to a sample of 15
students. The correlation coefficient between
the two sets of scores is found to be 0.7. Is
this correlation statistically significant at the
.01 level? - H0 ?0 , Ha ??0
-
- for a .01, 3.534 t0 gt t13, .005
3.012 - ? Reject H0
- Test
- H0 ?0 , Ha ??0
- Test statistic
- Reject H0 iff
56Approximate Statistical Inference on ?
- There is no exact method of testing ? vs an
arbitrary ?0 - Distribution of R is very complicated
- T0 tn-2 only when ? 0
- To test ? vs an arbitrary ?0 one can use Fishers
transformation -
- Therefore, let
-
57Approximate Statistical Inference on ?
- Test
-
- Sample estimate
- Z test statistic
-
- CI for ?
We reject H0 if z0 gt za/2
58Approximate Statistical Inference on ?using SAS
59Pitfalls of Regression and Correlation Analysis
- Correlation and causation
- Ticks cause good health
- Coincidental data
- Sun spots and republicans
- Lurking variables
- Church, suicide, population
- Restricted range
- Local, global linearity
607. Simple Linear Regression in Matrix Form
?
vector of response
X matrix
vector of errors
vector of parameters
Using matrix notation
Using summa-tions
?
61Matrix Form of Multiple Regression by LS
or y X b e in short LS criterion is
The LS solutions are
62Summary
Linear regression analysis
Model Assumptions
Correlation Coefficient r
The Least squares (LS) estimates b0 and b1
Probabilistic model for Linear regression
Correlation Analysis
Outliers? Influential Observations? Data
Transformations?
Confidence Interval Prediction interval
63Least Squares (LS) Fit Sample correlation
coefficient r Statistical inference on ß0
ß1 Prediction Interval Model
Assumptions Correlation Analysis
Linearity Constant Variance Normality
Independence
64Questions?