Simple%20Linear%20Regression - PowerPoint PPT Presentation

About This Presentation
Title:

Simple%20Linear%20Regression

Description:

David Beckham: 1.83m Victoria Beckham: 1.68m Brad Pitt: 1.83m Angelina Jolie: 1.70m To predict height of the wife in a couple, based on the ... History: when there is ... – PowerPoint PPT presentation

Number of Views:168
Avg rating:3.0/5.0
Slides: 65
Provided by: ferr60
Category:

less

Transcript and Presenter's Notes

Title: Simple%20Linear%20Regression


1
Simple Linear Regression
2
1. Introduction
Example
George Bush 1.81m Laura Bush ?
Brad Pitt 1.83m Angelina Jolie 1.70m
David Beckham 1.83m Victoria Beckham 1.68m
? To predict height of the wife in a couple,
based on the husbands height
Response (out come or dependent) variable (Y)
height of the wife Predictor
(explanatory or independent) variable (X)
height of the husband
3
Regression analysis
? regression analysis is a statistical
methodology to estimate the relationship of a
response variable to a set of predictor variable.
? when there is just one predictor variable, we
will use simple linear regression. When there are
two or more predictor variables, we use multiple
linear regression.
? when it is not clear which variable represents
a response and which is a predictor, correlation
analysis is used to study the strength of the
relationship
History
  • ? The earliest form of linear regression was the
    method of least squares, which was published by
    Legendre in 1805, and by Gauss in 1809.
  • ? The method was extended by Francis Galton in
    the 19th century to describe a biological
    phenomenon.
  • ? This work was extended by Karl Pearson and Udny
    Yule to a more general statistical context around
    20th century.

4
A probabilistic model
  • We denote the n observed values of the
    predictor variable x as
  • We denote the corresponding observed values
    of the response variable Y as

5
Notations of the simple linear Regression
- Observed value of the random variable Yi
depends on xi
- random error with
unknown mean of Yi
True Regression Line
Unknown Slope
Unknown Intercept
6
Simple Linear Regression Model
  • y dependent variable
  • x independent variable
  • b0 intercept
  • b1 slope of the line
  • error variable

b0 and b1 are unknown, therefore, are estimated
from the data.
y
Rise
b1 Rise/Run
Run
b0
x
7
(No Transcript)
8
4 BASIC ASSUMPTIONS for statistical inference
Linear function of the predictor variable
Have a common variance, Same for all values of x.
Normally distributed
Independent
9
Comments
1. Linear not in x But in the parameters
and
Example
linear, logx x
2. Predictor variable is not set as predetermined
fixed values, is random along with Y. The
model can be considered as a conditional model
Example Height and Weight of the
children. Height (X) given Weight (Y)
predict
Conditional expectation of Y given X x
10
2. Fitting the Simple Linear Regression Model
  • 2.1 Least Squares (LS) Fit

11
Example 10.1 (Tires Tread Wear vs. Mileage
Scatter Plot. From Statistics and Data Analysis
Tamhane and Dunlop Prentice Hall. )
12
Estimating the Coefficients
  • The estimates are determined by
  • drawing a sample from the population of interest,
  • calculating sample statistics.
  • producing a straight line that cuts into the data.

y
w
The question is Which straight line fits best?
w
w
w
w
w w w w
w
w w
w w
w
x
13
For the least squares regression method, the best
line is the one that minimizes the sum of squared
vertical differences between the points and the
line.
(2,4)
4
w
(4,3.2)
w
3
2.5
The smaller the sum of squared differences the
better the fit of the line to the data.
2
w
(1,2)
(3,1.5)
w
3
4
2
14
(No Transcript)
15
The best fitting straight line in the sense of
minimizing Q LS estimate
  • One way to find the LS estimate and
  • Setting these partial derivatives equal to zero
    and simplifying, we get the normal equations

16
  • Solve the equations and we get

17
  • To simplify, we introduce
  • The resulting equation is
    known as the least squares regression line, which
    is an estimate of the true regression line.


18
Example 10.2 (Tire Tread vs. Mileage LS Line Fit)
  • Find the equation of the line for the tire tread
    wear data from Table10.1,we have
  • and n9.From these we calculate

19
  • The slope and intercept estimates are
  • Therefore, the equation of the LS line is
  • Conclusion there is a loss of 7.281 mils in the
    tire groove depth for every 1000 miles of
    driving.
  • Given a particular
  • We can find
  • This means that the mean groove depth for all
    tires driven for 25,000miles is estimated to be
    178.62 miles.

20
2.2 Goodness of Fit of the LS Line
  • Coefficient of Determination and Correlation
  • The residuals
  • are used to evaluate the goodness of fit of the
    LS line.

21
  • We define
  • Note total sum of squares (SST)
  • Regression sum of squares (SSR)
  • Error sum of squares (SSE)
  • is called the coefficient of determination

22
Example 10.3 (Tire Tread Wear vs. Mileage
Coefficient of Determination and Correlation
  • For the tire tread wear data, calculate using
    the result s from example 10.2 We have
  • Next calculate
  • Therefore
  • The Pearson correlation is
  • where the sign of r follows from the sign of
    since 95.3 of the variation in tread
    wear is accounted for by linear regression on
    mileage, the relationship between the two is
    strongly linear with a negative slope.

23
The Maximum Likelihood Estimators (MLE)
  • Consider the linear model
  • where is drawn from a normal population with
    mean 0 and standard deviation s, the likelihood
    function for Y is
  • Thus, the log-likelihood for the data is

24
The MLE Estimators
  • Solving
  • We obtain the MLEs of the three unknown model
    parameters
  • The MLEs of the model parameters a and b are the
    same as the LSEs both unbiased
  • The MLE of the error variance, however, is
    biased

25
2.3 An Unbiased Estimator of s2
  • An unbiased estimate of is given by
  • Example 10.4(Tire Tread Wear Vs. Mileage
    Estimate of
  • Find the estimate of for the tread wear data
    using the results from Example 10.3 We have
    SSE2351.3 and n-27,therefore
  • Which has 7 d.f. The estimate of is
    miles.

26
3. Statistical Inference on b0 and b1
Under the normal error assumption
Point estimators Sampling
distributions of and
For mathematical derivations, please refer
to the Tamhane and Dunlop text book, P331.
27
Statistical Inference on b0 and b1 ,
Cont

Pivotal Quantities (P.Q.s) Confidence
Intervals (CIs)

28
Statistical Inference on b0 and b1 ,
Cont
Hypothesis tests -- Test
statistics -- At the significance level
, we reject in favor of if and
only if (iff) -- The first test is
used to show whether there is a linear
relationship between x and y
29
Analysis of Variance (ANOVA), Cont
Mean Square -- a sum of squares divided
by its d.f.
30
Analysis of Variance (ANOVA)
ANOVA Table Example
Source of Variation (Source) Sum of Squares (SS) Degrees of Freedom (d.f.) Mean Square (MS) F
Regression Error SSR SSE 1 n - 2
Total SST n - 1
Source SS d.f. MS F
Regression Error 50,887.20 2531.53 1 7 50,887.20 361.25 140.71
Total 53,418.73 8
31
4. Finance Application Market Model
  • One of the most important applications of linear
    regression is the market model.
  • It is assumed that rate of return on a stock (R)
    is linearly related to the rate of return on the
    overall market.
  • R b0 b1Rm e

Rate of return on a particular stock
Rate of return on some major stock index
The beta coefficient measures how sensitive the
stocks rate of return is to changes in the
level of the overall market.
32
Example The market model
  • Estimate the market model for Nortel, a stock
    traded in the Toronto Stock Exchange.
  • Data consisted of monthly percentage return for
    Nortel and monthly percentage returnfor all the
    stocks.

This is a measure of the stocks market related
risk. In this sample, for each 1 increase in
the TSE return, the average increase in Nortels
return is .8877.
This is a measure of the total risk embedded in
the Nortel stock, that is market-related. Specific
ally, 31.37 of the variation in Nortels return
are explained by the variation in the TSEs
returns.
33
5. Regression Diagnostics5.1 Checking for Model
Assumptions
  • Checking for Linearity
  • Checking for Constant Variance
  • Checking for Normality
  • Checking for Independence

34
Checking for Linearity
Xi Mileage Yß0 ß1 x Yi Groove
Depth
Yß0 ß1 x Yi
fitted value ei residual
Residual ei Yi- Yi
i Xi Yi Yi ei
1 0 394.33 360.64 33.69
2 4 329.50 331.51 -2.01
3 8 291.00 302.39 -11.39
4 12 255.17 273.27 -18.10
5 16 229.33 244.15 -14.82
6 20 204.83 215.02 -10.19
7 24 179.00 185.90 -6.90
8 28 163.83 156.78 7.05
9 32 150.33 127.66 22.67
35
Checking for Normality
36
Checking for Constant Variance
  • Var(Y) is not constant. A
    sample residual plots when

  • Var(Y) is constant.

37
Checking for Independence
  • Does not apply for Simple Linear Regression Model
  • Only apply for time series data

38
5.2 Checking for Outliers Influential
Observations
  • What is OUTLIER
  • Why checking for outliers is important
  • Mathematical definition
  • How to deal with them

39
5.2-A. Intro
  • Recall Box and Whiskers Plot (Chapter 4 of TD)
  • Where (mild) OUTLIER is defined as any
    observations that lies outside of Q1-(1.5IQR)
    and Q3(1.5IQR) (Interquartile range, IQR Q3
    - Q1)
  • (Extreme) OUTLIER as that lies outside of
    Q1-(3IQR) and Q3(3IQR)
  • Observation "far away" from the rest of the data

40
5.2-B. Why are outliers a problem?
  • May indicate a sample peculiarity or a data entry
    error or other problem
  • Regression coefficients estimated that minimize
    the Sum of Squares for Error (SSE) are very
    sensitive to outliers gtgtBias or distortion of
    estimates
  • Any statistical test based on sample means and
    variances can be distorted In the presence of
    outliers gtgtDistortion of p-values
  • Faulty conclusions.
  • Example
  • ( Estimators not sensitive to outliers are said
    to be robust )

Sorted Data Median Mean Variance 95 CI for mean
Real Data 1 3 5 9 12 5 6.0 20.6 0.45, 11.55
Data with Error 1 3 5 9 120 5 27.6 2676.8 -36.630,91.83
41
5.2-C. Mathematical Definition
  • Outlier
  • The standardized residual is given by
  • If eigt2, then the corresponding observation
    may be regarded an outlier.
  • Example (Tire Tread Wear vs. Mileage)
  • STUDENTIZED RESIDUAL a type of standardized
    residual calculated with the current observation
    deleted from the analysis.
  • The LS fit can be excessively influenced by
    observation that is not necessarily an outlier as
    defined above.

i 1 2 3 4 5 6 7 8 9
ei 2.25 -0.12 -0.66 -1.02 -0.83 -0.57 -0.40 0.43 1.51
42
5.2-C. Mathematical Definition
  • Influential Observation
  • Observation with extreme x-value, y-value, or
    both.
  • On average hii is (k1)/n, regard any
    hiigt2(k1)/n as high leverage
  • If xi deviates greatly from mean x, then hii is
    large
  • Standardized residual will be large for a high
    leverage observation
  • Influence can be thought of as the product of
    leverage and outlierness.
  • Example (Observation is influential/ high
    leverage, but not an outlier)

43
5.2-C. SAS code of the tire example
SAS code Data tire Input x y Datalines 0
394.33 4 329.50 32 150.33 Run proc reg
datatire model yx output outresid
rstudentr hlev cookdcd dffitsdffit Run pr
oc print dataresid where abs(r)gt2 or
levgt(4/9) or cdgt(4/9) or abs(dffit)gt(2sqrt(1/9))
run
44
5.2-C. SAS output of the tire example
SAS output
45
5.2-D. How to deal with Outliers Influential
Observations
  • Investigate (Data errors? Rare events? Can be
    corrected?)
  • Ways to accommodate outliers
  • Non Parametric Methods (robust to outliers)
  • Data Transformations
  • Deletion (or report model results both with and
    without the outliers or influential observations
    to see how much they change)

46
5.3 Data Transformations
  • Reason
  • To achieve linearity
  • To achieve homogeneity of variance
  • To achieve normality or symmetry about the
    regression equation

47
Types of Transformation
  • Linearzing Transformation
  • transformation of a response variable, or
    predicted variable, or both, which produces an
    approximate linear relationship between
    variables.
  • Variance Stabilizing Transformation
  • make transformation if the constant variance
    assumption is violated

48
Linearizing Transformation
  • Use mathematical operation, e.g. square root,
    power, log, exponential, etc.
  • Only one variable needs to be transformed in the
    simple linear regression.
  • Which one? Predictor or Response? Why?

49
e.g. We take a log transformation on Y a exp
(-bx) ltgt log Y log a - b x
Xi Yi log Yi Y exp (logYi) Ei
0 394.33 5.926 374.64 19.69
4 329.50 5.807 332.58 -3.08
8 291.00 5.688 295.24 -4.24
12 255.17 5.569 262.09 -6.92
16 229.33 5.450 232.67 -3.34
20 204.83 5.331 206.54 -1.71
24 179.00 5.211 183.36 -4.36
28 163.83 5.092 162.77 1.06
32 150.33 4.973 144.50 5.83
50
Variance Stabilizing Transformation
  • Delta method Two terms Taylor-series
    approximations
  • Var( h(Y)) h(m)2 g2 (m) where Var(Y)
    g2(m), E(Y) m
  • set h(m)2 g2 (m) 1
  • h(m)
  • h(m)   ? h(y)
  • e.g. Var(Y) c2 m2 , where c gt 0, g(m) cm ?
    g(y) cy
  • h(y)
  • Therefore it is the logarithmic transformation


51
6. Correlation Analysis
  • Pearson Product Moment Correlation a
    measurement of how closely two variables share a
    linear relationship.
  • Useful when it is not possible to determine which
    variable is the predictor and which is the
    response.
  • Health vs wealth. Which is predictor? Which is
    response?

52
Statistical Inference on the Correlation
Coefficient ?
  • We can derive a test on the correlation
    coefficient in the same way that we have been
    doing in class.
  • Assumptions
  • X, Y are from the bivariate normal distribution
  • Start with point estimator
  • r sample correlation coefficient estimator of
    the population correlation coefficient ?
  • Get the pivotal quantity
  • The distribution of r is quite complicated
  • T0 test statistic for ? 0

53
Bivariate Normal Distribution
  • pdf
  • Properties
  • µ1, µ2 means for X, Y
  • s12, s22 variances for X, Y
  • ? the correlation coeffbetween X, Y

54
Derivation of T0
Are these equivalent?
  • Therefore, we can use t as a statistic for
    testing against the null hypothesis H0 ß10
  • Equivalently, we can test against H0 ?0

Substitute
After a few steps we can see that these two
t-tests are indeed equivalent
55
Exact Statistical Inference on ?
  • Example
  • A researcher wants to determine if two test
    instruments give similar results. The two test
    instruments are administered to a sample of 15
    students. The correlation coefficient between
    the two sets of scores is found to be 0.7. Is
    this correlation statistically significant at the
    .01 level?
  • H0 ?0 , Ha ??0
  • for a .01, 3.534 t0 gt t13, .005
    3.012
  • ? Reject H0
  • Test
  • H0 ?0 , Ha ??0
  • Test statistic
  • Reject H0 iff

56
Approximate Statistical Inference on ?
  • There is no exact method of testing ? vs an
    arbitrary ?0
  • Distribution of R is very complicated
  • T0 tn-2 only when ? 0
  • To test ? vs an arbitrary ?0 one can use Fishers
    transformation
  • Therefore, let

57
Approximate Statistical Inference on ?
  • Test
  • Sample estimate
  • Z test statistic
  • CI for ?

We reject H0 if z0 gt za/2
58
Approximate Statistical Inference on ?using SAS
  • Code
  • Output

59
Pitfalls of Regression and Correlation Analysis
  • Correlation and causation
  • Ticks cause good health
  • Coincidental data
  • Sun spots and republicans
  • Lurking variables
  • Church, suicide, population
  • Restricted range
  • Local, global linearity

60
7. Simple Linear Regression in Matrix Form
?
  • Linear regression model

vector of response
X matrix
vector of errors
vector of parameters
  • The normal equations

Using matrix notation
Using summa-tions
?
61
Matrix Form of Multiple Regression by LS
or y X b e in short LS criterion is
The LS solutions are
62
Summary
Linear regression analysis
Model Assumptions
Correlation Coefficient r
The Least squares (LS) estimates b0 and b1
Probabilistic model for Linear regression
Correlation Analysis
Outliers? Influential Observations? Data
Transformations?
Confidence Interval Prediction interval
63
Least Squares (LS) Fit Sample correlation
coefficient r Statistical inference on ß0
ß1 Prediction Interval Model
Assumptions Correlation Analysis
Linearity Constant Variance Normality
Independence
64
Questions?
Write a Comment
User Comments (0)
About PowerShow.com