View by Category

Loading...

PPT – Simple Linear Regression PowerPoint presentation | free to download

The Adobe Flash plugin is needed to view this content

About This Presentation

Write a Comment

User Comments (0)

Transcript and Presenter's Notes

Simple Linear Regression

Simple Linear Regression

- Regression analysis is used to predict the value

of one variable on the basis of other variables. - The technique involves developing a mathematical

equation that describes the relationship between

the variable to be forecast, which is called the

dependent variable, and variables that the

statistician believes are related to the

dependent variable. - The dependent variable is denoted by "y", while

the related variables are called independent

variables and are denoted x1, x2, , xk (where

"k" is the number of independent variables). - Equations such as
- E mc2
- F ma

Simple Linear Regression

- Are deterministic models because with exception

of small errors, these equations allow us to

determine the value of the independent variable

(on the left side of the equation) from the value

of the independent variables. - These equations do not represent the random

nature of real life. Equations that contain some

measure of randomness are called probabilistic

models. - To build a probabilistic model, we start with a

deterministic model that approximates the

relationship we want to model. We then add a

random term that measures the error of the

deterministic component.

Example Predict Cost of Building a House

- Suppose that the cost of building a new house is

about 75 per square foot and that most lots sell

for about 25,000. The approximate selling price

would be - y 25000 75x
- Where y selling price and x Size of the house

in square feet - Thus a house of 2000 square feet would be

estimated to sell for - y 25000 75(2000) 175,000

Example Predict Cost of Building a House

- We know the price is not exactly 175,000, but

between 150,000 - 200,000. To represent this

situation properly, we should use the

probabilistic model - y 25000 75x ?
- where " ? " ( the Greek letter epsilon)

represents the random term (Called the error

variable). - ? is the difference between the actual selling

price and the estimated price based on the size

of the house. - Thus the random term accounts for all the

variables, both measurable and immeasurable, that

are not part of the model such as number of

bedrooms and location.

Example Predict Cost of Building a House

- The value of ? will vary from house to house

depending on location, number of bedrooms, etc - In this chapter we will only consider equation

(models) that have one independent variable. This

model we will use is called the first-order

linear model or the simple linear regression

model.

The Model

- The first order linear model
- y dependent variable
- x independent variable
- b0 y-intercept
- b1 slope of the line
- error variable

b0 and b1 are unknown, therefore, are estimated

from the data.

y

Rise

b1 Rise/Run

Run

b0

x

Estimating the Coefficients

- The estimates are determined by
- drawing a sample from the population of interest,
- calculating sample statistics.
- producing a straight line that cuts into the data.

y

w

The question is Which straight line fits best?

w

w

w

w

w w w w

w

w w

w w

w

x

Estimating the Coefficients

The best line is the one that minimizes the sum

of squared vertical differences between the

points and the line.

Sum of squared differences

(2 - 1)2

(4 - 2)2

(1.5 - 3)2

(3.2 - 4)2 6.89

Let us compare two lines

(2,4)

4

The second line is horizontal

w

(4,3.2)

w

3

2.5

2

w

(1,2)

The smaller the sum of squared differences the

better the fit of the line to the data.

(3,1.5)

w

3

4

2

Estimating the Coefficients

To calculate the estimates of the

coefficients that minimize the differences

between the data points and the line, use the

formulas

The regression equation that estimates the

equation of the first order linear model is

Example 17.1 Relationship between odometer

reading and a used cars selling price.

- A car dealer wants to find the relationship

between the odometer reading and the selling

price of used cars. - A random sample of 100 cars is selected, and the

data recorded. - Find the regression line.

Independent variable x

Dependent variable y

Example 17.1 Relationship between odometer

reading and a used cars selling price.

- Solving by handTo calculate b0 and b1 we need to

calculate several statistics first

where n 100.

Example 17.1 Relationship between odometer

reading and a used cars selling price.

- Using the computer (see file Xm17-01.xls)

Tools Data analysis Regression Shade the y

range and the x range OK

Example 17.1 Relationship between odometer

reading and a used cars selling price.

6533

0

No data

This is the slope of the line. For each

additional mile on the odometer, the price

decreases by an average of 0.0312

The intercept is b0 6533.

Do not interpret the intercept as the Price of

cars that have not been driven

Example 17.1 Relationship between odometer

reading and a used cars selling price.

- Using Minitab
- Type or import the data into 2 columns
- Click Stat, Regression, and Regression
- Type the name of the dependent variable (Response

- Price or C2) - Hit tab, and type the name of the independent

variable (Predictors - Odometer or C1) - Click O.K.
- Click Stat, Regression, and Fitted Line Plot
- Type the name of the dependent variable (Response

- Y - Price or C2) - Hit tab, and type the name of the independent

variable (Predictors - X - Odometer or C1) - Click O.K.

Error Variable Required Conditions

- b 1 -.0312 which means that for each additional

mile on the odometer, the price decreases by an

average of .0312 or 3.12 cents. - b 0 6533 since this is the y - intercept, we

might think that a car with 0 miles would sell

for 6533. - In this case however, the intercept is probably

meaningless. Because our sample did not include

any cars with 0 miles on the odometer, we have no

basis for interpreting b 0. - As a general rule, we cannot determine the value

of y for a value of x that is far outside the

range of sample values of x. In this sample, x

ranged from 19075 to 49223.

Error Variable Required Conditions

- The error ? is a critical part of the regression

model. - Four requirements involving the distribution of ?

must be satisfied. - The probability distribution of ? is normal.
- The mean of e is zero E(?) 0.
- The standard deviation of ? is ?? for all values

of x. - The set of errors associated with different

values of y are all independent.

From the first three assumptions we have y is

normally distributed with mean E(y) b0 b1x,

and a constant standard deviation se

The standard deviation remains constant,

m3

m2

but the mean value changes with x

m1

x1

x2

x3

Assessing the Model

- The least squares method will produce a

regression line whether or not there is a linear

relationship between x and y. - Consequently, it is important to assess how well

the linear model fits the data. - Several methods are used to assess the model
- Testing and/or estimating the coefficients.
- Using descriptive measurements.

Standard Error of Estimate

- The mean error is equal to zero.
- If se is small the errors tend to be close to

zero (close to the mean error). Then, the model

fits the data well. - If se is large, some of the errors will be large,

which implies that the model's fit is poor. - We could use se to measure the suitability of

using a linear model, but se is a population

parameter which is usually unknown. - We can estimate se from the data using the Sum of

Squares Error (SSE). - Therefore, we can, use se as a measure of the

suitability of using a linear model.

Standard Error of Estimate

- An unbiased estimator of se2 is given by se2

Sum of Squares For Errors

- SSE the minimized sum of squares differences

between the points and the regression line, also

known as "sum of squares about the regression

(line)", "sum of squares of the residuals", and

"sum of squares errors (SSE)" - It can serve as a measure of how well the line

fits the data. - This statistic plays a role in every statistical

technique we employ to assess the model.

Sum of Squares For Errors

Shortcut formula

Example 17.2

- Calculate the standard error of estimate for

example 17.1, and describe what does it tell you

about the model fit? - Solution

Calculated before

Testing The Slope

- When no linear relationship exists between two

variables, the regression line should be

horizontal.

q

q

Linear relationship. Different inputs (x)

yield different outputs (y).

No linear relationship. Different inputs (x)

yield the same output (y).

The slope is not equal to zero

The slope is equal to zero

Testing The Slope

- We can draw inference about b1 from b1 by testing
- H0 b1 0
- H1 b1 ? 0 (or 0)
- The test statistic is

where

If the error variable is normally distributed,

the statistic is Student t distribution with d.f.

n-2.

Example 17.3

- SolutionSolving by hand
- To compute t we need the values of b1 and sb1.

Test Statistic

Example 17.3

- The rejection region is

- Conclusion Reject the null hypothesis since
- 13.49

Example 17.3

- Interpreting the Results
- The value of the test statistic is t -13.49

with a p-value of 0.00. There is overwhelming

evidence to infer that a linear relationship

exists. - What this means is that the odometer reading does

affect the auction selling price of the cars.

Coefficient of Determination

- When we want to measure the strength of the

linear relationship, we use the coefficient of

determination.

Coefficient of Determination

- To understand the significance of this

coefficient note

The regression model

Overall variability in y

The error

Coefficient of Determination

Two data points (x1,y1) and (x2,y2) of a certain

sample are shown.

y2

y1

x1

x2

Total variation in y

Variation explained by the regression line)

Unexplained variation (error)

Coefficient of Determination

Variation in y SSR SSE

- R2 measures the proportion of the variation in y

that is explained by the variation in x.

- R2 takes on any value between zero and one.
- R2 1 Perfect match between the line and the

data points. - R2 0 There are no linear relationship between

x and y.

Example 17.4

- Find the coefficient of determination for example

17.1 what does this statistic tell you about the

model? - Solution
- Solving by hand
- Using the computer
- From the regression output we have

65 of the variation in the auction selling price

is explained by the variation in odometer

reading. The rest (35) remains unexplained

by this model.

Example 17.4 Interpreting the Results

- We found that R2 65. This statistic tells us

that 65 of the variation in the auction selling

price is explained by the variation in the

odometer readings. The remaining 35 is

unexplained. - Unlike test statistics, R2 does not have a

critical value that enables us to draw

conclusions. - We know that the higher the value of R2 the

better the model fits the data. - R2 provides us a measure of the strength of the

linear relationship between the independent and

dependent variables.

Cause and Effect Relationships

- Do not assume that a linear relationship between

variables means that there is a cause and effect

relationship between those variables. - We cannot infer a causal relationship from

statistics alone. Causal relationships must be

justified by reasonable theoretical

relationships. - Do not interpret the work "explained" in relation

to R2 to mean "caused". - In the Taurus example, we might conclude that

decreasing the odometer reading would cause the

price to rise. This conclusion may not be

entirely true. - It is theoretically possible that the price is

determined by the overall condition of the car

and that condition worsens when the car is driven

longer.

Using the Regression Equation

- Before using the regression model, we need to

assess how well it fits the data. - If we are satisfied with how well the model fits

the data, we can use it to make predictions for

y. - Illustration
- Predict the selling price of a three-year-old

Taurus with 40,000 miles on the odometer (Example

17.1)

Thus, the dealer would predict that the car would

sell for 5285. This does not provide any

information about how closely the value will

match the true selling price. To do that we must

construct a confidence interval.

Prediction interval and confidence interval

- Two intervals can be used to discover how closely

the predicted value will match the true value of

y. - Prediction interval - for a particular value of

y, - Confidence interval - for the expected value of y.

The prediction interval is wider than the

confidence interval

Example 17.6 Interval Estimates for the Car

Auction Price

- Provide an interval estimate for the bidding

price on a Ford Taurus with 40,000 miles on the

odometer. - Solution
- The dealer would like to predict the price of a

single car - The prediction interval(95)

t.025,98

Example 17.6 Interval Estimates for the Car

Auction Price

- The car dealer wants to bid on a lot of 250 Ford

Tauruses, where each car has been driven for

about 40,000 miles. - Solution
- The dealer needs to estimate the mean price per

car. - The confidence interval (95)

The Effect of the Given Value of X on the

Interval

- As xg moves away from x the interval becomes

longer. That is, the shortest interval is found

at x.

Coefficient of Correlation

- The coefficient of correlation is used to measure

the strength of association between two

variables. - The coefficient values range between -1 and 1.
- If r -1 (negative association) or r 1

(positive association) every point falls on the

regression line. - If r 0 there is no linear pattern.
- The coefficient can be used to test for linear

relationship between two variables.

Coefficient of Correlation

- Use in cases where we are interested only if a

linear relationship exists and not the form of

the relationship

Coefficient of Correlation

- Testing the coefficient of correlation
- When there are no linear relationship between two

variables, r 0. - The hypotheses are
- H0 r 0H1 r 0
- The test statistic is

The statistic is Student t distributed with d.f.

n - 2, provided the variables are bivariate

normally distributed.

Example 17.7 Testing for Linear Relationship

- Test the coefficient of correlation to determine

if linear relationship exists in the data of

example 17.1. - Solution
- We test H0 r 0 H1 r 0.
- Solving by hand
- The rejection region ist ta/2,n-2 t.025,98

1.984 or so.

Example 17.7 Testing for Linear Relationship

- The sample coefficient of correlation
- r cov(X,Y)/sxsy -1,356,256/(6597.6)(254.9)

-.806

The value of the t statistic is

Conclusion There is sufficient evidence at a

5 to infer that there are linear relationship

between the two variables.

Example 17.7 Relationship between odometer

reading and a used cars selling price.

- Using Minitab
- Type or import the data into 2 columns
- Click Stat, Basic Statistics, and Correlation
- Type the variable names (Price or C2, Odometer

or C1) - Click O.K.

Spearman Rank Correlation Coefficient

- The Spearman rank test is used to test whether

relationship exists between variables in cases

where - at least one variable is ranked, or
- both variables are quantitative but the normality

requirement is not satisfied

Spearman Rank Correlation Coefficient

- The hypotheses are
- H0 rs 0
- H1 rs 0
- The test statistic is
- a and b are the ranks of the data.
- For a large sample (n 30) rs is approximately

normally distributed

Example 17.8 Spearman Rank Correlation

Coefficient

- A production manager wants to examine the

relationship between - aptitude test score given prior to hiring, and
- performance rating three months after starting

work. - A random sample of 20 production workers was

selected. The test scores as well as performance

rating was recorded.

Example 17.8 Spearman Rank Correlation

Coefficient

- The aptitude test results range from 0 100.
- The performance ratings are as follows
- 1 Employee has performed well below average
- 2 Employee has performed somewhat below average
- 3 Employee has performed at the average level
- 4 Employee has performed somewhat above average
- 5 Employee has performed well above average
- Can the firms manager infer at the 5

significance level that aptitude test scores are

correlated with performance rating?

Example 17.8 Spearman Rank Correlation

Coefficient

- The problem objective is to analyze the

relationship between two variables. - The aptitude test score is quantitative, but the

performance rating is ranked - We will treat the aptitude test score as is it is

ranked and calculate the Spearman rank

correlation coefficient

Example 17.8 Spearman Rank Correlation

Coefficient

Scores range from 0 to 100

Scores range from 1 to 5

- The hypotheses are
- H0 rs 0
- H1 rs 0

- The test statistic is rs, and the rejection

region is rs rcritical (taken from the

Spearman rank correlation table).

Example 17.8 Spearman Rank Correlation

Coefficient

Ties are broken by averaging the ranks.

- Solving by hand
- Rank each variable separately.

Conclusion Do not reject the null

hypothesis. At 5 significance level there is

insufficient evidence to infer that the two

variable are related to one another.

- Calculate sa 5.92 sb 5.50 cov(a,b) 12.34
- Thus rs cov(a,b)/sasb .379.
- The critical value for a .05 and n 20 is .450.

Example 17.8 Relationship between odometer

reading and a used cars selling price.

- Interpret the results
- There is not enough evidence to believe that the

aptitude test scores and performance ratings are

related. - This suggests that the aptitude test should be

improved to better measure the knowledge and

skill required by a production line worker. - If this proves impossible, the aptitude test

should be discarded.

Example 17.8 Relationship between odometer

reading and a used cars selling price.

- Using Minitab
- Type or import the data into 2 columns
- Click Manip, and Rank
- Type the name of the first variable (arbitrary

choice, Aptitude or C1) - Hit Tab and specify the column where the ranks

are to be stored. Click O.K. - Click Manip, and Rank
- Type the name of the second variable (arbitrary

choice, C2) - Hit Tab and specify the column where the ranks

are to be stored. Click O.K. - Click Stat, Basic statistics and Correlation
- Type the names of the ranked variables
- Click O.K.

Regression Diagnostics - I

- The three conditions required for the validity of

the regression analysis are - the error variable is normally distributed.
- the error variance is constant for all values of

x. - The errors are independent of each other.
- How can we diagnose violations of these

conditions?

Residual Analysis

- Examining the residuals (or standardized

residuals), we can identify violations of the

required conditions - Example 17.1 - continued
- Nonnormality.
- Use Excel to obtain the standardized residual

histogram. - Examine the histogram and look for a bell shaped

diagram with mean close to zero.

Residual Analysis

A Partial list of Standard residuals

Standardized residual i Residual i / Standard

deviation

We can also apply the Lilliefors test or the c2

test of normality.

Heteroscedasticity

- When the requirement of a constant variance is

violated we have heteroscedasticity.

Residual

y

Heteroscedasticity

- When the requirement of a constant variance is

not violated we have homoscedasticity.

Residual

y

The spread of the data points does not change

much.

Heteroscedasticity

- When the requirement of a constant variance is

not violated we have homoscedasticity.

Residual

y

As far as the even spread, this is a much better

situation

Nonindependence of Error Variables

- A time series is constituted if data were

collected over time. - Examining the residuals over time, no pattern

should be observed if the errors are independent. - When a pattern is detected, the errors are said

to be autocorrelated. - Autocorrelation can be detected by graphing the

residuals against time.

Nonindependence of Error Variables

Patterns in the appearance of the residuals over

time indicates that autocorrelation exists.

Residual

Residual

0

0

Time

Time

Note the runs of positive residuals, replaced by

runs of negative residuals

Note the oscillating behavior of the residuals

around zero.

Outliers

- An outlier is an observation that is unusually

small or large. - Several possibilities need to be investigated

when an outlier is observed - There was an error in recording the value.
- The point does not belong in the sample.
- The observation is valid.
- Identify outliers from the scatter diagram.
- It is customary to suspect an observation is an

outlier if its standard residual 2

Outliers

An influential observation

An outlier

but, some outliers may be very influential

The outlier causes a shift in the regression line

Procedure for regression diagnostics

- Develop a model that has a theoretical basis.
- Gather data for the two variables in the model.
- Draw the scatter diagram to determine whether a

linear model appears to be appropriate. - Check the required conditions for the errors.
- Assess the model fit.
- If the model fits the data, use the regression

equation.

About PowerShow.com

You can use PowerShow.com to find and download example online PowerPoint ppt presentations on just about any topic you can imagine so you can learn how to improve your own slides and presentations for free. Or use it to find and download high-quality how-to PowerPoint ppt presentations with illustrated or animated slides that will teach you how to do something new, also for free. Or use it to upload your own PowerPoint slides so you can share them with your teachers, class, students, bosses, employees, customers, potential investors or the world. Or use it to create really cool photo slideshows - with 2D and 3D transitions, animation, and your choice of music - that you can share with your Facebook friends or Google+ circles. That's all free as well!

For a small fee you can get the industry's best online privacy or publicly promote your presentations and slide shows with top rankings. But aside from that it's free. We'll even convert your presentations and slide shows into the universal Flash format with all their original multimedia glory, including animation, 2D and 3D transition effects, embedded music or other audio, or even video embedded in slides. All for free. Most of the presentations and slideshows on PowerShow.com are free to view, many are even free to download. (You can choose whether to allow people to download your original PowerPoint presentations and photo slideshows for a fee or free or not at all.) Check out PowerShow.com today - for FREE. There is truly something for everyone!

presentations for free. Or use it to find and download high-quality how-to PowerPoint ppt presentations with illustrated or animated slides that will teach you how to do something new, also for free. Or use it to upload your own PowerPoint slides so you can share them with your teachers, class, students, bosses, employees, customers, potential investors or the world. Or use it to create really cool photo slideshows - with 2D and 3D transitions, animation, and your choice of music - that you can share with your Facebook friends or Google+ circles. That's all free as well!

For a small fee you can get the industry's best online privacy or publicly promote your presentations and slide shows with top rankings. But aside from that it's free. We'll even convert your presentations and slide shows into the universal Flash format with all their original multimedia glory, including animation, 2D and 3D transition effects, embedded music or other audio, or even video embedded in slides. All for free. Most of the presentations and slideshows on PowerShow.com are free to view, many are even free to download. (You can choose whether to allow people to download your original PowerPoint presentations and photo slideshows for a fee or free or not at all.) Check out PowerShow.com today - for FREE. There is truly something for everyone!

For a small fee you can get the industry's best online privacy or publicly promote your presentations and slide shows with top rankings. But aside from that it's free. We'll even convert your presentations and slide shows into the universal Flash format with all their original multimedia glory, including animation, 2D and 3D transition effects, embedded music or other audio, or even video embedded in slides. All for free. Most of the presentations and slideshows on PowerShow.com are free to view, many are even free to download. (You can choose whether to allow people to download your original PowerPoint presentations and photo slideshows for a fee or free or not at all.) Check out PowerShow.com today - for FREE. There is truly something for everyone!

Recommended

«

/ »

Page of

«

/ »

Promoted Presentations

Related Presentations

Page of

Page of

CrystalGraphics Sales Tel: (800) 394-0700 x 1 or Send an email

Home About Us Terms and Conditions Privacy Policy Contact Us Send Us Feedback

Copyright 2014 CrystalGraphics, Inc. — All rights Reserved. PowerShow.com is a trademark of CrystalGraphics, Inc.

Copyright 2014 CrystalGraphics, Inc. — All rights Reserved. PowerShow.com is a trademark of CrystalGraphics, Inc.

The PowerPoint PPT presentation: "Simple Linear Regression" is the property of its rightful owner.

Do you have PowerPoint slides to share? If so, share your PPT presentation slides online with PowerShow.com. It's FREE!

Committed to assisting Lamar University and other schools with their online training by sharing educational presentations for free