Loading...

PPT – Chapter 8: Linear Regression PowerPoint presentation | free to download - id: 6b8131-MDcxN

The Adobe Flash plugin is needed to view this content

Chapter 8 Linear Regression

- A.P. Statistics

Linear Model

- Making a scatterplot allows you to describe the

relationship between the two quantitative

variables. - However, sometimes it is much more useful to use

that linear relationship to predict or estimate

information based on that real data relationship. - We use the Linear Model to make those predictions

and estimations.

Linear Model

- Normal Model

- Linear Model

- Allows us to make predictions and estimations

about the population and future events. - It is a model of real data, as long as that data

has a nearly symmetric distribution.

- Allow us to make predictions and estimations

about the population and future events. - It is a model of real data, as long as that data

has a linear relationship between two

quantitative variables.

Linear Model and the Least Squared Regression Line

- To make this model, we need to find a line of

best fit. - This line of best fit is the predictor line and

will be the way we predict or estimate our

response variable, given our explanatory

variable. - This line has to do with how well it minimizes

the residuals.

Residuals and the Least Squares Regression Line

- The residual is the difference between the

observed value and the predicted value. - It tells us how far off the models prediction is

at that point - Negative residual predicted value is too big

(overestimation) - Positive residual predicted value is too small

(underestimation)

Residuals

Least Squares Regression Line

- The LSRL attempts to find a line where the sum of

the squared residuals are the smallest. - Why not just find a line where the sum of the

residuals is the smallest? - Sum of residuals will always be zero
- By squaring residuals, we get all positive

values, which can be added - Emphasizes the large residualswhich have a big

impact on the correlation and the regression line

Scatterplot of Math and Verbal SAT scores

Scatterplot of Math and Verbal SAT scores with

incorrect LSRL

Scatterplot of Math and Verbal SAT scores with

correct LSRL

(No Transcript)

Least-Squares Regression Line

- We Can Find the LSRL For Three Different

Situations - Using z-Scores of Real Data (Standardizing Data)
- Using Summary Statistics of Data (mean and

standard deviation) - Using Real Data

(No Transcript)

LSRL Using z-Scores of Real Data

- LSRL passes through
- and
- LSRL equation is
- moving one standard deviation from the mean

in x, we can expect to move about r standard

deviations from the mean in y .

LSRL Using z-Scores of Real Data

(Interpretation)

- LSRL of scatterplot
- For every standard deviation above (below) the

mean a sandwich is in protein, well predict that

that its fat content is 0.83 standard deviations

above (below) the mean.

LSRL Using Summary Statistics of Data

Protein Fat

LSRL Equation

LSRL Using Summary Statistics of Data

(Interpretation)

Slope One additional gram of protein is

associated with an additional 0.97 grams of

fat. y-intercept An item that has zero grams

of protein will have 6.8 grams of fat.

ALWAYS CHECK TO SEE IF Y-INTERCEPT MAKES SENSE IN

THE CONTEXT OF THE PROBLEM AND DATA

LSRL Using Summary Statistics of Data

(Interpretation)

- Use technology to get the LSRL. Making sure you

check your conditions, etc.

Properties of the LSRL

- The fact that the Sum of Squared Errors (SSE,

same as Least Squared Sum)is as small as possible

means that for this line - The sum and mean of the residuals is 0
- The variation in the residuals is as small as

possible - The line contains the point of averages

Assumptions and Conditions for using LSRL

- Quantitative Variable Condition
- Straight Enough Condition
- if notre-express (Chapter 10)
- Outlier Condition
- with and without ?

Residuals and LSRL

- Residuals should be used to see if a linear model

is appropriate - Residuals are the part of the data that has not

been modeled in our linear model

Residuals and LSRL

- What to Look for in a Residual Plot to Satisfy

Straight Enough Condition - No patterns, no interesting features (like

direction or shape), should stretch horizontally

with about same scatter throughout, no bends or

outliers. - The distribution of residuals should be symmetric

if the original data is straight enough.

Looking at a scatterplot of the residuals vs. the

x-value is a good way to check the Straight

Enough Condition, which determines if a linear

model is appropriate.

Residuals, again

(No Transcript)

(No Transcript)

(No Transcript)

A Complete Linear Regression AnalysisPART I

- Draw a scatterplot of the data. Comment on what

you see. (Satisfy Quantitative Data Condition) - Form, strength, direction
- Unusual Points, Deviations
- Comment on General Variable Direction

A Complete Linear Regression AnalysisPART II

- Compute r . Comment on what r means in context

and if it is appropriate to use (does the

relationship seem linearStraight Enough

Condition)

A Complete Linear Regression AnalysisPART III

- Find the LSRL
- Check all three conditions
- Quantitative Data Condition
- Straight Enough Condition
- Outlier Condition

A Complete Linear Regression AnalysisPART IV

- Draw a residual plot and interpret it-is the

linear model appropriate?

A Complete Linear Regression AnalysisPART V

- Interpret slope in context
- Interpret the y-intercept in context

A Complete Linear Regression AnalysisPART VI

- Compute R-Squared. Interpret the value and use

as a measure for the accuracy of the model. How

well does the model predict?

What is R-Squared

- This value will determine how accurate the linear

model is predicting your y-values from you

x-values. - It is written as a percent.
- It is, literally, your r-value squared.

R-Squared Interpretation

- If a Regression analysis has an R-squared value

of 97, that means the model does an excellent

job predicting the y-values in your model. - How do we interpret that?
- 97 of the variation is y can be accounted for

by the variation is x, on average.

R-Squared Interpretation

- There are other ways to write that

interpretation. - Also, can be thought of as
- how much error was eliminated in our predictions

if we used the LSRL instead of a guess of .