Loading...

PPT – LINEAR REGRESSION PowerPoint presentation | free to view - id: 17d13-N2UxN

The Adobe Flash plugin is needed to view this content

LINEAR REGRESSION CORRELATION

- CHAPTER 6
- BCT 2053
- Siti Zanariah Satari, FIST/FSKKP, 2009

CONTENT

- 6.1 Simple Linear Regression Analysis and
- Correlation
- 6.2 Relationship Test and Prediction
- in Simple Linear Regression Analysis
- 6.3 Multiple Linear Regression Analysis and
- Correlation
- 6.4 Model Selection

6.1 Simple Linear Regression Analysis and

Correlation

- OBJECTIVE
- Find a mathematical equation that can relate a

dependent and independent variables x and y. - Plot a scatter diagram and graph the regression

line - Calculate the strength of the linear relationship

between x and y by correlation coefficient .

INTRODUCTORY CONCEPTS

- Suppose you wish to investigate the relationship

between a dependent variable (y) and independent

variable (x) - Independent variable (x) the variables has been

controlled - Dependent variable (y) the response variables
- In other word, the value of y depends on the

value of x.

Example A

Suppose you wish to investigate the relationship

between the numbers of hours students spent

studying for an examination and the mark they

achieved.

Numbers of hours students spent studying for an

examination ( x Independent variable )

the mark (y) they achieved. ( y Dependent

variable )

will cause

Other Examples

- The weight at the end of a spring (x) and the

length of the spring (y) - A students mark in Statistics test (x) and the

mark in a Programming test (y) - The diameter of the stem of a plant (x) and the

average length of leaf of the plant (y)

SCATTER DIAGRAM

- When pairs of values are plotted, a scatter

diagram is produced - To see how the data looks like and relate with

each other - Exercise Plot a scatter diagram for Example A

LINEAR CORRELATION AND SIMPLE LINEAR LINE

- Linear correlation
- If the points on the scatter diagram appear to

lie near a straight line ( Simple regression line

) - Or you would say that there is a linear

correlation between x and y - Exercise From the scatter diagram for Example A,

is there any correlation between x and y?

Positive Linear Correlation

Negative Linear Correlation

No Correlation

No relationship between x and y

INFERENCES IN CORRELATION

- The product moment correlation coefficient, r, is

a numerical value between -1 and 1 inclusive

which indicates the linear degree of scatter. - r 1 indicates perfect positive linear

correlation - r -1 indicates perfect negative linear

correlation - r 0 indicates no correlation

INFERENCES IN CORRELATION

- The nearer the value of r is to 1 or -1, the

closer the points on the scatter diagram are to

the regression line - Nearer to 1 is strong positive linear correlation
- Nearer to -1 is strong negative linear

correlation - Exercise Calculate the correlation coefficient

r for Example A and interpret the result.

THE LEAST SQUARE REGRESSION LINE

- a mathematical way of fitting the regression line

- The line of best fit must pass through the means

of both sets of data, i.e. the point

Least square regression line of y on x

- Exercise Find and draw the regression

line for Example A,

6.2 Relationship Test and Prediction in Simple

Linear Regression Analysis

- OBJECTIVE
- Test the significance of regression slope.
- Predict and estimate the new y value from the

regression equation.

RELATIONSHIP TESTS AND PREDICTION IN SIMPLE

LINEAR REGRESSION ANALYSIS

- HYPOTHESIS TESTING FOR THE SLOPE OF

REGRESSION LINE - ESTIMATION AND PREDICTION

1 HYPOTHESIS TESTING FOR THE SLOPE OF

REGRESSION LINE

- To test the linear relationship between x and y
- x and y have a linear relationship if the slope

- Test the hypothesis,

with statistic test.

where

- If Ho is reject, x and y have a linear

relationship

- Exercise Test the linearity between x and y

for Example A at

2. ESTIMATION AND PREDICTION

- When x is the independent variable and you want

to - estimate y for a given value of x
- estimate x for a given value of y.
- When neither variable is controlled and you want

to estimate y for a given value of x - The regression line y on x is used to make

prediction when there is a linear correlation

between x and y.

Guideline for using regression equation

- If there is no linear correlation, dont use the

regression equation to make prediction - When using the regression equation for

predictions, stay within the scope of the

available sample data - A regression equation based on old data is not

necessarily now. - Dont make predictions about a population that is

different from the population from which the

sample data were drawn.

Exercise

- Use Example A to find
- the estimate of y when x 10 hours
- the estimate of x when y 75 marks

EXAMPLE B

- A study is done to see whether there is a

relationship between a mothers age and the

number of children she has. The data are shown

here. - Plot a scatter diagram to illustrate the data.
- Compute the value of the correlation coefficient,

r and comment on the relationship between the

value of r and the scatter plot.

- Find the equation of the regression line of y on

x. Then predict the number of children of a

mother whose age is 34. - Test the linearity between x and y when a 0.05.

SOLVE SIMPLE LINEAR REGRESSION BY EXCEL

- Excel key in data
- Tools Data Analysis Regression enter the

data range (y x) ok

Computer Output - Excel

Strong Linear positive correlation

x and y have linear relationship ( P-value

6.3 Multiple Linear Regression Analysis and

Correlation

- OBJECTIVE
- To describe linear relationships involving more

than two variables. - Interpret the computer output for multiple linear

regression analysis and make prediction.

MULTIPLE LINEAR REGRESSION EQUATION

- A multiple regression equation is use to describe

linear relationships involving more than two

variables. - A multiple linear regression equation expresses a

linear relationship between a response variable y

and two or more predictor variable (x1, x2,,xk).

The general form of a multiple regression

equation is - A multiple linear regression equation identify

the plane that gives the best fit to the data

Notation Multiple regression equation

Examples of real situation

- A manufacturer of jams wants to know where to

direct its marketing efforts when introducing a

new flavour. Regression analysis can be used to

help determine the profile of heavy users of

jams. For instance, a company might predict the

number of flavours of jam a household might have

at any one time on the basis of a number of

independent variables such as, number of children

living at home, age of children, gender of

children, income and time spent on shopping.

Examples of real situation

- Many companies use regression to study markets

segments to determine which variables seem to

have an impact on market share, purchase

frequency, product ownerships, and product

brand loyalty, as well as many other areas.

Examples of real situation

- Personals directors explore the relationships of

employee salary levels to geographic location,

unemployment rates, industry growth, union

membership, industry type, or competitive

salaries. - Financial analysts look for causes of high stock

prices by analysing dividend yields, earning per

share, stock splits, consumer expectation of

interest rates, savings levels and inflation

rates.

Examples of real situation

- Medical researchers use regression analysis to

seek links between blood pressure and independent

variables such as age, social class, weight,

smoking habits and race. - Doctors explore the impact of communications,

number of contacts, and age of patient on patient

satisfaction with service.

Computing the Multiple Linear Regression Equation

- By using the least square method, the multiple

linear regression equation is given by - Where the estimated regression coefficients

EXAMPLE C

- Assume that, a sales manager of Tackey Toys,

needs to predict sales of Tackey products in

selected market area. He believes that

advertising expenditures and the population in

each market area can be used to predict sales. He

gathered sample of toy sales, advertising

expenditures and the population as below. Find

the linear multiple regression equation which the

best fit to the data.

Example, cont

Solution

- Since we have 2 independent variables, so the

multiple regression equation is given by

SOLVE MULTIPLE LINEAR REGRESSION BY EXCEL

- Excel key in data
- Tools Data Analysis Regression enter the

data range (y x) ok

Computer Output Microsoft Excel

Interpreting the Values in the Equation

- b0 6.3972
- The value of estimated y when x1 and x2 are both

zero. - b1 20.4921
- When the population in thousands is constant then

the estimated toy sales increases by 20.4921

thousands dollars for each 1000 dollars of

advertising expenditures. - b2 0.2805
- When the advertising expenditures in thousands

dollars is constant then the estimated toy sales

increases by 0.2805 thousands dollars for each

1000 people in the population.

Making preliminary predictions with the multiple

regression equation

- Assume that the sales manager needs a sales

forecast for a market area Tackey Toys has

recently spend 4,000 advertising in this market,

which has a population of 500,000 people. So the

point estimate of toy sales is given by

The Coefficient of Multiple Determinations

- Measure the percentage of variation in the y

variable associated with the use of the set x

variables - A percentage that shows the variation in the y

variable thats explain by its relation to the

combination of x1 and x2.

.

where

and

when all

when all observations fall directly on the fitted

response surface, i.e. when (the regression

equation is good)

for all t.

Computer Output Microsoft Excel

97.4 of Tackey Toy sales in the market area is

explained by advertising expenditures and

population size

Multiple R and Adjusted R²

The coefficient of multiple correlations R is the

positive square root of R².

The adjusted coefficient of determination is the

multiple coefficient of determination R² modified

to account for the number of variables and the

sample size.

When we compare a multiple regression equation to

others, it is better to use the adjusted R².

Standard error and ANOVA test

- Standard error
- Measure the extent of the scatter, or dispersion,

of the sample data points about the multiple

regression plane. - ANOVA test
- H0 neither of the independent variables is

related to the dependent variables (b1 b2 0) - H1 At least one of the independent variables is

related to the dependent variables (b1 or b2 or

both ? 0) - Reject H0 if significance F

Computer Output Microsoft Excel

97.4 of Tackey Toy sales in the market area is

explained by advertising expenditures and

population size

6.4 Model Selection

- OBJECTIVE
- Select the best multiple linear regression model

for any given data set.

TIPS Model Selection in simple way

- Use common sense and practical considerations to

include or exclude variables. - Consider the P-value (the measure of the overall

significance of multiple regression

equation-significance F value) displayed by

computer output. The smaller the better. - Consider equation with high values of adjusted R²

and try include only a few variables. - Find the linear correlation coefficient r for

each pair of variables being considered. If 2

predictor values have a very high r, there is no

need to include them both. Exclude the variable

with the lower value of r.

EXAMPLE D

- The following table summarize the multiple

regression analysis for the response variable (y)

is weight (in pounds), and the predictor (x)

variables are H ( height in inches), W (waist

circumference in cm), and C (cholesterol in mg)

Example D, cont

- If only one predictor variable is used to

predict weight, which single variable is best?

Why? - If exactly two predictor variables are used to

predict weight, which two variables should be

chosen? Why? - Which regression equation is best for predicting

weight? Why?

CONCLUSION

- This chapter introduces important methods

(regression) for making inferences about a

relationship between two or more variables and

describing such a relationship with an equation

that can be used for predicting value of one

variable given the value of the other variable.

Thank You