Loading...

PPT – Regression Analysis: Estimating Relationships PowerPoint presentation | free to view - id: 119cde-NTcwN

The Adobe Flash plugin is needed to view this content

Regression Analysis and Modeling for Decision

Support

Regression Analysis Estimating Models for

Decision Support

Regression Analysis is a study of relationship

between a set of independent variables and the

dependent variable. Independent variables are

characteristics that can be measured directly

(example the area of a house). These variables

are also caled predictor variables (used to

predict the dependent variable) or explanatory

variables (used to explain the behavior of the

dependent variable). Dependent variable is a

characteristic whose value depends on the values

of independent variables.

Purpose of Regression Analysis

Future/Unknown

Past / Experience / Known

Now

time

ExplanationUse regression analysis to develop a

mathematical model to explain the variance in the

dependent variable based on values of independent

variables.

Prediction If the regression model adequately

explains the dependent variable, use the model to

predict values of the dependent variable.

Explain Selling Price of a house (dependent)

based on its characteristics (independents). If

the model is valid, use it for prediction.

Develop Regression Model using known data

(sample) Selling Price 40,000 100(Sq.ft)

20,000(Baths) If the above model is reliable and

valid, Use this model to predict the Selling

Price of any house based on its area (Sq.ft.) and

the number of bathrooms (Baths) The constant

term (40,000) is the fixed price of the house.

This is not dependent on the values of the

variables considered. Can be interpreted as the

price of the lot and transaction costs. The

coefficient of Sq.ft. (100) is the change in

Selling Price for an additional Square Foot. Can

be interpreted as Price per Sq.Foot.

Procedure for Building Regression Models

Define/Clarify Purpose. Identify, graph and

describe the measurement of the dependent

variable. Specific y

Identify possible independent variables

(predictors should make sense). Describe each

X. Use scatter plots and correlations for y with

each x.

Estimate Regression Coefficients (using least

squares method).

Test to see if all coefficients are significant

(reliability). Establish validity (are

relationships as expected, do predictions match

actuals?).

Implement the model in Decision Support System.

Incorporate error in predictions. Outline

limitations/constraints of the model.

Compare predictions with actual values.

Modify/Refine/Expand model if necessary.

Selecting Independent Variables Scatter Plots

Scatter Plots are used to visualize the

relationship between any two variables. For

regression analysis, we are looking for strong

linear relationship between the independent and

dependent variable.

Y-Intercept (Constant) Value of the dependent

variable irrespective of the value(s) of the

independent variable(s).

X-Coefficient (Slope) Change in dependent

variable per unit change in independent variable.

Overhead 3996 43 M_Hrs 883 Runs

R-Squared Proportion of variance in dependent

variable explained by independent variable(s).

Selecting Independent Variables Correlation

Analysis

Correlation Coefficients are used to measure the

linear relationship between any two variables.

For regression analysis, we are looking for

strong linear relationship between the

independent and dependent variable, and low

correlations among independent variables .

Correlation of MachHrs with Overhead (should be

high)

Correlation of MachHrs with ProdRuns (should be

low)

Correlation of ProdRuns with Overhead (should

be high)

Multicollinearity exists when two independent

variables are highly correlated (redundancy).

Simple Linear Regression

Linear regression function One dependent and one

independent variables Mathematical form Y

b0 b1X e b0 and b1 are parameters (unknown

constants) and their values are estimated from a

known sample of X and corresponding Y.

Estimated Model Y-Pred b0 b1X

Y-actual

e

Y-pred

b0 and b1 are estimates (based on a sample) of

B0 and B1 which are parameters (based on

population) Estimation of b0 and b1

(coefficients) is done by the Least Squares

Method. This method selects the line that has

the smallest squared error

B1 slope

B0 y -intercept

X

Example of Simple Linear Regression Defining

Objective(s)

Define Objectives

- Pharmex is a chain of drugstores that operates

around the country. - To see how effective their advertising and other

promotional activities are, the company has

collected data from 50 randomly selected

metropolitan regions. - In each region it has compared its own

promotional expenditures and sales to those of

the leading competitor in the region over the

past year. - So, Pharmexs objective is to model the

relationship between Promotion expenditures and

Sales - Since Pharmex is interested in improving its

sales, relative to its largest competitor, the

dependent (outcome) variable for this situation

is - Sales Pharmexs sales as a percentage of those

of the leading competitor. This is the dependent

(or predicted) variable.

Example of SLR Select Independent Variable

Variable Selection

The company expects that there is a positive

relationship between the Relative measures of

Sales and Promotion Expenditures, so that regions

with relatively more expenditures have relatively

more sales.

Promote Pharmexs promotional expenditures as a

percentage of those of the leading competitor.

This is the independent variable (or predictor

variable), one which can be controlled by

Pharmex.

- Selection Criteria
- Based in Common Sense and Experience
- Scatter Plots and Correlations

Description of Variables List each variable, how

measured, and expected relationship with

dependent variable. In this section report

results of Correlation Analyses, Scatter Plots,

etc.

Example of SLR Collect and Organize Data

Data Collection

Example of SLR Estimate Coefficients

Estimate Model

Regression Procedure in Excel

R-Square 45 of the variance in Sales is

explained by Promote (model)

Estimated Coefficients

Y-intercept (b0) 25.12 Slope (b1) 0.762

Sales-predicted

25.12 0.762 Promote

P-Value Indicates the probability of making a

Type I error (the possibility that the

coefficient is 0, that is there is no

relationship). If this value is greater than .05

do not use the variable as a predictor.

Example of SLR Testing the Model

- Reliability and Validity
- Does the model make intuitive sense? Is the model

easy to understand and interpret? - Are all coefficients statistically significant?

(p-values less than .05) - Are the signs associated with the coefficients as

expected? - Does the model predict values that are reasonably

close to the actual values? - Is the model sufficiently sound? (High R-square,

low standard error, etc.)

Example of SLR Implementing and Using the Model

Develop a Spreadsheet Model (Decision Support

System)

Estimated

Decision Variable

Forecast (regression formula)

What-if Pharmex spent 160K on promotions?

(Sensitivity analysis) What will Pharmex have to

do to achieve 20 sales more than its competitor?

(goal seeking) What will happen to Pharmexs

sales if its Competitors promotion can be any

value between 130K and 140K? (Monte-Carlo

Simulation)

- Estimating Demand for a Product

Conceptual Structure of Demand Model

Firm Demand Total Industry Demand Market

Share Define Market Share Firm Demand / Total

Industry Demand Market Share Firm

Demand / (Avg. Demand Number of Firms)

Market Share Relative Demand / Number of

Firms Firm Demand Total Industry Demand

(Relative Demand / N)

Macro-economic Influences Seasonal

Patterns Stage of Life Cycle

Exogenous Demand

Industry Activity Pricing, Promotion,

Quality

Endogenous Demand

FD

Competitive Profile Relative Pricing,

Promotion, Quality, and Loyalty

Relative Demand

Total Industry Demand

Relative Demand(Measure of Market Share)

RD is firm specific and a measure of market

share, the predictor variables should also be

relative to industry averages. For example,

relative price of the firm is PREL Firms

Price / Industry Avg. Price

Estimating Demand

- Situation Overview

- Objectives of Analysis

Objectives

- Obtaining reliable forecasts for the Firm Demand
- Build a more dependable model based on regression

analysis - Make operations more efficient based on more

reliable forecasts - Monitor patterns in the overall demand for the

industry

- Defining the Data

Descriptions of the Variables

- FD Firm Demand
- Demand for our Firms product
- TID Total Industry Demand
- Includes demand for all competitors in the market
- MS Market Share
- Firms percentage share of the market
- N Number of Competitors
- AFD Average Demand
- Average Demand for all competitors in the market
- Avg. Demand Total Industry Demand / N
- RD - Relative Demand
- The firms demand relative to the average demand

of the market - Relative Demand Firm Demand / Avg. Demand

Descriptions of the Variables (continued)

- Firm Demand Total Industry Demand Market

Share - Market Share Firm Demand / Total Industry

Demand - Market Share Firm Demand / (Avg. Demand N)
- Market Share Relative Demand / N
- Hence
- Firm Demand Total Industry Demand (Relative

Demand / N)

Predictor Variables for Total Industry Demand

- Total Industry Demand is a factor of individual

company demand - Demand for each company depends on macro-economic

influences and overall industry trends - Average Price Indicates industry trends in

pricing, and is reasonable to use instead of

individual competitor prices - Average Advertising Indicates industry

expenditures in promotions, marketing - In this case, also factors in RD

Predictor Variables for Relative Demand

- Relative Demand is a measure of the firms market

share - Hence, the predictor variables should also be

relative to Industry Average - PREL Relative Price
- PREL Firm Price / Average Price
- AREL Relative Advertising
- AREL Firm Advertising / Average Advertising
- RD1 Relative Loyalty
- This is estimated for the analysis by using the

Relative Demand from the previous quarter

- Description of Variables

TID Summary Measures

TID Describing Avg. Price with Graphs

- Values seem to be trending down over time
- Relatively stable - In the 375 range for the 19

quarters observed - Median is to the right of the Mean in the

Box-Plot - Indicates negative skewness

TID Describing Avg. Advertising with Graphs

- Some indications of seasonality from the

time-series plot - Values change significantly with time
- Median is almost the same as the mean
- Slight negative skewness exists

TID Describing Total Industry Demand with Graphs

- Time series plot indicates a steadily increasing

demand for the product - Short term increases are steep
- Mean is to the right of the Median
- Indicates positive skewness

TID Correlations Matrix

- Decent correlation between Quarter and TID
- Consider Quarter for regression analysis
- High correlation between Avg Price and TID
- Good candidate for regression analysis
- High correlation exists between Avg Advertising

and TID - Good candidate for regression analysis
- Significant correlation between Avg Price, Avg

Advertising and Quarter - Potential candidates for variable exclusion

during regression analysis

RD Scatter Plot Arel vs RD

- Trendline indicates a positive relationship
- A correlation factor of .378 indicating less

correlation between the two variables - Arel is a potential candidate to be discarded

RD Scatter Plot Prel vs RD

- Trendline indicates a negative relationship
- Has a correlation factor of -0.67 indicating a

fair amount of correlation between the two

variables

RD Scatter Plot RD1 vs RD

- Trendline indicates a positive relationship
- Has a correlation factor of 0.711indicating a

fair amount of correlation between the two

variables

RD Correlations Matrix

- RD has a reasonably high correlation to Prel and

RD1 - RD has a relatively lower correlation to Arel
- The variable could be discarded during regression

analysis - The correlations between Prel, Arel, and RD1 are

low enough to indicate that there will not be

problems related to multicollinearity

- Analysis and Modeling

TID Model Regression Analysis (1)

- Dependent Variable
- Total Industry Demand
- Independent Variable
- Quarter
- Resulting Equation
- TID 14218.0702 (645.9825 Quarter)
- The R2 for the resulting equation indicates that

the variable Quarter explains 48 of TID, and

hence the unexplained value is 52 - This makes for a very poor model despite the

p-value being very low which implies that the

probability of a Type 1 error is minimal

TID Model Regression Analysis (2)

- Dependent Variable
- Total Industry Demand
- Independent Variables
- Quarter, Avg. Price, Avg. Advertising
- Resulting Equation
- TID 130249 (132.228 Quarter) (-358.613

Avg Price) (0.263 Avg. Advertising) - The R2 for the resulting equation indicates that

the model explains 90.69 of TID, and hence the

unexplained value is about 9, this generally

signifies a good model - p-value is within tolerance levels for all

variables, except Quarter which has a p-value of

0.21 and includes the 0 value in its range. Hence

Quarter must be discarded

TID Model Regression Analysis (3)

- Dependent Variable
- Total Industry Demand
- Independent Variables
- Avg. Price, Avg. Advertising
- Resulting Equation
- TID 164336.17 (-445.168 Avg Price) (0.262

Avg. Advertising) - The R2 for the resulting equation indicates that

the model explains 89.67 of TID, and hence the

unexplained value is about 10.5 - p-value is within tolerance levels for all

variables, and none of the variables include the

0 value in its range - This model is good, and is the model that will be

used for TID

RD Model Regression Analysis (3)

- Dependent Variable
- Relative Demand
- Independent Variables
- Relative Pricing (Prel), Relative Advertising

(Arel), Loyalty (RD1) - Resulting Equation
- RD 16.13 (-16.445 Prel) (0.779 Arel)

(0.533 RD1) - The R2 for the resulting equation indicates that

the model explains 95.75 of RD, and hence the

unexplained value is about 4.25 - p-value is within tolerance levels for all

variables, and none of the variables include the

0 value in its range - This model is very good, and is the model that

will be used for RD

- Verifying the Model

Verification of TID Model

Verification of RD Model

- Using the Model

DSS Forecasting Firm Demand

- The coefficients from the TID model and the RD

model are entered into this model - Based on estimated values for the various items

on the left, the Firm Demand is calculated by

this model