Model Building

- Chapter 5

Why Model Building is Important

- By model building, we mean writing a model that

will provide a good fit to a set of data and that

will give good estimates of the mean value of y

and good predictions of future values of y. - The goodness of fit of the model, measured by the

coefficient of determination R 2.

The Two Types of Independent Variables

Quantitative and Qualitative

- Quantitative variable
- Qualitative variable

Definition 5.1

- The different values of an independent variable

used in regression are called its levels.

A p th-Order Polynomial with One Independent

Variable

- where p is an integer and b0, b1,, bp are

unknown parameters that must be estimated.

First-Order (Straight-Line) Model with One

Independent Variable

- Interpretation of model parameters
- b0 y intercept the value of E(y) when x 0
- b1 Slope of the line the change in E(y) for a

1-unit increase in x

A Second-Order (Quadratic) Model with One

Independent Variable

- where b0, b1, and b2 are unknown parameters that

must be estimated. - Interpretation of model parameters
- b0 y intercept the value of E(y) when x 0
- b1 Shift parameter changing the value of b1

shifts the parabola to the right or left

(increasing the value of b1 causes the parabola

to shift to the right) - b2 Rate of curvature

Graphs for Two Second-Order Polynomial Models

B2 gt0

B2 lt 0

Example of the Use of a Quadratic Model

What happens Out here?

Third-Order Model with One Independent Variable

- Interpretation of model parameters
- b0 y intercept the value of E(y) when x 0
- b1 Shift parameter (shifts polynomial right or

left on the x-axis) - b2 Rate of curvature
- b3 The magnitude of b3 controls the rate of

reversal of curvature for the polynomial

First-Order Model in k Quantitative Independent

Variables

- where b0, b1,, bk are unknown parameters that

must be estimated. - Interpretation of model parameters
- b0 y intercept the value of E(y) when x 0
- b1 Change in E(y) for a 1-unit increase in x1,

when x2, x3,, xk, are held fixed - b2 Change in E(y) for a 1-unit increase in x2,

when x1, x3,, xk, are held fixed - .
- .
- .
- bk Change in E(y) for a 1-unit increase in xk,

when x1, x2,, xk-1, are held fixed

Graph

Contour Lines

- Plot Y versus x1 for different values of x2.
- Plot y versus x1 for x2 1
- Plot y versus x1 for x2 2
- Plot y versus x1 for x2 3

Example

Interaction (Second Order) Model with Two

Independent Variables

- Interpretation of Model Parameters
- b0 y intercept the value of E(y) when x1

x2 0 - b1 and b2 Changing b1 and b2 causes the surface

to shift along the x1 and x2 axes - b3 Controls the rate of twist in the ruled

surface (see Figure 5.10)

Continued

- When one independent variable is held fixed, the

model - produces straight lines with the following

slopes - b1 b3 x2 Change in E(y) for a 1-unit increase

in x1, when x2 is held fixed - b2 b3 x1 Change in E(y) for a 1-unit increase

in x2, when x1 is held fixed

Definition 5.2

- Two variables x1 and x2 are said to interact if

the change in E(y) for a 1-unit change in x1

(when x2 is held fixed) is dependent on the value

to x2.

Graph

Contours

Complete Second-Order Model with Two Independent

Variables

- Interpretation of Model Parameters
- b0 y intercept the value of E(y) when x1

x2 0 - b1 and b2 Changing b1 and b2 causes the surface

to shift along the x1 and x2 axes - b3 The value of b3 controls the rotation of

the surface - b4 and b5 Sign and values of these parameters

control the type of surface and the rates of

curvature - Three types of surfaces may be produced by a

second-order model. - A paraboloid that opens upward (Figure 5.12a)
- A paraboloid that opens downward (Figure 5.12b)
- A saddle-shaped surface (Figure 5.12c)

Complete Second-Order Model with Three

Quantitative Independent Variables

- where b0, b1,, b9 are unknown parameters that

must be estimated.

Coding Procedure for Observational Data

- Let
- x Uncoded quantitative independent variable
- u Coded quantitative independent variable
- Then if x takes values x1, x2,, xn for the n

data - points in the regression analysis, let
- where sx is the standard deviation of the x

values, i.e.,

Procedure for Writing with One Qualitative

Independent Variable at k Levels (A,B,C,D,)

- where
- The number of dummy variables for a single

qualitative variable is always 1 less than the

number of levels for the variable. Then, assuming

the base level is A, the mean for each level is

Continued

- ? Interpretations

Population Means

- Show Setup
- Example Page 280

Table 8.4 Summary of the Sample Results for Five

Populations

Multiple t tests

- Null Hypotheses

Analysis of Variance Procedures

- Each of the five populations has a normal

distribution. - The variances of the five populations are equal

that is - The five sets of measurements are independent

random samples from their respective populations.

The Null and Alternative Hypotheses

- (i.e., the t population means are equal)
- At least one of the t population means differs

from the rest.

FIGURE 8.5Distributions of four populations that

satisfy AOV assumptions

Model

Main Effect Model with Two Qualitative

Independent Variables, One at Three Levels (F1,

F2, F3) and the Other at Two Levels (B1, B2)

Interaction Model with Two Qualitative

Independent Variables, One at Three Levels (F1,

F2, F3) and the Other at Two Levels (B1, B2)

Population Means

Factorial Treatment Structure in a Completely

Randomized Design

A factorial experiment is an experiment in which

the response y is observed at all factor-level

combinations of the independent variables.

Population Parameters A by B

Population Parameters 2 by 2

Main Effects

Figure 15.6a Illustration of the Absence of

Interaction in a 2 x 2 Factorial Experiment

Mean response

Factors A and B do not interact

Figure 15.6b,c Illustration of the Presence of

Interaction in a 2 x 2 Factorial Experiment

Factors A and B interact

Level 1, factor B Level 2, factor B

Population Parameters 2 by 2 No Interaction

Population Parameters 2 by 2 Interaction

engine performance example

Graph of sample means for engine performance

example

Pattern of the Model Relating E(y) to k

Qualitative Independent Variables

- Main effect terms for all independent variables
- All two-way interaction terms between pairs of

independent variables - All three-way interaction terms between

different groups of three independent variables - All k-way interaction terms for the k

independent variable

Models with Both Quantitative and Qualitative

Independent Variables

- Perhaps the most interesting data analysis

problems are those that involve both quantitative

and qualitative independent variables. For

example, suppose mean performance of a diesel

engine is a function of one qualitative

independent variable,engine fuel type at levels

F1, F2, and F3 and one quantitative independent

variable, engine speed in revolutions per minute

(rpm). We will proceed to build a model in

stages, showing graphically the interpretation

that we would give to the model at each stage.

This will help you see the contribution of

various terms in the model.

Analysis of Covariance

Example 16.14

Simple Model

Covariate

Common Slope

Factor Level

Simple Model

Hypothesis testing

- Simple Model

SPSS

SPSS Simple Model

SPSS - Simple

What is being tested ?

Estimates

eq1 Predicted sales 17.368

.899prev_sales

eq2 Predicted sales 12.292

.899prev_sales

eq3 Predicted sales 4.391

.899prev_sales

More Complex Model

Covariate

Different Slopes

Factor Level

Complex Model

Hypothesis testing

- Complex Model

SPSS - Complex

SPSS - Complex

What is being tested?

SPSS Complex

What are the prediction equations ?

Which model is appropriate?

- Simple ??
- Complex ??
- We do not know at this point

Need to test

L Matrix

- /lmatrix betas all 0 0 0 1 -1 0
- all 0 0 0 1 0 -1

Additional topics

- Expected Marginal Means
- Test at some other X
- RSQ
- Design Matrix

Problems Areas

- Multi-colinearity
- Problem Points
- Non-constant variance as a function of the

independents - Variable selection

External Model Validation

- Models that fit the sample data well may not be

successful predictors of y when applied to new

data. For this reason, it is important to assess

the validity of the regression model in addition

to its adequacy before using it in practice. - Model Validation involves an assessment of how

the fitted regression model will perform in

practice - Examining the predicted values
- Examining the estimated model parameters
- Collecting new data for prediction
- Data-splitting (cross validation)

