Loading...

PPT – Chapter 7 Polynomial Regression Models PowerPoint presentation | free to download - id: 624bf7-NTliM

The Adobe Flash plugin is needed to view this content

Chapter 7 Polynomial Regression Models

- Ray-Bing Chen
- Institute of Statistics
- National University of Kaohsiung

7.1 Introdution

- The linear regression model y X? ? is a

general model for fitting any relationship that

is linear in the unknown parameter ?. - Polynomial regression model

7.2 Polynomial Models in One Variable

- 7.2.1 Basic Principles
- A second-order model (quadratic model)

(No Transcript)

- Polynomial models are useful in situations where

the analyst knows that curvilinear effects are

present in the true response function. - Polynomial models are also useful as

approximating functions to unknown and possible

very complex nonlinear relationship. - Polynomial model is the Taylor series expansion

of the unknown function.

- Several important conditions
- Order of the model The order (k) should be as

low as possible. The high-order polynomials (k gt

2) should be avoided unless they can be justified

for reasons outside the data. In an extreme case

it is always possible to pass a polynomial of

order n-1 through n point so that a polynomial of

sufficiently high degree can always be found that

provides a good fit to the data. - Model Building Strategy Various strategies for

choosing the order of an approximating polynomial

have been suggested. Two procedures forward

selection and backward elimination.

- Extrapolation Extrapolation with polynomial

models can be extreme hazardous. (see Figure 7.2) - Ill-Conditioning I The XX matrix becomes

ill-conditioned as the order increases. It means

that the matrix inversion calculations will be

inaccurate, and considerable error may be

introduced into the parameter estimates. - Ill-Conditioning II If the values of x are

limited to a narrow range, there can be

significant ill-conditioning or multicollinearity

in the columns of X.

(No Transcript)

- Hierarchy The regression model
- is said to be hierarchical because it

contains all terms of order three and lower. Only

hierarchical models are invariant under linear

transformation. - Example 7.1 The Hardwood Data
- The strength of kraft paper (y) v.s. the of

hardwood. - Data in Table 7.1
- A scatter plot in Figure 7.3

(No Transcript)

(No Transcript)

(No Transcript)

(No Transcript)

(No Transcript)

(No Transcript)

- 7.2.2 Piecewise Polynomial Fitting (Splines)
- Sometimes a low-order polynomial provides a poor

fit to the data. But increasing the order of the

polynomial modestly does not substantially

improve the situation. - This problem may occur when the function behaves

differently in different parts of the range of x.

- A usual approach is to divide the range of x into

segments and fit an appropriate curve in each

segment. - Spline functions offer a useful way to perform

this type of piecewise polynomial fitting.

- Splines are piecewise polynomials of order k.
- The joint points of the pieces are usually called

knots. - Generally the function values and the first k-1

derivatives agree at the knots. That is slpine is

a continuous function with k-1 continues

derivatives. - Cubic Spline

- It is not simple to decide the number and

position of the knots and the order of the

polynomial in each segment. - Wold (1974) suggests
- there should be as few knots as possible, with at

least four or five data points per segment. - There should be no more than one extreme point

and one point of inflexion per segment. - The great flexibility of spline functions makes

it very easy to overfit the data.

- Cubic slpine model with h knots and no continuous

restriction - The fewer continuity restrictions required, the

better if the fit. - The more continuity restrictions required, the

worse is the fit but smoother the final curve

will be.

- XX becomes ill-conditioned if there is a large

number of knots. - Use a different representation of the slpine

cubic B-spline.

- Example 7.2 Voltage Drop Data
- The battery voltage drop in a guided missile

motor observed over the time of missile flight is

shown in Table 7.3. - The Scatter-plot is in Figure 7.6
- Model the data with a cubic slpine using two

knots at 6.5 and 13.

(No Transcript)

- The ANOVA
- A plot of the residual v.s. the fitted values and

a normal probability plot of the residuals are in

Figure 7.7 and Figure 7.8

(No Transcript)

(No Transcript)

(No Transcript)

(No Transcript)

- Example 7.3 Piecewise Linear Regression
- An important special case of practical interest

fitting piecewise linear regression models. - This can be treated easily using linear splines.

(No Transcript)

- 7.2.3 Polynomial and Trigonometric Terms
- Sometimes consider the models as the combination

of polynomial and trigonometric terms. - From the scatter-plot, there may be some

periodicity or cyclic behavior in the data. - A model with fewer terms may result than if only

polynomial terms are employed. - The model

- If the regressor x is equally spaced, then the

pairs of terms sin(jx) and cos(jx) are

orthogonal. - Even without exactly equal spacing, the

correlation between these terms will usually be

quite small. - In Example 7.2
- Rescale the regressor x so that all of the

observations are in the interval (0, 2?). - Fit the model with d 2 and r 1
- R2 0.9895 and MSRes 0.0767

7.3 Nonparamteric Regression

- Nonparameter regression is closed related to the

piecewise polynomial regression. - Develop a model free basis for predicting the

response over the range of the data.

- 7.3.1 Kernel Regression
- The kernel smoother use a weighted average of

the data. - where Swij is the smoothing

matrix. - Typically, the weights are chosen such that wij ?

0 for all yis outside of s defined

neighborhood of the specific location of

interest.

- These kernel smoothers use a bandwidth, b, to

define this neighborhood of interest. - A large value for b results in more of the data

being used to predict the response at the

specific location. - The resulting plot of predicted values becomes

much smoother as b increases. - As b decrease, less of the data are used to

generate the prediction, and the resulting plot

looks more wiggly or bumpy.

- This approach is called a kernel smoother.
- A kernel function
- See Table 7.5

- 7.3.2 Locally Weighted Regression (Loess)
- Another nonparameteric method
- Loess also uses the data from a neighborhood

around the specific location. - The neighborhood is defined as the span, which is

the fraction of the total points used to form

neighborhoods. - A span 0.5 indicates that the closest half of the

total data points is used as the neighborhood. - Then loess procedure uses the points in the

neighborhood to generate a weighted least-squares

estimate of the specific response.

- The weights are based on the distance of the

points used in the estimation from the specific

location of interest. - Let x0 be the specific location of interest, and

let ?(x0) be the distance the farthest point in

the neighborhood lies from the specific location

of interest. - The tri-cube weighted function is

- The model
- Since

- A common estimate of variance is
- R2 (SST SSRes) / SST

- Example 7.4 Applying Loess Regression to the

Windmill Data

(No Transcript)

(No Transcript)

(No Transcript)

(No Transcript)

(No Transcript)

- 7.3.3 Final Cautions
- Parametric models are guided by appropriate

subject area theory. - Nonparametric models almost always reflect pure

empiricism. - One should always prefer a simple parametric

model when it provides a reasonable and

satisfactory fit to the data. - The model terms often have important

interpretations. - One should prefer the parametric model,

especially when subject area theory supports the

transformation used.

- On the other hand, there are many situations

where no simple parametric model yields an

adequate or satisfactory fit to the data, where

there is little or no subject area theory to

guide the analyst, and where no simple

transformation appears appropriate. - In such cases, nonparametric regression makes a

great deal of sense. - One is willing to accept the relative complexity

and the black box nature of the estimation in

order to give an adequate fit to the data.

7.4 Polynomial Models in Two or More Variables

(No Transcript)

- Response surface methodology (RSM) is widely

applied in industry for modeling the output

response(s) of a process in terms of the

important controllable variables and then finding

the operating conditions that optimize the

response. - Illustrate fitting a second-order response

surface in two variables. - y the percent conversion of a chemical process
- T reaction temperature
- C reaction concentration
- Figure 7.14 shows a central composite design.

- Second-order model
- See p.246
- The fitted model is
- The ANOVA table

(No Transcript)

- R2 and adjusted R2 values for this model are

satisfactory.

(No Transcript)

(No Transcript)

(No Transcript)

(No Transcript)

(No Transcript)

- From the response surface plots, the maximum

percent conversion occurs at about 245C and 20

concentration. - The experimenter is interested in predicting the

response y pr estimating the mean response at a

particular point in the process variable space.

(No Transcript)

7.5 Orthogonal Polynomial

- In fitting polynomial model in one variable, even

if nonessential ill-conditioning is removed by

centering, we may still have high levels of

multicollinearity.

(No Transcript)

- Suppose the model is,
- Then XX is
- The estimators are

(No Transcript)

- Example 7.5 Orthogonal Polynomial
- The effect of various reorder quantities on the

average annual cost of the inventory.

(No Transcript)

- The fitted equation is