Loading...

PPT – Chapter 13 Generalized Linear Models PowerPoint presentation | free to view - id: 70bde6-MzVkN

The Adobe Flash plugin is needed to view this content

Chapter 13Generalized Linear Models

Generalized Linear Models

- Traditional applications of linear models, such

as DOX and multiple linear regression, assume

that the response variable is - Normally distributed
- Constant variance
- Independent
- There are many situations where these assumptions

are inappropriate - The response is either binary (0,1), or a count
- The response is continuous, but nonnormal

Some Approaches to These Problems

- Data transformation
- Induce approximate normality
- Stabilize variance
- Simplify model form
- Weighted least squares
- Often used to stabilize variance
- Generalized linear models (GLM)
- Approach is about 25-30 years old, unifies linear

and nonlinear regression models - Response distribution is a member of the

exponential family (normal, exponential, gamma,

binomial, Poisson)

Generalized Linear Models

- Original applications were in biopharmaceutical

sciences - Lots of recent interest in GLMs in industrial

statistics - GLMs are simple models include linear regression

and OLS as a special case - Parameter estimation is by maximum likelihood

(assume that the response distribution is known) - Inference on parameters is based on large-sample

or asymptotic theory - We will consider logistic regression, Poisson

regression, then the GLM

References

- Montgomery, D. C., Peck, E. A5, and Vining, G. G.

(2012), Introduction to Linear Regression

Analysis, 4th Edition, Wiley, New York (see

Chapter 14) - Myers, R. H., Montgomery, D. C., Vining, G. G.

and Robinson, T.J. (2010), Generalized Linear

Models with Applications in Engineering and the

Sciences, 2nd edition, Wiley, New York - Hosmer, D. W. and Lemeshow, S. (2000), Applied

Logistic Regression, 2nd Edition, Wiley, New York

- Lewis, S. L., Montgomery, D. C., and Myers, R. H.

(2001), Confidence Interval Coverage for

Designed Experiments Analyzed with GLMs, Journal

of Quality Technology 33, pp. 279-292 - Lewis, S. L., Montgomery, D. C., and Myers, R. H.

(2001), Examples of Designed Experiments with

Nonnormal Responses, Journal of Quality

Technology 33, pp. 265-278 - Myers, R. H. and Montgomery, D. C. (1997), A

Tutorial on Generalized Linear Models, Journal

of Quality Technology 29, pp. 274-291

Binary Response Variables

- The outcome ( or response, or endpoint) values 0,

1 can represent success and failure - Occurs often in the biopharmaceutical field

dose-response studies, bioassays, clinical trials - Industrial applications include failure analysis,

fatigue testing, reliability testing - For example, functional electrical testing on a

semiconductor can yield - success in which case the device works
- failure due to a short, an open, or some other

failure mode

Binary Response Variables

- Possible model
- The response yi is a Bernoulli random variable

Problems With This Model

- The error terms take on only two values, so they

cant possibly be normally distributed - The variance of the observations is a function of

the mean (see previous slide) - A linear response function could result in

predicted values that fall outside the 0, 1

range, and this is impossible because

Binary Response Variables The Challenger Data

Temperature at Launch At Least One O-ring Failure Temperature at Launch At Least One O-ring Failure

53 1 70 1

56 1 70 1

57 1 72 0

63 0 73 0

66 0 75 0

67 0 75 1

67 0 76 0

67 0 76 0

68 0 78 0

69 0 79 0

70 0 80 0

70 1 81 0

Data for space shuttle launches and static tests

prior to the launch of Challenger

Binary Response Variables

- There is a lot of empirical evidence that the

response function should be nonlinear an S

shape is quite logical - See the scatter plot of the Challenger data
- The logistic response function is a common choice

(No Transcript)

The Logistic Response Function

- The logistic response function can be easily

linearized. Let - Define
- This is called the logit transformation

Logistic Regression Model

- Model
- The model parameters are estimated by the method

of maximum likelihood (MLE)

A Logistic Regression Model for the Challenger

Data (Using Minitab)

Binary Logistic Regression O-Ring Fail versus

Temperature Link Function Logit Response

Information Variable Value Count O-Ring F

1 7 (Event) 0

17 Total 24 Logistic

Regression Table

Odds 95 CI Predictor

Coef SE Coef Z P Ratio

Lower Upper Constant 10.875 5.703

1.91 0.057 Temperat -0.17132 0.08344

-2.05 0.040 0.84 0.72

0.99 Log-Likelihood -11.515

A Logistic Regression Model for the Challenger

Data

Test that all slopes are zero G 5.944, DF 1,

P-Value 0.015 Goodness-of-Fit Tests Method

Chi-Square DF P Pearson

14.049 15 0.522 Deviance

15.759 15 0.398 Hosmer-Lemeshow

11.834 8 0.159

Note that the fitted function has been extended

down to 31 deg F, the temperature at which

Challenger was launched

Maximum Likelihood Estimation in Logistic

Regression

- The distribution of each observation yi is
- The likelihood function is
- We usually work with the log-likelihood

Maximum Likelihood Estimation in Logistic

Regression

- The maximum likelihood estimators (MLEs) of the

model parameters are those values that maximize

the likelihood (or log-likelihood) function - ML has been around since the first part of the

previous century - Often gives estimators that are intuitively

pleasing - MLEs have nice properties unbiased (for large

samples), minimum variance (or nearly so), and

they have an approximate normal distribution when

n is large

Maximum Likelihood Estimation in Logistic

Regression

- If we have ni trials at each observation, we can

write the log-likelihood as - The derivative of the log-likelihood is

Maximum Likelihood Estimation in Logistic

Regression

- Setting this last result to zero gives the

maximum likelihood score equations - These equations look easy to solveweve actually

seen them before in linear regression

Maximum Likelihood Estimation in Logistic

Regression

- Solving the ML score equations in logistic

regression isnt quite as easy, because - Logistic regression is a nonlinear model
- It turns out that the solution is actually fairly

easy, and is based on iteratively reweighted

least squares or IRLS (see Appendix for details) - An iterative procedure is necessary because

parameter estimates must be updated from an

initial guess through several steps - Weights are necessary because the variance of the

observations is not constant - The weights are functions of the unknown

parameters

Interpretation of the Parameters in Logistic

Regression

- The log-odds at x is
- The log-odds at x 1 is
- The difference in the log-odds is

Interpretation of the Parameters in Logistic

Regression

- The odds ratio is found by taking antilogs
- The odds ratio is interpreted as the estimated

increase in the probability of success

associated with a one-unit increase in the value

of the predictor variable

Odds Ratio for the Challenger Data

- This implies that every decrease of one

degree in temperature increases the odds of

O-ring failure by about 1/0.84 1.19 or 19

percent - The temperature at Challenger launch was 22

degrees below the lowest observed launch

temperature, so now - This results in an increase in the odds of

failure of 1/0.0231 43.34, or about 4200

percent!! - Theres a big extrapolation here, but if you

knew this prior to launch, what decision would

you have made?

Inference on the Model Parameters

Inference on the Model Parameters

See slide 15 Minitab calls this G.

Testing Goodness of Fit

Pearson chi-square goodness-of-fit statistic

The Hosmer-Lemeshow goodness-of-fit statistic

Refer to slide 15 for the Minitab output showing

all three goodness-of-fit statistics for the

Challenger data

Likelihood Inference on the Model Parameters

- Deviance can also be used to test hypotheses

about subsets of the model parameters (analogous

to the extra SS method) - Procedure

Inference on the Model Parameters

- Tests on individual model coefficients can also

be done using Wald inference - Uses the result that the MLEs have an approximate

normal distribution, so the distribution of - is standard normal if the true value of the

parameter is zero. Some computer programs report

the square of Z (which is chi-square), and

others calculate the P-value using the t

distribution - See slide 14 for the Wald test on the

temperature parameter for the Challenger data

Another Logistic Regression Example The

Pneumoconiosis Data

- A 1959 article in Biometrics reported the data

(No Transcript)

(No Transcript)

The fitted model

(No Transcript)

(No Transcript)

(No Transcript)

Diagnostic Checking

(No Transcript)

(No Transcript)

(No Transcript)

Consider Fitting a More Complex Model

A More Complex Model

Is the expanded model useful? The Wald test on

the term (Years)2 indicates that the term is

probably unnecessary. Consider the difference in

deviance

Compare the P-values for the Wald and deviance

tests

(No Transcript)

(No Transcript)

(No Transcript)

Other models for binary response data

Logit model

Probit model

Complimentary log-log model

(No Transcript)

More than two categorical outcomes

(No Transcript)

(No Transcript)

Poisson Regression

- Consider now the case where the response is a

count of some relatively rare event - Defects in a unit of product
- Software bugs
- Particulate matter or some pollutant in the

environment - Number of Atlantic hurricanes
- We wish to model the relationship between the

count response and one or more regressor or

predictor variables - A logical model for the count response is the

Poisson distribution

Poisson Regression

- Poisson regression is another case where the

response variance is related to the mean in

fact, in the Poisson distribution - The Poisson regression model is
- We assume that there is a function g that relates

the mean of the response to a linear predictor

Poisson Regression

- The function g is called a link function
- The relationship between the mean of the response

distribution and the linear predictor is - Choice of the link function
- Identity link
- Log link (very logical for the Poisson-no

negative predicted values)

Poisson Regression

- The usual form of the Poisson regression model is

- This is a special case of the GLM Poisson

response and a log link - Parameter estimation in Poisson regression is

essentially equivalent to logistic regression

maximum likelihood, implemented by IRLS - Wald (large sample) and Deviance

(likelihood-based) based inference is carried out

the same way as in the logistic regression model

An Example of Poisson Regression

- The aircraft damage data
- Response y the number of locations where damage

was inflicted on the aircraft - Regressors

The table contains data from 30 strike

missions There is a lot of multicollinearity in

this data the A-6 has a two-man crew and is

capable of carrying a heavier bomb load All three

regressors tend to increase monotonically

Based on the full model, we can remove

x3 However, when x3 is removed, x1 (type of

aircraft) is no longer significant this is not

shown, but easily verified This is probably

multicollinearity at work Note the Type 1 and

Type 3 analyses for each variable Note also that

the P-values for the Wald tests and the Type 3

analysis (based on deviance) dont agree

Lets consider all of the subset regression

models

Deleting either x1 or x2 results in a

two-variable model that is worse than the full

model Removing x3 gives a model equivalent to the

full model, but as noted before, x1 is

insignificant One of the single-variable models

(x2) is equivalent to the full model

The one-variable model with x2 displays no lack

of fit (Deviance/df 1.1791) The prediction

equation is

Another Example Involving Poisson Regression

- The mine fracture data
- The response is a count of the number of

fractures in the mine - The regressors are

The indicates the best model of a specific

subset size Note that the addition of a term

cannot increase the deviance (promoting the

analog between deviance and the usual residual

sum of squares) To compare the model with only

x1, x2, and x4 to the full model, evaluate the

difference in deviance 38.03 - 37.86 0.17 with

1 df. This is not significant.

There is no indication of lack of fit

deviance/df 0.9508 The final model is

The Generalized Linear Model

- Poisson and logistic regression are two special

cases of the GLM - Binomial response with a logistic link
- Poisson response with a log link
- In the GLM, the response distribution must be a

member of the exponential family - This includes the binomial, Poisson, normal,

inverse normal, exponential, and gamma

distributions

The Generalized Linear Model

- The relationship between the mean of the response

distribution and the linear predictor is

determined by the link function - The canonical link is specified when
- The canonical link depends on the choice of the

response distribution

Canonical Links for the GLM

Links for the GLM

- You do not have to use the canonical link, it

just simplifies some of the mathematics - In fact, the log (non-canonical) link is very

often used with the exponential and gamma

distributions, especially when the response

variable is nonnegative - Other links can be based on the power family (as

in power family transformations), or the

complimentary log-log function

Parameter Estimation and Inference in the GLM

- Estimation is by maximum likelihood (and IRLS)

for the canonical link the score function is - For the case of a non-canonical link,
- Wald inference and deviance-based inference is

conducted just as in logistic and Poisson

regression

This is classical data analyzed by many. y

cycles to failure, x1 cycle length, x2

amplitude, x3 load The experimental design is a

33 factorial Most analysts begin by fitting a

full quadratic model using ordinary least squares

Design-Expert V6 was used to analyze the data A

log transform is suggested

The Final Model is First-Order

Response Cycles Transform Natural

log Constant 0.000 ANOVA for Response

Surface Linear Model Analysis of variance table

Partial sum of squares Sum of Mean F Source

Squares DF Square Value Prob gt

F Model 22.32 3 7.44 213.50 lt 0.0001 A 12.47 1 1

2.47 357.87 lt 0.0001 B 7.11 1 7.11 204.04 lt

0.0001 C 2.74 1 2.74 78.57 lt 0.0001 Residual 0.8

0 23 0.035 Cor Total 23.12 26 Std.

Dev. 0.19 R-Squared 0.9653 Mean 6.34 Adj

R-Squared 0.9608 C.V. 2.95 Pred

R-Squared 0.9520 PRESS 1.11 Adeq

Precision 51.520 Coefficient Standard 95

CI 95 CI Factor Estimate DF Error Low High

Intercept 6.34 1 0.036 6.26 6.41

A-A 0.83 1 0.044 0.74 0.92 B-B -0.63 1 0.044 -0.

72 -0.54 C-C -0.39 1 0.044 -0.48 -0.30

Contour plot (log cycles) response surface

(cycles)

A GLM for the Worsted Yarn Data

- We selected a gamma response distribution with a

log link - The resulting GLM (from SAS) is
- Model is adequate little difference between GLM

OLS - Contour plots (predictions) very similar

The SAS PROC GENMOD output for the worsted yarn

experiment, assuming a first-order model in the

linear predictor Scaled deviance divided by df is

the appropriate lack of fit measure in the gamma

response situation

Comparison of the OLS and GLM Models

A GLM for the Worsted Yarn Data

- Confidence intervals on the mean response are

uniformly shorter from the GLM than from least

squares - See Lewis, S. L., Montgomery, D. C., and Myers,

R. H. (2001), Confidence Interval Coverage for

Designed Experiments Analyzed with GLMs, JQT,

33, pp. 279-292 - While point estimates are very similar, the GLM

provides better precision of estimation

Residual Analysis in the GLM

- Analysis of residuals is important in any

model-fitting procedure - The ordinary or raw residuals are not the best

choice for the GLM, because the approximate

normality and constant variance assumptions are

not satisfied - Typically, deviance residuals are employed for

model adequacy checking in the GLM. - The deviance residuals are the square roots of

the contribution to the deviance from each

observation, multiplied by the sign of the

corresponding raw residual

Deviance Residuals

- Logistic regression
- Poisson regression

Deviance Residual Plots

- Deviance residuals behave much like ordinary

residual in normal-theory linear models - Normal probability plot is appropriate
- Plot versus fitted values, usually transformed to

the constant-information scale

(No Transcript)

Deviance Residual Plots for the Worsted Yarn

Experiment

Overdispersion

- Occurs occasionally with Poisson or binomial data
- The variance of the response is greater than one

would anticipate based on the choice of response

distribution - For example, in the Poisson distribution, we

expect the variance to be approximately equal to

the mean if the observed variance is greater,

this indicates overdispersion - Diagnosis if deviance/df greatly exceeds unity,

overdispersion may be present - There may be other reasons for deviance/df to be

large, such as a poorly specified model, missing

regressors, etc (the same things that cause the

mean square for error to be inflated in ordinary

least squares modeling)

Overdispersion

- Most direct way to model overdispersion is with a

multiplicative dispersion parameter, say ?, where - A logical estimate for ? is deviance/df
- Unless overdispersion is accounted for, the

standard errors will be too small. - The adjustment consists of multiplying the

standard errors by

The Wave-Soldering Experiment

- Response is the number of defects
- Seven design variables
- A prebake condition
- B flux denisty
- C conveyor speed
- D preheat condition
- E cooling time
- F ultrasonic solder agitator
- G solder temperature

The Wave-Soldering Experiment

One observation has been discarded, as it was

suspected to be an outlier This is a resolution

IV design

The Wave-Soldering Experiment

5 of 7 main effects significant AC, AD, BC, and

BD also significant Overdispersion is a possible

problem, as deviance/df is large Overdispersion

causes standard errors to be underestimated, and

this could lead to identifying too many effects

as significant

After adjusting for overdispersion, fewer effects

are significant C, G, AC, and BD the important

factors, assuming a 5 significance level Note

that the standard errors are larger than they

were before, having been multiplied by

The Edited Model for the Wave-Soldering Experiment

Generalized Linear Models

- The GLM is a unification of linear and nonlinear

models that can accommodate a wide variety of

response distributions - Can be used with both regression models and

designed experiments - Computer implementations in Minitab, JMP, SAS

(PROC GENMOD), S-Plus - Logistic regression available in many basic

packages - GLMs are a useful alternative to data

transformation, and should always be considered

when data transformations are not entirely

satisfactory - Unlike data transformations, GLMs directly attack

the unequal variance problem and use the maximum

likelihood approach to account for the form of

the response distribution