Validation of predictive regression models - PowerPoint PPT Presentation

About This Presentation
Title:

Validation of predictive regression models

Description:

To be able to explain why validation is necessary for predictive models ... Distribution of 30day mortality is binomial. Age has a linear effect ... – PowerPoint PPT presentation

Number of Views:89
Avg rating:3.0/5.0
Slides: 30
Provided by: Steye
Learn more at: https://sites.pitt.edu
Category:

less

Transcript and Presenter's Notes

Title: Validation of predictive regression models


1
Validation of predictive regression models
  • Ewout W. Steyerberg, PhD
  • Clinical epidemiologist
  • Frank E. Harrell, PhD
  • Biostatistician

2
Personal background
  • Ewout Steyerberg Erasmus MC, Rotterdam, the
    Netherlands
  • Frank Harrell Health Evaluation Sciences, Univ
    of Virginia, Charlottesville, VA, USA
  • Validation of predictions from regression models
    is of paramount importance

3
Learning objectives knowledge of
  • common types of regression models
  • fundamental assumptions of regression models
  • performance criteria of predictive models
  • principles of different types of validation

4
Performance objectives
  • To be able to explain why validation is necessary
    for predictive models
  • To be able to judge the adequacy of a validation
    procedure

5
Predictive models provide quantitative estimates
of an outcome, e.g.
  • Quality of life one year after surgery
  • Death at 30 days after surgery
  • Long term survival

6
Predictive models are often based on regression
analysis
  • y a sum(bixi)
  • y outcome variable
  • a intercept
  • bi regression coefficient i
  • xi predictor variable i
  • i in 1,many, usually 2 to 20

7
3 examples of regression
  • Quality of life one year after surgery
  • continuous outcome, linear regression
  • Death at 30 days after surgery
  • binary outcome, logistic regression
  • Long term survival
  • time-to-outcome, Cox regression

8
Predictive models make assumptions
  • Distribution
  • Linearity of continuous variables
  • Additivity of effects

9
Example a simple logistic regression model
  • 30day mortality a b1sex b2age
  • Assumptions
  • Distribution of 30day mortality is binomial
  • Age has a linear effect
  • The effects of sex and age can be added

10
Assessing model assumptions
  • Examine model residuals
  • Perform specific tests
  • add nonlinear terms, e.g. ageage2
  • add interaction terms, e.g. sexage

11
Model assumptions and predictions
  • Better predictions if assumptions are met
  • Some violation inherent in empirical data
  • Evaluate predictions in new data

12
Evaluation of predictions
  • Calibration
  • average of predictions correct?
  • low and high predictions correct?
  • Discrimination
  • distinguish low risk from high risk patients?

13
Example predicted probabilities
14
3 types of validation
  • Apparent performance on sample used to develop
    model
  • Internal performance on population underlying
    the sample
  • External performance on related but slightly
    different population

15
Apparent validity
  • Easy to calculate
  • Results in optimistic performance estimates

16
Apparent estimates optimistic since same data
used for
  • Definition of model structure e.g. selection
    and coding of variables
  • Estimation of model parameters e.g. regression
    coefficients
  • Evaluation of model performance e.g.
    calibration and discrimination

17
Internal validity
  • More difficult to calculate
  • Test model in new data, random from underlying
    population

18
Why internal validation?
  • Honest estimate of performance should be
    obtained, at least for a population similar to
    the development sample
  • Internal validated performance sets an upper
    limit to what may be expected in other settings
    (external validity)

19
External validity
  • Moderately easy to calculate when new data are
    available
  • Test model in new data, different from
    development population

20
Why external validation?
  • Various factors may differ from development
    population, including
  • different selection of patients
  • different definitions of variables
  • different diagnostic or therapeutic procedures

21
Internal validation techniques
  • Split-sample
  • development / validation
  • Cross-validation
  • alternating development / validation
  • extreme n-1 develop / 1 validate (jack-knife)
  • Bootstrap

22
Bootstrap is the preferred internal validation
technique
  • bootstrap sample for model development n
    patients drawn with replacement
  • original sample for validation n patients
  • difference optimism
  • efficiency development and validation on n
    patients

23
Example bootstrap results for logistic
regression model
  • 30-day mortality a b1sex b2age
  • Apparent area under the ROC curve 0.77
  • Mean area of 200 bootstrap samples0.772
  • Mean area of 200 tests in original 0.762
  • Optimism in apparent performance 0.01
  • Optimism-corrected area 0.76

24
External validation techniques
  • Temporal validation same investigators, validate
    in recent years
  • Spatial validation (other place) same
    investigators, cross-validate in centers
  • Fully external other investigators, other centers

25
Example external validity of logistic regression
model
  • 30-day mortality a b1sex b2age
  • Apparent area in 785 patients 0.77
  • Tested in 20,318 other patients 0.74
  • Tested by other investigators ?

26
Example external validation
27
Summary
  • Apparent validity gives an optimistic estimate of
    model performance
  • Internal validity may be estimated by
    bootstrapping
  • External validity should be determined in other
    populations

28
Key references
  • tutorial and book on multivariable
    models(Harrell 1996, Stat Med 15361-87
    Harrell regression modeling strategies,
    Springer 2001)
  • empirical evaluations of strategies (Steyerberg
    2000 Stat Med19 1059-79)
  • internal validation (Steyerberg 2001JCE 54
    774-81)
  • external validation (Justice 1999 Ann Intern
    Med 130515-24 Altman 2000 Stat Med 19 453-73)

29
Links
  • Interactive text book on predictive
    modelinghttp//www.neri.org/symptom/mockup/Chapte
    r_8/
  • Harrells Regression modeling strategieshttp//he
    sweb1.med.virginia.edu/biostat/rms/
Write a Comment
User Comments (0)
About PowerShow.com