Title: Comparison of Recalibration Techniques for Logistic Regression in Interventional Cardiology
1Comparison of Recalibration Techniques for
Logistic Regression in Interventional Cardiology
- Michael E. Matheny, MD
- HST 951 Final Presentation
2Background
- Risk Models are evaluated for accuracy in two
categories - Discrimination ability of a model to separate
data with respect to values of an outcome
variable - Measured by the Area Under the Receiving
Operating Characteristic Curve (ROC or AUC) - Calibration ability of a model to accurately
predict risk for individuals or small subgroups
of the population - Multiple measurements Hosmer-Lemeshow
Goodness-of-Fit, Brier Score, Calibration Slope,
etc.
3Background
- Risk Modeling techniques are generally able to
perform well in terms of discrimination and
calibration on local development and test data - Performance Depends on
- Data collection quality (data noise level)
- Identification of relevant risk factors for an
outcome - Time delay to realization of an outcome
4Background
- A model is most useful when it can be
successfully applied to all patients in that
domain - External validation of these models in multiple
medical domains with various risk modeling
techniques have produced consistent results - Discrimination is preserved
- Calibration tends to fail
5Background
- Multiple Reasons for Calibration Failure
- Problems related to location/medical center
- Different patient demographics / case-mix
- Different outcome event rates
- Possibly different data element definitions
- Problems related to time
- Changes in the standard of medical care
6Background
- Various recalibration methods have been applied
to adapt a risk model to local conditions - Outcome Scaling
- Adjusting the model result by the outcome event
ratio between the new and original models - Model Refitting
- Applying a new model to the result of the
original model - Including the result of the original model as a
covariate in the new model - Remodeling
- Fitting a new model using the same covariates
7Background
- These techniques have been variably successful in
improving calibration for local populations - Relative performance of these techniques has not
been well-described in the literature - Application of these techniques over multiple
consecutive time periods of data for a population
has not been reported
8Background
- Logistic Regression is the most common risk
modeling technique used in medicine - Interventional Cardiology
- High Data Quality (National Data Element
Standard) - Many large published risk models
- Risk factors for outcome are well-known
- Access
9Purpose
- The purpose of this study was to evaluate
well-known recalibration methods for Logistic
Regression over multiple periods of time to
compare the relative performance of each method
in the domain of Interventional Cardiology.
10MethodsSource Data
- Brigham Womens Hospital
- 720 Bed Academic Teaching Hospital
- Interventional Cardiology Suites
- Electronic Data Collection
- Compliant with National Data Element Standard
- State mandated data collection for every case
11MethodsSource Data
- All PCI cases performed from January 01, 2002 to
December 31, 2004 were included - The outcome of interest was post-procedural
in-hospital mortality - A separate data set was created each year of cases
12MethodsSource Data
Year Cases Mortality ()
2002 1947 15 (0.8)
2003 1841 33 (1.8)
2004 1767 33 (1.9)
13MethodsData Collection
- The most well-known LR risk models were utilized
for the evaluation (event rate) - American College of Cardiology (ACC)
- 707/50123 (1.4)
- Northern New England (NNE)
- 165/15331 (1.1)
- Cleveland Clinic (CCL)
- 169/12985 (1.3)
- University of Michigan (MIC)
- 169/10796 (1.6)
14MethodsData Collection
- All statistical evaluations were performed by SAS
9.1 (Cary, NC) - Discrimination was measured by the Area Under the
Receiving Operator Characteristic curve
15MethodsStudy Data
- Three calibration evaluations
- Hosmer-Lemeshow Goodness-of-Fit
- Brier Score / Spiegelhalter Z Score
- Calibration Plot (Intercept/Slope)
- Graphical Only
- Based on risk deciles in HL GOF algorithm
- For each recalibration, the prior year was used
to recalibrate (2002-gt2003, 2003-gt2004)
16MethodsPost-Score Scaling (PSY)
- At the case level, model results are scaled by
the following equation
- P(PSY) can exceed 1 for some values of
ObservedEventRate gt ModelEventRate and these
values are truncated to 1
17MethodsLR Intercept Scaling (IntY)
- In the general LR equation
- B0 is the intercept of the equation
- This variable represents the outcome probability
in the absence of all other risk factors
(baseline risk)
18MethodsLR Intercept Scaling (IntY)
- The proportion of risk contributed by the
intercept (baseline) can be calculated for a data
set by
19MethodsLR Intercept Scaling (IntY)
- The proportion of risk (RiskInt()) is multiplied
by the observed event rate, and converted back to
a Beta Coefficient from a probability
- If ObsEventRate(New) gt ObsEventRate(Old) then the
probability can exceed 1, and is truncated to 1.
20MethodsRecalibration Methods
- LR Model Refitting (SigY)
- In this method, the output probability of the
original LR equation is used to model a new LR
equation with that output as the only covariate
21ResultsROC with 95 Confidence Intervals
22ResultsNo Recalibration
Model Obs Exp HLChi2 Spieg Z
2003
ACC 33 414 634 -11.4
NNE 33 39.0 24.3 0.08
MIC 33 27.2 6.6 1.51
CCL 33 56.3 14.0 -3.49
2004
ACC 33 418 641 -11.8
NNE 33 36.6 51.0 0.41
MIC 33 23.3 22.9 1.99
CCL 33 60.3 21.2 -3.78
23ResultsPost-Scale (PSY) Recalibration
Model Obs Exp HLChi2 Spieg Z
2003
ACC 33 226 210 -13.6
NNE 33 27.9 32.8 1.43
MIC 33 13.4 40.4 5.63
CCL 33 33.3 5.8 -0.74
2004
ACC 33 524 1233 -4.91
NNE 33 58.9 44.7 -1.14
MIC 33 26.7 18.0 1.26
CCL 33 82.9 41.0 -4.79
24Results2003 PSY vs None
25Results2004 PSY vs None
26ResultsLR Intercept Scaling (IntY) Recalibration
Model Obs Exp HLChi2 Spieg Z
2003
ACC 33 45.1 10.0 -2.20
NNE 33 26.0 43.6 2.52
MIC 33 22.1 12.7 2.78
CCL 33 24.8 10.5 1.25
2004
ACC 33 34.1 14.6 -0.90
NNE 33 28.9 69.8 1.82
MIC 33 26.5 17.6 1.22
CCL 33 33.5 14.2 -0.50
27Results2003 IntY vs None
28Results2004 IntY vs None
29ResultsLR Refitting (SigY) Recalibration
Model Obs Exp HLChi2 Spieg Z
2003
ACC 33 24.0 12.7 1.16
NNE 33 18.6 32.9 4.14
MIC 33 20.1 24.0 4.56
CCL 33 25.5 15.2 2.18
2004
ACC 33 32.0 35.7 -0.47
NNE 33 31.2 21.7 1.00
MIC 33 31.0 23.6 0.27
CCL 33 31.6 13.2 0.84
30Results2003 SigY vs None
31Results2004 SigY vs None
32Conclusions
- All 4 Models failed to maintain calibration on
the data without recalibration - Two utilized measures of calibration (HL Brier)
commonly disagreed - If a model was considered to be recalibrated only
if both methods showed calibration, then - Best was Intercept adjustment (IntY) 3 / 8
- 2nd was LR Refitting (SigY) 2 / 8
33Limitations
- Low Event Rate makes attaining statistical
significance for results more difficult - Variation between 2002 and 2003/2004 Event Rates
make recalibration less likely in 2003 compared
to 2004.
34Future Directions
- Analyze 2005 data after conclusion of the year
- Include locally derived model (from 2002 data)
- Include Support Vector Machine evaluation
35 Michael Matheny, MD mmatheny_at_dsg.harvard.edu
Brigham Womens HospitalThorn 30975 Francis
StreetBoston, MA 02115
The End