Combining GLM and data mining techniques - PowerPoint PPT Presentation

About This Presentation
Title:

Combining GLM and data mining techniques

Description:

Greg Taylor. Taylor Fry Consulting Actuaries. University of Melbourne ... ANN may be most useful as an automated tool for seeking out detailed trends in data ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 30
Provided by: PeterMu6
Category:

less

Transcript and Presenter's Notes

Title: Combining GLM and data mining techniques


1
Combining GLM and data mining techniques
  • Greg Taylor
  • Taylor Fry Consulting Actuaries
  • University of Melbourne
  • University of New South Wales
  • Casualty Actuarial Society
  • Special Interest Seminar on Predictive Modeling
  • Boston, October 4-5 2006

2
Overview
  • Examine general form of model of claims data
  • Examine the specific case of a GLM to represent
    the data
  • Consider how the GLM structure is chosen
  • Introduce and discuss Artificial Neural Networks
    (ANNs)
  • Consider how these may assist in formulating a
    GLM
  • Presentation draws heavily on work of colleague
    Dr Peter Mulquiney

3
Model of claims data
  • General form of claims data model
  • Yi f(Xi Ăź) ei
  • Yi some observation on claims experience
  • Ăź vector of parameters that apply to all
    observations
  • Xi vector of attributes (covariates) of i-th
    observation
  • ei vector of centred stochastic error terms

4
Model of claims data
  • General form of claims data model
  • Yi f(Xi Ăź) ei
  • Yi some observation on claims experience
  • Ăź vector of parameters that apply to all
    observations
  • Xi vector of attributes (covariates) of i-th
    observation
  • ei vector of centred stochastic error terms
  • Examples
  • Yi Yad paid losses in (a,d) cell
  • a accident period
  • d development period
  • Yi cost of i-th completed claim

5
Examples (contd)
  • Yad paid losses in (a,d) cell
  • EYad Ăźd Sr1d-1 Yar (chain ladder)

6
Examples (contd)
  • Yad paid losses in (a,d) cell
  • EYad Ăźd Sr1d-1 Yar (chain ladder)
  • EYad A db exp(-cd) exp aĂź ln d - ?d
    (Hoerl curve for each accident periods payments)

7
Examples (contd)
  • Yad paid losses in (a,d) cell
  • EYad Ăźd Sr1d-1 Yar (chain ladder)
  • EYad A db exp(-cd) exp aĂź ln d - ?d
    (Hoerl curve for each accident periods payments)
  • Yi cost of i-th completed claim
  • Yi Gamma
  • EYi exp aĂź ti
  • where
  • ai accident period to which i-th claim belongs
  • ti operational time at completion of i-th claim
  • proportion of claims from the accident
    period ai completed before i-th claim

8
Examples of individual claim models
  • More generally
  • EYi
  • exp function of operational time

9
Examples of individual claim models (contd)
  • More generally
  • EYi
  • exp function of operational time
  • function of accident period (legislative
    change)

10
Examples of individual claim models (contd)
  • More generally
  • EYi
  • exp function of operational time
  • function of accident period (legislative
    change)
  • function of completion period (superimposed
    inflation)

11
Examples of individual claim models (contd)
  • More generally
  • EYi
  • exp function of operational time
  • function of accident period (legislative
    change)
  • function of completion period (superimposed
    inflation)
  • joint function (interaction) of operational
    time accident period (change in payment pattern
    attributable to legislative change)

12
Examples of individual claim models (contd)
  • Models of this type may be very detailed
  • May include
  • Operational time effect (payment pattern)
  • Seasonality
  • Creeping change in payment pattern
  • Abrupt change in payment pattern
  • Accident period effect (legislative change)
  • Completion quarter effect (superimposed
    inflation)
  • Variations in superimposed inflation over time
  • Variations of superimposed inflation with
    operational time
  • etc

13
Identification of data features
  • Typically largely ad hoc, using
  • Trial and error regressions
  • Diagnostics, e.g. residual plots

14
Identification of data features - illustration
  • Modelling about 60,000 Auto Bodily Injury claims
  • First fitting just an operational time effect

15
Identification of data features - illustration
  • But there appear to be unmodelled trends by
  • Accident quarter
  • Completion (finalisation) quarter

16
Identification of data features - illustration
  • Final model includes terms for
  • Operational time
  • Seasonality
  • Claim frequency
  • Decrease induces increased claim sizes
  • Accident quarter
  • Change in Scheme rules
  • Change in operational time effect with change in
    Scheme rules
  • Superimposed inflation
  • Varying with operational time

17
Identification of data features alternative
approach
  • Final model is complex in structure
  • Structure identified in ad hoc manner
  • More rigorous approach desirable
  • Try Artificial Neural Network (ANN)
  • Essentially a form of non-linear regression

18
(Feed-forward) ANN for regression problem Y
f(X)
  • Start with vector of P inputs X xp
  • Create hidden layer with M hidden units
  • Make M linear combinations of inputs
  • Linear combinations then passed through layer of
    activation functions g(hm)

19
ANN for Regression problem Y f(X)
  • Activation function
  • Commonly a sigmoidal curve
  • Function ? introduces non-linearity to model
  • ? keeps response bounded

20
ANN for Regression problem Y f(X)
  • Y is then given by a linear combination of the
    outputs from the hidden layer
  • This function can describe any continuous
    function
  • 2 hidden layers ? ANN can describe any function

21
Illustration of ANN

Wm
g
Zm
hm
wm
Xi
22
Training of ANN
  • Weights are usually determined by minimising the
    least-squares error
  • Weight decay penalty function stops overfitting
  • Larger ? ? smaller weights
  • Smaller weights ? smoother fit

23
Training of ANN - example
  • Training data set 70 of available data
  • Test data set 30 of available data
  • Network structure
  • Single hidden layer
  • 20 units
  • Weight decay ?0.05
  • These tuning parameters determined by
    cross-validation
  • Prediction error in test data set

24
Comparison of GLM and ANN
  • GLM
  • Average absolute error
  • 33,777
  • ANN
  • Average absolute error
  • 33,559

25
GLM and ANN forecasts
  • Both by simple extrapolation of trends here
  • ANN case
  • Development quarter 10 red
  • Development quarter 20 green
  • Development quarter 30 yellow
  • Development quarter 40 blue
  • Note negative superimposed inflation
  • May be undesirable

ANN extrapolation
26
GLM and ANN forecasts
  • Note negative superimposed inflation
  • May be undesirable
  • But ANN useful in searching out general form of
    past superimposed inflation
  • Which can then be modelled explicitly in GLM

ANN extrapolation
27
Application of ANN
  • Generalisation of preceding remark
  • ANN may be most useful as an automated tool for
    seeking out detailed trends in data
  • Apply ANN to data set
  • Study trends in fitted model against a range of
    predictors or pairs of predictors
  • Use this knowledge to choose the functional forms
    of included in the linear predictor of the GLM

28
Application of ANN (contd)
  • Ultimate test of the GLM is to apply ANN to its
    residuals, seeking structure
  • There should be none
  • The example indicates that the chosen GLM
    structure may
  • Over-estimate the more recent experience at the
    mid-ages of claim
  • Under-estimate it at the older ages

29
Conclusions
  • GLMs provide a powerful and flexible family of
    models for claims data
  • Complex GLM structures may be required for
    adequate representation of the data
  • The identification of these may be difficult
  • The identification procedures are likely to be ad
    hoc
  • ANNs provide an alternative form of non-linear
    regression
  • These are likely to involve their own
    shortcomings if left to stand on their own
  • They may, however, provide considerable
    assistance if used in parallel with GLMs to
    identify GLM structure
Write a Comment
User Comments (0)
About PowerShow.com