Data mining in Health Insurance - PowerPoint PPT Presentation

Loading...

PPT – Data mining in Health Insurance PowerPoint presentation | free to download - id: 711fe7-OWIzO



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Data mining in Health Insurance

Description:

Data mining in Health Insurance Other target types: ... data mining/ unsupervised learning / fraud detection * Outline Intro Application Health Insurance Fraud ... – PowerPoint PPT presentation

Number of Views:146
Avg rating:3.0/5.0
Slides: 65
Provided by: Zol98
Learn more at: http://datamining.liacs.nl
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Data mining in Health Insurance


1
Data mining in Health Insurance
2
Introduction
  • Rob Konijn, rob.konijn_at_achmea.nl
  • VU University Amsterdam
  • Leiden Institute of Advanced Computer Science
    (LIACS)
  • Achmea Health Insurance
  • Currently working here
  • Delivering leads for other departments to follow
    up
  • Fraud, abuse
  • Research topic keywords data mining/
    unsupervised learning / fraud detection

3
Outline
  • Intro Application
  • Health Insurance
  • Fraud detection
  • Part 1 Subgroup discovery
  • Part 2 Anomaly detection (slides partly by Z.
    Slavik, VU)

4
Intro Application
  • Health Insurance Data
  • Health Insurance in NL
  • Obligatory
  • Only private insurance companies
  • About 100 euro/month(everyone)170 euro (income)
  • Premium increase of 5-12 each year
  • Achmea about 6 million customers

5
Funding of Health Insurance Costs in the
Netherlands
vereveningsfonds
vereveningsfonds
vereveningsfonds
vereveningsfonds
vereveningsfonds
vereveningsfonds
vereveningsfonds
vereveningsfonds
rijksbijdrage verzekerden 18- 2 mld
vereveningsbijdrage
inkomensafh. bijdrage werkgevers 17 mld
18 mld
zorgverzekeraar
verzekerde
zorgverzekeraar
nominale premie 18 - rekenpremie (
947/vrz) 12 mld - opslag ( 150/vrz) 2 mld
30 mld zorguitgaven
6
Verevenings-model
Mannen
Vrouwen
  • By population characteristics
  • Age
  • Gender
  • Income, social class
  • Type of work
  • Calculation afterwards
  • High costs compensation (gt15.000 euro)

1,400
0 - 4 jr
1,210
1,026
5 - 9 jr
936
907
10 - 14 jr
918
964
15 - 17 jr
1,062
892
18 - 24 jr
1,214
25 - 29 jr
870
1,768
30 - 34 jr
905
1,876
980
35 - 39 jr
1,476
1,044
40 - 44 jr
1,232
45 - 49 jr
1,183
1,366
50 - 54 jr
1,354
1,532
1,639
55 - 59 jr
1,713
60 - 64 jr
1,885
1,905
2,394
65 - 69 jr
2,201
2,826
70 - 74 jr
2,560
75 - 79 jr
3,244
2,886
3,349
80 - 84 jr
3,018
3,424
85 - 89 jr
3,034
90 jr e.o.
3,464
3,014
7
Fraude in de zorg
8
Introduction ApplicationThe Data
  • Transactional data
  • Records of an event
  • Visit to a medical practitioner
  • Charged directly by medical practioner
  • Patient is not involved
  • Risk of fraud

9
Transactional Data
  • Transactions Facts
  • Achmea About 200 mln transactions per year
  • Info of customers and practitioners dimensions

10
Different levels of hierarchy
  • Records represent events
  • However, for example for fraud detection, we are
    interested in customers, or medical practitoners
  • See examples next pages
  • Groups of records Subgroup Discovery
  • Individual patients/practioners outlier detection

11
Different types of fraud hierarchy
  • On a patient level, or on a hospital level

12
Handling different hierarchy
  • Creating profiles from transactional data
  • Aggregating costs over a time period
  • Each record patient
  • Each attribute i 1 to n cost spent on treatment
    i
  • Feature construction, for example
  • The ratio of long/short consults (G.P.)
  • The ratio of 3-way and 2 way fillings (Dentist)
  • Usually used for one-way analysis

13
Different types of fraud detection
  • Supervised
  • A labeled fraud set
  • A labeled non-fraud set
  • Credit cards, debit cards
  • Unsupervised
  • No labels
  • Health Insurance, Cargo, telecom, tax etc.

14
Unsupervised learning in Health Insurance Data
  • Anomaly Detection (outlier detection)
  • Finding individual deviating points
  • Subgroup Discovery
  • Finding (descriptions of) deviating groups
  • Focus on differences and uncommon behavior
  • In contrast to other unsupervised learning
    methods
  • Clustering
  • Frequent Pattern mining

15
Subgroup Discovery
  • Goal Find differences in claim behavior of
    medical practitioners
  • To detect inefficient claim behavior
  • Actions
  • A visit from the account manager
  • To include in contract negotiations
  • In the extreme case fraud
  • Investigation by the fraud detection department
  • By describing deviations of a practitioner from
    its peers
  • Subgroups

16
Patient-level, Subgroup Discovery
  • Subgroup (orange) group of patients
  • Target (red)
  • Indicates whether a patient visited a
    practitioner (1), or not (0)

17
Subgroup Discovery Quality Measures
  • Target Dentist 1672 patiënten
  • Compare with peer group, 100.000 patients in
    total
  • Subgroup V11 gt 42 euro 10347 patients
  • V11 one sided filling
  • Crosstable

  target dentist rest totaal
V11 gt 42 871 9476 10347
rest 801 88852 89653
totaal 1672 98328 100000
18
The cross table
  • Cross table in data
  • Cross table expected
  • Assuming independence

  target dentist rest total
V11 gt 42 871 9476 10347
rest 801 88852 89653
total 1672 98328 100000
  target dentist rest total
V11 gt 42 173 10174 10347
rest 1499 88154 89653
total 1672 98328 100000
19
Calculating Wracc and Lift
  target dentist rest total
V11 gt 42 871 9476 10347
rest 801 88852 89653
total 1672 98328 100000
  target dentist rest total
V11 gt 42 173 10174 10347
rest 1499 88154 89653
total 1672 98328 100000
  • Size subgroup P(S) 0.10347, size target
    dentist P(T) 0.01672
  • Weighted Relative ACCuracy (WRAcc) P(ST)
    P(S)P(T) (871 173)/100000 689/100000
  • Lift P(ST)/P(S)P(T) 871/173 5.03

20
Example dentistry, at depth 1, one target
dentist
21
ROC analysis, target dentist
22
Making SD more useful adding prior knowledge
  • Adding prior knowledge
  • Background variables patient (age, gender, etc.)
  • Specialism practitioner
  • For dentistry choice of insurance
  • Adding already known differences
  • Already detected by domain experts themselves
  • Already detected during a previous data mining
    run

23
Prior Knowledge, Motivation
24
Example, influence of prior knowledge
25
The idea create an expected cross table using
prior knowledge
26
Quality Measures
  • Ratio (Lift)
  • Difference (WRAcc)
  • Squared sum (Chi-square statistic)

27
Example, iterative approach
  • Idea add subgroup to prior knowledge iteratively
  • Target single pharmacy
  • Patients that visited the hospital in last 3
    years removed from data
  • Compare with peer group (400,000 patients), 2929
    patiënts of target pharmacy
  • Top subgroup B03XA01 (Erythropoietin)gt0 euro

1 target pharmacy
rest
T F F
T 1297 224 224
F 1632 396,847 396,847

subgroup
B03XA01 gt 0
rest
28
Next iteration
  • Add B03XA01 (EPO) gt0 euro to prior knowledge
  • Next best subgroup N05AX08 (Risperdal)gt 500
    euro

29
Figure describing subgroup N05AX08 gt 500
Left target pharmacy, right other pharmacies
30
Addition adding costs to quality measure
  • M55 dental cleaning
  • V11 1-way filling
  • V21 polishing
  • Cost of treatments in subgroup 370 euro (average)
  • 791 more patients than expected
  • Total quality 791370 292,469 euro

31
Iterative approach, top 3 subgroups
  • V12 2-sided filling
  • V21 polishing
  • V60 indirect pulpa covering
  • V21 and V60 are not allowed on the same day
  • Claim back (from all dentists) 1.3 million euro

32
3d isometrics, cost based QM
33
(No Transcript)
34
Other target types double binary target
  • Target 1 year 2009 or 2008
  • Target 2 target practitioner
  • Pattern
  • M59 extensive (expensive) dental cleaning
  • C12 second consult in one year
  • Crosstable

35
Other target types Multiclass target
  • Subgroup (orange) group of patients
  • Target (red), now is a multi-value column, one
    value per dentist

36
Multiclass target, in ROC Space
37
Anemaly Detection
  • The example above contains a contextual anomaly...

38
Outline Anomaly Detection
  • Anomalies
  • Definition
  • Types
  • Technique categories
  • Examples
  • Lecture based on
  • Chandola et al. (2009). Anomaly Detection A
    Survey
  • Paper in BB

38
39
Definition
  • Anomaly detection refers to the problem of
    finding patterns in data that do not conform to
    expected behavior
  • Anomalies, aka.
  • Outliers
  • Discordant observations
  • Exceptions
  • Aberrations
  • Surprises
  • Peculiarities
  • Contaminants

40
Anomaly types
  • Point anomalies
  • A data point is anomalous with respect to the
    rest of the data

41
Not covered today
  • Other types of anomalies
  • Collective anomalies
  • Contextual anomalies
  • Other detection approaches
  • Supervised learning
  • Semi supervised
  • Assume training data is from normal class
  • Use to detect anomalies in the future

42
We focus on outlier scores
  • Scores
  • You get a ranked list of anomalies
  • We investigate the top 10
  • An anomaly has a score of at least 134
  • Leads followed by fraud investigators
  • Labels

ANOMALY
43
Detection method categorisation
  1. Model based
  2. Depth based
  3. Distance Based
  4. Information theory related (not covered)
  5. Spectral theory related (not covered)

44
Model based
  • Build a (statistical) model of the data
  • Data instances occur in high probability regions
    of a stochastic model, while anomalies occur in
    low probability regions
  • Or data instances have a high distance to the
    model are outliers
  • Or data instances have a high influence on the
    model are outliers

45
Example one way outlier detection
  • Pharmacy records
  • Records represent patients
  • One attribute at a time
  • This example attribute describing the costs
    spent on fertility medication (gonodatropin) in a
    year
  • We could use such one way detection for each
    attribute in the data

46
Example, model parametric probability density
function
47
Example, model non-parametric distribution
  • Left kernel density estimate
  • Right boxplot

48
Example regression model
49
Other models possible
  • Probabilistic
  • Bayesian networks
  • Regression models
  • Regression trees/ random forests
  • Neural networks
  • Outlier score prediction error (residual)

50
Depth based methods
  • Applied on 1-4 dimensional datasets
  • Or 1-4 attributes at a time
  • Objects that have a high distance to the center
    of the data are considered outliers
  • Example Pharmacy
  • Records represent patients
  • 2 attributes
  • Costs spent on diabetes medication
  • Costs spent on diabetes testing material

51
Example bagplot, halfspace depth
52
Distance based (nearest neighbor based)
  • Assumption
  • Normal data instances occur in dense
    neighbourhoods, while anomalies occur far from
    their closest neighbours

53
Similarity/distance
  • You need a similarity measure between two data
    points
  • Numeric attributes Eucledian, etc.
  • Nominal simple match often enough
  • Multivariate
  • Distance using all attributes
  • Distance between attribute values, then combine

54
Example, dentistry data
  • Records represent dentists
  • Attributes are 14 cost categories
  • Denote the percentage of patients that received a
    claim from the category

55
Option 1Distance to kth neighbour as anomaly
score
56
Option 2Use relative densities of neighbourhoods
  • Density of neighbourhood estimated for each
    instance
  • Instances in the low density neighbourhoods are
    anomalous, others normal
  • Note
  • Distance to kth neighbour is an estimate for the
    inverse of density (large distance ? low density)
  • But this estimates outliers in varying density
    neighbourhoods badly

57
LOF
  • Local Outlier Factor
  • Local density
  • k divided by the volume of the smallest
    hyper-sphere centred around the instance,
    containing k neighbours
  • Anomalous instance
  • Local density will belower than that ofthe k
    nearest neighbours

58
Example LOF outlier, dentistry
59
3. Clustering based a.d. techniques
  • 3 possibilities
  • 1. Normal data instances belong to a cluster in
    the data, while anomalies do not belong to any
    cluster
  • Use clustering methods that do not force all
    instances to belong to a cluster
  • DBSCAN, ROCK, SSN
  • 2. Distance to the cluster center outlier score
  • 3. Clusters with too few points are outlying
    clusters

60
K-means with 6 clusters, centers of the dentistry
data set
  • Attributes percent of patient that received
    claim from cost category
  • Clusters correspond to specialism
  • Dentist
  • Orthodontist
  • Orthodontist (charged by dentist)
  • Dentist
  • Dentist
  • Dental hygenist

61
Combining Subgroup Discovery and Outlier Detection
  • Describe regions with outliers using SD
  • Identify suspicious medical practitioners
  • 2 or 3 step approach to describe outliers
  • Calculate outlier score
  • Use subgroup discovery to describe regions with
    outliers.
  • (optional) identify the involved medical
    practitioners

62
Example output
  • Look at patients with P30gt1050 euro for
    practitioner number 221
  • Left all data, right practitioner 221

63
Descriptions of outliers LOCI outlier score
  • 1. Calculate outlier score
  • LOCI is a density based outlier score
  • 2. Describe outlying regions
  • Result top subgroup
  • Orthodontics (dentist) 0.044 Orthodontics
    0.78
  • Group of 9 dentists with an average score of 3.9

64
Conclusions
  • Health insurance Interesting application domain
  • Very relevant
  • Outlier Detection and Subgroup discovery are
    useful
About PowerShow.com