Title: Detect and prevent fraud, waste and abuse with data mining
1Detect and prevent fraud, waste and abuse with
data mining
2Todays agenda
- Examine the impact of fraud, waste and abuse
- Data mining for detection and prevention
- Data mining in action case studies
3For the purpose of todays seminar
- Fraud, waste and abuse includes
- Illegal practices
- Waste
- Payment error
- Non-Compliance
- Incorrect billing practices
4The impact of fraud, waste and abuse
5The impact of fraud
- GAO cited 19.1billion in improper Government
payments in 17 major programs for fiscal year
1998. GAO, Financial Management Increased
Attention Needed to Prevent Billions in Improper
Payments, October 1999 - Medicare 12.6 Billion
- Supplemental Security Income 1.6 B
- The Food Stamp Program 1.4 B
- Old Age and Survival Insurance 1.2 B
- Disability Insurance 941 Million
- Housing Subsidies 847 Million
- Veterans Benefits, Unemployment Insurance and
Others 514 Million
6The impact of fraud
- An IRS audit of returns claiming the EIC (Earned
Income Credit) for tax year 1994 found 4.4
billion in overpayments out of 17.2 billion in
total claims. A follow-up study by the IRS
determined that, even after the implementation of
compliance reforms, the error rate was still at
least 20 percent of all EIC claims filed. GAO,
Major Management Challenges and Program Risks
Department of the Treasury, January 1999. - From 1994 through 1998, defense contractors
returned a total of roughly 4.6 billion in
overpayments. GAO, DOD Contract Management
Greater Attention Needed to Identify and Recover
Overpayments, July 1999 - Federal Employees Health Benefits Program, is
estimated to consume as much as 1.8 billion a
year in waste, fraud, and abuse--Office of
Personnel Management IG, Most Serious Management
Problems Office of Personnel Management, 1
December 1999
7A pervasive problem...
- Medicare, Medicaid and other essential public
health programs - Entitlement and subsidy payments
- Tax collection
- Procurement
- Food stamps
- Child welfare benefits
- Workers compensation
- Unemployment benefits
- Any benefit payment
8And it is set to continue...
- Total public sector transaction volumes now
exceed 2 trillion -
- Forrester Research
9The impact if fraud goes undetected
- Much needed programs and services are
under-funded - Billions of dollars are lost
- Countless man-hours spent on investigative and
auditing efforts yielding little in recoupments
10Data mining for detection and prevention
11Data mining defined
- The process of discovering meaningful new
relationships, patterns and trends by sifting
through data using pattern recognition
technologies as well as statistical and
mathematical techniques. - - The Gartner Group
12Matching known fraud/non-compliance
- Which new cases are similar to known cases?
- How can we define similarity?
- How can we rate or score similarity?
13Anomalies and irregularities
- How can we detect anomalous or unusual behavior?
- What do we mean by usual?
- Can we rate or score cases on their degree of
anomaly?
14Data mining is not
- Blindapplication of analysis/modeling
algorithms - Brute-force crunching of bulk data
- Black box technology
- Magic
15How do you mine data?
- Use the Cross Industry Standard Process for Data
Mining (CRISP-DM) - Based on real-world lessons
- Focus on business issues
- User-centric interactive
- Full process
- Results are used
16Techniques used to identify fraud
- Predict and Classify
- Regression algorithms (predict numeric outcome)
neural networks, CRT, Regression, GLM - Classification algorithms (predict symbolic
outcome) CRT, C5.0, logistic regression
- Group and Find Associations
- Clustering/Grouping algorithms K-means,
Kohonen, 2Step, Factor analysis - Association algorithms apriori, GRI, Capri,
Sequence
17Techniques for finding fraud
- Predict the expected value for a claim, compare
that with the actual value of the claim. - Those cases that fall far outside the expected
range should be evaluated more closely
18Techniques for finding fraud
Decision Trees and Rules
- Build a profile of the characteristics of
fraudulent behavior. - Pull out the cases that meet the historical
characteristics of fraud.
19Techniques for finding fraud
Clustering and Associations
- Group behavior using a clustering algorithm
- Find groups of events using the association
algorithms - Identify outliers and investigate
20Fraud detection using CRISP-DM
- Provides a systematic way to detect fraud and
abuse - Ensures auditing and investigative efforts are
maximized - Continually assesses and updates models to
identify new emerging fraud patterns - Leads to higher recoupments
21Data mining in action Fraud, waste and
abusecase studies
22How can data mining help?
- Payment error prevention
- Billing and payment fraud
- Audit selection
23Payment Error Prevention
The US Health Care Finance Administration needed
to isolate the likely causes of payment error by
developing a profile of acceptable billing
practices and...
used this information to focus their auditing
effort
24Payment error prevention solution
- Clementine
- Using audited discharge records, built profiles
of appropriate decisions such as diagnosis coding
and admission - Matched new cases
- Cases not matching are audited
25Payment error prevention results
- Detected 50 of past incorrect payments
resulting in significant recovery of funding lost
to payment errors - PRO analysts able to use resultant Clementine
models to prevent future error
26Billing and payment fraud
The US Defense Finance and Accounting Service
needed to find fraud in millions of Dept of
Defense transactions and...
Identified suspicious cases to focus
investigations
27Billing and payment fraud solution
- Clementine
- Detection models based on known fraud patterns
- Analyzed all transactions scored based on
similarity to these known patterns - High scoring transactions flagged for
investigation
28Billing and payment fraud results
- Identified over 1,200 payments for further
investigation - Integrated the detection process
- Anomaly detection methods (e.g., clustering) will
serve as sentinel systems for previously
undetected fraud patterns
29Audit selection
The Washington State Department of Revenue needed
to detect erroneous tax returns and...
Focused audit investigations on cases with the
highest likely adjustments
30Audit selection solution
- Clementine
- Using previously audited returns
- Model adjustment (recovery) per auditor hour
based on return information - Models will then score future returns showing
highest potential adjustment
31Audit selection results
- Maximizes auditors time by focusing on cases
likely to yield the highest return - Closes the tax gap
32Data mining - key to detecting and preventing
fraud, waste and abuse
- Learn from the past
- High quality, evidence based decisions
- Predict
- Prevent future instances
- React to changing circumstances
- Models kept current, from latest data
33Announcing
http//www.spss.com/dataminingsummit/
34Questions?SPSS Sales 800-543-2185 or
sales_at_spss.com