CS490D: Introduction to Data Mining Prof. Chris Clifton - PowerPoint PPT Presentation

About This Presentation
Title:

CS490D: Introduction to Data Mining Prof. Chris Clifton

Description:

Regression algorithms (predict numeric outcome): neural networks, CART, Regression, GLM ... (predict symbolic outcome): CART, C5.0, logistic regression ... – PowerPoint PPT presentation

Number of Views:77
Avg rating:3.0/5.0
Slides: 28
Provided by: clif8
Category:

less

Transcript and Presenter's Notes

Title: CS490D: Introduction to Data Mining Prof. Chris Clifton


1
CS490DIntroduction to Data MiningProf. Chris
Clifton
  • April 14, 2004
  • Fraud and Misuse Detection

2
What is Fraud Detection?
  • Identify wrongful actions
  • Is right and wrong universal?
  • If so, why not just prevent wrong actions
  • Identify actions by the wrong people
  • Identify suspect actions
  • Legal
  • But probably not right

3
In Data Mining terms
  • Classification?
  • Classify into fraudulent and non-fraudulent
    behavior
  • What do we need to do this?
  • Outlier Detection
  • Assume non-fraudulent behavior is normal
  • Find the exceptions
  • Problems?

4
Solution Differential Profiling
  • Determine individual behavior
  • What is normal for the individual
  • What separates one individual from another
  • Gives profile of individual behavior
  • How do we do this?







Classification Mining
5
Has this been done?Intrusion Detection
(LaneBrodley)
  • Profiled computer users based on command
    sequences
  • Command
  • Some (but not all) argument information
  • Sequence information

6
ResultsAccuracy Time to Alarm
7
Scaling Issues
  • What happens with millions of users?
  • Credit card
  • Cell phone
  • What about new users?
  • Ideas?

8
Multi-user profiles
  • Cluster users
  • Develop profiles for clusters
  • E.g., differential profiling
  • Old customers Do they match profile for their
    cluster?
  • Allows wider range of acceptable behavior
  • New customer Do they match any profile?

9
Data mining for detection and prevention
10
Data mining defined
  • The process of discovering meaningful new
    relationships, patterns and trends by sifting
    through data using pattern recognition
    technologies as well as statistical and
    mathematical techniques.
  • - The Gartner Group

11
Matching known fraud/non-compliance
  • Which new cases are similar to known cases?
  • How can we define similarity?
  • How can we rate or score similarity?

12
Anomalies and irregularities
  • How can we detect anomalous or unusual behavior?
  • What do we mean by usual?
  • Can we rate or score cases on their degree of
    anomaly?

13
Data mining is not
  • Blindapplication of analysis/modeling
    algorithms
  • Brute-force crunching of bulk data
  • Black box technology
  • Magic

14
How do you mine data?
  • Use the Cross Industry Standard Process for Data
    Mining (CRISP-DM)
  • Based on real-world lessons
  • Focus on business issues
  • User-centric interactive
  • Full process
  • Results are used

15
Techniques used to identify fraud
  • Predict and Classify
  • Regression algorithms (predict numeric outcome)
    neural networks, CART, Regression, GLM
  • Classification algorithms (predict symbolic
    outcome) CART, C5.0, logistic regression
  • Group and Find Associations
  • Clustering/Grouping algorithms K-means,
    Kohonen, 2Step, Factor analysis
  • Association algorithms apriori, GRI, Capri,
    Sequence

16
Techniques for finding fraud
  • Predict the expected value for a claim, compare
    that with the actual value of the claim.
  • Those cases that fall far outside the expected
    range should be evaluated more closely

17
Techniques for finding fraud
Decision Trees and Rules
  • Build a profile of the characteristics of
    fraudulent behavior.
  • Pull out the cases that meet the historical
    characteristics of fraud.

18
Techniques for finding fraud
Clustering and Associations
  • Group behavior using a clustering algorithm
  • Find groups of events using the association
    algorithms
  • Identify outliers and investigate

19
Fraud detection using CRISP-DM
  • Provides a systematic way to detect fraud and
    abuse
  • Ensures auditing and investigative efforts are
    maximized
  • Continually assesses and updates models to
    identify new emerging fraud patterns
  • Leads to higher recoupments

20
Data mining in action Fraud, waste and
abusecase studies
21
How can data mining help?
  • Payment error prevention
  • Billing and payment fraud
  • Audit selection

22
Payment Error Prevention
The US Health Care Finance Administration needed
to isolate the likely causes of payment error by
developing a profile of acceptable billing
practices and...
used this information to focus their auditing
effort
23
Payment error prevention solution
  • Clementine
  • Using audited discharge records, built profiles
    of appropriate decisions such as diagnosis coding
    and admission
  • Matched new cases
  • Cases not matching are audited

24
Payment error prevention results
  • Detected 50 of past incorrect payments
    resulting in significant recovery of funding lost
    to payment errors
  • PRO analysts able to use resultant Clementine
    models to prevent future error

25
Billing and payment fraud
The US Defense Finance and Accounting Service
needed to find fraud in millions of Dept of
Defense transactions and...
Identified suspicious cases to focus
investigations
26
Billing and payment fraud solution
  • Clementine
  • Detection models based on known fraud patterns
  • Analyzed all transactions scored based on
    similarity to these known patterns
  • High scoring transactions flagged for
    investigation

27
Billing and payment fraud results
  • Identified over 1,200 payments for further
    investigation
  • Integrated the detection process
  • Anomaly detection methods (e.g., clustering) will
    serve as sentinel systems for previously
    undetected fraud patterns

28
Audit selection
The Washington State Department of Revenue needed
to detect erroneous tax returns and...
Focused audit investigations on cases with the
highest likely adjustments
29
Audit selection solution
  • Clementine
  • Using previously audited returns
  • Model adjustment (recovery) per auditor hour
    based on return information
  • Models will then score future returns showing
    highest potential adjustment

30
Audit selection results
  • Maximizes auditors time by focusing on cases
    likely to yield the highest return
  • Closes the tax gap

31
Data mining - key to detecting and preventing
fraud, waste and abuse
  • Learn from the past
  • High quality, evidence based decisions
  • Predict
  • Prevent future instances
  • React to changing circumstances
  • Models kept current, from latest data
Write a Comment
User Comments (0)
About PowerShow.com