Making Data Mining Models Useful to Model Non-paying Customers of Exchange Carriers - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

Making Data Mining Models Useful to Model Non-paying Customers of Exchange Carriers

Description:

Making Data Mining Models Useful to Model Non-paying ... Settle down with a realistic solution. A new set of algorithms to calibrate probability outputs ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 12
Provided by: IBMU288
Category:

less

Transcript and Presenter's Notes

Title: Making Data Mining Models Useful to Model Non-paying Customers of Exchange Carriers


1
Making Data Mining Models Useful to Model
Non-paying Customers of Exchange Carriers
  • Wei Fan, IBM T.J.Watson
  • Janek Mathuria, and Chang-tien Lu
  • Virginia Tech, Northern Virginia Center

2
Our Selling Points
  • A real practical problem for an actual CLEC
    company.
  • A whole process
  • Start with a great goal.
  • Reality taught us a lesson
  • Settle down with a realistic solution
  • A new set of algorithms to calibrate probability
    outputs (as distinguished from Zadrozny and
    Elkans calibration methods)

3
Challenging Problem
  • Differentiate between Late and Default
  • Late 1 month past due
  • Default two month past due.
  • Default Percentage 20.
  • Designed feature set Details in Paper
  • Calling summary.
  • Billing summary.
  • Obvious ones.
  • Other ones out there? Maybe.

4
Failure
  • Failure of Commonly Used Methods
  • Nearly predicting every customer as paying on
    time and still has 80
  • What this means
  • Our feature set not complete? Probably.
  • Problem itself is just stochastic in nature.
  • Natural next step cost-sensitive learning?
  • Impossible to define precisely due to complexity.

5
A Compromised Solution
  • Predict a reliable probability score.
  • A customer is uniquely distinguished by its
    feature vector.
  • If the model predict that a customer has 20
    chance to default
  • Indeed the customer has 20 chance to default
  • The predicted score is considered reliable

6
Previously Proposed Calibration Methods
  • Existing approaches that output scores are not
    reliable (Zadrozny and Elkan)
  • Decision trees.
  • Naïve Bayes
  • SVM
  • Logistic Regression
  • Use function mapping to calibrate unreliable
    score to reliable ones.
  • Assumption original unreliable score need to be
    monotonous.
  • Otherwise, it is not applicable.

7
A Good Calibration
8
A Bad Calibration
9
Random Decision Trees
  • Amazingly Simple and Counter-intuitive
  • Do not use any purity check function.
  • Pick a feature randomly.
  • Continuous feature, pick a random splitting
    point.
  • Discrete feature can be picked only once in one
    decision path.
  • Continuous feature can be picked multiple times.
  • Tree depth up to the number of features.
  • Original feature set. No bootstrap!
  • Each tree computes probability at the leaf node.
  • 10 fraud and 90 normal transaction, p(fraudx)
    0.1
  • Multiple trees, 10 min and 30 enough, average
    probability.

10
Random Forest
  • Marriage between Random Decision Tree and Random
    Forest
  • Pick a feature subset randomly.
  • Compute info gain for each feature. Choose the
    one with highest info gain.
  • Original dataset. Not bootstrap.
  • Leaf node computes probability.
  • 10 to 30 trees.

11
Availability.
  • Software available upon request.
Write a Comment
User Comments (0)
About PowerShow.com