Title: A Fully Distributed Framework for Cost-sensitive Data Mining
1A Fully Distributed Framework for Cost-sensitive
Data Mining Wei Fan, Haixun Wang, and Philip S.
Yu IBM T.J.Watson, Hawthorne, New York Salvatore
J. Stolfo Columbia University, New York City, New
York
2Inductive Learning
Training Data
Learner
Classifier
1. Decision trees 2. Rules 3. Naive Bayes ...
(43.45,retail,10025,10040, ...,
nonfraud) (246,70,weapon,10001,94583,...,fraud)
Transaction fraud,nonfraud
Test Data
Classifier
Class Labels
nonfraud fraud
(99.99,pharmacy,10013,10027,...,?) (1.00,gas,100
40,00234,...,?)
3(No Transcript)
4 Distributed Data Mining
- data is inherently distributed across the
network. - many credit card authorization servers are
distributed. Data are collected at each
individual site. - other examples include supermarket customer and
transaction database, hotel reservations, travel
agency and so on ... - In some situations, data cannot even be shared.
- many different banks have their data servers.
They rather share the model but cannot share the
data due to many reasons such as privacy, legal,
and competitive reasons.
5Cost-sensitive Problems
- Charity Donation
- Solicit to people who will donate large amount of
charity. - Costs 0.68 to send a letter.
- A(x) donation amount.
- Only solicit if A(x) gt 0.68, otherwise lose
money. - Credit card fraud detection
- Detect frauds with high transaction amount
- 90 to challenge a potential fraud
- A(x) fraudulant transaction amount.
- Only challenge if A(x) gt 90, otherwise lose
money.
6Different Learning Frameworks
7Fully Distributed Framework (training)
D2
D2
D1
K sites
MLt
ML2
ML1
generate K models
Ck
C2
C1
8Fully-distributed Framework (predicting)
D
Test Set
Sent to k models
C1
C2
Ck
P1
P2
Pk
Compute k predictions
Combine
Combine to one prediction
P
9Cost-sensitive Decision Making
- Assume that records the benefit
received by predicting an example of class to
be an instance of class . - The expected benefit received to predict an
example to be an instance of class
(regardless of its true label) is
- The optimal decision-making policy chooses the
label that maximizes the expected benefit,
i.e., - When and is a traditional
accuracy-based problem. - Total benefits
10Charity Donation Example
- It costs .68 to send a solicitation.
- Assume that is the best estimate of the
donation amount,
- The cost-sensitive decision making will solicit
an individual if and only if
11Credit Card Fraud Detection Example
- It costs 90 to challenge a potential fraud
- Assume that y(x) is the transaction
amount
- The cost-sensitive decision making policy will
predict a transaction to be fraudulent if and
only if
12Adult Dataset
- Downloaded from UCI database.
- Associate a benefit factor 2 to positives and a
benefit factor 1 to negatives
- The decision to predict positive is
13Calculating probabilities
- For decision trees, n is the number of examples
in a node and k is the number of examples
with class label , then the probability is
- more sophisticated methods
- smoothing
- early stopping, and early stopping plus smoothing
- For rules, probability is calucated in the same
way as decision trees - For naive Bayes, is the score for
class label , then
- binning
14(No Transcript)
15Combining Technique-Averaging
- Each model computes an expected benefit for
example over every class label
- Combining individual expected benefit
together - We choose the label with the highest combined
expected benefit
16Why accuracy is higher?
1. Decision threshold line 2. Examples on the
left are more profitable than those on the
right 3. "Evening effect" biases towards big
fish.
17Partially distributed combining techniques
- Regression
- Treat base classifiers' outputs as indepedent
variables of regression and the true label as
dependent variables. - Modify Meta-learning
- Learning a classifier that maps the base
classifiers' class label predictions to that the
true class label. - For cost-sensitive learning, the top level
classifier output probability instead of just a
label.
18Communication Overhead Summary
19Experiments
- Decision Tree Learner C4.5 version 8
- Dataset
- Donation
- Credit Card
- Adult
20Accuracy comparision
21Accuracy comparison
22Accuracy comparison
23Detailed Spread
24Credit Card Fraud Dataset
25Adult Dataset
26Why accuracy is higher?
27Summary and Future Work
- Evaluated a wide range of combining techniques
include variations of averaging, regression and
meta-learning for scalable cost-sensitive (and
cost-insensitive learning). - Averaging, although simple, has the highest
accuracy. - Previously proposed approaches have significantly
more overhead and only work well for tradtional
accuracy-based problems. - Future work ensemble pruning and performance
estimation