A Fully Distributed Framework for Cost-sensitive Data Mining - PowerPoint PPT Presentation

1 / 27

About This Presentation

Title:

A Fully Distributed Framework for Cost-sensitive Data Mining

Description:

Combining Technique-Averaging ... include variations of averaging, regression and meta-learning ... Averaging, although simple, has the highest accuracy. ... – PowerPoint PPT presentation

Number of Views:27

Avg rating:3.0/5.0

Slides: 28

Provided by: www1CsC

Category:

more less

Transcript and Presenter's Notes

Title: A Fully Distributed Framework for Cost-sensitive Data Mining

1
A Fully Distributed Framework for Cost-sensitive
Data Mining Wei Fan, Haixun Wang, and Philip S.
Yu IBM T.J.Watson, Hawthorne, New York Salvatore
J. Stolfo Columbia University, New York City, New
York
2
Inductive Learning
Training Data
Learner
Classifier
1. Decision trees 2. Rules 3. Naive Bayes ...
(43.45,retail,10025,10040, ...,
nonfraud) (246,70,weapon,10001,94583,...,fraud)
Transaction fraud,nonfraud
Test Data
Classifier
Class Labels
nonfraud fraud
(99.99,pharmacy,10013,10027,...,?) (1.00,gas,100
40,00234,...,?)
3
(No Transcript)
4
Distributed Data Mining

data is inherently distributed across the
network.
many credit card authorization servers are
distributed. Data are collected at each
individual site.
other examples include supermarket customer and
transaction database, hotel reservations, travel
agency and so on ...
In some situations, data cannot even be shared.
many different banks have their data servers.
They rather share the model but cannot share the
data due to many reasons such as privacy, legal,
and competitive reasons.

5
Cost-sensitive Problems

Charity Donation
Solicit to people who will donate large amount of
charity.
Costs 0.68 to send a letter.
A(x) donation amount.
Only solicit if A(x) gt 0.68, otherwise lose
money.
Credit card fraud detection
Detect frauds with high transaction amount
90 to challenge a potential fraud
A(x) fraudulant transaction amount.
Only challenge if A(x) gt 90, otherwise lose
money.

6
Different Learning Frameworks
7
Fully Distributed Framework (training)
D2
D2
D1
K sites
MLt
ML2
ML1
generate K models
Ck
C2
C1
8
Fully-distributed Framework (predicting)
D
Test Set
Sent to k models
C1
C2
Ck
P1
P2
Pk
Compute k predictions
Combine
Combine to one prediction
P
9
Cost-sensitive Decision Making

Assume that records the benefit
received by predicting an example of class to
be an instance of class .
The expected benefit received to predict an
example to be an instance of class
(regardless of its true label) is
The optimal decision-making policy chooses the
label that maximizes the expected benefit,
i.e.,
When and is a traditional
accuracy-based problem.
Total benefits

10
Charity Donation Example

It costs .68 to send a solicitation.
Assume that is the best estimate of the
donation amount,
The cost-sensitive decision making will solicit
an individual if and only if

11
Credit Card Fraud Detection Example

It costs 90 to challenge a potential fraud
Assume that y(x) is the transaction
amount
The cost-sensitive decision making policy will
predict a transaction to be fraudulent if and
only if

12
Adult Dataset

Downloaded from UCI database.
Associate a benefit factor 2 to positives and a
benefit factor 1 to negatives
The decision to predict positive is

13
Calculating probabilities

For decision trees, n is the number of examples
in a node and k is the number of examples
with class label , then the probability is
more sophisticated methods
smoothing
early stopping, and early stopping plus smoothing
For rules, probability is calucated in the same
way as decision trees
For naive Bayes, is the score for
class label , then
binning

14
(No Transcript)
15
Combining Technique-Averaging

Each model computes an expected benefit for
example over every class label
Combining individual expected benefit
together
We choose the label with the highest combined
expected benefit

16
Why accuracy is higher?
1. Decision threshold line 2. Examples on the
left are more profitable than those on the
right 3. "Evening effect" biases towards big
fish.
17
Partially distributed combining techniques

Regression
Treat base classifiers' outputs as indepedent
variables of regression and the true label as
dependent variables.
Modify Meta-learning
Learning a classifier that maps the base
classifiers' class label predictions to that the
true class label.
For cost-sensitive learning, the top level
classifier output probability instead of just a
label.

18
Communication Overhead Summary
19
Experiments

Decision Tree Learner C4.5 version 8
Dataset
Donation
Credit Card
Adult

20
Accuracy comparision
21
Accuracy comparison
22
Accuracy comparison
23
Detailed Spread
24
Credit Card Fraud Dataset
25
Adult Dataset
26
Why accuracy is higher?
27
Summary and Future Work

Evaluated a wide range of combining techniques
include variations of averaging, regression and
meta-learning for scalable cost-sensitive (and
cost-insensitive learning).
Averaging, although simple, has the highest
accuracy.
Previously proposed approaches have significantly
more overhead and only work well for tradtional
accuracy-based problems.
Future work ensemble pruning and performance
estimation