A Fully Distributed Framework for Cost-sensitive Data Mining - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

A Fully Distributed Framework for Cost-sensitive Data Mining

Description:

Combining Technique-Averaging ... include variations of averaging, regression and meta-learning ... Averaging, although simple, has the highest accuracy. ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 28
Provided by: www1CsC
Category:

less

Transcript and Presenter's Notes

Title: A Fully Distributed Framework for Cost-sensitive Data Mining


1
A Fully Distributed Framework for Cost-sensitive
Data Mining Wei Fan, Haixun Wang, and Philip S.
Yu IBM T.J.Watson, Hawthorne, New York Salvatore
J. Stolfo Columbia University, New York City, New
York
2
Inductive Learning
Training Data
Learner
Classifier
1. Decision trees 2. Rules 3. Naive Bayes ...
(43.45,retail,10025,10040, ...,
nonfraud) (246,70,weapon,10001,94583,...,fraud)
Transaction fraud,nonfraud
Test Data
Classifier
Class Labels
nonfraud fraud
(99.99,pharmacy,10013,10027,...,?) (1.00,gas,100
40,00234,...,?)
3
(No Transcript)
4
Distributed Data Mining
  • data is inherently distributed across the
    network.
  • many credit card authorization servers are
    distributed. Data are collected at each
    individual site.
  • other examples include supermarket customer and
    transaction database, hotel reservations, travel
    agency and so on ...
  • In some situations, data cannot even be shared.
  • many different banks have their data servers.
    They rather share the model but cannot share the
    data due to many reasons such as privacy, legal,
    and competitive reasons.

5
Cost-sensitive Problems
  • Charity Donation
  • Solicit to people who will donate large amount of
    charity.
  • Costs 0.68 to send a letter.
  • A(x) donation amount.
  • Only solicit if A(x) gt 0.68, otherwise lose
    money.
  • Credit card fraud detection
  • Detect frauds with high transaction amount
  • 90 to challenge a potential fraud
  • A(x) fraudulant transaction amount.
  • Only challenge if A(x) gt 90, otherwise lose
    money.

6
Different Learning Frameworks
7
Fully Distributed Framework (training)
D2
D2
D1
K sites
MLt
ML2
ML1
generate K models
Ck
C2
C1
8
Fully-distributed Framework (predicting)
D
Test Set
Sent to k models
C1
C2
Ck
P1
P2
Pk
Compute k predictions
Combine
Combine to one prediction
P
9
Cost-sensitive Decision Making
  • Assume that records the benefit
    received by predicting an example of class to
    be an instance of class .
  • The expected benefit received to predict an
    example to be an instance of class
    (regardless of its true label) is
  • The optimal decision-making policy chooses the
    label that maximizes the expected benefit,
    i.e.,
  • When and is a traditional
    accuracy-based problem.
  • Total benefits

10
Charity Donation Example
  • It costs .68 to send a solicitation.
  • Assume that is the best estimate of the
    donation amount,
  • The cost-sensitive decision making will solicit
    an individual if and only if

11
Credit Card Fraud Detection Example
  • It costs 90 to challenge a potential fraud
  • Assume that y(x) is the transaction
    amount
  • The cost-sensitive decision making policy will
    predict a transaction to be fraudulent if and
    only if

12
Adult Dataset
  • Downloaded from UCI database.
  • Associate a benefit factor 2 to positives and a
    benefit factor 1 to negatives
  • The decision to predict positive is

13
Calculating probabilities
  • For decision trees, n is the number of examples
    in a node and k is the number of examples
    with class label , then the probability is
  • more sophisticated methods
  • smoothing
  • early stopping, and early stopping plus smoothing
  • For rules, probability is calucated in the same
    way as decision trees
  • For naive Bayes, is the score for
    class label , then
  • binning

14
(No Transcript)
15
Combining Technique-Averaging
  • Each model computes an expected benefit for
    example over every class label
  • Combining individual expected benefit
    together
  • We choose the label with the highest combined
    expected benefit

16
Why accuracy is higher?
1. Decision threshold line 2. Examples on the
left are more profitable than those on the
right 3. "Evening effect" biases towards big
fish.
17
Partially distributed combining techniques
  • Regression
  • Treat base classifiers' outputs as indepedent
    variables of regression and the true label as
    dependent variables.
  • Modify Meta-learning
  • Learning a classifier that maps the base
    classifiers' class label predictions to that the
    true class label.
  • For cost-sensitive learning, the top level
    classifier output probability instead of just a
    label.

18
Communication Overhead Summary
19
Experiments
  • Decision Tree Learner C4.5 version 8
  • Dataset
  • Donation
  • Credit Card
  • Adult

20
Accuracy comparision
21
Accuracy comparison
22
Accuracy comparison
23
Detailed Spread
24
Credit Card Fraud Dataset
25
Adult Dataset
26
Why accuracy is higher?
27
Summary and Future Work
  • Evaluated a wide range of combining techniques
    include variations of averaging, regression and
    meta-learning for scalable cost-sensitive (and
    cost-insensitive learning).
  • Averaging, although simple, has the highest
    accuracy.
  • Previously proposed approaches have significantly
    more overhead and only work well for tradtional
    accuracy-based problems.
  • Future work ensemble pruning and performance
    estimation
Write a Comment
User Comments (0)
About PowerShow.com