Inductive Learning in Less Than One Sequential Data Scan - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Inductive Learning in Less Than One Sequential Data Scan

Description:

Predict fraud since we get $10 back. Combining Multiple Models. Individual benefits ... card fraud detection. Total benefits: Recovered fraud amount minus ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 26
Provided by: yia7
Category:

less

Transcript and Presenter's Notes

Title: Inductive Learning in Less Than One Sequential Data Scan


1
Inductive Learning in Less Than One Sequential
Data Scan
  • Wei Fan, Haixun Wang, and Philip S. Yu
  • IBM T.J.Watson
  • Shaw-hwa Lo
  • Columbia University

2
Problems
  • Many inductive algorithms are main memory-based.
  • When the dataset is bigger than the memory, it
    will "thrash".
  • Very low in efficiency when thrashing happens.
  • For algorithms that are not memory-based,
  • Do we need to see every piece of data? Probably
    not.
  • Overfitting curve? Not practical.

3
Basic IdeaOne Scan Algorithm
Algorithm
4
Loss and Benefit
  • Loss function
  • Evaluate performance.
  • Benefit matrix inverse of loss func
  • Traditional 0-1 loss
  • bx,x 1, bx,y 0
  • Cost-sensitive loss
  • Overhead of 90 to investigate a fraud.
  • bfraud, fraud tranamt - 90.
  • bfraud, nonfraud 0.
  • bnonfraud, fraud -90.
  • bnonfraud, nonfraud 0.

5
Probabilistic Modeling
  • is the probability that x is an
    instance of class

  • is the expected benefit
  • Optimal decision

6
Example
  • p(fraudx) 0.5 and tranamt 200
  • e(fraudx) bfraud,fraudp(fraudx)
    bnonfraud, fraud p(nonfraudx)
  • (200 90) x 0.5 (-90) x 0.5 10
  • E(nonfraudx) bfraud,nonfraudp(fraudx)
    bnonfraud,nonfraudp(nonfraudx)
  • 0 x 0.5 0 x 0.5 always 0
  • Predict fraud since we get 10 back.

7
Combining Multiple Models
  • Individual benefits
  • Averaged benefits
  • Optimal decision

8
How about accuracy
9
Do we need all K models?
  • We stop learning if k (lt K) models have the same
    accuracy as K models with confidence p.
  • Ends up scanning the dataset less than 1.
  • Use statistical sampling.

10
Less than one scan
No
Algorithm
Accurate Enough?
Yes
11
Hoeffdings inequality
  • Random variable within Ra-b
  • After n observations, its mean value is y.
  • What is its error with confidence p regardless of
    the distribution?

12
When can we stop?
  • Use k models
  • highest expected benefit
  • Hoeffdings error
  • second highedt expected benefit
  • Hoeffdings error
  • The majority label is still with confidence p
    iff

13
Less Than One Scan Algorithm
  • Iterate the process on every instance from a
    validation set.
  • Until every instance has the same prediction as
    the full ensemble with confidence p.

14
Validation Set
  • If we fail on one example x, we do not need to
    examine on another one.
  • So we can keep only one example in memory at a
    time.
  • If k base modelss prediction on x is the same as
    K models.
  • It is very likely that k1 models will also be
    the same as K models with the same confidence.

15
Validation Set
  • At anytime, we only need to keep one data item x
    from the validation set.
  • It is sequentially read from the validation set.
  • The validation set is read only once.
  • What can be a validation set?
  • The training set itself
  • A separate holdout set.

16
Amount of Data Scan
  • Training Set at most one
  • Validation Set once.
  • Using training as validation set
  • Once we decide to train model from a batch, we do
    not use it for validation again.
  • How much is used to train model? Less than one.

17
Experiments
  • Donation Dataset
  • Total benefits donated charity minus overhead to
    send solicitations.

18
Experiment Setup
  • Inductive learners
  • C4.5
  • RIPPER
  • NB
  • Number of base models
  • 8,16,32,64,128,256
  • Reports their average

19
Baseline Results (with C4.5)
  • Single model 13292.7
  • Complete One Scan 14702.9
  • The average of 8,16,32,64,128,256
  • We are actually 1410 higher than the single
    model.

20
Less-than-one scan (with C4.5)
  • Full one scan 14702
  • Less-than-one scan 14828
  • Actually a little higher, 126.
  • How much data scanned with 99.7 confidence?
  • 71

21
Other datasets
  • Credit card fraud detection
  • Total benefits
  • Recovered fraud amount minus overhead of
    investigation

22
Results
  • Baseline single 733980 (with curtailed
    probability)
  • One scan ensemble 804964
  • Less than one scan 804914
  • Data scan amount 64

23
Smoothing effect.
24
Related Work
  • Ensenbles
  • Meta-learning (Chan and Stolfo) 2 scans
  • Bagging (Breiman) and AdaBoost (Freund and
    Schapire) multiple
  • Use of Hoeffdings inequality
  • Aggregate query (Hellerstein et al)
  • Streaming decision tree (Hulten and Domingos)
  • Single decision tree, less than one scan
  • Scalable decision tree
  • SPRINT (Shafer et al) multiple scans
  • BOAT (Gehrke et al) 2 scans

25
Conclusion
  • Both one scan and less than one scan have
    accuracy either similar or higher than the single
    model.
  • Less than one scan uses approximately 60 90
    of data for training with loss of accuracy.
Write a Comment
User Comments (0)
About PowerShow.com