Inductive Learning in Less Than One Sequential Data Scan - PowerPoint PPT Presentation

1 / 25

About This Presentation

Title:

Inductive Learning in Less Than One Sequential Data Scan

Description:

Predict fraud since we get $10 back. Combining Multiple Models. Individual benefits ... card fraud detection. Total benefits: Recovered fraud amount minus ... – PowerPoint PPT presentation

Number of Views:47

Avg rating:3.0/5.0

Slides: 26

Provided by: yia7

Category:

more less

Transcript and Presenter's Notes

Title: Inductive Learning in Less Than One Sequential Data Scan

1
Inductive Learning in Less Than One Sequential
Data Scan

Wei Fan, Haixun Wang, and Philip S. Yu
IBM T.J.Watson
Shaw-hwa Lo
Columbia University

2
Problems

Many inductive algorithms are main memory-based.
When the dataset is bigger than the memory, it
will "thrash".
Very low in efficiency when thrashing happens.
For algorithms that are not memory-based,
Do we need to see every piece of data? Probably
not.
Overfitting curve? Not practical.

3
Basic IdeaOne Scan Algorithm
Algorithm
4
Loss and Benefit

Loss function
Evaluate performance.
Benefit matrix inverse of loss func
Traditional 0-1 loss
bx,x 1, bx,y 0
Cost-sensitive loss
Overhead of 90 to investigate a fraud.
bfraud, fraud tranamt - 90.
bfraud, nonfraud 0.
bnonfraud, fraud -90.
bnonfraud, nonfraud 0.

5
Probabilistic Modeling

is the probability that x is an
instance of class
is the expected benefit
Optimal decision

6
Example

p(fraudx) 0.5 and tranamt 200
e(fraudx) bfraud,fraudp(fraudx)
bnonfraud, fraud p(nonfraudx)
(200 90) x 0.5 (-90) x 0.5 10
E(nonfraudx) bfraud,nonfraudp(fraudx)
bnonfraud,nonfraudp(nonfraudx)
0 x 0.5 0 x 0.5 always 0
Predict fraud since we get 10 back.

7
Combining Multiple Models

Individual benefits
Averaged benefits
Optimal decision

8
How about accuracy
9
Do we need all K models?

We stop learning if k (lt K) models have the same
accuracy as K models with confidence p.
Ends up scanning the dataset less than 1.
Use statistical sampling.

10
Less than one scan
No
Algorithm
Accurate Enough?
Yes
11
Hoeffdings inequality

Random variable within Ra-b
After n observations, its mean value is y.
What is its error with confidence p regardless of
the distribution?

12
When can we stop?

Use k models
highest expected benefit
Hoeffdings error
second highedt expected benefit
Hoeffdings error
The majority label is still with confidence p
iff

13
Less Than One Scan Algorithm

Iterate the process on every instance from a
validation set.
Until every instance has the same prediction as
the full ensemble with confidence p.

14
Validation Set

If we fail on one example x, we do not need to
examine on another one.
So we can keep only one example in memory at a
time.
If k base modelss prediction on x is the same as
K models.
It is very likely that k1 models will also be
the same as K models with the same confidence.

15
Validation Set

At anytime, we only need to keep one data item x
from the validation set.
It is sequentially read from the validation set.
The validation set is read only once.
What can be a validation set?
The training set itself
A separate holdout set.

16
Amount of Data Scan

Training Set at most one
Validation Set once.
Using training as validation set
Once we decide to train model from a batch, we do
not use it for validation again.
How much is used to train model? Less than one.

17
Experiments

Donation Dataset
Total benefits donated charity minus overhead to
send solicitations.

18
Experiment Setup

Inductive learners
C4.5
RIPPER
NB
Number of base models
8,16,32,64,128,256
Reports their average

19
Baseline Results (with C4.5)

Single model 13292.7
Complete One Scan 14702.9
The average of 8,16,32,64,128,256
We are actually 1410 higher than the single
model.

20
Less-than-one scan (with C4.5)

Full one scan 14702
Less-than-one scan 14828
Actually a little higher, 126.
How much data scanned with 99.7 confidence?
71

21
Other datasets

Credit card fraud detection
Total benefits
Recovered fraud amount minus overhead of
investigation

22
Results

Baseline single 733980 (with curtailed
probability)
One scan ensemble 804964
Less than one scan 804914
Data scan amount 64

23
Smoothing effect.
24
Related Work

Ensenbles
Meta-learning (Chan and Stolfo) 2 scans
Bagging (Breiman) and AdaBoost (Freund and
Schapire) multiple
Use of Hoeffdings inequality
Aggregate query (Hellerstein et al)
Streaming decision tree (Hulten and Domingos)
Single decision tree, less than one scan
Scalable decision tree
SPRINT (Shafer et al) multiple scans
BOAT (Gehrke et al) 2 scans

25
Conclusion

Both one scan and less than one scan have
accuracy either similar or higher than the single
model.
Less than one scan uses approximately 60 90
of data for training with loss of accuracy.

Write a Comment

User Comments (0)