Progressive Modeling

About This Presentation

Title:

Progressive Modeling

Description:

Countless ways to construct and extract features from raw data. ... The fully constructed model is an ensemble with K base classifiers. ... – PowerPoint PPT presentation

Number of Views:40

Avg rating:3.0/5.0

Slides: 16

Provided by: Wei74

Category:

more less

Transcript and Presenter's Notes

Title: Progressive Modeling

1
Progressive Modeling
Wei Fan, Haixun Wang, and Philip S. Yu IBM
T.J.Watson Shaw-hwa Lo, and Salvatore J.
Stolfo Columbia University
2
State-of-the-art Data Mining

Countless ways to construct and extract features
from raw data.
Exponential combinations of feature selection
from a fixed feature set.
Many different inductive learners to choose from.
Their combination? Hard to count.
Now, how do we compute models? Batch mode!

3
Batch Mode

Helpless and Frustrating Process
Construct features from raw data.
Choose a feature set.
Apply an inductive learning algorithm.
Wait until a model is constructed.
No information about accuracy or training time.
Is the model good enough?
If unlucky, try a new feature set and repeat this
process.

4
Problem

Unknown accuracy prior to finish building the
model.
Unknown actual training time.
If the final accuracy is too low, all resources
are lost.
Particularly bad for very large dataset.

5
Progressive Modeling

Basic idea can we estimate the performance
(accuracy and training time) either online or a
priori before completely construction a model?
The estimated performance will fall within a
range with lower and upper error bound and a
confidence.

6
A Interactive Learning Interface
7
New Features

Both estimated accuracy and remaining training
time are reported online.
Estimated values are given in error ranges with
confidence intervals.
When learning is completed, the final accuracy is
guaranteed to be within estimation with given
confidence.
Particularly useful for mining very large dataset.

8
Summary of Implementation

Based on ensembles of classifiers.
The fully constructed model is an ensemble with K
base classifiers.
Use statistical sampling techniques to estimate
the performance of the ensemble with K base
classifiers from a "subset" ensemble of k (lt K)
classifiers.

9
Averaging Ensemble - Training
D
large dataset
partition into
D2
D2
D1
K subsets
MLt
ML2
ML1
generate K models
Ck
C2
C1
10
Averaging Ensemble - Testing
D
Test Set
Sent to K models
C1
C2
Ck
P1
P2
Pk
Compute K predictions
Averaging
Combine to one prediction
P
11
Summary of formula

Each model computes an expected benefit for
example over every class label
Combining individual expected benefit
together
We choose the label with the highest combined
expected benefit

12
Basic Intuition

We use the multiple model as the basis. We use k
models to estimate the performance of K models.
For the donation dataset as an example, we use 30
classifiers to estimate the "probability" for the
256 classifiers to solicit x.
If this probability is 0.95, it means that 95 of
the time the complete ensemble of 256 classifiers
will solicit x.
The estimated "expected" returns by using the 256
classifiers is 0.95 ( y(x) - 0.68) 0.05
0
we can also calculate its standard deviations.
The estimated performance on a dataset set is the
cumulative of the performance on every example.

13
Summary of methods.

The probability P(solicit x by 256) is related to
the 30 individual expected benefit e(x)'s.
The more wide spread e(x)'s are, the more likely
for P(solicit x by 256) to be higher.
The higher the average of e(x)'s are, the more
likely for P(solicit x by 256) to be higher.
The area under the normal density curve is the
probability

14
Assuring learning qualities

We request the first random sample and then the
second.
At this point, we have the accuracy of the
current model and estimated accuracy of
the complete model and its standard
deviation
as well as estimated remaining training time
Learning stops iff
the accuracy of current model is sufficient
the estimated accuracy of complete model is too
low
or the estimated learning time is too long