Progressive Modeling - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Progressive Modeling

Description:

Countless ways to construct and extract features from raw data. ... The fully constructed model is an ensemble with K base classifiers. ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 16
Provided by: Wei74
Category:

less

Transcript and Presenter's Notes

Title: Progressive Modeling


1
Progressive Modeling
Wei Fan, Haixun Wang, and Philip S. Yu IBM
T.J.Watson Shaw-hwa Lo, and Salvatore J.
Stolfo Columbia University
2
State-of-the-art Data Mining
  • Countless ways to construct and extract features
    from raw data.
  • Exponential combinations of feature selection
    from a fixed feature set.
  • Many different inductive learners to choose from.
  • Their combination? Hard to count.
  • Now, how do we compute models? Batch mode!

3
Batch Mode
  • Helpless and Frustrating Process
  • Construct features from raw data.
  • Choose a feature set.
  • Apply an inductive learning algorithm.
  • Wait until a model is constructed.
  • No information about accuracy or training time.
  • Is the model good enough?
  • If unlucky, try a new feature set and repeat this
    process.

4
Problem
  • Unknown accuracy prior to finish building the
    model.
  • Unknown actual training time.
  • If the final accuracy is too low, all resources
    are lost.
  • Particularly bad for very large dataset.

5
Progressive Modeling
  • Basic idea can we estimate the performance
    (accuracy and training time) either online or a
    priori before completely construction a model?
  • The estimated performance will fall within a
    range with lower and upper error bound and a
    confidence.

6
A Interactive Learning Interface
7
New Features
  • Both estimated accuracy and remaining training
    time are reported online.
  • Estimated values are given in error ranges with
    confidence intervals.
  • When learning is completed, the final accuracy is
    guaranteed to be within estimation with given
    confidence.
  • Particularly useful for mining very large dataset.

8
Summary of Implementation
  • Based on ensembles of classifiers.
  • The fully constructed model is an ensemble with K
    base classifiers.
  • Use statistical sampling techniques to estimate
    the performance of the ensemble with K base
    classifiers from a "subset" ensemble of k (lt K)
    classifiers.

9
Averaging Ensemble - Training
D
large dataset
partition into
D2
D2
D1
K subsets
MLt
ML2
ML1
generate K models
Ck
C2
C1
10
Averaging Ensemble - Testing
D
Test Set
Sent to K models
C1
C2
Ck
P1
P2
Pk
Compute K predictions
Averaging
Combine to one prediction
P
11
Summary of formula
  • Each model computes an expected benefit for
    example over every class label
  • Combining individual expected benefit
    together
  • We choose the label with the highest combined
    expected benefit

12
Basic Intuition
  • We use the multiple model as the basis. We use k
    models to estimate the performance of K models.
  • For the donation dataset as an example, we use 30
    classifiers to estimate the "probability" for the
    256 classifiers to solicit x.
  • If this probability is 0.95, it means that 95 of
    the time the complete ensemble of 256 classifiers
    will solicit x.
  • The estimated "expected" returns by using the 256
    classifiers is 0.95 ( y(x) - 0.68) 0.05
    0
  • we can also calculate its standard deviations.
  • The estimated performance on a dataset set is the
    cumulative of the performance on every example.

13
Summary of methods.
  • The probability P(solicit x by 256) is related to
    the 30 individual expected benefit e(x)'s.
  • The more wide spread e(x)'s are, the more likely
    for P(solicit x by 256) to be higher.
  • The higher the average of e(x)'s are, the more
    likely for P(solicit x by 256) to be higher.
  • The area under the normal density curve is the
    probability

14
Assuring learning qualities
  • We request the first random sample and then the
    second.
  • At this point, we have the accuracy of the
    current model and estimated accuracy of
    the complete model and its standard
    deviation
  • as well as estimated remaining training time
  • Learning stops iff
  • the accuracy of current model is sufficient
  • the estimated accuracy of complete model is too
    low
  • or the estimated learning time is too long

15
Summary of formulas
  • Estimating the probability to solicit x by 256
    classifier
  • Expected benefit received for x

  • and its variance
  • Estimated Performance
Write a Comment
User Comments (0)
About PowerShow.com