Assessing and Comparing Machine Learning Algorithms - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Assessing and Comparing Machine Learning Algorithms

Description:

Understand how to compare classification algorithms' performance. ... Comparing the expected errors of two algorithms: is k-NN more accurate than MLP ? ... – PowerPoint PPT presentation

Number of Views:201
Avg rating:3.0/5.0
Slides: 37
Provided by: isabellebi
Category:

less

Transcript and Presenter's Notes

Title: Assessing and Comparing Machine Learning Algorithms


1
Assessing and Comparing Machine Learning
Algorithms
2
Learning Objectives
  • Understand cross-validation and resampling
    methods.
  • Understand how to measure error.
  • Understand hypothesis testing.
  • Understand how to compare classification
    algorithms performance.
  • Understand how to assess a prediction algorithms
    performance.
  • Understand how to assess a clustering algorithms
    performance.

3
Acknowledgements
  • Some of these slides have been adapted from Ethem
    Alpaydin.

4
Introduction
  • Questions
  • Assessment of the expected error of a learning
    algorithm is the error rate of 1-NN less than
    2?
  • Comparing the expected errors of two algorithms
    is k-NN more accurate than MLP ?
  • Training/validation/test sets
  • Resampling methods K-fold cross-validation

5
Algorithm Preference
  • Criteria (Application-dependent)
  • Misclassification error, or risk (loss functions)
  • Training time/space complexity
  • Testing time/space complexity
  • Interpretability
  • Easy programmability
  • Cost-sensitive learning

6
Learning Objectives
  • Understand cross-validation and resampling
    methods.
  • Understand how to measure error.
  • Understand hypothesis testing.
  • Understand how to compare classification
    algorithms performance.
  • Understand how to assess a prediction algorithms
    performance.
  • Understand how to assess a clustering algorithms
    performance.

7
Resampling and K-Fold Cross-Validation
  • The need for multiple training/validation sets
  • Xi,Vii training/validation sets of fold i
  • K-fold cross-validation divide X into k,
    Xi,i1,...,K
  • Ti share K-2 parts
  • Leave one out Ti N-1 Vi 1

8
52 Cross-Validation
  • 5 times 2 fold cross-validation (Dietterich, 1998)

9
Bootstrapping
  • Draw instances from a dataset with replacement
  • Prob that we do not pick an instance after N
    draws
  • that is, only 36.8 is new!

10
Learning Objectives
  • Understand cross-validation and resampling
    methods.
  • Understand how to measure error.
  • Understand hypothesis testing.
  • Understand how to compare classification
    algorithms performance.
  • Understand how to assess a prediction algorithms
    performance.
  • Understand how to assess a clustering algorithms
    performance.

11
Measuring Error
  • Error rate of errors / of instances
    (FNFP) / N
  • Recall of found positives / of positives
  • TP / (TPFN) sensitivity hit rate
  • Precision of found positives / of found
  • TP / (TPFP)
  • Specificity TN / (TNFP)
  • False alarm rate FP / (FPTN) 1 - Specificity

12
ROC Curve
13
Learning Objectives
  • Understand cross-validation and resampling
    methods.
  • Understand how to measure error.
  • Understand hypothesis testing.
  • Understand how to compare classification
    algorithms performance.
  • Understand how to assess a prediction algorithms
    performance.
  • Understand how to assess a clustering algorithms
    performance.

14
Interval Estimation
  • X xt t where xt N ( µ, s2)
  • m N ( µ, s2/N)

100(1- a) percent confidence interval
15
When s2 is not known
16
Hypothesis Testing
  • Reject a null hypothesis if not supported by the
    sample with enough confidence
  • X xt t where xt N ( µ, s2)
  • H0 µ µ0 vs. H1 µ ? µ0
  • Accept H0 with level of significance a if µ0 is
    in the 100(1- a) confidence interval
  • Two-sided test

17
  • One-sided test H0 µ µ0 vs. H1 µ gt µ0
  • Accept if
  • Variance unknown Use t, instead of z
  • Accept H0 µ µ0 if

18
Assessing Error H0 p p0 vs. H1 p gt p0
  • Single training/validation set Binomial Test
  • If error prob is p0, prob that there are e
    errors or less in N validation trials is

Accept if this prob is less than 1- a
1- a
N100, e20
19
Normal Approximation to the Binomial
  • Number of errors X is approx N with mean Np0 and
    var Np0(1-p0)

Accept if this prob for X e is less than z1-a
1- a
20
Paired t Test
  • Multiple training/validation sets
  • xti 1 if instance t misclassified on fold i
  • Error rate of fold i
  • With m and s2 average and var of pi
  • we accept p0 or less error if
  • is less than ta,K-1

21
Learning Objectives
  • Understand cross-validation and resampling
    methods.
  • Understand how to measure error.
  • Understand hypothesis testing.
  • Understand how to compare classification
    algorithms performance.
  • Understand how to assess a prediction algorithms
    performance.
  • Understand how to assess a clustering algorithms
    performance.

22
Comparing Classifiers H0 µ0 µ1 vs. H1 µ0
? µ1
  • Single training/validation set McNemars Test
  • Under H0, we expect e01 e10(e01 e10)/2

Accept if lt X2a,1
23
K-Fold CV Paired t Test
  • Use K-fold cv to get K training/validation folds
  • pi1, pi2 Errors of classifiers 1 and 2 on fold i
  • pi pi1 pi2 Paired difference on fold i
  • The null hypothesis is whether pi has mean 0

24
52 cv Paired t Test
  • Use 52 cv to get 2 folds of 5 tra/val
    replications (Dietterich, 1998)
  • pi(j) difference btw errors of 1 and 2 on fold
    j1, 2 of replication i1,...,5

Two-sided test Accept H0 µ0 µ1 if in
(-ta/2,5,ta/2,5)
One-sided test Accept H0 µ0 µ1 if lt ta,5
25
52 cv Paired F Test
Two-sided test Accept H0 µ0 µ1 if lt Fa,10,5
26
Comparing Lgt2 Algorithms Analysis of Variance
(Anova)
  • Errors of L algorithms on K folds
  • We construct two estimators to s2 .
  • One is valid if H0 is true, the other is always
    valid.
  • We reject H0 if the two estimators disagree.

27
(No Transcript)
28
(No Transcript)
29
Other Tests
  • Range test (Newman-Keuls)
  • Nonparametric tests (Sign test, Kruskal-Wallis)
  • Contrasts Check if 1 and 2 differ from 3,4, and
    5
  • Multiple comparisons require Bonferroni
    correction If there are m tests, to have an
    overall significance of a, each test should have
    a significance of a/m.
  • Regression CLT states that the sum of iid
    variables from any distribution is approximately
    normal and the preceding methods can be used.
  • Other loss functions ?

30
Learning Objectives
  • Understand cross-validation and resampling
    methods.
  • Understand how to measure error.
  • Understand hypothesis testing.
  • Understand how to compare classification
    algorithms performance.
  • Understand how to assess a prediction algorithms
    performance.
  • Understand how to assess a clustering algorithms
    performance.

31
Prediction Assessment
  • As in classification
  • Performance evaluation
  • Cross-validation
  • Difference
  • Error rate is not appropriate
  • Performance measures for prediction
  • Mean squared error
  • Root mean squared error
  • Mean absolute error
  • Relative squared error
  • Root relative squared error
  • Relative absolute error
  • Correlation coefficient

32
Prediction Assessment
  • Performance measures for prediction (p
    predicted values, a actual values)
  • Mean squared error
  • Root mean squared error
  • Mean absolute error
  • Relative squared error
  • Root relative squared error
  • Relative absolute error
  • Correlation coefficient

33
Prediction Assessment
  • Performance measures for prediction
  • Minimize error and maximize the correlation
    coefficient
  • Significance test applied to performance measure
    (mean squared error )

34
Learning Objectives
  • Understand cross-validation and resampling
    methods.
  • Understand how to measure error.
  • Understand hypothesis testing.
  • Understand how to compare classification
    algorithms performance.
  • Understand how to assess a prediction algorithms
    performance.
  • Understand how to assess a clustering algorithms
    performance.

35
Minimum Description Length Principle
  • MDL (minimum description length
    principle)states that the best theory for some
    data is the one that minimizes the size of the
    modelalso minimize the amount of information
    necessary to specify the exceptions relative to
    the theory
  • Theory that minimizes LTLETwhere LT is
    the number of bits to code the theory and where
    LET is the number of bits to code training set.

36
Clustering Assessment
  • Clustering assessment
  • Evaluate how clusters found match predefined
    classes (supervised method).
  • Evaluate by application context usefulness.
  • Evaluate by minimum description length principle
  • Best clustering will support the most efficient
    encoding of the samples by the clusters.
  • Example
  • Encode the cluster centers.
  • For each sample, code the cluster it belongs
    to, and its displacement/coordinates from the
    cluster center.
  • The better the clustering fits the data, the more
    compact is going to be the representation.
Write a Comment
User Comments (0)
About PowerShow.com