Assessing and Comparing Machine Learning Algorithms - PowerPoint PPT Presentation

1 / 36

About This Presentation

Title:

Assessing and Comparing Machine Learning Algorithms

Description:

Understand how to compare classification algorithms' performance. ... Comparing the expected errors of two algorithms: is k-NN more accurate than MLP ? ... – PowerPoint PPT presentation

Number of Views:201

Avg rating:3.0/5.0

Slides: 37

Provided by: isabellebi

Category:

more less

Transcript and Presenter's Notes

Title: Assessing and Comparing Machine Learning Algorithms

1
Assessing and Comparing Machine Learning
Algorithms
2
Learning Objectives

Understand cross-validation and resampling
methods.
Understand how to measure error.
Understand hypothesis testing.
Understand how to compare classification
algorithms performance.
Understand how to assess a prediction algorithms
performance.
Understand how to assess a clustering algorithms
performance.

3
Acknowledgements

Some of these slides have been adapted from Ethem
Alpaydin.

4
Introduction

Questions
Assessment of the expected error of a learning
algorithm is the error rate of 1-NN less than
2?
Comparing the expected errors of two algorithms
is k-NN more accurate than MLP ?
Training/validation/test sets
Resampling methods K-fold cross-validation

5
Algorithm Preference

Criteria (Application-dependent)
Misclassification error, or risk (loss functions)
Training time/space complexity
Testing time/space complexity
Interpretability
Easy programmability
Cost-sensitive learning

6
Learning Objectives

Understand cross-validation and resampling
methods.
Understand how to measure error.
Understand hypothesis testing.
Understand how to compare classification
algorithms performance.
Understand how to assess a prediction algorithms
performance.
Understand how to assess a clustering algorithms
performance.

7
Resampling and K-Fold Cross-Validation

The need for multiple training/validation sets
Xi,Vii training/validation sets of fold i
K-fold cross-validation divide X into k,
Xi,i1,...,K
Ti share K-2 parts
Leave one out Ti N-1 Vi 1

8
52 Cross-Validation

5 times 2 fold cross-validation (Dietterich, 1998)

9
Bootstrapping

Draw instances from a dataset with replacement
Prob that we do not pick an instance after N
draws
that is, only 36.8 is new!

10
Learning Objectives

Understand cross-validation and resampling
methods.
Understand how to measure error.
Understand hypothesis testing.
Understand how to compare classification
algorithms performance.
Understand how to assess a prediction algorithms
performance.
Understand how to assess a clustering algorithms
performance.

11
Measuring Error

Error rate of errors / of instances
(FNFP) / N
Recall of found positives / of positives
TP / (TPFN) sensitivity hit rate
Precision of found positives / of found
TP / (TPFP)
Specificity TN / (TNFP)
False alarm rate FP / (FPTN) 1 - Specificity

12
ROC Curve
13
Learning Objectives

Understand cross-validation and resampling
methods.
Understand how to measure error.
Understand hypothesis testing.
Understand how to compare classification
algorithms performance.
Understand how to assess a prediction algorithms
performance.
Understand how to assess a clustering algorithms
performance.

14
Interval Estimation

X xt t where xt N ( µ, s2)
m N ( µ, s2/N)

100(1- a) percent confidence interval
15
When s2 is not known
16
Hypothesis Testing

Reject a null hypothesis if not supported by the
sample with enough confidence
X xt t where xt N ( µ, s2)
H0 µ µ0 vs. H1 µ ? µ0
Accept H0 with level of significance a if µ0 is
in the 100(1- a) confidence interval
Two-sided test

One-sided test H0 µ µ0 vs. H1 µ gt µ0
Accept if
Variance unknown Use t, instead of z
Accept H0 µ µ0 if

18
Assessing Error H0 p p0 vs. H1 p gt p0

Single training/validation set Binomial Test
If error prob is p0, prob that there are e
errors or less in N validation trials is

Accept if this prob is less than 1- a
1- a
N100, e20
19
Normal Approximation to the Binomial

Number of errors X is approx N with mean Np0 and
var Np0(1-p0)

Accept if this prob for X e is less than z1-a
1- a
20
Paired t Test

Multiple training/validation sets
xti 1 if instance t misclassified on fold i
Error rate of fold i
With m and s2 average and var of pi
we accept p0 or less error if
is less than ta,K-1

21
Learning Objectives

Understand cross-validation and resampling
methods.
Understand how to measure error.
Understand hypothesis testing.
Understand how to compare classification
algorithms performance.
Understand how to assess a prediction algorithms
performance.
Understand how to assess a clustering algorithms
performance.

22
Comparing Classifiers H0 µ0 µ1 vs. H1 µ0
? µ1

Single training/validation set McNemars Test
Under H0, we expect e01 e10(e01 e10)/2

Accept if lt X2a,1
23
K-Fold CV Paired t Test

Use K-fold cv to get K training/validation folds
pi1, pi2 Errors of classifiers 1 and 2 on fold i
pi pi1 pi2 Paired difference on fold i
The null hypothesis is whether pi has mean 0

24
52 cv Paired t Test

Use 52 cv to get 2 folds of 5 tra/val
replications (Dietterich, 1998)
pi(j) difference btw errors of 1 and 2 on fold
j1, 2 of replication i1,...,5

Two-sided test Accept H0 µ0 µ1 if in
(-ta/2,5,ta/2,5)
One-sided test Accept H0 µ0 µ1 if lt ta,5
25
52 cv Paired F Test
Two-sided test Accept H0 µ0 µ1 if lt Fa,10,5
26
Comparing Lgt2 Algorithms Analysis of Variance
(Anova)

Errors of L algorithms on K folds
We construct two estimators to s2 .
One is valid if H0 is true, the other is always
valid.
We reject H0 if the two estimators disagree.

27
(No Transcript)
28
(No Transcript)
29
Other Tests

Range test (Newman-Keuls)
Nonparametric tests (Sign test, Kruskal-Wallis)
Contrasts Check if 1 and 2 differ from 3,4, and
5
Multiple comparisons require Bonferroni
correction If there are m tests, to have an
overall significance of a, each test should have
a significance of a/m.
Regression CLT states that the sum of iid
variables from any distribution is approximately
normal and the preceding methods can be used.
Other loss functions ?

30
Learning Objectives

Understand cross-validation and resampling
methods.
Understand how to measure error.
Understand hypothesis testing.
Understand how to compare classification
algorithms performance.
Understand how to assess a prediction algorithms
performance.
Understand how to assess a clustering algorithms
performance.

31
Prediction Assessment

As in classification
Performance evaluation
Cross-validation
Difference
Error rate is not appropriate
Performance measures for prediction
Mean squared error
Root mean squared error
Mean absolute error
Relative squared error
Root relative squared error
Relative absolute error
Correlation coefficient

32
Prediction Assessment

Performance measures for prediction (p
predicted values, a actual values)
Mean squared error
Root mean squared error
Mean absolute error
Relative squared error
Root relative squared error
Relative absolute error
Correlation coefficient

33
Prediction Assessment

Performance measures for prediction
Minimize error and maximize the correlation
coefficient
Significance test applied to performance measure
(mean squared error )

34
Learning Objectives

Understand cross-validation and resampling
methods.
Understand how to measure error.
Understand hypothesis testing.
Understand how to compare classification
algorithms performance.
Understand how to assess a prediction algorithms
performance.
Understand how to assess a clustering algorithms
performance.

35
Minimum Description Length Principle

MDL (minimum description length
principle)states that the best theory for some
data is the one that minimizes the size of the
modelalso minimize the amount of information
necessary to specify the exceptions relative to
the theory
Theory that minimizes LTLETwhere LT is
the number of bits to code the theory and where
LET is the number of bits to code training set.

36
Clustering Assessment

Clustering assessment
Evaluate how clusters found match predefined
classes (supervised method).
Evaluate by application context usefulness.
Evaluate by minimum description length principle
Best clustering will support the most efficient
encoding of the samples by the clusters.
Example
Encode the cluster centers.
For each sample, code the cluster it belongs
to, and its displacement/coordinates from the
cluster center.
The better the clustering fits the data, the more
compact is going to be the representation.