Combining Classification and Model Trees for Handling Ordinal Problems - PowerPoint PPT Presentation

1 / 19

About This Presentation

Title:

Combining Classification and Model Trees for Handling Ordinal Problems

Description:

... are more likely to be correct in their decision when they agree in their opinion. ... Statistical Results (as far as root mean square error) ... – PowerPoint PPT presentation

Number of Views:75

Avg rating:3.0/5.0

Slides: 20

Provided by: sot2

Category:

more less

Transcript and Presenter's Notes

Title: Combining Classification and Model Trees for Handling Ordinal Problems

1
Combining Classification and Model Trees for
Handling Ordinal Problems

D. Anyfantis, M. Karagiannopoulos S. B.
Kotsiantis, P. E. Pintelas
Educational Software Development Laboratory
and
Computers and Applications Laboratory
Department of Mathematics, University of Patras,
Greece

2
Aim

Handling the problem of learning to predict
ordinal (i.e., ordered discrete) classes.
To propose a technique that can be a more robust
solution to the problem.

3
Contents

Introduction
Techniques for Dealing with Ordinal Problems
Proposed Technique
Experiments
Conclusions

4
Ordinal Classification Problems

A class of problems between classification and
regression (discrete classes with a linear
ordering)
Given ordered classes, one is not only interested
in maximizing the classification accuracy, but
also in minimizing the distances between the
actual and the predicted classes.

5
Simple Techniques for Dealing with Ordinal
Problems

Classification algorithms by discarding the
ordering information in the class attribute.
Regression algorithms where each class is mapped
to a numeric value.
Reducing the multi-class ordinal classification
problem to a set of binary classification
problems using the one-against-all approach.

6
Another more Sophisticated Technique (ORD)

Converting the original ordinal class problem
into a series of binary problems that encode the
ordering of the original classes, too. However,
to predict the class value of an unseen instance
this variant algorithm needs to estimate the
probabilities of the k original ordinal classes
using k - 1 models.
For a three class ordinal problem, estimation of
the probability for the first ordinal class value
depends on a single classifier P(Target lt second
value) as well as for the last ordinal class
P(Target gt second value). However, for class
value in the middle of the range, the probability
depends on a pair of classifiers and is given by
P(Target gt first value) (1 - P(Target gt second
value))

7
Proposed Technique (1)

Combines the predictions of a classification tree
and a model tree algorithm.
When learners are combined using a voting
methodology, we expect to obtain good results
based on the belief that the majority of
classifiers are more likely to be correct in
their decision when they agree in their opinion.

8
Proposed Technique (2)
9
Proposed Technique (3)

In the proposed ensemble the sum rule is used -
each voter gives the probability of its
prediction for each candidate.
Next all confidence values are added for each
candidate and the candidate with the highest sum
wins the election.

10
Experiments (1)

To test the hypothesis that the above method
improves the generalization performance on
ordinal prediction problems, we performed
experiments on real-world ordinal datasets
donated by Dr. Arie Ben David (http//www.cs.waika
to.ac.nz/ml/weka/).
We also used datasets from UCI repository because
of the lack of numerous benchmark datasets
involving ordinal class values. These datasets
represented numeric prediction problems. We
converted the numeric target values into ordinal
quantities using equal-size binning (three equal
size intervals).

11
Experiments (2)

All accuracy estimates were obtained by averaging
the results from 10 separate runs of stratified
10-fold cross-validation.
26 datasets

12
Experiments (3)

For each data set the algorithms are compared
according to
classification accuracy (the rate of correct
predictions)
mean absolute error
where p predicted values and a actual values.

13
Results (1)

Table shows the summary results for the proposed
technique in comparison with
C4.5 without any modification
in conjunction with the ordinal classification
method (C4.5-ORD)
using classification via regression (M5?)

Datasets Vote-C4.5-M5? M5? C4.5 C4.5-ORD
AVERAGE accuracy 75.44 74.59 75.06 75.33
AVERAGE MeanError 0.28 0.29 0.30 0.30
14
Statistical Results (as far as root mean square
error)

The presented ensemble is significantly more
accurate than M5? in 4 out of the 26 datasets,
whilst it has significantly higher root mean
square error in none dataset.
The presented ensemble has also significantly
lower root mean square error in 8 out of the 26
datasets than both C4.5 and C4.5-ORD, whereas it
is significantly less accurate in none dataset.

15
Statistical Results (as far as classification
accuracy)

The presented ensemble is significantly more
accurate than M5? in 4 out of the 26 datasets,
whilst it has significantly higher error rate in
2 datasets.
The presented ensemble has also significantly
lower error rate in 3 out of the 26 datasets than
C4.5-ORD, whereas it is significantly less
accurate in 1 dataset.
The proposed method is significantly more
accurate than C4.5 in 1 out of the 26 data-sets,
whilst it has significantly higher error rate in
none dataset.

16
Discussion

If the ranking problem is posed as a
classification problem then the inherent
structure present in ranked data is not made use
of and hence generalization ability of such
classifiers is severely limited.
On the other hand, posing the task of sorting as
a regression problem leads to a highly
constrained problem.

17
Conclusion

According to our experiments in synthetic and
real ordinal data sets, the proposed method
manages to minimize the distances between the
actual and the predicted classes, without harming
but actually slightly improving the
classification accuracy.

18
Future work

More extensive experiments with real ordinal data
sets from diverse areas will be needed to
establish the precise capabilities and relative
advantages of this methodology.

19
Thank you