Combining Classification and Model Trees for Handling Ordinal Problems - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Combining Classification and Model Trees for Handling Ordinal Problems

Description:

... are more likely to be correct in their decision when they agree in their opinion. ... Statistical Results (as far as root mean square error) ... – PowerPoint PPT presentation

Number of Views:75
Avg rating:3.0/5.0
Slides: 20
Provided by: sot2
Category:

less

Transcript and Presenter's Notes

Title: Combining Classification and Model Trees for Handling Ordinal Problems


1
Combining Classification and Model Trees for
Handling Ordinal Problems
  • D. Anyfantis, M. Karagiannopoulos S. B.
    Kotsiantis, P. E. Pintelas
  • Educational Software Development Laboratory
  • and
  • Computers and Applications Laboratory
  • Department of Mathematics, University of Patras,
    Greece

2
Aim
  • Handling the problem of learning to predict
    ordinal (i.e., ordered discrete) classes.
  • To propose a technique that can be a more robust
    solution to the problem.

3
Contents
  • Introduction
  • Techniques for Dealing with Ordinal Problems
  • Proposed Technique
  • Experiments
  • Conclusions

4
Ordinal Classification Problems
  • A class of problems between classification and
    regression (discrete classes with a linear
    ordering)
  • Given ordered classes, one is not only interested
    in maximizing the classification accuracy, but
    also in minimizing the distances between the
    actual and the predicted classes.

5
Simple Techniques for Dealing with Ordinal
Problems
  • Classification algorithms by discarding the
    ordering information in the class attribute.
  • Regression algorithms where each class is mapped
    to a numeric value.
  • Reducing the multi-class ordinal classification
    problem to a set of binary classification
    problems using the one-against-all approach.

6
Another more Sophisticated Technique (ORD)
  • Converting the original ordinal class problem
    into a series of binary problems that encode the
    ordering of the original classes, too. However,
    to predict the class value of an unseen instance
    this variant algorithm needs to estimate the
    probabilities of the k original ordinal classes
    using k - 1 models.
  • For a three class ordinal problem, estimation of
    the probability for the first ordinal class value
    depends on a single classifier P(Target lt second
    value) as well as for the last ordinal class
    P(Target gt second value). However, for class
    value in the middle of the range, the probability
    depends on a pair of classifiers and is given by
  • P(Target gt first value) (1 - P(Target gt second
    value))

7
Proposed Technique (1)
  • Combines the predictions of a classification tree
    and a model tree algorithm.
  • When learners are combined using a voting
    methodology, we expect to obtain good results
    based on the belief that the majority of
    classifiers are more likely to be correct in
    their decision when they agree in their opinion.

8
Proposed Technique (2)
9
Proposed Technique (3)
  • In the proposed ensemble the sum rule is used -
    each voter gives the probability of its
    prediction for each candidate.
  • Next all confidence values are added for each
    candidate and the candidate with the highest sum
    wins the election.

10
Experiments (1)
  • To test the hypothesis that the above method
    improves the generalization performance on
    ordinal prediction problems, we performed
    experiments on real-world ordinal datasets
    donated by Dr. Arie Ben David (http//www.cs.waika
    to.ac.nz/ml/weka/).
  • We also used datasets from UCI repository because
    of the lack of numerous benchmark datasets
    involving ordinal class values. These datasets
    represented numeric prediction problems. We
    converted the numeric target values into ordinal
    quantities using equal-size binning (three equal
    size intervals).

11
Experiments (2)
  • All accuracy estimates were obtained by averaging
    the results from 10 separate runs of stratified
    10-fold cross-validation.
  • 26 datasets

12
Experiments (3)
  • For each data set the algorithms are compared
    according to
  • classification accuracy (the rate of correct
    predictions)
  • mean absolute error
  • where p predicted values and a actual values.

13
Results (1)
  • Table shows the summary results for the proposed
    technique in comparison with
  • C4.5 without any modification
  • in conjunction with the ordinal classification
    method (C4.5-ORD)
  • using classification via regression (M5?)

Datasets Vote-C4.5-M5? M5? C4.5 C4.5-ORD
AVERAGE accuracy 75.44 74.59 75.06 75.33
AVERAGE MeanError 0.28 0.29 0.30 0.30
14
Statistical Results (as far as root mean square
error)
  • The presented ensemble is significantly more
    accurate than M5? in 4 out of the 26 datasets,
    whilst it has significantly higher root mean
    square error in none dataset.
  • The presented ensemble has also significantly
    lower root mean square error in 8 out of the 26
    datasets than both C4.5 and C4.5-ORD, whereas it
    is significantly less accurate in none dataset.

15
Statistical Results (as far as classification
accuracy)
  • The presented ensemble is significantly more
    accurate than M5? in 4 out of the 26 datasets,
    whilst it has significantly higher error rate in
    2 datasets.
  • The presented ensemble has also significantly
    lower error rate in 3 out of the 26 datasets than
    C4.5-ORD, whereas it is significantly less
    accurate in 1 dataset.
  • The proposed method is significantly more
    accurate than C4.5 in 1 out of the 26 data-sets,
    whilst it has significantly higher error rate in
    none dataset.

16
Discussion
  • If the ranking problem is posed as a
    classification problem then the inherent
    structure present in ranked data is not made use
    of and hence generalization ability of such
    classifiers is severely limited.
  • On the other hand, posing the task of sorting as
    a regression problem leads to a highly
    constrained problem.

17
Conclusion
  • According to our experiments in synthetic and
    real ordinal data sets, the proposed method
    manages to minimize the distances between the
    actual and the predicted classes, without harming
    but actually slightly improving the
    classification accuracy.

18
Future work
  • More extensive experiments with real ordinal data
    sets from diverse areas will be needed to
    establish the precise capabilities and relative
    advantages of this methodology.

19
Thank you
  • Any question?
Write a Comment
User Comments (0)
About PowerShow.com