Feature Selection for Regression Problems - PowerPoint PPT Presentation

Loading...

PPT – Feature Selection for Regression Problems PowerPoint presentation | free to download - id: 14744e-ZDY0M



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Feature Selection for Regression Problems

Description:

Forward selection wrapper methods are less able to improve performance of a ... they are less expensive in terms of the computational effort and use fewer ... – PowerPoint PPT presentation

Number of Views:105
Avg rating:3.0/5.0
Slides: 23
Provided by: sot2
Learn more at: http://www.math.upatras.gr
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Feature Selection for Regression Problems


1
Feature Selection for Regression Problems
  • M. Karagiannopoulos, D. Anyfantis, S. B.
    Kotsiantis, P. E. Pintelas
  • Educational Software Development Laboratory
  • and
  • Computers and Applications Laboratory
  • Department of Mathematics, University of Patras,
    Greece

2
Scope
  • To investigate the most suitable wrapper feature
    selection technique (if any) for some well known
    regression algorithms.

3
Contents
  • Introduction
  • Feature selection techniques
  • Wrapper algorithms
  • Experiments
  • Conclusions

4
Introduction
  • What is the feature subset selection problem?
  • Occurs prior to the learning (induction)
    algorithm
  • Selection of the relevant features (variables)
    that influence the prediction of the learning
    algorithm

5
Why feature selection is important?
  • May improve performance of learning algorithm
  • Learning algorithm may not scale up to the size
    of the full feature set either in sample or time
  • Allows us to better understand the domain
  • Cheaper to collect a reduced set of features

6
Characterising features
  • Generally, features are characterized as
  • Relevant These are features which have an
    influence on the output and their role can not be
    assumed by the rest
  • Irrelevant Irrelevant features are defined as
    those features not having any influence on the
    output, and whose values are generated at random
    for each example.
  • Redundant A redundancy exists whenever a feature
    can take the role of another (perhaps the
    simplest way to model redundancy).

7
Typical Feature Selection First step
1
2
Original Feature Set
Generation
Subset
Evaluation
Goodness of the subset
Stopping Criterion
No
Validation
Yes
3
4
8
Typical Feature Selection Second step
Measures the goodness of the subset Compares with
the previous best subset if found better, then
replaces the previous best subset
1
2
Original Feature Set
Generation
Subset
Evaluation
Goodness of the subset
Stopping Criterion
No
Validation
Yes
3
4
9
Typical Feature Selection Third step
  • Based on Generation Procedure
  • Pre-defined number of features
  • Pre-defined number of iterations
  • Based on Evaluation Function
  • whether addition or deletion of a feature does
    not produce a better subset
  • whether optimal subset based on some evaluation
    function is achieved

1
2
Original Feature Set
Generation
Subset
Evaluation
Goodness of the subset
Stopping Criterion
No
Validation
Yes
3
4
10
Typical Feature Selection - Fourth step
1
2
Original Feature Set
Generation
Subset
Evaluation
Basically not part of the feature selection
process itself - compare results with already
established results or results from competing
feature selection methods
Goodness of the subset
Stopping Criterion
No
Validation
Yes
3
4
11
Categorization of feature selection techniques
  • Feature selection methods are grouped into two
    broad groups
  • Filter methods that take the set of data
    (features) attempting to trim some and then hand
    this new set of features to the learning
    algorithm
  • Wrapper methods that use as evaluation measure
    the accuracy of the learning algorithm

12
Argument for wrapper methods
  • The estimated accuracy of the learning algorithm
    is the best available heuristic for measuring the
    values of features.
  • Different learning algorithms may perform better
    with different feature sets, even if they are
    using the same training set.

13
Wrapper selection algorithms (1)
  • The simplest method is forward selection (FS). It
    starts with the empty set and greedily adds
    features one at a time (without backtracking).
  • Backward stepwise selection (BS) starts with all
    features in the feature set and greedily removes
    them one at a time (without backtracking).

14
Wrapper selection algorithms (2)
  • The Best First search starts with an empty set of
    features and generates all possible single
    feature expansions. The subset with the highest
    evaluation is chosen and is expanded in the same
    manner by adding single features (with
    backtracking). The Best First search (BFFS) can
    be combined with forward or backward selection
    (BFBS).
  • Genetic algorithm selection. A solution is
    typically a fixed length binary string
    representing a feature subsetthe value of each
    position in the string represents the presence or
    absence of a particular feature. The algorithm is
    an iterative process where each successive
    generation is produced by applying genetic
    operators such as crossover and mutation to the
    members of the current generation.

15
Experiments
  • For the purpose of the present study, we used 4
    well known learning algorithms (RepTree, M5rules,
    K, SMOreg), the presented feature selection
    algorithms and 12 dataset from the UCI
    repository.

16
Methodology of experiments
  • The whole training set was divided into ten
    mutually exclusive and equal-sized subsets and
    for each subset the learner was trained on the
    union of all of the other subsets.
  • The best features are selected according to the
    feature selection algorithm and the performance
    of the subset is measured by how well it predicts
    the values of the test instances.
  • This cross validation procedure was run 10 times
    for each algorithm and the average value of the
    10-cross validations was calculated.

17
Experiment with regression tree - RepTree
BS is slightly better feature selection method
(on the average) than the others for the RepTree.
18
Experiment with rule learner- M5rules
BS, BFBS and GS are the best feature selection
methods (on the average) for the M5rules learner.
19
Experiment with instance based learner - K
BS and BFBS is the best feature selection methods
(on the average) for K algorithm
20
Experiment with SMOreg
Similar results from all feature selection
methods
21
Conclusions
  • None of the described feature selection
    algorithms is superior to others in all data sets
    for a specific learning algorithm.
  • None of the described feature selection
    algorithms is superior to others in all data
    sets.
  • Backward selection strategies are very
    inefficient for large-scale datasets, which may
    have hundreds of original features.
  • Forward selection wrapper methods are less able
    to improve performance of a given classifier, but
    they are less expensive in terms of the
    computational effort and use fewer features for
    the induction.
  • Genetic selection typically requires a large
    number of evaluations to reach a minimum.

22
Future Work
  • We will use a light filter feature selection
    procedure as a preprocessing step in order to
    reduce the computational cost of the wrapping
    procedure without harming accuracy.
About PowerShow.com