Feature Selection - PowerPoint PPT Presentation

1 / 81
About This Presentation
Title:

Feature Selection

Description:

Title: Slide 1 Author: Rose Last modified by: PARAND Created Date: 8/16/2006 12:00:00 AM Document presentation format: On-screen Show (4:3) Other titles – PowerPoint PPT presentation

Number of Views:104
Avg rating:3.0/5.0
Slides: 82
Provided by: Rose2157
Category:

less

Transcript and Presenter's Notes

Title: Feature Selection


1
Feature Selection
Jamshid Shanbehzadeh, Samaneh Yazdani
Department of Computer Engineering, Faculty Of
Engineering, Khorazmi University (Tarbiat
Moallem University of Teheran)
2
Outline
3
Outline
  • Part 1 Dimension Reduction
  • Dimension
  • Feature Space
  • Definition Goals
  • Curse of dimensionality
  • Research and Application
  • Grouping of dimension reduction methods
  • Part 3 Application Of Feature Selection and
    Software
  • Part 2 Feature selection
  • Parts of feature set
  • Feature Selection Approach

4
Part 1 Dimension Reduction
5
  • Dimension Reduction
  • Dimension
  • Dimension (Feature or Variable)
  • A measurement of a certain aspect of an object
  • Two feature of person
  • weight
  • hight

6
  • Dimension Reduction
  • Feature Space
  • Feature Space
  • An abstract space where each pattern sample is
    represented as point

7
  • Dimension Reduction
  • Introduction
  • Large and high-dimensional data
  • Web documents, etc
  • A large amount of resources are needed in
  • Information Retrieval
  • Classification tasks
  • Data Preservation etc
  • Dimension Reduction

8
  • Dimension Reduction
  • Definition Goals
  • Dimensionality reduction
  • The study of methods for reducing the number
    of dimensions describing the object
  • General objectives of dimensionality reduction
  • Reduce the computational cost
  • Improve the quality of data for efficient
    data-intensive processing tasks

9
  • Dimension Reduction
  • Definition Goals

Class 1 overweight Class 2 underweight
Weight (kg)
60
50
Height (cm)
150
140
  • Dimension Reduction
  • preserves information on classification of
    overweight and underweight as much as possible
  • makes classification easier
  • reduces data size ( 2 features ? 1 feature )

10
  • Dimension Reduction
  • Curse of dimensionality
  • As the number of dimension increases, a fix
    data sample becomes exponentially spars

Example
Observe that the data become more and more sparse
in higher dimensions
  • Effective solution to the problem of curse of
    dimensionality is


  • Dimensionality reduction

11
  • Dimension Reduction
  • Research and Application

Why dimension reduction is a subject of much
research recently?
  • Massive data of large dimensionality in
  • Knowledge discovery
  • Text mining
  • Web mining
  • and . . .

12
  • Dimension Reduction
  • Grouping of dimension reduction methods
  • Dimensionality reduction approaches include
  • Feature Selection
  • Feature Extraction

13
  • Dimension Reduction
  • Grouping of dimension reduction methods
    Feature Selection
  • Dimensionality reduction approaches include
  • Feature Selection the problem of choosing a
    small subset of features that ideally are
    necessary and sufficient to describe the target
    concept.

Example
  • Feature Set X,Y
  • Two Class

Goal Classification
  • Feature X Or Feature Y ?
  • Answer Feature X

14
  • Dimension Reduction
  • Grouping of dimension reduction methods
    Feature Selection
  • Feature Selection (FS)
  • Selects feature
  • ex.
  • Preserves weight

15
  • Dimension Reduction
  • Grouping of dimension reduction methods
  • Dimensionality reduction approaches include
  • Feature Extraction Create new feature based on
    transformations or combinations of the original
    feature set.

New Feature
  • Original Feature X1,X2

16
  • Dimension Reduction
  • Grouping of dimension reduction methods
  • Feature Extraction (FE)
  • Generates feature
  • ex.
  • Preserves weight / height

17
  • Dimension Reduction
  • Grouping of dimension reduction methods
  • Dimensionality reduction approaches include
  • Feature Extraction Create new feature based on
    transformations or combinations of the original
    feature set.
  • N Number of original features
  • M Number of extracted features
  • MltN

18
  • Dimension Reduction
  • Question Feature Selection Or Feature
    Extraction
  • Feature Selection Or Feature Extraction
  • It is depend on the problem. Example
  • Pattern recognition problem of dimensionality
    reduction is to extract a small set of features
    that recovers most of the variability of the
    data.
  • Text mining problem is defined as selecting a
    small subset of words or terms (not new features
    that are combination of words or terms).
  • Image Compression problem is finding the best
    extracted features to describe the image

19
Part 2 Feature selection
20
Feature selection
  • Thousands to millions of low level features
    select the most relevant one to build better,
    faster, and easier to understand learning
    machines.

n
X
m
N
21
  • Feature selection
  • Parts of feature set
  • Irrelevant OR Relevant
  • Three disjoint categories of features
  • Irrelevant
  • Weakly Relevant
  • Strongly Relevant

22
  • Feature selection
  • Parts of feature set
  • Irrelevant OR Relevant
  • Goal Classification
  • Two Class Lion and Deer
  • We use some features to classify a new instance

To which class does this animal belong

23
  • Feature selection
  • Parts of feature set
  • Irrelevant OR Relevant
  • Goal Classification
  • Two Class Lion and Deer
  • We use some feature to classify a new instance

So, number of legs is irrelevant feature
Q Number of legs? A 4
  • Feature 1 Number of legs

24
  • Feature selection
  • Parts of feature set
  • Irrelevant OR Relevant
  • Goal Classification
  • Two Class Lion and Deer
  • We use some features to classify a new instance

So, Color is an irrelevant feature
Q What is its color? A
  • Feature 1 Number of legs
  • Feature 2 Color

25
  • Feature selection
  • Parts of feature set
  • Irrelevant OR Relevant
  • Goal Classification
  • Two Class Lion and Deer
  • We use some features to classify a new instance

So, Feature 3 is a relevant feature
Q What does it eat? A Grass
  • Feature 1 Number of legs
  • Feature 2 Color
  • Feature 3 Type of food

26
  • Feature selection
  • Parts of feature set
  • Irrelevant OR Relevant
  • Goal Classification
  • Three Class Lion, Deer and Leopard
  • We use some features to classify a new instance

To which class does this animal belong

27
  • Feature selection
  • Parts of feature set
  • Irrelevant OR Relevant
  • Goal Classification
  • Three Class Lion, Deer and Leopard
  • We use some features to classify a new instance

So, number of legs is an irrelevant feature
Q Number of legs? A 4
  • Feature 1 Number of legs

28
  • Feature selection
  • Parts of feature set
  • Irrelevant OR Relevant
  • Goal Classification
  • Three Class Lion, Deer and Leopard
  • We use some features to classify a new instance

So, Color is a relevant feature
Q What is its color? A
  • Feature 1 Number of legs
  • Feature 2 Color

29
  • Feature selection
  • Parts of feature set
  • Irrelevant OR Relevant
  • Goal Classification
  • Three Class Lion and Deer and Leopard
  • We use some features to classify a new instance

So, Feature 3 is a relevant feature
Q What does it eat? A meat
  • Feature 1 Number of legs
  • Feature 2 Color
  • Feature3 Type of food

30
  • Feature selection
  • Parts of feature set
  • Irrelevant OR Relevant
  • Goal Classification
  • Three Class Lion and Deer and Leopard
  • We use some feature to classify a new instance
  • Feature 1 Number of legs
  • Feature 2 Color
  • Feature3 Type of food
  • Add new feature Felidae
  • It is weakly relevant feature
  • Optimal set Color, Type of food Or Color,
    Felidae

31
  • Feature selection
  • Parts of feature set
  • Irrelevant OR Relevant
  • Traditionally, feature selection research has
    focused on searching for relevant features.

Relevant
Irrelevant
Feature set
32
  • Feature selection
  • Parts of feature set
  • Irrelevant OR Relevant An Example for the
    Problem
  • Data set
  • Five Boolean features
  • C F1?F2
  • F3 F2 , F5 F4
  • Optimal subset
  • F1, F2 or F1, F3

F1 F2 F3 F4 F5 C
0 0 1 0 1 0
0 1 0 0 1 1
1 0 1 0 1 1
1 1 0 0 1 1
0 0 1 1 0 0
0 1 0 1 0 1
1 0 1 1 0 1
1 1 0 1 0 1
33
  • Feature selection
  • Parts of feature set
  • Irrelevant OR Relevant
  • Formal Definition 1 (Irrelevance)
  • Irrelevance indicates that the feature is not
    necessary at all.
  • In previous Example
  • F4, F5 irrelevance

Relevant
F4 and F5
34
  • Feature selection
  • Parts of feature set
  • Irrelevant OR Relevant
  • Definition1(Irrelevance) A feature Fi is
    irrelevant if
  • Irrelevance indicates that the feature is not
    necessary at all
  • F be a full set of features
  • Fi a feature
  • Si F -Fi.

35
  • Feature selection
  • Parts of feature set
  • Irrelevant OR Relevant
  • Categories of relevant features
  • Strongly Relevant
  • Weakly Relevant

Strongly
Irrelevant
Weakly
Relevant
36
  • Feature selection
  • Parts of feature set
  • Irrelevant OR Relevant An Example for the
    Problem

F1 F2 F3 F4 F5 C
0 0 1 0 1 0
0 1 0 0 1 1
1 0 1 0 1 1
1 1 0 0 1 1
0 0 1 1 0 0
0 1 0 1 0 1
1 0 1 1 0 1
1 1 0 1 0 1
  • Data set
  • Five Boolean features
  • C F1?F2
  • F3 F2 , F5 F4

37
  • Feature selection
  • Parts of feature set
  • Irrelevant OR Relevant
  • Formal Definition2 (Strong relevance)
  • Strong relevance of a feature indicates that the
    feature is always necessary for an optimal subset
  • It cannot be removed without affecting the
    original conditional class distribution.
  • In previous Example
  • Feature F1 is strongly relevant

Weakly
F1
F4 and F5
38
  • Feature selection
  • Parts of feature set
  • Irrelevant OR Relevant
  • Definition 2 (Strong relevance) A feature Fi is
    strongly relevant if
  • Strong relevance of a feature cannot be removed
    without affecting the original conditional class
    distribution

39
  • Feature selection
  • Parts of feature set
  • Irrelevant OR Relevant
  • Formal Definition 3 (Weak relevance)
  • Weak relevance suggests that the feature is not
    always necessary but may become necessary for an
    optimal subset at certain conditions.
  • In previous Example
  • F2, F3 weakly relevant

F2 and F3
F1
F4 and F5
40
  • Feature selection
  • Parts of feature set
  • Irrelevant OR Relevant
  • Definition 3 (Weak relevance) A feature Fi is
    weakly relevant if
  • Weak relevance suggests that the feature is not
    always necessary but may become necessary for an
    optimal subset at certain conditions.

41
  • Feature selection
  • Parts of feature set
  • Optimal Feature Subset
  • Example
  • In order to determine the target concept
    (Cg(F1, F2))
  • F1 is indispensable
  • One of F2 and F3 can be disposed
  • Both F4 and F5 can be discarded.

optimal subset Either F1, F2 or F1, F3
  • The goal of feature selection is to find either
    of them.

42
  • Feature selection
  • Parts of feature set
  • Optimal Feature Subset

optimal subset Either F1, F2 or F1, F3
  • Conclusion
  • An optimal subset should include all strongly
    relevant features, none of irrelevant features,
    and a subset of weakly relevant features.

which of weakly relevant features should be
selected and which of them removed
43
  • Feature selection
  • Parts of feature set
  • Redundancy
  • Solution
  • Defining Feature Redundancy

44
  • Feature selection
  • Parts of feature set
  • Redundancy
  • Redundancy
  • It is widely accepted that two features are
    redundant to each other if their values are
    completely correlated
  • In previous Example
  • F2, F3 ( )

45
  • Feature selection
  • Parts of feature set
  • Redundancy

Markov blanket
  • The Markov blanket condition requires that Mi
    subsume not only the information that Fi has
    about C, but also about all of the other features.

46
  • Feature selection
  • Parts of feature set
  • Redundancy
  • Redundancy definition further divides weakly
    relevant features into redundant and
    non-redundant ones.

III
II
Strongly
Irrelevant
Weakly
II Weakly relevant and redundant features III
Weakly relevant but non-redundant features
Optimal Subset Strongly relevant features
Weakly relevant but non-redundant features
47
  • Feature selection
  • Approaches

48
  • Feature selection
  • Approaches Subset Evaluation (Feature Subset
    Selection )
  • Framework of feature selection via subset
    evaluation

49
  • Feature selection
  • Approaches Subset Evaluation (Feature Subset
    Selection )

Subset Generation
1
2
Original Feature Set
Generation
Subset
Evaluation
Goodness of the subset
Stopping Criterion
No
Validation
Yes
3
4
50
  • Feature selection
  • Approaches Subset Evaluation (Feature Subset
    Selection )

Subset search method -Exhaustive Search Example
  • Examine all combinations of feature subset.
  • Example
  • f1,f2,f3 gt f1,f2,f3,f1,f2,f1,f3,f
    2,f3,f1,f2,f3
  • Order of the search space O(2d), d - feature.
  • Optimal subset is achievable.
  • Too expensive if feature space is large.

51
  • Feature selection
  • Approaches Subset Evaluation (Feature Subset
    Selection )

Subset Evaluation
Measures the goodness of the subset Compares with
the previous best subset if found better, then
replaces the previous best subset
1
2
Original Feature Set
Generation
Subset
Evaluation
Goodness of the subset
Stopping Criterion
No
Validation
Yes
3
4
52
  • Feature selection
  • Approaches Subset Evaluation (Feature Subset
    Selection )

Subset Evaluation
  • Each feature and feature subset needs to be
    evaluated based on importance by a criterion.
  • The existing feature selection algorithms, based
    on criterion functions used in searching for
    informative features can be generally categorized
    as
  • Filter model
  • Wrapper model
  • Embedded methods

Note Different criteria may select different
features.
53
  • Feature selection
  • Approaches Subset Evaluation (Feature Subset
    Selection )

Filter
  • The filter approach utilizes the data alone to
    decide which features should be kept, without
    running the learning algorithm.
  • The filter approach basically pre-selects the
    features, and then applies the selected feature
    subset to the clustering algorithm.
  • Evaluation function ltgt Classifier
  • Ignored effect of selected subset on the
    performance of classifier.

54
  • Feature selection
  • Approaches Subset Evaluation (Feature Subset
    Selection )

Filter (1)- Independent Criterion
  • Some popular independent criteria are
  • Distance measures (Euclidean distance measure).
  • Information measures (Entropy, Information
    gain, etc.)
  • Dependency measures (Correlation coefficient)
  • Consistency measures

55
  • Feature selection
  • Approaches Subset Evaluation (Feature Subset
    Selection )

Wrappers
  • In wrapper methods, the performance of a
    learning algorithm is used to evaluate the
    goodness of selected feature subsets.
  • Evaluation function classifier
  • Take classifier into account.

56
  • Feature selection
  • Approaches Subset Evaluation (Feature Subset
    Selection )

Wrappers (2)
  • Wrappers utilize a learning machine as a black
    box to score subsets of features according to
    their predictive power.

57
  • Feature selection
  • Approaches Subset Evaluation (Feature Subset
    Selection )

Filters vs. Wrappers
  • Filters
  • Advantages
  • Fast execution Filters generally involve a
    non-iterative computation on the dataset, which
    can execute much faster than a classifier
    training session
  • Generality Since filters evaluate the intrinsic
    properties of the data, rather than their
    interactions with a particular classifier, their
    results exhibit more generality the solution
    will be good for a larger family of classifiers
  • Disadvantages
  • The main disadvantage of the filter approach is
    that it totally ignores the effects of the
    selected feature subset on the performance of the
    induction algorithm

58
  • Feature selection
  • Approaches Subset Evaluation (Feature Subset
    Selection )

Filters vs. Wrappers
  • Wrappers
  • Advantages
  • Accuracy wrappers generally achieve better
    recognition rates than filters since they are
    tuned to the specific interactions between the
    classifier and the dataset
  • Disadvantages
  • Slow execution since the wrapper must train a
    classifier for each feature subset (or several
    classifiers if cross-validation is used), the
    method can become infeasible for computationally
    intensive methods
  • Lack of generality the solution lacks
    generality since it is tied to the bias of the
    classifier used in the evaluation function.

59
  • Feature selection
  • Approaches Subset Evaluation (Feature Subset
    Selection )

Embedded methods
All features
Yes, stop!
No, continue
Recursive Feature Elimination (RFE) SVM.
Guyon-Weston, 2000. US patent 7,117,188
60
  • Feature selection
  • Approaches Subset Evaluation (Feature Subset
    Selection )

Stopping Criterion
  • Based on Generation rcdure
  • Pre-defined number of features
  • Pre-defined number of iterations
  • Based on Evaluation Function
  • whether addition or deletion of a feature does
    not produce a better subset
  • whether optimal subset based on some evaluation
    function is achieved

1
2
Original Feature Set
Generation
Subset
Evaluation
Goodness of the subset
Stopping Criterion
No
Validation
Yes
3
4
61
  • Feature selection
  • Approaches Subset Evaluation (Feature Subset
    Selection )

Result Validation
1
2
Original Feature Set
Generation
Subset
Evaluation
Basically not part of the feature selection
process itself - compare results with already
established results or results from competing
feature selection methods
Goodness of the subset
Stopping Criterion
No
Validation
Yes
3
4
62
  • Feature selection
  • Approaches Subset Evaluation (Feature Subset
    Selection )

Subset Evaluation Advantage
  • A feature subset selected by this approach
    approximates the optimal subset

III
II
Strongly
Irrelevant
Weakly
II Weakly relevant and redundant features III
Weakly relevant but non-redundant features
Optimal Subset Strongly relevant features
Weakly relevant but non-redundant features
63
  • Feature selection
  • Approaches Subset Evaluation (Feature Subset
    Selection )

Subset Evaluation Disadvantages
  • High computational cost of the subset search
    makes subset evaluation approach inefficient for
    high dimensional data.

64
  • Feature selection
  • Approaches

65
  • Feature selection
  • Approaches Individual Evaluation (Feature
    Weighting/Ranking)

Individual method (Feature Ranking / Feature
weighting)
  • Individual methods evaluate each feature
    individually according to a criterion.
  • They then select features, which either satisfy
    a condition or are top-ranked.
  • Exhaustive, greedy and random searches are
    subset search methods because they evaluate each
    candidate subset.

66
  • Feature selection
  • Approaches Individual Evaluation (Feature
    Weighting/Ranking)

Individual Evaluation Advantage
  • linear time complexity in terms of
    dimensionality N.
  • Individual method is efficient for
    high-dimensional data.

67
  • Feature selection
  • Approaches Individual Evaluation (Feature
    Weighting/Ranking)

Individual Evaluation Disadvantages
  • It is incapable of removing redundant
    features.
  • For high-dimensional data which may contain a
    large number of redundant features, this approach
    may produce results far from optimal.

Select Weakly Strongly
68
  • Feature selection
  • Approaches

69
  • Feature selection
  • New Framework

New Framework
  • New framework of feature selection composed of
    two steps
  • First Step (Relevance analysis) determines the
    subset of relevant features by removing
    irrelevant ones.
  • Second Step (redundancy analysis) determines
    and eliminates redundant features from relevant
    ones and thus produces the final subset.

70
Part 3 Applications of Feature
Selection And Software
71
  • Feature selection
  • Applications of Feature Selection

72
  • Feature selection
  • Applications of Feature Selection
  • Text categorization Importance
  • Information explosive
  • 80 information stored in text documents
    journals, web pages, emails...
  • Difficult to extract special information
  • Current technologies...

?
?
Internet
73
  • Feature selection
  • Applications of Feature Selection
  • Text categorization
  • Assigning documents to a fixed set of
    categories

sports
categorizer
cultures
News article
health
politics
economics
vacations
74
  • Feature selection
  • Applications of Feature Selection
  • Text categorization
  • Text-Categorization
  • Documents are represented by a vector of
    dimension the size of the vocabulary containing
    word frequency counts
  • Vocabulary 15.000 words (i.e. each document is
    represented by a 15.000-dimensional vector)
  • Typical tasks
  • Automatic sorting of documents into
    web-directories
  • Detection of spam-email

74
75
  • Feature selection
  • Applications of Feature Selection
  • Text categorization
  • Major characteristic, or difficulty of text
    categorization

High dimensionality of the feature space
  • Goal Reduce the original feature space without
    sacrificing categorization accuracy

76
  • Feature selection
  • Applications of Feature Selection
  • Image retrieval
  • Importance Rapid increase of the size and
    amount of image collections from both civilian
    and military equipments
  • Problem Cannot access to or make use of the
    information unless it is organized.
  • Content-based image retrieval Instead of being
    manually annotated by
  • text-based keywords, images would be indexed by
    their own visual contents (features), such as
    color, texture, shape, etc.

One of the biggest problems to make content-based
image retrieval truly scalable to large size
image collections is still the curse of
dimensionality
77
  • Feature selection
  • Applications of Feature Selection
  • Image retrieval
  • Paper ReliefF Based Feature Selection In
    Content-Based Image Retrieval
  • A. sarrafzadeh, Habibollah Agh Atabay, Mir
    Mosen Pedram, Jamshid Shanbehzadeh
  • Image dataset Coil-20 contains
  • 1440 grayscale pictures from 20 classes of
    objects.

78
  • Feature selection
  • Applications of Feature Selection
  • Image retrieval
  • In this paper They use
  • Legendre moments to extract features
  • ReliefF algorithm to select the most relevant
    and non-redundant features
  • Support vector machine to classify images.

The effects of features on classification
accuracy
79
  • Feature selection
  • Weka Software What we can do with ?
  • Weka is a piece of software, written in Java,
    that provides an array of machine learning tools,
    many of which can be used for data mining
  • Pre-processing data
  • Features selection
  • Features extraction
  • Regression
  • Classify data
  • Clustering data
  • Associate rules
  • More functions
  • Create random data set
  • Connect data sets in other formats
  • Visualize data
  • .

80
References
1 M. Dash and H.Liu, Dimensionality Reduction,
in Encyclopedia of Computer Science and
Engineering, John Wiley Sons, Inc 2,958-966,
2009. 2H. Liu and L. Yu, "Toward Integrating
Feature Selection Algorithms for Classification
and Clustering", presented at IEEE Trans. Knowl.
Data Eng, vol. 17, no.4, pp.491-502,
2005. 3I.Guyon and A.Elisseeff, "An
introduction to variable and feature selection",
Journal of Machine Learning Research 3,
11571182, 2003. 4 L. Yu and H. Liu,
Efficient Feature Selection via Analysis of
Relevance and Redundancy", presented at Journal
of Machine Learning Research, vol. 5,
pp.1205-1224, 2004. 5 H. Liu, and H. Motoda,
"Computational methods of feature selection",
Chapman and Hall/CRC Press, 2007. 6 I.Guyon,
Lecture 2 Introduction to Feature
Selection. 7 M.Dash and H.liu, Feature
selection for classification.
81
References
8 Makoto Miwa, A Survey on Incremental Feature
Extraction 9 Lei Yu, Feature Selection and Its
Application in Genomic Data Analysis
Write a Comment
User Comments (0)
About PowerShow.com