Dimension Reduction and Feature Selection - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Dimension Reduction and Feature Selection

Description:

Combine 'essence' of attributes to create a (hopefully) smaller set of variables ... Select attributes most highly correlated with goal attribute. ... – PowerPoint PPT presentation

Number of Views:119
Avg rating:3.0/5.0
Slides: 19
Provided by: CraigAS7
Category:

less

Transcript and Presenter's Notes

Title: Dimension Reduction and Feature Selection


1
Dimension Reduction and Feature Selection
  • Craig A. Struble, Ph.D.
  • Department of Mathematics, Statistics, and
    Computer Science
  • Marquette University

2
Overview
  • Dimension Reduction
  • Correlation
  • Principal Component Analysis
  • Singular Value Decomposition
  • Feature Selection
  • Information Content

3
Dimension Reduction
  • The number of attributes causes complexity of
    learning, clustering, etc. to grow exponentially
  • Curse of dimensionality
  • We need methods to reduce the number of
    attributes
  • Dimension reduction reduces attributes without
    (directly) considering relevance of the
    attribute.
  • Not really removing attributes, but
    combining/recasting them.

4
Correlation
  • A causal, complementary, parallel, or reciprocal
    relationship
  • The simultaneous change in value of two
    numerically valued random variables
  • So, if one attributes value changes in a
    predictable way whenever another one changes, why
    keep them both?

5
Correlation Analysis
  • Pearsons Correlation Coefficient
  • Positive means both increase simultaneously
  • Negative means one increases as other decreases
  • If rA,B has a large magnitude, A and B are
    strongly correlated and one of the attributes can
    be removed

6
Correlation Analysis
Strong relationship
7
Principal Component Analysis
  • Karhunen-Loeve or K-L method
  • Combine essence of attributes to create a
    (hopefully) smaller set of variables the describe
    the data
  • An instance with k attributes is a point in
    k-dimensional space
  • Find c k-dimensional orthogonal vectors that best
    represent the data such that c lt k
  • These vectors are combinations of attributes.

8
Principal Component Analysis
  • Normalize the data
  • Compute c orthonormal vectors, which are the
    principal components
  • Sort in order of decreasing significance
  • Measured in terms of data variance
  • Can reduce data dimension by choosing only the
    most significant principal components

9
Singular Value Decomposition
  • One method of PCA
  • Let A be an m by n matrix. Then A can be written
    as the product of matrices
  • such that U is an m by n matrix, V is an n by n
    matrix, and ? is an n by n diagonal matrix with
    singular values ?1gt?2 gtgt ?ngt0. Furthermore,
    U and V are orthogonal matrices

10
Singular Value Decomposition
11
Singular Value Decomposition
gt x lt- t(array(112,dimc(3,4))) gt str(s lt-
svd(x)) u ,1 ,2
,3 1, -0.1408767 -0.82471435 -0.3128363 2,
-0.3439463 -0.42626394 0.7522216 3, -0.5470159
-0.02781353 -0.5659342 4, -0.7500855
0.37063688 0.1265489 v ,1
,2 ,3 1, -0.5045331 0.76077568
-0.4082483 2, -0.5745157 0.05714052
0.8164966 3, -0.6444983 -0.64649464
-0.4082483 gt a lt- diag(sd) ,1
,2 ,3 1, 25.46241 0.000000
0.000000e00 2, 0.00000 1.290662
0.000000e00 3, 0.00000 0.000000 8.920717e-16
12
Singular Value Decomposition
  • The amount of variance captured by a singular
    value is
  • The entropy of the data set is

13
Feature Selection
  • Select the most relevant subset of attributes
  • Wrapper approach
  • Features are selected as part of the mining
    algorithm
  • Filter approach
  • Features selected before mining algorithm
  • Wrapper approach is generally more accurate but
    also more computationally expensive

14
Feature Selection
  • Feature selection is actually a search problem
  • Want to select subset of features giving most
    accurate model

a,b,c
b,c
a,c
a,b
b
c
a
?
15
Feature Selection
  • Any search heuristics will work
  • Branch and bound
  • Best-first or A
  • Genetic algorithms
  • etc.
  • Bigger problem is to estimate the relevance of
    attributes without building classifier.

16
Feature Selection
  • Using entropy
  • Calculate information gain of each attribute
  • Select the l attributes with the highest
    information gain
  • Removes attributes that are the same for all data
    instances

17
Feature Selection
  • Stepwise forward selection
  • Start with empty attribute set
  • Add best of attributes
  • Add best of remaining attributes
  • Repeat. Take the top l
  • Stepwise backward selection
  • Start with entire attribute set
  • Remove worst of attributes
  • Repeat until l are left.

18
Feature Selection
  • Other methods
  • Sample data, build model for subset of data and
    attributes to estimate accuracy.
  • Select attributes with most or least variance
  • Select attributes most highly correlated with
    goal attribute.
  • What does feature selection provide you?
  • Reduced data size
  • Analysis of most important pieces of
    information to collect.
Write a Comment
User Comments (0)
About PowerShow.com