Dimension Reduction and Feature Selection - PowerPoint PPT Presentation

1 / 18

About This Presentation

Title:

Dimension Reduction and Feature Selection

Description:

Combine 'essence' of attributes to create a (hopefully) smaller set of variables ... Select attributes most highly correlated with goal attribute. ... – PowerPoint PPT presentation

Number of Views:119

Avg rating:3.0/5.0

Slides: 19

Provided by: CraigAS7

Category:

more less

Transcript and Presenter's Notes

Title: Dimension Reduction and Feature Selection

1
Dimension Reduction and Feature Selection

Craig A. Struble, Ph.D.
Department of Mathematics, Statistics, and
Computer Science
Marquette University

2
Overview

Dimension Reduction
Correlation
Principal Component Analysis
Singular Value Decomposition
Feature Selection
Information Content

3
Dimension Reduction

The number of attributes causes complexity of
learning, clustering, etc. to grow exponentially
Curse of dimensionality
We need methods to reduce the number of
attributes
Dimension reduction reduces attributes without
(directly) considering relevance of the
attribute.
Not really removing attributes, but
combining/recasting them.

4
Correlation

A causal, complementary, parallel, or reciprocal
relationship
The simultaneous change in value of two
numerically valued random variables
So, if one attributes value changes in a
predictable way whenever another one changes, why
keep them both?

5
Correlation Analysis

Pearsons Correlation Coefficient
Positive means both increase simultaneously
Negative means one increases as other decreases
If rA,B has a large magnitude, A and B are
strongly correlated and one of the attributes can
be removed

6
Correlation Analysis
Strong relationship
7
Principal Component Analysis

Karhunen-Loeve or K-L method
Combine essence of attributes to create a
(hopefully) smaller set of variables the describe
the data
An instance with k attributes is a point in
k-dimensional space
Find c k-dimensional orthogonal vectors that best
represent the data such that c lt k
These vectors are combinations of attributes.

8
Principal Component Analysis

Normalize the data
Compute c orthonormal vectors, which are the
principal components
Sort in order of decreasing significance
Measured in terms of data variance
Can reduce data dimension by choosing only the
most significant principal components

9
Singular Value Decomposition

One method of PCA
Let A be an m by n matrix. Then A can be written
as the product of matrices
such that U is an m by n matrix, V is an n by n
matrix, and ? is an n by n diagonal matrix with
singular values ?1gt?2 gtgt ?ngt0. Furthermore,
U and V are orthogonal matrices

10
Singular Value Decomposition
11
Singular Value Decomposition
gt x lt- t(array(112,dimc(3,4))) gt str(s lt-
svd(x)) u ,1 ,2
,3 1, -0.1408767 -0.82471435 -0.3128363 2,
-0.3439463 -0.42626394 0.7522216 3, -0.5470159
-0.02781353 -0.5659342 4, -0.7500855
0.37063688 0.1265489 v ,1
,2 ,3 1, -0.5045331 0.76077568
-0.4082483 2, -0.5745157 0.05714052
0.8164966 3, -0.6444983 -0.64649464
-0.4082483 gt a lt- diag(sd) ,1
,2 ,3 1, 25.46241 0.000000
0.000000e00 2, 0.00000 1.290662
0.000000e00 3, 0.00000 0.000000 8.920717e-16
12
Singular Value Decomposition

The amount of variance captured by a singular
value is
The entropy of the data set is

13
Feature Selection

Select the most relevant subset of attributes
Wrapper approach
Features are selected as part of the mining
algorithm
Filter approach
Features selected before mining algorithm
Wrapper approach is generally more accurate but
also more computationally expensive

14
Feature Selection

Feature selection is actually a search problem
Want to select subset of features giving most
accurate model

a,b,c
b,c
a,c
a,b
b
c
a
?
15
Feature Selection

Any search heuristics will work
Branch and bound
Best-first or A
Genetic algorithms
etc.
Bigger problem is to estimate the relevance of
attributes without building classifier.

16
Feature Selection

Using entropy
Calculate information gain of each attribute
Select the l attributes with the highest
information gain
Removes attributes that are the same for all data
instances

17
Feature Selection

Stepwise forward selection
Start with empty attribute set
Add best of attributes
Add best of remaining attributes
Repeat. Take the top l
Stepwise backward selection
Start with entire attribute set
Remove worst of attributes
Repeat until l are left.

18
Feature Selection

Other methods
Sample data, build model for subset of data and
attributes to estimate accuracy.
Select attributes with most or least variance
Select attributes most highly correlated with
goal attribute.
What does feature selection provide you?
Reduced data size
Analysis of most important pieces of
information to collect.