Genetic Algorithm and Feature Selection - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Genetic Algorithm and Feature Selection

Description:

Genetic Algorithm and Feature Selection. Zheng Li. Michigan State ... 2.Crossover. 11001011 11011111 = 11001111. 11001001 = 10001001. 1. ... Results: GA ... – PowerPoint PPT presentation

Number of Views:927
Avg rating:3.0/5.0
Slides: 22
Provided by: lizh1
Category:

less

Transcript and Presenter's Notes

Title: Genetic Algorithm and Feature Selection


1
Genetic Algorithm and Feature Selection
  • Zheng Li
  • Michigan State University

2
Overview
Gene Profile
GA
DATA
FEATURE
Metabolic profile
3
Gene Profile
GA
DATA
FEATURE
Metabolic profile
4
Genetic Algorithm
1.Encoding
Discrete
Floating
2.Crossover
1100101111011111 11001111
3.Mutation
11001001 gt  10001001
4.Target functionprediction accuracyfeature
subset size
5
GA/PCA/PLS(1.sel scores)
Loading
Encoding
Optimize
Independent variables X
GA
Y
Scores

Regression
Feature space
T3
T4
Dependent variables
T1
6
GA/PCA/PLS(2.sel original vars)
Encoding
Optimize
GA
Y

PCA/PLS
Feature space
X2
Dependent variables
X1
X4
7
Results GA/PCA/PLS( Intra TG)
Those flux number with selection value larger
than 0.5 is selected in the PLS model
8
Results GA/PLS/PCA (Intra TG)
PLS with all variables included has a fittness
value 0.1715
Fittness is defined as sum square error of PLS
prediction
9
Results GA/PLS/PCA(Urea)
10
Results GA/PLS/PCA(Urea)
PLS Model With All Vars has Fittness Of -0.006.
11
Results GA/PLS/PCA
12
GA/MBPLS/MBPCA
Encoding group info
Optimize
GA
Y
1
2

Block1
Block2
Block score
Dependent variables
Super score
13
Results GA/MBPLS
14
Results GA/MBPLS
Variance explained Prediction accuracy
15
GA/KPLS/KPCA/SVM
Encoding
Optimize
GA
Y

KPCA/KPLS SVM
Dependent variables
16
Results GA/SVM
17
Results GA/SVM
Fittness of SVM with all variables included is
0.0083
18
Results GA/SVM
19
Discussion Biomarker Identification
  • Application of GA coupled feature selection in
    Bioinformatics
  • Select most relevant factors from the metabolic
    and genetic
  • profiles that can optimally characterize cellular
    states. Open new avenues for identifying complex
    disease genes and biomarker for disease diagnosis
    and for assessing drug efficiency.
  • Next step is to try on gene data to identify
    marker genes instead of marker metabolic fluxes.
  • Find an appropriate optimization function for
    MBPLS/MBPCA

20
GA/KPCA/KPLS/SVM
  • If we are solving a problem, we are usually
    looking for some solution which will be the best
    among others. The space of all feasible solutions
    (the set of solutions among which the desired
    solution resides) is called search space (also
    state space). Each point in the search space
    represents one possible solution. Each possible
    solution can be "marked" by its value (or
    fitness) for the problem. With GA we look for the
    best solution among among a number of possible
    solutions - represented by one point in the
    search space.

21
GA/MBPLS
Y
Super score
1
2
Block1
Block2
Block score
X
Write a Comment
User Comments (0)
About PowerShow.com