Microarray Data Analysis for Gene Selection and Cancer Classification - PowerPoint PPT Presentation

1 / 52
About This Presentation
Title:

Microarray Data Analysis for Gene Selection and Cancer Classification

Description:

Microarray data analysis is being developed and currently has become an ... of the gene regulation are estimated using probability statistics methods; ... – PowerPoint PPT presentation

Number of Views:589
Avg rating:3.0/5.0
Slides: 53
Provided by: iseSd
Category:

less

Transcript and Presenter's Notes

Title: Microarray Data Analysis for Gene Selection and Cancer Classification


1
Microarray Data Analysis for Gene Selection and
Cancer Classification
  • De-Shuang Huang(???)
  • http//www.intelengine.cn/
  • Intelligent Computing Lab,
  • Institute of Intelligent Machines, CAS, China
  • University of Science Technology of China
  • September 25, 2005

2
Contents
  • 1. Microarray Data Analysis and Microarray
    Technology
  • 2. Gene Selection Method Based on Support Vectors
    and Penalty Strategy
  • 3. Gene Selection Method Based on Gene Regulation
    Probability (GRP)
  • 4. ICA for Cancer Classification Using Gene
    Expression Data
  • 5. An MSA-HLA Based RBF Classifier for Cancer
    Classification
  • 6. Conclusions

3
1. Microarray Data Analysis and Microarray
Technology
  • Microarray technology was developed in 1993 and
    used to simultaneously detect the express levels
    of hundreds of thousands of genes in biological
    bodies.
  • It is a historic technology as sensitive in vivo
    sensors for clinical diagnosis.
  • The tremendous amount of data from microarray
    technology presents a challenge for data
    analysis.
  • Microarray data analysis is being developed and
    currently has become an important content in
    Bioinformatics.
  • Two important issues in application of microarray
    technology on cancer research Gene Selection
    and Cancer Classification.

4
Two means for manufacture of microarrays
  • In situ synthesized oligonucleotide arrays
  • By Affymetrix Inc.

5
Pre-synthesized cDNA arrays
By Patrick Browns Lab at Stanford University
6
  • A sample of microarray

7
  • Acquisition of gene expression data by microarray
    technology

8
  • An image sample scanned by laser scanner with two
    channels

The image will be digitalized as gene
expression values, where a colored point
represents a gene.
9
  • Image analysis and synthesis through special
    software

10
  • 1.1 Structure of microarray data

Description Matrix with one row for each gene
and one column for each condition (replicate).
11
  • 1.2 Aims and applications of microarray data
    analysis
  • Identify biologically specific genes
  • Prognose and diagnose diseases
  • Explore relation between genes or other
    biological factors
  • Discover the gene alternative splicing expression
    law
  • Help to understand disease pathology
  • Assist studying the gene regulation network

12
  • 1.3 Difficulty and Complication in Microarray
    Data Analysis
  • High dimensionality generally 5,000-15,000
  • Inherently very noisy
  • High degree of variability
  • The vast majority of variables hidden
  • Complicated relations between genes
  • Complicated relations between phenotypes
  • Complicated relations between genes and
    phenotypes

13
2. Gene Selection Based on Support Vectors and
Penalty Strategy
  • In the algorithm, a cross validation procedure is
    performed on datasets
  • For each validation sub-procedure, support vector
    machines (SVM) are trained and tested
  • For each SVM, its support vectors are weighted
    and combined into the initial gene correlation
    degree with the class distinction
  • A penalty strategy is proposed to penalize the
    initial CDs to obtain the penalized CDs that are
    used to produce a criterion for gene selection
  • The applications on the leukemia dataset and the
    colon dataset show that our algorithm can
    identify the key genes related with the class
    distinction and it is competitive to the previous
    methods.

14
  • Algorithm
  • Step 1. Compute the original gene correlation
    degree with the class distinction

where is the number of the support vectors
in set.
15
  • Step 2. Compute the penalized correlation degree
  • where

16
Step 3. For, each gene, compute the compositive
correlation degree
where is the number of the cross validation
procedures
Step 4. Rank all genes
17
Biotechnology Letters, vol.27, no.8, pp.597-603,
2005.
  • Experimental results
  • The trends of the weights changes of 50 genes
    during 50 times random cross validations

leukemia dataset

18
  • The convergent process of correlation degree

colon dataset
leukemia dataset

19
3. Gene selection method based on gene regulation
probability (GRP)
  • In the method, the gene regulation is defined in
    statistics
  • The probabilities of the gene regulation are
    estimated using probability statistics methods
  • The method can extract the gene regulation
    information and be used for gene selection
  • The applications on the leukemia dataset and the
    colon dataset suggest that our proposed method is
    effective and efficient and competitive to
    previous methods.

20
  • Established gene regulation model

Commonly, we can preset
, where the cutoff coefficient
.
21
Definition 3 The regulation matrix, B, is such a
matrix that has the same representation form as
the microarray data matrix, A, but contains the
generic elements, ,
which are determined by Definitions 1 and 2 and
used to record the regulation states of the gene,
g, in the tissue sample, s.
For the up-regulation matrix, B,
22
For the down-regulation matrix B-,
Define
23
Next, to estimate the regulation probabilities
using the probabilistic statistic methods,
assume that the regulation event, E, occurs by
the probability, , under the background
context C, that is,
where x can be got from the regulation matrices,
B and B-, i.e.,
24
The marginal probability of x can be computed as
follows
25
  • Experimental Results
  • Gene distribution over the up- or down-regulation
    event

26
Cell (submitted) .
  • Fitting distribution of GRP

27
Illustration of the distinction in expression
levels of the partial selected genes
28
  • Selected genes and their biological descriptions

29
  • Significance test of the gene regulation
    probability difference

30
  • Genes regulated under different significance
    levels

31
  • Comparison between actual GRP and permuted
  • GRP I

32
  • Comparison between actual GRP and permuted GRP II

33

4. ICA for Cancer Classification Using Gene
Expression Data
A . Independent Component Analysis
  • Mixing model
  • Demixing model

Neurocomputing (accepted)
34
B. The Independent Basis Snapshot Representation
  • t-statistics

35
C. Classifiers
  • SVM
  • Leave-out-one cross-validation (LOO-CV)
  • Accuracy

36
Experimental Results
A. Datasets
37
B1. Experiments
38
B2. Experiments
39
C. Comparison of Experimental Results for
Different Methods
40
D. Comparison of Experimental Results for
Different Methods
41
E. The Effects of Gene Numbers Used in ICA
42
5. An MSA-HLA Based RBF Classifier for Cancer
Classification
  • A modified simulated annealing (MSA) algorithm is
    developed and combined with the linear least
    square paradigms to optimize the structure of the
    radial basis function classifier (RBFclassifier).
  • The optimized RBFclassifier is applied to cancer
    classifications.
  • Experimental results show that the optimized
    RBFclassifier is not only parsimonious but also
    has better generalization performance.

43
  • Methods
  • The modified SA (MSA)
  • Simulated annealing (SA) is modified to
    search for the optimal number of RBF centers, and
    the resulting MSA uses MSE of RBF classifiers as
    the evolving environment. The MSA algorithm is
    stated as follows

44
Step1. Initialize the initial state and the
temperature of the system, and set the annealing
schedule. Step 2. At each T(t), repeat a
predetermined number of times (i)
Randomly produce a new state, , and compute
and . (ii) is rejected if
and accepted by
otherwise. Step 3. T(t) is updated by the
annealing schedule and the process is stopped if
the lowest temperature is arrived and go to Step
2 otherwise.
45
  • The hybrid learning algorithm (HLA)
  • HLA is employed to further optimize the
    spreads and centers as well as weights of RBF
    classifiers based on the results from MSA. HLA
    algorithm is stated as follows
  • Step 1. Weights are obtained by LS

46
  • Step 2. Compute the MSE
  • Step 3. Update centers

47
  • Step 4. Update spreads

48
IEE Electronics Letters, vol.41, no.11,
pp.630-632, 2005.
  • Experimental results
  • Evolving curves for the numbers of centers by MSA

49
  • Comparison of cancer recognition rates
  • among different algorithms

50
Conclusions
  • Our studies show that it is indeed feasible to
    identify or classify genes by analyzing
    microarray data
  • Microarray data analysis can be used to identify
    the genes and pathway, and reveal new targets for
    therapy, and prognose the individual cancer
    subtype
  • When more expression signatures of larger tumor
    sets become available, it will become clear how
    microarray data analysis will improve monitoring
    of the stages in which tumors grow and spread,
    and therefore prognosis

51
Conclusions
  • 4. More and better methods for analyzing
    microarry data need to be developed
  • 5. Lots of pattern recognition tools and
    machine learning methods will be possibly applied
    on microarray data
  • 6. Due to inherent noise and variations in
    microarray data, it is necessary to develop
    probabilistic methods for extracting useful
    information.
  • 7. At present, the most commonly used
    computational approach for analyzing microarray
    data is clustering analysis, e.g.,hierarchical
    clustering , k-means, SOM. etc.

52
  • THE END
Write a Comment
User Comments (0)
About PowerShow.com