Recursive Partitioning for Tumor Classification with Gene Expression Microarray Data

About This Presentation

Title:

Recursive Partitioning for Tumor Classification with Gene Expression Microarray Data

Description:

Results From Classification Tree on the Data. Fig 1. Classification tree for tissue types by using ... Fig 3. A scatterplot of expression data from R15447 ... – PowerPoint PPT presentation

Number of Views:202

Avg rating:3.0/5.0

Slides: 14

Provided by: geneti

Category:

more less

Transcript and Presenter's Notes

Title: Recursive Partitioning for Tumor Classification with Gene Expression Microarray Data

1
Recursive Partitioning for Tumor Classification
with Gene Expression Microarray Data

Heping Zhang, Chang-Yung Yu,
Burton Singer, Momian Xiong
Presented by Weihua Huang

2
Expression profiles of 2,000 genes using an
Affymetrix oligonucleotide array in 22 normal and
40 colon cancer tissuesThe response is binary
indicating normal or cancer tissue and the
predictor variables are the 2000 genes
Data used in the article
3
Classification Tree Using Recursive Partitioning
Goal To partition the feature space into
disjoint regions by growing a tree so that the
group in the same region are homogeneous in terms
of response. Algorithm Start with a root node
containing the study sample and split it into
smaller and smaller nodes according to whether a
particular selected predictor is above a chosen
cutoff value. At each splitting step, the
selected predictor and its corresponding level
are chosen to maximize the reduction in node
impurity ?I P(A)I(A) P(AL)I(AL) P(AR)I(AR)
4
Classification Tree using Recursive Partitioning
Node impurity One example of node impurity is
measured by entropy function
- P log(P) - (1-P) log(1-P), where P is
the probability of a tissue being normal within
the node

Minimum impurity ( 0 )
When all tissues are of the same type within the
node ( P 0 or 1)

Maximum impurity ( log2)
When half normal tissues and half cancer tissues
are within the node (P0.5)

5
Results From Classification Tree on the DataFig
1. Classification tree for tissue types by using
expression data from three genes ( M26383,
R15447, M28214)
6
Another Way to Visualize the Recursive
PartitioningFig 3. A scatterplot of expression
data from R15447 and M28214 for a subset of
tissues (node 3 in Fig. 1).
7
Results from Recursive partitioning

Quality of the tree-based classification
Using localized 5-fold cross validation error
rate
The same genes to the same nodes
Randomly divide the 40 cancer tissues into 5
subsamples of 8, and the 22 normal tissues into 5
subsamples of 4,4,4,5, and 5 four subsamples
each from the cancer and normal tissues were
used to choose the cutoff values for the three
splits. The remaining samples were used to count
the misclassified tissues as a result of new
cutoff values.
The error rate is between 6-8 from two runs of
cross validation, which is much better than that
obtained by existing analysis.

8
Correlation Analysis on Genes

Functional expressions from various genes are
correlated.
Examine the correlation patterns of the three
selected genes in Fig. 1.

9
Correlation Between the Three Selected Genes and
the Remaining Expression Data
10
Another Tree Based on a Different Set of Three
GenesFig. 6. Classification tree for tissue
types using expression data from three genes
(R87126, T62947, X15183)
11
Correlation Matrix Among Genes in Fig.1 and Fig.
6

12
1. Efficient with large number of genes2.
Automatically selects valuable and user-friendly
genes as predictors3. More precise than some
other classification methods such as support
vector machine and linear discriminant analysis
Advantages of the Classification Tree
13
1. It is likely that the information contained in
a large number of genes can be captured by a
small optimal set of genes without significant
loss of information. 2. The precision of
classification of recursive partitioning is
important for clinical application.
Conclusions

Write a Comment

User Comments (0)