From Bicluster to Tricluster An Alternative Approach

About This Presentation

Title:

From Bicluster to Tricluster An Alternative Approach

Description:

Bicluster has increased popularity in finding cluster in gene expression data. However this method works well in two ... S. C. Madeira and A. L. Oliveira. ... – PowerPoint PPT presentation

Number of Views:214

Avg rating:3.0/5.0

Slides: 18

Provided by: golammors

Category:

more less

Transcript and Presenter's Notes

Title: From Bicluster to Tricluster An Alternative Approach

1
From Bicluster to TriclusterAn Alternative
Approach

Presented by
Morshed Osmani

2
Outline

Problem Definition
Related Works
My Idea
Further Improvement
Suggestion and Comments

3
Problem Statement

Bicluster has increased popularity in finding
cluster in gene expression data. However this
method works well in two dimensional space
((Gene, Sample), (Gene, Time), (Sample, Time)).
Tricluster use all those three dimensions to
cluster data.
We want an algorithm which will use the bicluster
approach to find the cluster in three dimensions.

4
Related Works

There are lots of biclustering algorithm to
cluster data in 2D.1
2 uses all three dimensions but biclustering
only in (Gene, Sample) dimensions.
Tricluster 3 clusters data in all those three
dimensions.
Most of the other algorithms cluster along the 2D.

5
My Idea

Correct those mathematical notation
Develop method for pruning large search space
Implement the modified solution and get some
simulation result
Compare the modified algorithm with original
papers algorithm

6
Original Paper

TRICLUSTER An Effective Algorithm for Mining
Coherent Clusters in 3D Microarray Data
Lizhuang Zhao, Mohammed J. Zaki
Rensselaer Polytechnic Institute, New York
ACM SIGMOD international conference on Management
of data, 2005

7
Introduction

Traditional clustering algorithms work in the
full dimensional space.
Biclustering, on the other hand, does not have
such a strict requirement. If some points are
similar in several dimensions (a subspace), they
will be clustered together in that subspace.
Biclustering is able to identify the
co-expression patterns of a subset of genes that
might be relevant to a subset of the samples of
interest.

8
Introduction (Cont.)

There has been a lot of interest in mining gene
expression patterns across time. These approaches
are also mainly two-dimensional, i.e., finding
patterns along the gene-time dimensions.
The paper deals with mining tri-clusters, i.e.,
mining coherent clusters along the
gene-sample-time (temporal) or gene-sample-region
(spatial) dimensions.
The authors claim TRICLUSTER is the first 3D
microarray subspace clustering method.

9
Related Work

There has been work on mining gene expression
patterns across time.
There are many full-space and biclustering
algorithms designed to work with microarray
datasets, such as feature based clustering, graph
based clustering and pattern based clustering.
There is no previous method that mines
tri-clusters.

10
Challenges

Biclustering itself is known to be a NP-hard
problem. So heuristic methods or probabilistic
approximations are used .
Microarray data is inherently susceptible to
noise, due to varying experimental conditions,
thus it is essential that the methods be robust
to noise.
As we do not understand the complex gene
regulation circuitry in the cell, clustering
methods should allow overlapping clusters that
share subsets of genes, samples or
time-courses/spatial regions.
Furthermore, the methods should be flexible
enough to mine several (interesting) types of
clusters, and should not be too sensitive to
input parameters.

11
Mathematical Notations Used
Let
be a set of n genes,
let
be a set of m biological samples (e.g.,
different tissues or experiments)
be a set of l experimental time points.
let
matrix
A three dimensional microarray dataset is a
real-valued
whose three dimensions correspond to genes,
samples and times respectively
A tricluster C is a submatrix of the dataset D,
provided certain conditions of homogeneity are
satisfied.
12
Problems with Notation

Used unconventional notation system which seems
to be incorrect.
May confuse the reader to comprehend the actual
meaning.
Wrong use of Cartesian product.
May be solved by using mapping (function).

13
The TRICLUSTER Algorithm

3D microarray datasets have more genes than
samples, and perhaps an equal number of time
points and samples, i.e.,
Due to the symmetric property, TRICLUSTER always
transposes the input 3D matrix such that
the dimension with the largest cardinality (say
G) is 1st dimension
then make S as the 2nd and T as the 3rd
dimension.

14
Steps of TRICLUSTER

TRICLUSTER has following main steps
For each GxS time slice matrix, find the valid
ratio-ranges for all pair of samples, and
construct a range multigraph
Mine the maximal biclusters from the range
multigraph
Construct a graph based on the mined biclusters
(as vertices) and get the maximal TRICLUSTERs
Optionally, delete or merge clusters if certain
overlapping criteria are met.

15
Future Direction

Reduce the search space
- How (yet to be determined)
Implement the modified algorithm
Compare the result with result from previous
algorithm

16
Need Suggestion

Here dataset D is a data cube. What are the
benefits we may receive from the data cube
model in this case?
How can this problem be integrated to DataDEX (if
there is a possibility)?
How can we reduce the search space?
Any other comments or suggestions are welcome.

17
Reference

S. C. Madeira and A. L. Oliveira. Biclustering
algorithms for biological data analysis a
survey. IEEE/ACM Transactions on Computational
Biology and Bioinformatics, 1(1)2445, 2004.
D. Jiang, J. Pei, M. Ramanathany, C. Tang, and A.
Zhang. Mining coherent gene clusters from
gene-sample-time microarray data. In 10th ACM
SIGKDD Conference, 2004.
Lizhuang Zhao and Mohammed J. Zaki. TRICLUSTER
an effective algorithm for mining coherent
clusters in 3D microarray data. Proceedings of
the 2005 ACM SIGMOD international conference on
Management of data, Baltimore, Maryland, 2005