From Bicluster to Tricluster An Alternative Approach - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

From Bicluster to Tricluster An Alternative Approach

Description:

Bicluster has increased popularity in finding cluster in gene expression data. However this method works well in two ... S. C. Madeira and A. L. Oliveira. ... – PowerPoint PPT presentation

Number of Views:214
Avg rating:3.0/5.0
Slides: 18
Provided by: golammors
Category:

less

Transcript and Presenter's Notes

Title: From Bicluster to Tricluster An Alternative Approach


1
From Bicluster to TriclusterAn Alternative
Approach
  • Presented by
  • Morshed Osmani

2
Outline
  • Problem Definition
  • Related Works
  • My Idea
  • Further Improvement
  • Suggestion and Comments

3
Problem Statement
  • Bicluster has increased popularity in finding
    cluster in gene expression data. However this
    method works well in two dimensional space
    ((Gene, Sample), (Gene, Time), (Sample, Time)).
  • Tricluster use all those three dimensions to
    cluster data.
  • We want an algorithm which will use the bicluster
    approach to find the cluster in three dimensions.

4
Related Works
  • There are lots of biclustering algorithm to
    cluster data in 2D.1
  • 2 uses all three dimensions but biclustering
    only in (Gene, Sample) dimensions.
  • Tricluster 3 clusters data in all those three
    dimensions.
  • Most of the other algorithms cluster along the 2D.

5
My Idea
  • Correct those mathematical notation
  • Develop method for pruning large search space
  • Implement the modified solution and get some
    simulation result
  • Compare the modified algorithm with original
    papers algorithm

6
Original Paper
  • TRICLUSTER An Effective Algorithm for Mining
    Coherent Clusters in 3D Microarray Data
  • Lizhuang Zhao, Mohammed J. Zaki
  • Rensselaer Polytechnic Institute, New York
  • ACM SIGMOD international conference on Management
    of data, 2005

7
Introduction
  • Traditional clustering algorithms work in the
    full dimensional space.
  • Biclustering, on the other hand, does not have
    such a strict requirement. If some points are
    similar in several dimensions (a subspace), they
    will be clustered together in that subspace.
  • Biclustering is able to identify the
    co-expression patterns of a subset of genes that
    might be relevant to a subset of the samples of
    interest.

8
Introduction (Cont.)
  • There has been a lot of interest in mining gene
    expression patterns across time. These approaches
    are also mainly two-dimensional, i.e., finding
    patterns along the gene-time dimensions.
  • The paper deals with mining tri-clusters, i.e.,
    mining coherent clusters along the
    gene-sample-time (temporal) or gene-sample-region
    (spatial) dimensions.
  • The authors claim TRICLUSTER is the first 3D
    microarray subspace clustering method.

9
Related Work
  • There has been work on mining gene expression
    patterns across time.
  • There are many full-space and biclustering
    algorithms designed to work with microarray
    datasets, such as feature based clustering, graph
    based clustering and pattern based clustering.
  • There is no previous method that mines
    tri-clusters.

10
Challenges
  • Biclustering itself is known to be a NP-hard
    problem. So heuristic methods or probabilistic
    approximations are used .
  • Microarray data is inherently susceptible to
    noise, due to varying experimental conditions,
    thus it is essential that the methods be robust
    to noise.
  • As we do not understand the complex gene
    regulation circuitry in the cell, clustering
    methods should allow overlapping clusters that
    share subsets of genes, samples or
    time-courses/spatial regions.
  • Furthermore, the methods should be flexible
    enough to mine several (interesting) types of
    clusters, and should not be too sensitive to
    input parameters.

11
Mathematical Notations Used
Let
be a set of n genes,
let
be a set of m biological samples (e.g.,
different tissues or experiments)
be a set of l experimental time points.
let
matrix
A three dimensional microarray dataset is a
real-valued
whose three dimensions correspond to genes,
samples and times respectively
A tricluster C is a submatrix of the dataset D,
provided certain conditions of homogeneity are
satisfied.
12
Problems with Notation
  • Used unconventional notation system which seems
    to be incorrect.
  • May confuse the reader to comprehend the actual
    meaning.
  • Wrong use of Cartesian product.
  • May be solved by using mapping (function).

13
The TRICLUSTER Algorithm
  • 3D microarray datasets have more genes than
    samples, and perhaps an equal number of time
    points and samples, i.e.,
  • Due to the symmetric property, TRICLUSTER always
    transposes the input 3D matrix such that
  • the dimension with the largest cardinality (say
    G) is 1st dimension
  • then make S as the 2nd and T as the 3rd
    dimension.

14
Steps of TRICLUSTER
  • TRICLUSTER has following main steps
  • For each GxS time slice matrix, find the valid
    ratio-ranges for all pair of samples, and
    construct a range multigraph
  • Mine the maximal biclusters from the range
    multigraph
  • Construct a graph based on the mined biclusters
    (as vertices) and get the maximal TRICLUSTERs
  • Optionally, delete or merge clusters if certain
    overlapping criteria are met.

15
Future Direction
  • Reduce the search space
  • - How (yet to be determined)
  • Implement the modified algorithm
  • Compare the result with result from previous
    algorithm

16
Need Suggestion
  • Here dataset D is a data cube. What are the
    benefits we may receive from the data cube
    model in this case?
  • How can this problem be integrated to DataDEX (if
    there is a possibility)?
  • How can we reduce the search space?
  • Any other comments or suggestions are welcome.

17
Reference
  • S. C. Madeira and A. L. Oliveira. Biclustering
    algorithms for biological data analysis a
    survey. IEEE/ACM Transactions on Computational
    Biology and Bioinformatics, 1(1)2445, 2004.
  • D. Jiang, J. Pei, M. Ramanathany, C. Tang, and A.
    Zhang. Mining coherent gene clusters from
    gene-sample-time microarray data. In 10th ACM
    SIGKDD Conference, 2004.
  • Lizhuang Zhao and Mohammed J. Zaki. TRICLUSTER
    an effective algorithm for mining coherent
    clusters in 3D microarray data. Proceedings of
    the 2005 ACM SIGMOD international conference on
    Management of data, Baltimore, Maryland, 2005
Write a Comment
User Comments (0)
About PowerShow.com