Take a Walk and Cluster Genes: A TSP-based Approach to Optimal Rearrangement Clustering - PowerPoint PPT Presentation

Loading...

PPT – Take a Walk and Cluster Genes: A TSP-based Approach to Optimal Rearrangement Clustering PowerPoint presentation | free to download - id: 56aee5-OTRkY



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Take a Walk and Cluster Genes: A TSP-based Approach to Optimal Rearrangement Clustering

Description:

Title: Rearrangement Clustering Author: Sharlee Climer Last modified by: Sharlee Climer Created Date: 7/5/2004 3:54:42 PM Document presentation format – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 25
Provided by: Sharlee4
Learn more at: http://www.cse.wustl.edu
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Take a Walk and Cluster Genes: A TSP-based Approach to Optimal Rearrangement Clustering


1
Take a Walk and Cluster Genes A TSP-based
Approach to Optimal Rearrangement Clustering
  • Sharlee Climer and
  • Weixiong Zhang
  • This research was supported in part by NDSEG and
    Olin Fellowships
  • and by NSF grants IIS-0196057 and
    ITR/EIA-0113618.

2
Overview
  • Introduction
  • Example
  • Results
  • Conclusion

3
Introduction
  • Rearrangement clustering
  • Rearrange rows of a matrix
  • Minimize the sum of the differences between
    adjacent rows
  • min S d(i, i1)
  • Rows correspond to objects
  • Columns correspond to features

4
Introduction
  • Applications
  • Information retrieval
  • Manufacturing
  • Software engineering

5
Example
6
Example
  • Bond Energy Algorithm (BEA)
  • Introduced in 1972 (McCormick, Schweitzer, White)
  • Approximate solution
  • Still widely used

7
Example
8
Example
  • Optimal solution
  • Lenstra (1974) observed equivalence to the
    Traveling Salesman Problem (TSP)
  • Given n cities and the distance between each pair
  • Find shortest cycle visiting every city
  • NP-hard problem

9
Example
  • Transform into a TSP
  • Each object corresponds to a city
  • Distance between two cities equal to difference
    between the corresponding objects
  • Dummy city added to problem
  • Costs from dummy city to all other cities equal a
    constant
  • Location of dummy city indicates position to cut
    cycle into a path

10
Example
  • TSP solvers extremely slow even for small
    problems in the 70s
  • Massive research efforts to solve TSP over last
    three decades
  • Current solvers
  • Concorde (Applegate, Bixby, Chvatal, Cook, 2001)
  • Solved a 15,112 city TSP

11
Example
12
Example
  • BEA and TSP offer approximate and optimal
    solutions
  • We have observed a flaw in the objective function
    when the objects form natural clusters
  • The objective minimizes the sum of every pair of
    adjacent rows
  • Inter-cluster distances tend to be significantly
    larger than intra-cluster distances
  • Summation dominated by inter-cluster distances

13
Example
  • TSPCluster addresses this flaw
  • Add k dummy cities
  • k clusters are specified by the output
  • TSP solver ignores inter-cluster distances
  • Minimizes sum of intra-cluster distances
  • Use sufficiently small constant for distances
    to/from dummy cities
  • Dummy cities never adjacent to each other

14
Example
15
Results
  • Arabidopsis
  • 499 genes
  • 25 conditions
  • Comparison with BEA
  • Used BEA similarity measure
  • BEA score 447,070
  • TSPCluster score 452,109 (k 1)

16
Results
BEA
TSPCluster
17
Results
  • Compared with Cluster (Eisen et al., 1998) and
    k-ary (Bar-Joseph et al., 2003)
  • Used Pearson correlation coefficient
  • Cluster 398
  • k-ary 427
  • TSPCluster 436 (k 1)

18
Results
Cluster
k-ary
TSPCluster
19
Results
  • TSPCluster with k equal to 2 to 50
  • How many clusters?
  • Average inter-cluster distances
  • BEA local peaks
  • 6, 13, 19, 26, 29, 35, 40, 47
  • Pearson correlation coefficient local peaks
  • 3, 9, 12, 21, 26, 40
  • Computation time varied
  • Less than half minute to 3 minutes

20
Results
k 26
k 40
21
Conclusion
  • Most problems have errors in their data
  • Error introduced by approximation algorithms
    cant be expected to undo this error
  • Computers are cheap
  • Computers and solvers are sophisticated
  • Dont have to always resort on approximate
    solutions even for NP-hard problems

22
Conclusion
  • Rearrangement clustering provides a linear
    ordering
  • Linear ordering inherent to many applications
  • Information retrieval
  • Manufacturing
  • Software engineering

23
Conclusion
  • Gene data arranged in linear order to examine
    data
  • Linear ordering not necessarily essential to gene
    clustering problems
  • Current work
  • Optimally solve subproblems in clustering
    algorithms

24
Questions?
About PowerShow.com