Robust%20Information-theoretic%20Clustering - PowerPoint PPT Presentation

About This Presentation
Title:

Robust%20Information-theoretic%20Clustering

Description:

Ability to describe the clusters succinctly. Adopt VAC (Volume after Compression) ... Record #bytes for number of clusters k. Record #bytes to record their type ... – PowerPoint PPT presentation

Number of Views:13
Avg rating:3.0/5.0
Slides: 15
Provided by: niyati1
Category:

less

Transcript and Presenter's Notes

Title: Robust%20Information-theoretic%20Clustering


1
Robust Information-theoretic Clustering
  • By C. Bohm, C. Faloutsos, J-Y. Pan, and C. Plant
  • Presenter Niyati Parikh

2
Objective
  • Find natural clustering in a dataset
  • Two questions
  • Goodness of a clustering
  • Efficient algorithm for good clustering

3
Define goodness
  • Ability to describe the clusters succinctly
  • Adopt VAC (Volume after Compression)
  • Record bytes for number of clusters k
  • Record bytes to record their type (guassian,
    uniform,..)
  • Compressed location of each point

4
VAC
  • Tells which grouping is better
  • Lower VAC gt better grouping
  • Formula using decorrelation matrix
  • Decorrelation matrix matrix with eigenvectors

5
Computing VAC
  • Steps
  • Compute covariance matrix of cluster C
  • Compute PCA and obtain eigenvector matrix
  • Compute VAC from the matrix

6
Efficient algorithm
  • Take initial clustering given by any algorithm
  • Refine that clustering to remove outliers/noise
  • Output a better clustering by doing post
    processing

7
Refining Clusters
  • Use VAC to refine existing clusters
  • Removing outliers from the given cluster C
  • Define Core and Out as set of points for core and
    outliers in C
  • Initially Out contains all points in C
  • Arrange points in ascending order of its
    distance from center
  • Compute VAC
  • Pick the closest point from Out and move to Core
  • Compute new VAC
  • If new VAC increases then stop, else pick next
    closest point and repeat

8
VAC and Robust estimation
  • Conventional estimation covariance matrix uses
    Mean
  • Robust estimation covariance matrix uses Median
  • Median is less affected by outliers than Mean

9
Sample result
  • Imperfect clusters formed by K-Means affect
    purifying process
  • May result into redundant clusters, that could be
    merged

10
Cluster Merging
  • Merge Ci and Cj only if the combined VAC
    decreases
  • savedCost(Ci, Cj) VAC(Ci) VAC(Cj) VAC(Ci U
    Cj)
  • If savedCost gt 0, then merge Ci and Cj
  • Greedy search to maximize savedCost, hence
    minimize VAC

11
Final Result
12
Experiment results
13
Example
14
Thank You
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com