1 / 42

Spectral Visual Clustering Tendency

- L. Wang, X. Geng, J. C. Bezdek, C. Leckie, and K.

Ramamohanarao, - SpecVAT Enhanced visual cluster analysis, in

Proceedings of the Eighth IEEE International

Conference on Data Mining, 2008. (ICDM 08), Dec. - 2008, pp. 638647.
- School of Engineering, The University of

Melbourne, Vic 3010, Australia

Clustering

Conventional K-means Clustering

4) Steps 2 and 3 are repeated until convergence

has been reached.

3) The centroid of each of the k clusters becomes

the new means.

1) k initial "means" (in this case k3)

2) associating every observation with the nearest

mean.

How to determine the k?

Determining the Number of Clusters

- Determining Before Clustering
- Cluster Tendency Analysis
- Determining After Clustering
- Cluster Validity Measurement

Cluster Tendency Analysis

Cluster Validity Measurement

Clustering

Input

Output

Visual Analysis of Cluster Tendency (VAT)

Scatter plot of a 2D data set

Unordered image I(D)

Reordered VAT image I(D)

J. C. Bezdek and R. J. Hathaway. VAT A tool for

visual assement of (cluster) tendency. In Proc.

International Joint Conference on Neural

Networks, pages 22252230, 2002.

Dissimilarity Matrix

n objects

Dissimilarity Image

Dissimilarity Matrix

5

1

3

d12

4

2

Dissimilarity between objects oi and oj

Scatter plot of a 2D data set

Reordered Dissimilarity Matrix

5

1

3

D

d12

4

2

Reordering

5

4

3

D

2

1

Example

VAT Algorithm

Dissimilarity Image

Dissimilarity Matrix

5

1

3

Max Dissimilarity

4

2

5

4

3

2

1

Problem of VAT

Reordered VAT Image

Scatter plot

Scatter plots of 9 synthetic data sets. From left

to right and from top to bottom S-1 S-9

Spectral Clustering

Scatter plot of a 2D data set

K-means Clustering

Spectral Clustering

U. von Luxburg. A tutorial on spectral

clustering. Technical report, Max Planck

Institute for Biological Cybernetics, Germany,

2006.

Spectral Graph

Connected Groups

Similarity Graph

Similarity Graph

Similarity Graph

Vertex Set

Weighted Adjacency Matrix

Similarity Graph

Similarity Graph

- e-neighborhood Graph
- k-nearest neighbor Graphs
- Fully connected graph

Gaussian Similarity Function

e-neighborhood

K-nearest neighbor

e

Spectral Graph

Connected Groups

Similarity Graph

Graph Laplacian

L Laplacian matrix

W adjacency matrix

D degree matrix

1 2 3 4 5

d1 0 0 0 0

0 d2 0 0 0

0 0 d3 0 0

0 0 0 d4 0

0 0 0 0 d5

1

2

3

4

5

1 2 3 4 5

w11 w12 w13 w14 w15

w21 w22 w23 w24 w25

w31 w32 w33 w34 w35

w41 w42 w43 w44 w45

w51 w52 w53 w54 w55

1

2

3

4

5

Example

W adjacency matrix

D degree matrix

0 1 1 0 0

1 0 1 0 0

1 1 0 0 0

0 0 0 0 1

0 0 0 1 0

2 0 0 0 0

0 2 0 0 0

0 0 2 0 0

0 0 0 1 0

0 0 0 0 1

2

1

3

4

5

Similarity Graph

L Laplacian matrix

2 -1 -1 0 0

-1 2 -1 0 0

-1 -1 2 0 0

0 0 0 1 -1

0 0 0 -1 1

Property of Graph Laplacian

- L is symmetric and positive semi-definite.
- The smallest eigenvalue of L is 0, the

corresponding eigenvector is the constant one

vector 1. - L has n non-negative, real-valued eigenvalues 0

? 1 ? ? 2 ? . . . ? ? n.

L Laplacian matrix

2 -1 -1 0 0

-1 2 -1 0 0

-1 -1 2 0 0

0 0 0 1 -1

0 0 0 -1 1

2

1

3

4

5

Similarity Graph

Eigenvalue and Eigenvector of Graph Laplacian

Connected Component ? Constant Eigenvector

Example

L Laplacian matrix

2 -1 -1 0 0

-1 2 -1 0 0

-1 -1 2 0 0

0 0 0 1 -1

0 0 0 -1 1

2

1

3

4

5

Similarity Graph

Two Connected Components ? Double Zero Eigenvalue

Eigenvectors f1 1 1 1 0 0 f2 0 0 0 1 1

Example

First Two Eigenvectors

W adjacency matrix

v1 v2 v3 v4 v5 u1 u2

0 1 1 0 0

1 0 1 0 0

1 1 0 0 0

0 0 0 0 1

0 0 0 1 0

v1

v2

v3

v4

v5

1 0

1 0

1 0

0 1

0 1

2

1

3

4

5

Similarity Graph

For all block diagonal matrices, the spectrum of

L is given by the union of the spectra of Li

Spectral Clustering

First k Eigenvectors ? New Clustering Space

2

1

u1 u2

3

1 0

1 0

1 0

0 1

0 1

y1

y2

y3

y4

y5

4

5

Use k-means clustering in the new space

Similarity Graph

Spectral Clustering

Scatter plot of a 2D data set

K-means Clustering

Spectral Clustering

Spectral VAT (SpecVAT)

Reordered VAT Image

Scatter plots

SpecVAT Algorithm

1. Construct Similarity Matrix W 2. Construct

Laplacian Matrix L 3. Choose First k Eigenvectors

u1,,uk 4. Construct New Dissimilarity Matrix

D

Data

u1 u2 u3

1 0 0

1 0 0

1 0 0

0 1 0

0 1 0

y1

y2

y3

y4

y5

SpecVAT Images

Original VAT Image

SpecVAT Images with Different k

Desired Result

SpecVAT Image Analysis

Histogram of VAT Images

VAT Images

Good VAT Image? Clarity and Block Structure

SpecVAT Image Analysis

Within-Cluster

Between-Cluster

Within-Cluster Variance sW

Between-Cluster Variance sB

Desired Distribution Small sW and sB

Goodness Measurement of VAT Images

T

Test All T1255 to find the smallest sB

Within-Cluster Variance sW

Between-Cluster Variance sB

Desired Distribution Small sW and sB

Determining the Number of Clusters

Test All k1kmax to find the smallest sB

Scatter plots of S-1 data

Scatter plots of S-5 data

Visual Clustering

Scatter plot

Good Partition

Bad Partition

C1

C2

C3

C1

C2

C3

Visual Clustering

Scatter plot

Good Partition

Bad Partition

C1

C2

C3

C1

C2

C3

Visual Clustering

Scatter plot

Good Partition

Bad Partition

Dark within-region and Bright between -region

Visual Clustering

Scatter plot

Good Partition

Dark within-region and Bright between -region

Genetic Algorithm is Applied in Paper

Result VAT Images

S-1

S-2

S-3

Scatter plots

Original VAT Images

SpecVAT Images

Result VAT Images

S-4

S-5

S-6

Scatter plots

Original VAT Images

SpecVAT Images

Result VAT Images

S-4

S-5

S-6

Scatter plots

Original VAT Images

SpecVAT Images

Results

Results

27 L. Zelnik-Manor and P. Perona. Self-tuning

spectral clustering. In Proc. Advances in Neural

Information Processing Systems, 2004.

Results

Conclusions

- The VAT is enhanced by using spectral analysis.
- Based on SpecVAT, the cluster structure can be

estimated by visual inspection. Number of

clusters can be automatically estimated.