Title: Clustering of the SOM using a clustering validity index based on intercluster and intracluster densi
1Clustering of the SOM using a clustering
validity index based on inter-cluster and
intra-cluster densitySitao Wu, Tommy W. S.
ChowPattern Recognition 37 (2004)
175188Summarized by Hwang, Seong-Seob
January, 2005
2Contents
- Introduction
- Self-organizing map and clustering
- Clustering of the SOM using local clustering
validity index and preprocessing of the SOM for
filtering - Experimental results
- Conclusions
3Introduction (1/3)
- Self-organizing map (SOM)
- Proposed by Kohonen
- Many industrial applications
- pattern recognition, biological modeling, data
compression, signal processing, data mining - Unsupervised Nonparametric neural network
approach - The success of the SOM algorithm lies in its
simplicity that makes it easy to understand,
simulate and be used in many applications
4Introduction (2/3)
- The basic SOM consists of a set of neurons
usually arranged in a 2D structure - Neighborhood relations among the neurons
- After completion of training, each neuron is
attached to a feature vector of the same
dimension as the input space - Vector quantization (VQ)
- By assigning each input vector to the neuron with
the nearest feature vector, the SOM is able to
divide the input space into regions with common
nearest feature vectors - Topology preservation
- if two feature vectors are near to each other in
the input space, the corresponding neurons will
also be close in the output space - The SOM is suitable for visualization purpose
5Introduction (3/3)
- Clustering algorithms
- To organize unlabeled input vectors into clusters
or natural groups such that points within a
cluster are more similar to each other than
vectors belonging to different clusters - Applications
- exploratory pattern-analysis, grouping,
decision-making, machine-learning situations,
data mining, document retrieval, image
segmentation, pattern classification - Five types of clustering
- hierarchical clustering, partitioning clustering,
density-based clustering, grid-based clustering,
model-based clustering
6Self-organizing map and visualization (1/3)
- Competitive learning is an adaptive process
- A division of neural nodes emerges in the network
to represent different patterns of the inputs
after training - The division is enforced by competition among the
neurons - The SOM consists of M neurons located on a
regular low-dimensional grid, usually one or two
dimensional - Higher-dimensional grid ? their visualization is
problematic - The lattice of the grid is either hexagonal or
rectangle
7Self-organizing map and visualization (2/3)
- The basic SOM algorithm
- The winning neuron is the neuron with the feature
vector closest to x(t) - A set of neighboring nodes of the winning node
- The weight update
i neuron wi feature vector x(t) data
vector c winning neuron
Nc set of neighboring nodes hic(t)
neighborhood kernel function Posi coordinates
of neuron i s(t) kernel width (decreasing
monotonically) e(t) learning rate (decreasing
monotonically)
8Self-organizing map and visualization (3/3)
- Because of the neighborhood relations,
neighboring neurons are pulled to the same
direction - Feature vectors of neighboring neurons resemble
each other - The 2D map can be easily visualized and thus give
people useful information about the input data - The usual way to display the cluster structure of
the data is to use a distance matrix, such as
U-matrix - U-matrix method displays the SOM grid according
to the distance of neighboring neurons
9Clustering algorithm (1/3)
- Partitioning clustering
- Given a database of n object, Constructing k
partitions (kn) - k-means algorithm
- Each cluster is represented by the mean value of
the objects in the cluster - Advantages
- The clustering is dynamic
- Some a priori knowledge, such as cluster shapes,
can be incorporated in the clustering - Drawbacks
- It encounters difficulty at discovering clusters
of arbitrary shapes - The number of clusters is pre-fixed and the
optimal number of clusters is hard to determine
10Clustering algorithm (2/3)
- Hierarchical clustering
- An hierarchical decomposition of the given
dataset - It can be classified as either agglomerative or
divisive - Advantage
- It is not affected by initialization and local
minima - Shortcomings
- It is impractical for large data sets due to the
high-computational complexity - It does not incorporate any a priori knowledge
such as cluster shapes - The clustering is static
11Clustering algorithm (3/3)
- Four types of definitions of inter-cluster
distance
12Clustering of SOM
- The two-level approach of clustering of the SOM
- Different symbols on the map represent different
clusters - The clustering algorithms can be used in
clustering the output neurons of SOM - If the clusters have nonspherical shapes?
- Partitioning clustering (X), Hierarchical
clustering (O) - The extended SOM (minimum distance variance)
13Global clustering validity index (1/4)
- Evaluation criteria are needed to justify the
correctness of a partition - The index is based on two accepted concepts
- A clusters compactness separation
- The implementation of most validity algorithms is
very computationally intensive (especially, very
large DB) - Dependent on the data the number of clusters
- Using sample mean of each subset vs. all points
in each subset
14Global clustering validity index (2/4)
- Intra-cluster density
- The intra-cluster density is high for
well-separated clusters
15Global clustering validity index (3/4)
- Inter-cluster density
- The density in the between cluster region is
intended to be significantly low
16Global clustering validity index (4/4)
- Clusters separation
- CDbw
- Composing Density Between and With clusters
17Merging criterion using the CDbw (1/2)
- The inter- intra-cluster density are
incorporated into merging criteria in addition to
distance information - Compute the CDbw for data belonging to each
neighboring pair of clusters - The merging mechanism is to find the pair of
clusters with minimal value of the CDbw - the two clusters have the strongest tendency to
be clustered - The advantage of the merging mechanism is that
the clustering result is more accurate, due to
the more information about the individual
clusters considered
18Merging criterion using the CDbw (2/2)
19Preprocessing before clustering of the SOM (1/2)
- Neurons with no input data assigned
- Not included in the next clustering steps
- Compute devjwj-mj, mean_dev, std_dev
- wj feature vector, mj the mean vector mj
- If devjgtmean_devstd_dev
- Exclude the neuron j for clustering later on
- This mechanism can filter out the input outliers
20Preprocessing before clustering of the SOM (2/2)
- Compute disj(xi)xi-wj, mean_disj, std_disj
- xi input vector
- If xi-wjgtmean_disjstd_disj
- Filter out the input vector xj for the next
clustering steps - This can filter out the input outliers and noises
- Compute of data belonging to the jth cluster
numj - Compute the statistical information about the
mean_num and std_dev_num for all the numjs - If numjltmean_num-std num
- Exclude the neuron j for cluster later on
- This can filter out the input noises
21Clustering of the SOM (1/2)
- After the preprocessing the clustering of the
SOM, some neurons and some input data are
excluded - The neurons and input data left can be
hierarchically clustered - Rectangular grids are used for the SOM
- The merging process happens for neighboring
clusters, which mean the neurons belonging to the
pair of clusters are direct neighbors
22Clustering of the SOM (2/2)
(a) Neuron A has eight direct neighboring
neurons B-I
(b) Multi-neuron represented neighboring
clusters 1 and 2 can be clustered into one
cluster because the two clusters are direct
neighbors
(c) Multi-neuron represented clusters 1 and 2
cannot be clustered into one cluster because the
two clusters are not direct neighbors
23The algorithm of clustering of the SOM
- Train input data by the SOM
- Preprocessing before clustering of the SOM
- Cluster SOM by using the agglomerative
hierarchical clustering - The merging criterion is made by the CDbw for all
pairs of directly neighboring clusters - Find the optimal partition of the input data
- According to the CDbw for all the input data as a
function of the number of clusters
24Experimental results
- To demonstrate the effectiveness of the proposed
clustering algorithm, four data sets were used in
our experiments - The input data are normalized such that the value
of each datum in each dimension lies in 0,1 - For training SOM authors used 100 training epochs
on the input data and the learning rate decreases
from 1 to 0.0001
25The synthetic data set in the 2D plane
- 2D data set (200)
- Three shallow elongated parallel clusters in the
2D plane - Some noises and outliers
- Preprocessing
- Using the SOM algorithm
- k-means, four different hierarchical clustering
algorithms, and the proposed algorithm to cluster
of the SOM
26Three partitions of the synthetic data set
(c) single-linkage
(a) the proposed algorithm
(b) k-means
(d) complete-linkage
(e) centroid-linkage
(f) average-linkage
272D synthetic data set
Three clusters for the synthetic data set are
displayed on the map by the proposed algorithm or
the single-linkage clustering algorithm on the SOM
SOM map size of 44
SOM map size of 66
The CDbw as a function of the number of clusters
for the synthetic data set
28Iris data set (1/3)
- Iris data
- Widely used in pattern classification
- 150 data points of four dimensions
- Three classes with 50 points each
- The first class is linearly separable from the
other two - The other two classes are overlapped to some
extent and are not linearly separable
29Iris data set (2/3)
Two clusters for the iris data set are identified
by the proposed algorithm or single-linkage
clustering algorithm on the SOM (SOM map size of
4 4)
For the known three classes, three clusters are
formed (SOM map size of 44) for the iris data set
The CDbw as a function of the number of clusters
for the iris data
(a) the proposed (b) single-linkage
algorithm
30Iris data set (3/3)
- Performance comparison of different clustering
algorithms for the iris data set
3115D synthetic data set (1/2)
- The statistical information of 20 clusters for
the 15D synthetic data set
3215D synthetic data set (2/2)
- Twenty clusters for the 15D synthetic data set
are displayed on the map by the proposed
algorithm on the SOM (SOM map size of 8 8)
33Wine data set (1/2)
- 13D data with known three classes (178 59, 71,
48)
Three clusters for the wine data set are
displayed on the map by the proposed clustering
algorithm on the SOM (SOM map size of 4 4)
The CDbw as a function of the number of clusters
for the wine data set by the proposed algorithm
on the SOM
34Wine data set (2/2)
- Performance comparison of different clustering
algorithms for the wine data set
35Conclusion (1/2)
- A new SOM-based clustering algorithm
- Using the clustering validity index locally to
determine which pair of clusters to be merged - The optimal number of clusters can be determined
by the maximum value of the CDbw, which is the
clustering validity index globally for all input
data - Compared with classical clustering methods on the
SOM, the proposed algorithm utilizes more
information about the data in each cluster in
addition to inter-cluster distances - The proposed algorithm clusters data better than
the classical clustering algorithms on the SOM
36Conclusion (2/2)
- The preprocessing steps for filtering out noises
and outliers - To increase the accuracy and robustness for
clustering of the SOM - The experimental results demonstrate that the
proposed clustering algorithm is a better
clustering algorithm than other clustering
algorithms on the SOM