1 / 49

Clustering and Self-Organizing Feature Map

- KAIST ????
- ? ??

Clustering

Introduction

- Cluster
- Group of the similar objects
- Clustering
- Special method of classification
- Unsupervised learning no predefined classes

What is Good Clustering?

- High Intra-cluster similarity
- Dissimilar to the objects in other clusters
- Low Inter-cluster similarity
- Similar to one another within the same cluster
- ?Depending on the similarity measure

The problem of unsupervised clustering

- Nearly identical to that of distribution

estimation for classes with multi-modal features

Example of 4 data sets with the same mean and

covariance

Similarity Measures

- The distance between them
- If the Euclidean distance between them is less

than some threshold distance d0, - same cluster

Scaling Axes Effect

Normalization

- To achieve invariance, normalize the data
- Subtracting the mean and dividing by the standard

deviation - Inappropriate if the spread is due to the

presence of subclasses

Similarity function

- Similarity function between vectors s(x, x)
- Using the angle between two vectors, normalized

inner product may be an appropriate similarity

function.

Tanimoto coefficient

- Using binary values
- The ratio of the number of shared attributes to

the number possessed by x or x

Distance between Sets

Criterion function for clustering

- Sum-of-Squared-Error criterion
- Also called minimum variance patition
- Problem when natural grouping have very

different numbers of points - General form

Category of Clustering Method

- Iterative optimization
- Move randomly selected point to other cluster if

it improves - Hierarchical Clustering
- Group objects into a tree of clusters
- AGNES(Agglomerative Nesting)
- DIANA(Divisible Analysis)
- Partitioning Clustering
- Construct a partition of object V into a set of k

clusters (k user input parameter) - K-means
- K-medoids

Hierarchical Clustering

- Sequence of partitions of N samples into C

clusters - 1. N clusters
- 2. Merge nearest two clusters to make total N-1

cluster - 3. do until the number of clusters C
- Dendrogram
- Agglomerative bottom up
- divisive top down

Hierarchical Method

agglomerative(AGNES)

divisible(DIANA)

Hierarchical Method

- Algorithm for Agglomerative
- Input Set V of objects
- Put each object in a cluster
- Loop until the number of cluster is one
- Calculate the set of inter-cluster similarity
- Form merge by the fusion of the most similar

pair of current clusters

Hierarchical Method

- Similarity Method
- Single-Linkage
- Complete-Linkage
- Average-Linkage

Nearest-neighbor algorithm

- When dmin is used, the algorithm is called the

neirest neighbor algorithm - If it is terminated when the distance between

nearest clusters exceeds an arbitrary threshold,

it is called single-linkage algorithm - generate a minimal spanning tree
- Chaining effect defect of this distance measure

(right)

K-means

- Use gravity center of the objects
- Algorithm
- Input k(the number of cluster), Set V of n

objects - Output A set of k clusters which minimizes the

sum of distance error criterion - Method
- Choose k objects as the initial cluster centers

set i0 - Loop
- For each object v
- Find the NearestCenter(i)(p), and assign p

to it - Compute mean of cluster as center
- Pro quick convergence
- Con sensitive to noise, outlier and initial

seed selection

K-means sensitive to initial point

K-means clustering

- Choose k for V

K-means clustering

- Assign each object to the cluster to which it is

the closest - Compute the center
- of each cluster

K-means clustering

- Reassign subjects to the cluster whose centroid

is nearest

Graph Theoretic Approach

- Removal of inconsistent edges

Self-Organizing Feature Maps

- Clustering method based on competitive learning
- one neuron per group is fired at any one time
- winner-takes-all, winning neuron
- winner-takes-all by lateral inhibitory connection

Self-Organizing Feature Map

- neurons placed at the nodes of lattice
- one or two dimension
- neurons become selectively tuned to input

patterns (stimuli) - by a competitive learning process
- locations of neuron so tuned to be ordered
- formation of topographic map of input pattern
- Spatial locations of neurons in lattice ?

intrinsic statistical features contained in input

patterns - SOM is non-linear generalization of PCA

SOFM motivated by human brain

- Brain is organized such a way that different

sensory data is represented by topologically

ordered computational maps - tactile, visual, acoustic sensory input are

mapped onto areas of cerebral cortex in

topologically ordered manner - building block of information processing

infrastructure of nervous system

SOFM motivated by human brain

- Neurons transform input signals into a

place-coded probability distribution - sites of maximum relative activities within the

map - accessed by higher-order processors with simple

connection - each incoming information is kept in its proper

context - Neurons dealing closed related information are

close together so that they connected via short

connections

Kohonen Model

- Captures essential features of computational maps

in Brain - capable of dimensionality reduction

Kohonen Model

- Transform incoming signal pattern into discrete

map - of 1-D or 2-D
- adaptively topologically ordered fashion
- Topology-preserving transformation
- class of vector coding algorithm
- optimally map into fixed number of code words
- input pattern is represented as a localized

region or spot of activities in the network - After initialization, three essential processes
- competition
- cooperation
- synaptic adaptation

Competitive Process

- Find best match of input vector with synaptic

weight - x x1, x2, , x3T
- wjwj1, wj2, , wjmT, j 1, 2,3, l
- Best matching, winning neuron
- i(x) arg min x-wj, j 1,2,3,..,l
- Determine the location where the topological

neighborhood of excited neurons is to be centered - continuous input space is mapped onto discrete

output space of neuron by competitive process

Cooperative Process

- For a winning neuron, the neurons in its

immediate neighborhood excite more than those

farther away - topological neighborhood decay smoothly with

lateral distance - Symmetric about maximum point defined by dij 0
- Monotonically decreasing to zero for dij ? 8
- Neighborhood function Gaussian case
- Size of neighborhood shrinks with time

Typical window function

Adaptive process

- Synaptic weight vector is changed in relation

with input vector - wj(n1) wj(n) ?(n) hj,i(x)(n) (x - wj(n))
- applied to all neurons inside the neighborhood of

winning neuron i - effect of moving weight wj toward input vector x
- upon repeated presentation of the training data,

weight tend to follow the distribution - Learning rate ?(n) may decay with time

SOFM algorithm

- 1.initialize ws by random number
- 2. For input x(n), find nearest cell
- i(x) argminj x(n) - wj(n)
- 3. update weights of neighbors
- wj(n1) wj(n) ? (n) hj,i(x)(n) x(n) -

wj(n) - 4. reduce neighbors and ?
- 5. Go to 2

Computer Simulation

- Input sample random numbers within 2-D unit

square - 100 neurons ( 10x10)
- Initial weights random assignment (0.01.0)
- Display
- each neuron positioned at w1, w2
- neighbors are connected by line
- next slide
- 2nd example Figure 9.8 and 9.9

SOFM Example(1) 2-D Lattice by 2-D distribution

(No Transcript)

Topologically ordered map development (2)

Topologically ordered map development (3)

Topologically ordered map development (5)

Topologically ordered map development (1D array

of Neurons)

SOFM Example(2)Phoneme Recognition

- Phonotopic maps

- Recognition result for humppila

SOFM Example(3)

- http//www-ti.informatik.uni-tuebingen.de/goepper

t/KohonenApp/KohonenApp.html - http//davis.wpi.edu/matt/courses/soms/applet.htm

l

Summary of SOM

- Continuous input space of activation patterns

that are generated in accordance with a certain

probability distribution - Topology of the network in the form of a lattice

of neurons, which defines a discrete output space - Time-varying neighborhood function defined around

winning neuron - Learning rate decrease gradually with time, but

never go to zero

Vector Quantization

- VQ data compression technique
- input space is divided into distinct regions
- reproduction vector, representative vector
- code words, code book
- Voronoi quantizer
- nearest neighbor rule on the Euclidean metric
- Learned Vector Quantization
- a supervised learning technique
- move Voronoi vector slightly in order to improve

classification decision quality

Voronoi Tesselation

Learned Vector Quantization

- Suppose wc is the closest to input xi.
- Let Cw be the class of wc
- Let Cxi be the class label of xi
- If Cw Cxi, then
- wc(n1) wc(n) ?nxi - wc(n)
- otherwise
- wc(n1) wc(n) - ?nxi - wc(n)
- the other Voronoi vectors are not modified

Adaptive Pattern Classification

- Combination of Feature extraction and

classification - Feature extraction
- unsupervised by SOFM
- essential information contents of input data
- data reduction / dimension reduction effect
- Classification
- supervised scheme such as MLP