Clustering - PowerPoint PPT Presentation

About This Presentation
Title:

Clustering

Description:

k-Means, hierarchical clustering, Self-Organizing Maps Self Organizing Map Neighborhood function to preserve topological properties of the input space Neighbors share ... – PowerPoint PPT presentation

Number of Views:459
Avg rating:3.0/5.0
Slides: 39
Provided by: Piete6
Category:

less

Transcript and Presenter's Notes

Title: Clustering


1
Clustering
  • k-Means,
  • hierarchical clustering,
  • Self-Organizing Maps

2
Outline
  • k-means clustering
  • Hierarchical clustering
  • Self-Organizing Maps

3
Classification vs. Clustering
Classification Supervised learning
4
Classification vs. Clustering
labels unknown
Clustering Unsupervised learning No labels, find
natural grouping of instances
5
Many Clustering Applications
  • Basically, everywhere where labels are
    unknown/uncertain/too expensive
  • Marketing find groups of similar customers
  • Astronomy find groups of similar stars, galaxies
  • Earthquake studies cluster earth quake
    epicenters along continent faults
  • Genomics find groups of genes with similar
    expressions

6
Clustering Methods Terminology
Non-overlapping
Overlapping
7
Clustering Methods Terminology
Bottom-up (agglomerative)
Top-down
8
Clustering Methods Terminology
Hierarchical
(vs flat)
9
Clustering Methods Terminology
Deterministic
Probabilistic
(vs flat)
10
k-Means Clustering
11
K-means clustering (k3)
Pick k random points initial cluster centers
12
K-means clustering (k3)
Assign each point to nearest cluster center
13
K-means clustering (k3)
Move cluster centers to mean of each cluster
14
K-means clustering (k3)
Reassign points to nearest cluster center
15
K-means clustering (k3)
Repeat step 3-4 until cluster centers converge
(dont/hardly move)
16
K-means
  • Works with numeric data only
  • Pick k random points initial cluster centers
  • Assign every item to its nearest cluster center
    (e.g. using Euclidean distance)
  • Move each cluster center to the mean of its
    assigned items
  • Repeat steps 2,3 until convergence (change in
    cluster assignments less than a threshold)

17
K-means clustering another example
http//www.youtube.com/watch?featureplayer_embedd
edvBVFG7fd1H30
18
Discussion
  • Result can vary significantly depending on
    initial choice of centers
  • Can get trapped in local minimum
  • Example
  • To increase chance of finding global optimum
    restart with different random seeds

19
Discussion, circular data
  • Arbitrary results
  • Prototypes not on data

20
K-means clustering summary
  • Advantages
  • Simple, understandable
  • Instances automatically assigned to clusters
  • Fast
  • Disadvantages
  • Must pick number of clusters beforehand
  • All instances forced into a single cluster
  • Sensitive to outliers
  • Random algorithm
  • Random results
  • Not always intuitive
  • Higher dimensions

21
K-means variations
  • k-medoids instead of mean, use medians of each
    cluster
  • Mean of 1,3,5,7,1009 is
  • Median of 1,3,5,7,1009 is
  • For large databases, use sampling

205
5
22
How to choose k?
  • One important parameter k, but how to choose?
  • Domain dependent, we simply want k clusters
  • Alternative repeat for several values of k and
    choose the best
  • Example
  • cluster mammals properties
  • each value of k leads to a different clustering
  • use an MDL-based encoding for the data in
    clusters
  • each additional clusterintroduces a penalty
  • optimal for k 6

23
Clustering Evaluation
  • Manual inspection
  • Benchmarking on existing labels
  • Classification through clustering
  • Is this fair?
  • Cluster quality measures
  • distance measures
  • high similarity within a cluster, low across
    clusters

24
Hierarchical Clustering
25
Hierarchical clustering
  • Hierarchical clustering represented in
    dendrogram
  • tree structure containing hierarchical clusters
  • individual clusters in leafs, union of child
    clusters in nodes

26
Bottom-up vs top-down clustering
  • Bottom up/Agglomerative
  • Start with single-instance clusters
  • At each step, join two closest clusters
  • Top down
  • Start with one universal cluster
  • Split in two clusters
  • Proceed recursively on each subset

27
Distance Between Clusters
  • Centroid distance between centroids
  • Sometimes hard to compute (e.g. mean of
    molecules?)
  • Single Link smallest distance between points
  • Complete Link largest distance between points
  • Average Link average distance between points

28
Clustering dendrogram
29
How many clusters?
30
Probability-based Clustering
  • Given k clusters, each instance belongs to all
    clusters (instead of a single one), with a
    certain probability
  • mixture model set of k distributions (one per
    cluster)
  • also each cluster has prior likelihood
  • If correct clustering known, we know parameters
    and P(Ci) for each cluster calculate P(Cix)
    using Bayes rule
  • How to estimate the unknown parameters?

31
Self-Organising Maps
32
Self Organizing Map
  • Group similar data together
  • Dimensionality reduction
  • Data visualization technique
  • Similar to neural networks
  • Neurons try to mimic the input vectors
  • The winning neuron (and its neighborhood) wins
  • Topology preserving, usingNeighborhood function

33
Self Organizing Map
  • Input high-dimensional input space
  • Output low dimensional (typically 2 or 3)
  • network topology
  • Training
  • Starting with a large learning rate and
    neighborhood size, both are gradually decreased
    to facilitate convergence
  • After learning, neurons with similar weights
    tend to cluster on the map

34
Learning the SOM
  • Determine the winner (the neuron of which the
    weight vector has the smallest distance to the
    input vector)
  • Move the weight vector w of the winning neuron
    towards the input i

35
SOM Learning Algorithm
  • Initialise SOM (random, or such that dissimilar
    input is mapped far apart)
  • for t from 0 to N
  • Randomly select a training instance
  • Get the best matching neuron
  • calculate distance, e.g.
  • Scale neighbors
  • Which? decrease over time Hexagons, squares,
    Gaussian,
  • Update of neighbors towards the training instance

36
Self Organizing Map
  • Neighborhood function to preserve topological
    properties of the input space
  • Neighbors share the prize (postcode lottery
    principle)

37
SOM of hand-written numerals
38
SOM of countries (poverty)
39
Clustering Summary
  • Unsupervised
  • Many approaches
  • k-means simple, sometimes useful
  • k-medoids is less sensitive to outliers
  • Hierarchical clustering works for symbolic
    attributes
  • Self-Organizing Maps
  • Evaluation is a problem
Write a Comment
User Comments (0)
About PowerShow.com