Unsupervised learning: Clustering - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Unsupervised learning: Clustering

Description:

Objective function that expresses our notion of interestingness for ... Define a measure of cluster compactness as the total distance from the cluster mean: ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 25
Provided by: axk
Category:

less

Transcript and Presenter's Notes

Title: Unsupervised learning: Clustering


1
Unsupervised learning Clustering
  • Ata Kaban
  • The University of Birmingham
  • http//www.cs.bham.ac.uk/axk

2
The Clustering Problem
Unsupervised Learning
Data (input)
Interesting structure (output)
  • Should contain essential traits
  • discard unessential details
  • provide a compact summary the data
  • interpretable for humans

Objective function that expresses our notion of
interestingness for this data
3
Here is some data
4
(No Transcript)
5
(No Transcript)
6
(No Transcript)
7
(No Transcript)
8
(No Transcript)
9
(No Transcript)
10
Formalising
  • Data points xn n1,2, N
  • Assume K clusters
  • Binary indicator variables zkn associated with
    each data point and cluster 1 if xn is in
    cluster k and 0 otherwise
  • Define a measure of cluster compactness as the
    total distance from the cluster mean

11
  • Cluster quality objective (the smaller the
    better)
  • Two sets of parameters - the cluster mean values
    mk and the cluster allocation indicator variables
    zkn
  • Minimise the above objective over each set of
    variables while holding one set fixed ? This is
    exactly what the K-means algorithm is doing! (can
    you prove it?)

12
  • Pseudo-code of K-means algorithm
  • Begin
  • initialize ?1, ?2, ,?K (randomly selected)
  • do classify n samples according to nearest
    ?i
  • recompute ?i
  • until no change in ?i
  • return ?1, ?2, , ?K
  • End

13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
Other forms of clustering
  • Many times, clusters are not disjoint, but a
    cluster may have subclusters, in turn having
    sub-subclusters.
  • ?Hierarchical clustering

17
  • Given any two samples x and x, they will be
    grouped together at some level, and if they are
    grouped a level k, they remain grouped for all
    higher levels
  • Hierarchical clustering ? tree representation
    called dendrogram

18
  • The similarity values may help to determine if
    the grouping are natural or forced, but if they
    are evenly distributed no information can be
    gained
  • Another representation is based on set, e.g., on
    the Venn diagrams

19
  • Hierarchical clustering can be divided in
    agglomerative and divisive.
  • Agglomerative (bottom up, clumping) start with n
    singleton cluster and form the sequence by
    merging clusters
  • Divisive (top down, splitting) start with all of
    the samples in one cluster and form the sequence
    by successively splitting clusters

20
  • Agglomerative hierarchical clustering
  • The procedure terminates when the specified
    number of cluster has been obtained, and returns
    the cluster as sets of points, rather than the
    mean or a representative vector for each cluster

21
Application to image segmentation
22
Application to clustering face images
Cluster centres face prototypes
23
The problem of the number of clusters
  • Typically, the number of clusters is known.
  • When its not, that is a hard problem called
    model selection. There are several ways of
    proceed.
  • A common approach is to repeat the clustering
    with K1, K2, K3, etc.

24
What did we learn today?
  • Data clustering
  • K-means algorithm in detail
  • How K-means can get stuck and how to take care of
    that
  • The outline of Hierarchical clustering methods
Write a Comment
User Comments (0)
About PowerShow.com