1 / 24

Kernel Methods for Weakly Supervised Mean Shift

Clustering

- Oncel Tuzel Fatih Porikli
- Mitsubishi Electric Research Labs
- Peter Meer
- Rutgers University

Outline

- Motivation
- Mean Shift
- Method Overview
- Kernel Mean Shift
- Constrained Kernel Mean Shift
- Experiments
- Conclusion

Motivation

- Clustering is an ambiguous task
- In many cases, the initially designed similarity

metric fails to resolve the ambiguities - Simple supervision can guide clustering to

desired structure - We present a semi supervised mean shift

clustering algorithm based on pair-wise

similarities

Mean Shift

- Given n data points xi on Rd and associated

bandwidths hi, the sample point density estimator

is given by - where k(x) is the kernel profile
- Stationary points of the density can be found via

the mean shift procedure - where

Mean Shift Clustering

- Mean shift iterations are initialized at the data

points - The cluster centers are located by the mean shift

procedure - The data points associated with the same local

maxima of the density function produce a

partitioning of the space - There is no systematic semi supervised mean shift

algorithm

Method Overview

Embedded Space

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

- The supervision is given in the form of a few

pair-wise similarity constraints - We embed the input space to a space where the

constraint pairs are associated with the same

mode - Mode seeking is performed on the embedded space
- The method preserves all the advantages of mean

shift clustering

.

.

.

.

.

.

x

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

x

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

x

.

.

.

x

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

x

.

.

x

Input Space

Pair-wise Constraints on the Input Space

- Data points are projected to the null space of

the constraint matrix - Since the constraint point pairs overlap after

projection, they are clustered together - The method fails if the clusters are not linearly

separable - At most d-1 constraints can be defined

Projection

Input Points

Constraint Vector

Clustering

Pair-wise Constraints on the Feature Space

- The method can be extended to handle increasing

number of constraints or to linearly inseparable

case using a mapping function - The mapping embeds the input space to an

enlarged feature space - The projection is performed on the feature space
- Defining mapping explicitly is not practical
- Solution Kernel Trick

Input Points

Mapping to Feature Space

Constraint Vector

Projection

Clustering

Kernel Mean Shift (Explicit Form)

- Given and a p.s.d. kernel

satisfying - where
- The density estimator at is given by
- The stationary points can be found via the mean

shift procedure

Kernel Mean Shift (Implicit Form)

- Let

be the dimensional feature matrix

and be the

dimensional Kernel matrix - At each iteration the estimate, , lies is the

column space of and any point on the subspace

can be written as - The distance between two points and is

given by - The implicit form of mean shift updates the

weighting vectors - where denote the i-th canonical basis for

Rn

Kernel Mean Shift Clustering

- The clustering algorithm starts on the data

points - Upon convergence the mode can be expressed via
- When the rank of the kernel matrix K is smaller

than n, columns of form an overcomplete basis

and the modes can be identified within an

equivalence relationship - The procedure is restricted to the subspace

spanned by the feature points therefore - The convergence of the procedure follows from the

original proof

Constrained Kernel Mean Shift

Feature Space

- Let be the set of

point pairs to be clustered together - The constraint matrix is given by
- The null space of A is the set of vectors
- and the matrix
- projects to
- Under the projection the constraint point pairs

are overlapped

Projection

Constrained Kernel Mean Shift

- The constrained mean shift algorithm implicitly

maps the data points to null space of the

constraint matrix - and performs mean shift on the embedded space
- This process is equivalent to applying kernel

mean shift algorithm with the projected kernel

function - The projected Kernel matrix only involves mapping

through the kernel function and can be

expressed in terms of original Kernel matrix - where

is the part of the Kernel matrix involving

constraint set and is the

scaling matrix

Experiments

- We conduct experiments on three datasets
- Synthetic experiments
- Clustering faces across illumination on CMU PIE

dataset - Clustering object categories on Caltech-4 dataset
- For the first two experiments we utilize Gaussian

kernel function - For the last experiment we utilize kernel

function - We use adaptive bandwidth mean shift where the

bandwidth for each point is selected as the k-th

smallest distance from the point to all the data

points on the feature space

Clustering Linear Structure

Data Points

Mean Shift

Constrained Mean Shift

- We generated 240 data points originating from six

different lines - Data is corrupted with normally distributed noise

with standard deviation 0.1 - Three pair-wise constraints are given

Clustering Circular Structure

Data Points

Data Points with Outliers

- We generated 200 data points originating from

five concentric circles - Data is corrupted with normally distributed noise

with standard deviation 0.1 - 80 outlier points are added
- Four pair-wise constraints are enforced from the

same circle

Mean Shift

Constrained Mean Shift

Clustering Faces Across Illumination

Samples from CMU PIE Dataset

Constraint Set

- Dataset contains 441 images from 21 subjects

under 21 different illumination conditions - Images are coarsely registered and scaled to the

same size 128x128 - Each image is represented with a

16384-dimensional vector - Two pair-wise similarity constraints are given

per subject - Approximately 1/10 of the dataset is labeled

Clustering Faces with Mean Shift

Pair-wise Distances

Mean Shift

- Mean shift finds 5 clusters corresponding to

partly illumination conditions, partly subject

labels

Clustering Faces with Constrained Mean Shift

Pair-wise Distances after Embedding

Constrained Mean Shift

- Constrained mean shift recovers all 21 subjects

perfectly

Clustering Object Categories

Samples from Caltech-4 Dataset

- Dataset contains 400 images from four object

categories cars, motorcycles, faces, airplanes - Each image is represented with a 500 bin feature

histogram - Pair-wise constraints are randomly selected

within classes - Experiment is repeated with varying number of

constraints (1 to 20 constraints per object class)

Clustering Object Categories with Mean Shift

Pair-wise Distances

Mean Shift

- Some of the samples from airplanes class and half

of the motorcycles class are incorrectly

identified as cars - The overall clustering accuracy is 74.25

Clustering Object Categories with Constrained

Mean Shift

Pair-wise Distances after Embedding

Constrained Mean Shift

- Clustering example after enforcing 10 constraints

per class - Only a single example among 400 is misclustered

Clustering Performance vs. Number of Constraints

- The results are averaged over 20 runs where at

each run a different constraint set is selected - Clustering accuracy is over 99 for more than 7

constraints per class

Conclusion

- We presented a novel constrained mean shift

clustering method that can incorporate pair-wise

must-link priors - The method preserves all the advantages of the

original mean shift clustering algorithm - The presented approach also extends to inner

product spaces thus, it is applicable to a wide

range of problems