Simultaneous Tensor Subspace Selection and Clustering: The Equivalence of High Order SVD and KMeans - PowerPoint PPT Presentation

1 / 29

About This Presentation

Title:

Simultaneous Tensor Subspace Selection and Clustering: The Equivalence of High Order SVD and KMeans

Description:

Simultaneous Tensor Subspace Selection and Clustering: The Equivalence of High ... (2) (U,V) in HOSVD is the same (U,V) in 2DSVD (Global Consistence Lemma) 7/29/09 ... – PowerPoint PPT presentation

Number of Views:228

Avg rating:3.0/5.0

Slides: 30

Provided by: Office20041681

Category:

more less

Transcript and Presenter's Notes

Title: Simultaneous Tensor Subspace Selection and Clustering: The Equivalence of High Order SVD and KMeans

1
Simultaneous Tensor Subspace Selection and
ClusteringThe Equivalence of High Order SVD and
K-Means Clustering(KDD08)

Shunping Huang

2
Introduction

Data are everywhere.
Some are 1-D vectors, eg. the daily temperature
in chapel hill
Some are 2-D matrices, eg. the daily temperature
in NC cities
The rest are high dimensional data, N-D data, eg.
the daily temperature, humidity, light in NC
cities
Data dimension reduction is an important topic in
data mining, machine learning, and pattern
recognition application.
At the very beginning, PCA and SVD were popular
tools for 2-D arrays of data
Recently, tensor-based methods have been
extensively studied, eg. HOSVD,2DSVD,GLRAM.

3
Introduction

The contributions of the paper which the authors
claim
Prove the equivalence of HOSVD and simultaneous
subspace selection(GLRAM/2DSVD) and K-means
clustering
Experiments demonstrate the equivalence theory
Provide a dataset quality assessment method based
on HOSVD to help to select subdatasets with
expected noise level.

4
Outline

Review SVD and PCA
Tensor
High Order SVD, 2DSVD, Tensor clustering
The equivalence theorem
Experimental results on ATT Databases
Dataset Quality Assessment and subdataset
selection.

5
Data dimension reduction of matrices

Singular Value Decomposition (SVD)
XUSVT

X(xij)MxN
U(uij)MxK
S (sij)KxK
VT (vij)KxN

p
p
p

6
Data dimension reduction of matrices

Principle Component Analysis(PCA)
An important application of SVD
XUVT

X(xij)MxN
U(uij)MxK
VT (vij)KxN

PCs
Loadings

7
Generalization to N Dimension

The limit of PCA and SVD
Methods for analyzing matrix(2-D arrays)
Not natural to apply to higher dimensional data
From 2-D to N-D (eg. 3-D)
Matrix ? Tensor
SVD/PCA ? HOSVD, 2DSVD/GLRAM
?

8
What is tensor?

Tensor
A multidimensional array

9
Tensor Mode-n Multiplication
10
Frobenius norm

Assuming A is a MxN matrix
The Frobenius norm of A is
Assuming B is a PxQxR tensor
The norm of B is

11
High Order SVD

Assuming that the data is a 3D tensor. For
example a set of images, which are of the same
size.
HOSVD factorization treats every index uniformly.
The goal of HOSVD is
U,V,W are 2D matrices and orthogonal. S is a 3D
tensor
Using explicit index, we can rewrite the formula
as

12
An illustration of HOSVD

The goal of HOSVD

W

V
S
U
X
13
GLRAM/2DSVD

Instead of treating every index equally, 2DSVD
views the 3D tensor
as
Each Xi is a 2D matrix, eg. an image.
The goal of GLRAM/2DSVD is
Or

14
An illustration of 2DSVD

The goal of 2DSVD

U
VT
X1
M1

M2
X2

Mn3
Xn3
15
Tensor Clustering

For vectors x1,,xn, we can do k-means clustering
ck is the centroid vector of cluster Ck
For tensors M1,,Mn, we can generalize it
ck is the centroid tensor of cluster Ck

16
The equivalence Theorem

The HOSVD does simultaneous 2DSVD and K-means
clustering.
(1) Solution of W in HOSVD is cluster indicator
to k-means
(2) (U,V) in HOSVD is the same (U,V) in 2DSVD
(Global Consistence Lemma)

17
Experiment on ATT Face Image Databases

10 different images of each of 40 distinct
subjects. 400 images in total.
Images were taken at different conditions
Different times
Varying lighting
Facial expression(open/close eyes,
smiling/no-smiling)
Facial details(glasses/no glasses)
Image size is 102 x 92.

18
Three methods are explored

PCAK-means clustering
Reshape each image into one vector
Images consist of a 9384x400 matrix
K-means is employed on the matrix. (K40)
2D-SVDK-means clustering
2DSVD is applied first, and 102x92 images (Xl,
l1,,400) are reduced to 30x30 dimensions(Ml ,
l1,,400).
Cluster Ml with K-means (K40)

19
Three methods are explored

HOSVD
Perform HOSVD on the 102x92x400 tensor to
simultaneous compression and clustering with
reduced dimensions 30x30x40.

After using three methods, the results are
represented as 400x40 matrices Q
Qij means image i is clustered into cluster j.
Create a new 40x40 matrix I. Each row in I
represents one subject and each column represents
one cluster

Images from one subject

subject
image

cluster
cluster
21
Visualization

PCAK-means 2DSVDK-means
HOSVD
Rows subjects
Columns clusters
Green squares show the number of images clustered
in the same cluster.

22
Data inconsistent

Ideally, the 10 images of a subject should have
the same cluster label.
But actually this is not the case.

23
Data inconsistent
24
Clustering accuracy comparison

The clustering accuracy
(The number of images that clustered into default
subject clusters) / (The total number of images)
Comparing in three image sets
(1) All 400 images
(2) 300 subset images
(3) 220 subset images
How are the subsets selected?
Running three methods on 400 images, select the
groups in which at least n (n8 and n10 in the
cases above) images are clustered into default
cluster by at least one method.
Merge them together.

25
Clustering accuracy comparison
26
Dataset quality assessment and subset selection

From the experiments we know
The figures of one person can be more similar to
others than his own images
These outliers lead to data inconsistent problem
and confuse data mining and pattern analysis
algorithms
How can we select high dimensional datasets(or
subdatasets) with fewer outliers?
HOSVD

27
Dataset quality assessment and subset selection

Subset Selection Method (on ATT dataset)
Apply HOSVD on all images
Select the subjects which at least have n images
been clustered into default subject cluster.

28
Conclusion of the paper

Firstly, the authors propose that HOSVD does
simultaneous 2DSVD and K-means clustering
Secondly, the authors provide experiments to
demonstrate the theoretical results
Finally, a HOSVD based dataset quality assessment
method is provided, to select clean datasets with
expected noise level.

29
Thanks!

Write a Comment

User Comments (0)