Clustering Algorithms for Perceptual Image Hashing - PowerPoint PPT Presentation

1 / 13

About This Presentation

Title:

Clustering Algorithms for Perceptual Image Hashing

Description:

Embedded Signal Processing Laboratory. Dept. of Electrical and Computer Engineering ... Research supported by a gift from the Xerox Foundation ... – PowerPoint PPT presentation

Number of Views:293

Avg rating:3.0/5.0

Slides: 14

Provided by: niranjanda

Learn more at: https://users.ece.utexas.edu

Category:

more less

Transcript and Presenter's Notes

Title: Clustering Algorithms for Perceptual Image Hashing

1
Clustering Algorithms for Perceptual Image
Hashing
IEEE Eleventh DSP Workshop, August 3rd 2004
Vishal Monga, Arindam Banerjee, and Brian L.
Evans
vishal, abanerje, bevans_at_ece.utexas.edu
Embedded Signal Processing Laboratory Dept. of
Electrical and Computer Engineering The
University of Texas at Austin http//signal.ece.ut
exas.edu
Research supported by a gift from the Xerox
Foundation
2
Hash Example

Hash function Projects value from set with large
(possibly infinite) number of members to set with
fixed number of (fewer) members
Irreversible
Provides short, simple representationof large
digital message
Example sum of ASCII codes forcharacters in
name modulo N,a prime number (N 7)

Database name search example
3
Perceptual Hash Desirable Properties

Perceptual robustness
Fragility to distinct inputs
Randomization
Necessary in security applicationsto minimize
vulnerability againstmalicious attacks

4
Hashing Framework

Two-stage hash algorithm
Goal Retain perceptual significance
Let (li, lj) denote vectors in metric space of
feature vectors V and 0 lt e lt d, then it is
desired
Minimizing average distance between clusters
inappropriate

5
Cost Function for Feature Vector Compression

Define joint cost matrices C1 and C2 (n x n)
n total number of vectors be clustered, C(li),
C(lj) denote the clusters that these vectors are
mapped to
Exponential cost
Ensures severe penalty associated if feature
vectors far apart
Perceptually distinct clustered together

a gt 0, ? gt 1 are algorithm parameters
6
Cost Function for Feature Vector Compression

Define S1 as
S2 is defined similarly
Normalize to get ,
Then, minimize expected cost
p(i) p(li), p(j) p(lj)

7
Basic Clustering Algorithm

Obtain e, d, set k 1. Select the data point
associated with highest probability mass, label
it l1
Make the first cluster by including all
unclustered points lj such that D(l1, lj) lt e/2
3. k k 1. Select the highest probability data
point lk among the unclustered points such that
where
S is any cluster, C set of clusters formed
till this step
Form the kth cluster Sk by including all
unclustered points lj such that D(lk, lj) lt e/2
5. Repeat steps 3-4 until no more clusters can be
formed

8
Observations

For any (li, lj) in cluster Sk
No errors up to this stage of algorithm
Each cluster is at least e away from any other
cluster
Within each cluster, maximum distance between
any two points is at most e

9
Approach 1

Select data point l among unclustered data
points that has highest probability mass
For each existing cluster Si, i 1,2,, k
compute
Let S(d)
Si such that di d
IF S(d) F THEN k k 1. Sk l is a
cluster of its own
ELSE for each Si in S(d) define
where denotes the complement of Si i.e.
all clusters in S(d) except Si. Then, l is
assigned to the cluster S arg min F(Si)
4. Repeat steps 1 through 3 until all data points
are exhausted

10
Approach 2

Select data point l among unclustered data
points that has highest probability mass
For each existing cluster Si, i 1, 2,, k,
define
and ß lies in 1/2, 1
Here, denotes the complement of Si i.e.
all existing clusters except Si. Then, l is
assigned to the cluster S arg min F(Si)
3. Repeat steps 1 and 2 until all data points are
exhausted

11
Summary

Approach 1
Tries to minimize conditioned on
0
Approach 2
Smoothly trades off the minimization of
vs.
via the parameter ß
ß ½ ? joint minimization
ß 1 ? exclusive minimization of
Final hash length determined automatically!
Given by bits, where k is number
of clusters formed
Proposed clustering can compress feature vectors
in any metric space, e.g. Euclidean, Hamming, and
Levenshtein