DENCLUE 2'0: Fast Clustering based on Kernel Density Estimation

About This Presentation

Title:

DENCLUE 2'0: Fast Clustering based on Kernel Density Estimation

Description:

DENCLUE 2.0: Fast Clustering based on Kernel Density Estimation. Alexander Hinneburg. Martin-Luther-University Halle-Wittenberg, Germany. Hans-Henning Gabriel ... – PowerPoint PPT presentation

Number of Views:881

Avg rating:3.0/5.0

Slides: 19

Provided by: alexander125

Category:

more less

Transcript and Presenter's Notes

Title: DENCLUE 2'0: Fast Clustering based on Kernel Density Estimation

1
DENCLUE 2.0 Fast Clustering based on Kernel
Density Estimation

Alexander Hinneburg
Martin-Luther-University Halle-Wittenberg,
Germany
Hans-Henning Gabriel
101tec GmbH, Halle, Germany

2
Overview

Density-based clustering and DENCLUE 1.0
Hill climbing as EM-algorithm
Identification of local maxima
Applications of general EM-acceleration
Experiments

3
Density-Based Clustering

Assumption
clusters are regions of high density in the data
space ,
How to estimate density?
parametric models
mixture models
non-parametric models
histogram
kernel density estimation

4
Kernel Density Estimation

Idea
influence of a data point is modeled by a kernel
density is the normalized sum of all kernels
smoothing parameter h

Gaussian Kernel
Density Estimate
5
DENCLUE 1.0 Framework

Clusters are defined by local maxima of the
density estimate
find all maxima by hill climbing
Problem
const. step size

Gradient
Hill Climbing
const. step size
6
Problem of const. Step Size

Not efficient
many unnecessary small steps
Not effective
does not converge to a local maximumjust comes
close
Example

7
New Hill Climbing Approach

General approach
differentiate density estimate and set to zero
no solution, but can be used for iteration

8
New DENCLUE 2.0 Hill Climbing

Efficient
automatically adjusted step size at no extra
costs
Effective
converges to local maximum (proof follows)
Example

9
Proof of Convergence

Cast the problem of maximizing kernel denstiy
as maximizing the likelihood of a mixture
model

Introduce hidden variable

10
Proof of Convergence

Complete likelihood is maximized by EM-Algorithm
this also maximizes the original likelihood,
which is the kernel density estimate
When starting the EM with we do the
hill climbing for

E-Step
M-Step
11
Identification of local Maxima

EM-Algorithm iterates until
reached end point
sum of k last step sizes
Assumption
true local maximum is in a ball of around
Points with end points
closerbelong to the same maximum M
In case of non-unique assignmentdo a few extra
EM iterations

12
Acceleration

Sparse EM
update only the p points with largest posterior
saves 1-p of kernel computations after first
iteration
Data Reduction
use only p of the data as representative points
random sampling
kMeans

13
Experiments

Comparison of DENCLUE 1.0 (FS) vs. 2.0 (SSA)

16-dim. artificial data
both methods are tuned to find the correct
clustering

14
Experiments

Comparison of acceleration methods

15
Experiments

Clustering quality (normalized mutual
information, NMI) vs. sample size (RS)

16
Experiments

Cluster Quality (NMI) of DENCLUE 2.0 (SSA) and
acceleration methods and k-Means on real data

sample sizes 0.8, 0.4, 0.2
17
Conclusion

New hill climbing for DENCLUE
Automatic step size adjustment
Convergence proof by reduction to EM
Allows the application of general EM
accelerations
Future work
automatic setting of smoothing parameter h(so
far tuned manually)

18
Thank you for your attention!

Write a Comment

User Comments (0)

About PowerShow.com

DENCLUE 2'0: Fast Clustering based on Kernel Density Estimation - PowerPoint PPT Presentation

DENCLUE 2'0: Fast Clustering based on Kernel Density Estimation

DENCLUE 2.0: Fast Clustering based on Kernel Density Estimation. Alexander Hinneburg. Martin-Luther-University Halle-Wittenberg, Germany. Hans-Henning Gabriel ... – PowerPoint PPT presentation