DENCLUE 2'0: Fast Clustering based on Kernel Density Estimation - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

DENCLUE 2'0: Fast Clustering based on Kernel Density Estimation

Description:

DENCLUE 2.0: Fast Clustering based on Kernel Density Estimation. Alexander Hinneburg. Martin-Luther-University Halle-Wittenberg, Germany. Hans-Henning Gabriel ... – PowerPoint PPT presentation

Number of Views:881
Avg rating:3.0/5.0
Slides: 19
Provided by: alexander125
Category:

less

Transcript and Presenter's Notes

Title: DENCLUE 2'0: Fast Clustering based on Kernel Density Estimation


1
DENCLUE 2.0 Fast Clustering based on Kernel
Density Estimation
  • Alexander Hinneburg
  • Martin-Luther-University Halle-Wittenberg,
    Germany
  • Hans-Henning Gabriel
  • 101tec GmbH, Halle, Germany

2
Overview
  • Density-based clustering and DENCLUE 1.0
  • Hill climbing as EM-algorithm
  • Identification of local maxima
  • Applications of general EM-acceleration
  • Experiments

3
Density-Based Clustering
  • Assumption
  • clusters are regions of high density in the data
    space ,
  • How to estimate density?
  • parametric models
  • mixture models
  • non-parametric models
  • histogram
  • kernel density estimation

4
Kernel Density Estimation
  • Idea
  • influence of a data point is modeled by a kernel
  • density is the normalized sum of all kernels
  • smoothing parameter h

Gaussian Kernel
Density Estimate
5
DENCLUE 1.0 Framework
  • Clusters are defined by local maxima of the
    density estimate
  • find all maxima by hill climbing
  • Problem
  • const. step size

Gradient
Hill Climbing
const. step size
6
Problem of const. Step Size
  • Not efficient
  • many unnecessary small steps
  • Not effective
  • does not converge to a local maximumjust comes
    close
  • Example

7
New Hill Climbing Approach
  • General approach
  • differentiate density estimate and set to zero
  • no solution, but can be used for iteration

8
New DENCLUE 2.0 Hill Climbing
  • Efficient
  • automatically adjusted step size at no extra
    costs
  • Effective
  • converges to local maximum (proof follows)
  • Example

9
Proof of Convergence
  • Cast the problem of maximizing kernel denstiy
    as maximizing the likelihood of a mixture
    model
  • Introduce hidden variable

10
Proof of Convergence
  • Complete likelihood is maximized by EM-Algorithm
  • this also maximizes the original likelihood,
    which is the kernel density estimate
  • When starting the EM with we do the
    hill climbing for

E-Step
M-Step
11
Identification of local Maxima
  • EM-Algorithm iterates until
  • reached end point
  • sum of k last step sizes
  • Assumption
  • true local maximum is in a ball of around
  • Points with end points
    closerbelong to the same maximum M
  • In case of non-unique assignmentdo a few extra
    EM iterations

12
Acceleration
  • Sparse EM
  • update only the p points with largest posterior
  • saves 1-p of kernel computations after first
    iteration
  • Data Reduction
  • use only p of the data as representative points
  • random sampling
  • kMeans

13
Experiments
  • Comparison of DENCLUE 1.0 (FS) vs. 2.0 (SSA)
  • 16-dim. artificial data
  • both methods are tuned to find the correct
    clustering

14
Experiments
  • Comparison of acceleration methods

15
Experiments
  • Clustering quality (normalized mutual
    information, NMI) vs. sample size (RS)

16
Experiments
  • Cluster Quality (NMI) of DENCLUE 2.0 (SSA) and
    acceleration methods and k-Means on real data

sample sizes 0.8, 0.4, 0.2
17
Conclusion
  • New hill climbing for DENCLUE
  • Automatic step size adjustment
  • Convergence proof by reduction to EM
  • Allows the application of general EM
    accelerations
  • Future work
  • automatic setting of smoothing parameter h(so
    far tuned manually)

18
Thank you for your attention!
Write a Comment
User Comments (0)
About PowerShow.com