On the Lower Bound of Local Optimum in KMeans Algorithm - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

On the Lower Bound of Local Optimum in KMeans Algorithm

Description:

Lower bound is derived and used to guess the potential of the current clustering ... How to lower bound the cost of any solution in the maximal region. 5/3/09 ... – PowerPoint PPT presentation

Number of Views:153
Avg rating:3.0/5.0
Slides: 33
Provided by: dcs3
Category:
Tags: algorithm | bound | how | kmeans | local | lower | optimum | to

less

Transcript and Presenter's Notes

Title: On the Lower Bound of Local Optimum in KMeans Algorithm


1
On the Lower Bound of Local Optimum in K-Means
Algorithm
  • Zhang Zhenjie, Dai Bing Tian, and Anthony K.H.
    Tung

2
Outline
  • Introduction
  • Maximal Region
  • Algorithms
  • Experiments
  • Conclusion and Future Work

3
Introduction
  • K-Means Algorithm
  • Pick k centers randomly
  • K-Means Iterations
  • Assign every point to the closest center
  • Compute the center of every cluster to replace
    the old one
  • Stop the algorithm if the centers are stable

4
Introduction (cont.)
  • Cost
  • Sum of the squared distance from every point to
    its closest center
  • Cost decreases after every k-means iteration
  • Global Optimum
  • Centers minimizing the cost
  • Local Optimum
  • Centers outputted by k-means with any initial
    centers

5
Introduction (cont.)
  • Disadvantages of Local Optimum
  • Much worse than global optimum
  • Re-run the algorithm with different initial
    centers
  • Leads to the waste of computation resource
  • Solution?
  • Find center set leading to global optimum?
  • Detect local optimum with large cost as early as
    possible? (the target of our paper)

6
Introduction (cont.)
  • A simple solution for early detection

Cost
Stop when the decrease of cost is small
after one iteration
Iteration
7
Introduction (cont.)
  • A simple solution for early detection

Cost
A much better local optimum is missed
Iteration
8
Introduction (cont.)
  • Our solution

Cost
Lower bound is derived and used to guess
the potential of the current clustering
Iteration
9
Introduction (cont.)
  • Our solution

Cost
If the yellow curve represents the current
best solution, we can stop the computation here
Iteration
10
Outline
  • Introduction
  • Maximal Region
  • Algorithms
  • Experiments
  • Conclusion and Future Work

11
Solution Space
  • Given a d-dimensional problem space, we define
    the solution space as a kd-dimensional space

c2
M1
c1
c2
c1
12
Solution Space
  • With the iterations, the center set jumps in the
    solution space

M3
M2
c2
M1
c1
c2
c1
13
Definition of Maximal Region
  • Maximal Region is a region in the solution space,
    covering the local optimum achieved by future
    iterations
  • Two problems
  • How to find such a maximal region
  • How to lower bound the cost of any solution in
    the maximal region

14
Maximal Region
The cost of center sets in solutions space
is represented by contour lines, lighter color
meaning smaller cost
c2
M2
Any solution between M1 and M2 must have
smaller cost than M1
M1
c1
15
Maximal Region
c2
M2
M1
Maximal Region of the local optimum, the
local optimum must locate in
c1
16
Maximal Region
  • A region is maximal region for center set M, if
  • It contains M
  • Any solution on the boundary of the region has
    equal cost of M

17
A Special Maximal Region
c2
M2
M1
every center moves no more than Delta
c1
18
Maximal Region
M1
m1
m2
19
Costs in Maximal Region
  • Bounding Theorem
  • Any solution in must have
    cost no less than C(M1)-DeltaN, where C(M1) is
    the cost of M1 and N is the size of the data set

20
Outline
  • Introduction
  • Maximal Region
  • Algorithms
  • Experiments
  • Conclusion and Future Work

21
Algorithm
  • New Algorithm
  • Same Initial Centers Selection
  • New Iteration
  • Reassignment
  • Computing new centers, M
  • Finding the smallest R(M,Delta)
  • Computing the lower bound in maximal region
  • Check the stopping criteria or prune the current
    procedure

22
Finding Smallest Delta
  • The value of Delta can be any float value
  • Divide the search range into N1 segments,
    0,a(1),a(1),a(2),a(N),infinity)
  • Search the segments from 0,a(1) in order
  • On every segment, solving a quadratic equations.
  • If any plausible quadratic root is found, return
    as the smallest Delta

23
Algorithm
  • Finding the smallest Delta to bound the local
    optimum in the Maximal Region
  • Sorting and Scan Algorithm
  • Complexity is O(Nlog N), N is the size of the
    data
  • Lower bounding the cost of local optimum
  • Simple computation
  • Done in O(1) time

24
Outline
  • Introduction
  • Maximal Region
  • Algorithms
  • Experiments
  • Conclusion and Future Work

25
Experiments
  • Data Set
  • Synthetic data sets and KDD99 data set
  • Original K-Means Algorithm (OKM)
  • Accelerated K-means Algorithm (AKM)
  • Run k-means clustering several times
  • The best result of the previous runs is used to
    prune the following runs

26
Experiments
  • Measurement
  • We use the same random seeds for OKM and AKM
  • iterations (I/O cost) and computation time (CPU
    cost)

27
Experiments (cont.)
  • Varying dimensionality on synthetic data sets

28
Experiments (cont.)
  • Varying k on synthetic data sets

29
Experiments (cont.)
  • Varying k on KDD99 data set

30
Conclusion and Future Work
  • Contribution
  • Lower bound of Local Optimum in K-Means
    Algorithm
  • The concept of Maximal Region
  • Algorithm for finding Maximal Region
  • Accelerate K-Means Algorithm

31
Conclusion and Future Work
  • Additional Applications
  • Data stream clustering
  • Real time cluster analysis over moving objects
  • Improvement
  • Some tighter bound
  • Extension to general clustering algorithms

32
Q A
Write a Comment
User Comments (0)
About PowerShow.com