Analyzing and Improving Local Search: kmeans and ICP - PowerPoint PPT Presentation

1 / 106

About This Presentation

Title:

Analyzing and Improving Local Search: kmeans and ICP

Description:

Some 1 e approximation algorithms known: Example running times: ... Then modify X: Add 1 dimension, O(k) points, O(1) clusters ... – PowerPoint PPT presentation

Number of Views:84

Avg rating:3.0/5.0

Slides: 107

Provided by: Davi826

Category:

more less

Transcript and Presenter's Notes

Title: Analyzing and Improving Local Search: kmeans and ICP

1
Analyzing and Improving Local Search k-means
and ICP

David Arthur
Special University Oral Exam
Stanford University

2
What is this talk about?

Two popular but poorly understood algorithms
Fast in practice
Nobody knows why
Find (highly) sub-optimal solutions
The big questions
What makes these algorithms fast?
How can the solutions they find be improved?

3
What is k-means?

Divides point-set into k tightly-packed
clusters

k3
4
What is ICP (Iterative Closest Point)?

Finds a subset of A similar to B

A

B
5
Main outline

Focus on k-means in talk
Goals
Understand running time
Harder than you might think!
Worst case exponential
Smoothed polynomial
Find better clusterings
k-means (modification of k-means)
Provably near-optimal
In practice faster and more accurate than the
competition

6
Main outline

What is k-means?
k-means worst-case complexity
k-means smoothed complexity
k-means

7
Main outline

What is k-means?
What exactly is being solved?
Why this algorithm?
How does it work?
k-means worst-case complexity
k-means smoothed complexity
k-means

8
The k-means problem

Input
An integer k
A set X of n points in Rd
Task
Partition the points into k clusters C1, C2,, Ck
Also choose centers c1, c2,, ck for the
clusters

k3
9
The k-means problem

Input
An integer k
A set X of n points in Rd
Task
Partition the points into k clusters C1, C2,, Ck
Also choose centers c1, c2,, ck for the
clusters
Minimize objective function
where c(x) is the center of the cluster
containing x
Similar to variance

k3
10
The k-means problem

Problem is NP-hard
Even when k 2 (Drineas et al., 04)
Even when d 2 (Mahajan et al., 09)
Some 1e approximation algorithms known
Example running times
O(n kk2 e (2d1)k logk1(n) logk1(1/e))
(Har-Peled and Mazumdar, 04)
O(2(k/e)O(1)dn)
(Kumar et al., 04)
All exponential (or worse) in k

11
The k-means problem

An example real-world data set
From the UC-Irvine Machine Learning Repository
Looking to detect malevolent network connections
n494,021
k100
d38
1e approximation algorithms too slow for this!

12
k-means method

The fast way
k-means method (Lloyd 82, MacQueen 67)
By far the most popular clustering algorithm
used in scientific and industrial applications
(Berkhin, 02)

13
k-means method

Start with k arbitrary centers ci
(In practice chosen at random from data points)

14
k-means method

Start with k arbitrary centers ci
(In practice chosen at random from data points)

15
k-means method

Start with k arbitrary centers ci
Assign points to clusters Ci based on closest ci

16
k-means method

Start with k arbitrary centers ci
Assign points to clusters Ci based on closest ci

17
k-means method

Start with k arbitrary centers ci
Assign points to clusters Ci based on closest ci

18
k-means method

Start with k arbitrary centers ci
Assign points to clusters Ci based on closest ci
Set each ci to be the center of mass of Ci

19
k-means method

Start with k arbitrary centers ci
Assign points to clusters Ci based on closest ci
Set each ci to be the center of mass of Ci

20
k-means method

Start with k arbitrary centers ci
Assign points to clusters Ci based on closest ci
Set each ci to be the center of mass of Ci
Repeat the last two steps until stable

21
k-means method

Start with k arbitrary centers ci
Assign points to clusters Ci based on closest ci
Set each ci to be the center of mass of Ci
Repeat the last two steps until stable

22
k-means method

Start with k arbitrary centers ci
Assign points to clusters Ci based on closest ci
Set each ci to be the center of mass of Ci
Repeat the last two steps until stable

23
k-means method

Start with k arbitrary centers ci
Assign points to clusters Ci based on closest ci
Set each ci to be the center of mass of Ci
Repeat the last two steps until stable

24
k-means method

Start with k arbitrary centers ci
Assign points to clusters Ci based on closest ci
Set each ci to be the center of mass of Ci
Repeat the last two steps until stable

25
k-means method

Start with k arbitrary centers ci
Assign points to clusters Ci based on closest ci
Set each ci to be the center of mass of Ci
Repeat the last two steps until stable

26
k-means method

Start with k arbitrary centers ci
Assign points to clusters Ci based on closest ci
Set each ci to be the center of mass of Ci
Repeat the last two steps until stable

27
k-means method

Start with k arbitrary centers ci
Assign points to clusters Ci based on closest ci
Set each ci to be the center of mass of Ci
Repeat the last two steps until stable

28
k-means method

Start with k arbitrary centers ci
Assign points to clusters Ci based on closest ci
Set each ci to be the center of mass of Ci
Repeat the last two steps until stable

29
k-means method

Start with k arbitrary centers ci
Assign points to clusters Ci based on closest ci
Set each ci to be the center of mass of Ci
Repeat the last two steps until stable

30
k-means method

Start with k arbitrary centers ci
Assign points to clusters Ci based on closest ci
Set each ci to be the center of mass of Ci
Repeat the last two steps until stable

31
k-means method

Start with k arbitrary centers ci
Assign points to clusters Ci based on closest ci
Set each ci to be the center of mass of Ci
Repeat the last two steps until stable

Clustering is stable k-means terminates!
32
What is known already

Number of iterations
Finite!
Each iteration decreases f
In practice
Sub-linear (Duda, 00)
Worst-case
O(n) (Har-Peled and Sadri, 04)
min(O(n3kd), O(kn)) (Inaba et al., 00)
Very large gap!

33
What is known already

Accuracy
Only finds a local optimum
Local optimum can be arbitrarily bad
i.e., f / fOPT unbounded
Even with high probability in 1 dimension
Even in natural examples well separated
Gaussians

34
Main outline

What is k-means?
k-means worst-case complexity
Number of iterations can be
Super-polynomial!
k-means smoothed complexity
k-means
How slow is the k-means method? (Arthur and
Vassilvitskii, 06)

35
Worst-case overview

Recursive construction
Start with input X (goes from A to B in T
iterations)
Then modify X
Add 1 dimension, O(k) points, O(1) clusters
Old part of input still goes from A to B in T
iterations
New part Resets everything once

A
B
A
B
Reset
T
T
36
Worst-case overview

Recursive construction
Repeat m times
O(m2) points
O(m) clusters
2m iterations
Lower bound follows

37
Recursive construction (Overview)
Ci
The original input X (Data points not shown)
Start with an arbitrary input...
38
Recursive construction (Overview)
Ci
G
G
H
H
H
H
... and add O(1) clusters, O(k) points along a
new dimension Note the symmetry!
39
Recursive construction (Trace t0)
Ci
G
H
H
Zoomed in, showing only one side We trace k-means
from here
40
Recursive construction (Trace t0...T)
Ci
G
H
H
New points are far away New clusters are stable
while k-means works on old points
41
Recursive construction (Trace tT1)Assigning
points to clusters
Ci
G
H
H
42
Recursive construction (Trace tT1)Assigning
points to clusters
Ci
G
pi
H
H
Choose pi to be direct lift of final Ci center
At time T1 pi closer to joining Ci than ever
before Can position G so pi joins Ci at time T1
43
Recursive construction (Trace tT1)Assigning
points to clusters
Ci
G
pi
H
H
Choose pi to be direct lift of final Ci center
At time T1 pi closer to joining Ci than ever
before Can position G so pi joins Ci at time T1
44
Recursive construction (Trace tT1)Assigning
points to clusters
Ci
G
pi
H
H
Choose pi to be direct lift of final Ci center
At time T1 pi closer to joining Ci than ever
before Can position G so pi joins Ci at time T1
45
Recursive construction (Trace tT1)Recomputing
centers
Ci
G
H
H
Center of G moves further away Centers of Ci
constant by symmetry
46
Recursive construction (Trace tT1)Recomputing
centers
Ci
G
H
H
Center of G moves further away Centers of Ci
constant by symmetry
47
Recursive construction (Trace tT2) Assigning
points to clusters
Ci
G
qi
H
H
Gs center is far away it loses points Each qi
switches to Ci regardless of qis position in the
base space
48
Recursive construction (Trace tT2) Assigning
points to clusters
Ci
G
qi
H
H
Gs center is far away it loses points Each qi
switches to Ci regardless of qis position in the
base space
49
Recursive construction (Trace tT2) Assigning
points to clusters
Ci
G
qi
H
H
Gs center is far away it loses points Each qi
switches to Ci regardless of qis position in the
base space
50
Recursive construction (Trace tT2)Recomputing
centers
Ci
G
Centers reset to t0
H
H
Symmetry Centers of Ci not lifted towards
G Choose qis position to reset Ci in base space
51
Recursive construction (Trace tT2)Recomputing
centers
Ci
G
Centers reset to t0
H
H
Symmetry Centers of Ci not lifted towards
G Choose qis position to reset Ci in base space
52
Recursive construction (Trace tT3)Assigning
points to clusters
Ci
G
Centers reset to t0
H
H
H has moved closer to pi, qi but Ci has
not Position H so pi, qi switch to H now
53
Recursive construction (Trace tT3)Assigning
points to clusters
Ci
G
H
Same state as t1
H
H has moved closer to pi, qi but Ci has
not Position H so pi, qi switch to H now
54
Recursive construction (Trace tT3)Assigning
points to clusters
Ci
G
H
Same state as t1
H
H has moved closer to pi, qi but Ci has
not Position H so pi, qi switch to H now
55
Recursive construction (Trace tT3)Recomputing
centers
Ci
G
H
Same state as t1
H
56
Recursive construction (Trace tT3)Recomputing
centers
Ci
G
H
Same state as t1
H
57
Recursive construction (Trace tT4)Assigning
points to clusters
Ci
G
H
Same state as t1
H
58
Recursive construction (Trace tT4)Assigning
points to clusters
Ci
G
H
Same state as t2
H
59
Recursive construction (Trace tT4)Assigning
points to clusters
Ci
G
H
Same state as t2
H
60
Recursive construction (Trace tT4)Recomputing
centers
Ci
G
H
Same state as t2
H
61
Recursive construction (Trace tT4)Recomputing
centers
Ci
G
H
Same state as t2
H
We are done! New clusters are completely
stable T-2 more iterations needed for Ci Total
time 2T 2
62
Worst-case complexity summary

k-means can require iterations
For random centers even with high probability
Even when d2 (Vattani, 09)
(d1 is open)

63
Worst-case complexity summary

k-means can require iterations
For random centers even with high probability
Even when d2 (Vattani, 09)
(d1 is open)
ICP
Can require ?(n/d)d iterations
Similar (but easier) argument

64
Main outline

What is k-means?
k-means worst-case complexity
k-means smoothed complexity
Take an arbitrary input, but randomly perturb it
Expected number of iterations is polynomial
Works for any k, d
k-means
k-means has smoothed polynomial complexity
(Arthur, Manthey, and Röglin, 09)

65
The problem with worst-case complexity

What is the problem?
k-means has bad worst-case complexity
But is not actually slow in practice
Need a different model to understand real world
Simple explanations
Average case
Real-world data is not random

66
The problem with worst-case complexity

A better explanation
Smoothed analysis (Spielman and Teng, 01)
Between average case and worst case
Perturb each point by normal distribution,
variance s2
Show expected running time is poly in n, D/s
D diameter of point-set

67
Proof overview

Recall the potential function
X is set of all data points
c(x) is corresponding cluster center

68
Proof overview

Recall the potential function
X is set of all data points
c(x) is corresponding cluster center
Bound f
f nD2 initially
Will prove f very likely to drop e2 each iteration

69
Proof overview

Recall the potential function
X is set of all data points
c(x) is corresponding cluster center
Bound f
f nD2 initially
Will prove f very likely to drop e2 each
iteration
Gives of iterations is at most n(D/e)2

70
The easy approach

Do union bound over all possible k-means steps
What defines a step?
Original clustering A ( kn choices)
Actually n3kd choices (Inaba et al., 00)
Resulting clustering B
Total number of possible steps n6kd
Probability a fixed step can be bad
Bounded by probability that A and B have
near-identical f
Probability (e/s)d

71
The easy approach

The argument
Pk-means takes more than n(D/e)2 iterations
PThere exists a possible bad step
( of possible steps) Pstep is bad
n6kd (e/s)d
small... if e lt s (1/n)O(k)
Resulting bound nO(k) iterations
Not polynomial!
(Arthur and Vassilitskii, 06), (Manthey and
Röglin, 09)

72
How can this be improved?

Union bound is wasteful!
These two k-means steps can be analyzed together

73
How can this be improved?

Union bound is wasteful!
These two k-means steps can be analyzed together

If point is not equidistant between centers,
potential drops. True for both pictures.
74
How can this be improved?

Union bound is wasteful!
These two k-means steps can be analyzed together

Other clusters do not matter...
75
How can this be improved?

Union bound is wasteful!
These two k-means steps can be analyzed together

And for the relevant clusters, only the center
matters, not the exact points.
76
How can this be improved?

A transition blueprint
Which points switched clusters
Approximate positions for relevant centers
Bonus Most approximate centers determined by
above!
Not obvious facts
m number of points switching clusters
transition blueprints (nk2)m (D/e)O(m)
Pblueprint is bad (e/s)m
(for most blueprints)

77
A good approach

The new argument
Pk-means takes more than n(D/e)2 iterations
PThere exists a possible bad blueprint
( of possible blueprints) Pblueprint is
bad
(nk2)m (D/e)O(m) (e/s)m
small... if e lt s (s/nD)O(1)
Resulting bound polynomial iterations!

78
Smoothed complexity summary

Smoothed complexity is polynomial
Still have work to do O(n26)
Getting tight exponents in smoothed analysis is
hard
Original theorem for Simplex O(n96)!
ICP
Also polynomial smoothed complexity
Much easier argument!

79
Main outline

What is k-means?
k-means worst-case complexity
k-means smoothed complexity
k-means
Whats wrong with k-means?
Whats k-means?
O(log k)-competitive with OPT
Experimental results
k-means The advantages of careful seeding
(Arthur and Vassilvitskii, 07)

80
Whats wrong with k-means?

Recall
Only finds a local optimum
Local optimum can be arbitrarily bad
i.e., f / fOPT unbounded
Even with high probability in 1 dimension
Even in natural examples well separated
Gaussians

81
Whats wrong with k-means?

k-means locally optimizes a clustering
But can miss the big picture!

If the data set has well separated clusters...
82
Whats wrong with k-means?

k-means locally optimizes a clustering
But can miss the big picture!

... and we do the standard approach (choose
initial centers uniformly at random) it is easy
to get two centers in one cluster...
83
Whats wrong with k-means?