Analyzing and Improving Local Search: kmeans and ICP - PowerPoint PPT Presentation

1 / 106
About This Presentation
Title:

Analyzing and Improving Local Search: kmeans and ICP

Description:

Some 1 e approximation algorithms known: Example running times: ... Then modify X: Add 1 dimension, O(k) points, O(1) clusters ... – PowerPoint PPT presentation

Number of Views:84
Avg rating:3.0/5.0
Slides: 107
Provided by: Davi826
Category:

less

Transcript and Presenter's Notes

Title: Analyzing and Improving Local Search: kmeans and ICP


1
Analyzing and Improving Local Search k-means
and ICP
  • David Arthur
  • Special University Oral Exam
  • Stanford University

2
What is this talk about?
  • Two popular but poorly understood algorithms
  • Fast in practice
  • Nobody knows why
  • Find (highly) sub-optimal solutions
  • The big questions
  • What makes these algorithms fast?
  • How can the solutions they find be improved?

3
What is k-means?
  • Divides point-set into k tightly-packed
    clusters

k3
4
What is ICP (Iterative Closest Point)?
  • Finds a subset of A similar to B

A

B
5
Main outline
  • Focus on k-means in talk
  • Goals
  • Understand running time
  • Harder than you might think!
  • Worst case exponential
  • Smoothed polynomial
  • Find better clusterings
  • k-means (modification of k-means)
  • Provably near-optimal
  • In practice faster and more accurate than the
    competition

6
Main outline
  • What is k-means?
  • k-means worst-case complexity
  • k-means smoothed complexity
  • k-means

7
Main outline
  • What is k-means?
  • What exactly is being solved?
  • Why this algorithm?
  • How does it work?
  • k-means worst-case complexity
  • k-means smoothed complexity
  • k-means

8
The k-means problem
  • Input
  • An integer k
  • A set X of n points in Rd
  • Task
  • Partition the points into k clusters C1, C2,, Ck
  • Also choose centers c1, c2,, ck for the
    clusters

k3
9
The k-means problem
  • Input
  • An integer k
  • A set X of n points in Rd
  • Task
  • Partition the points into k clusters C1, C2,, Ck
  • Also choose centers c1, c2,, ck for the
    clusters
  • Minimize objective function
  • where c(x) is the center of the cluster
    containing x
  • Similar to variance

k3
10
The k-means problem
  • Problem is NP-hard
  • Even when k 2 (Drineas et al., 04)
  • Even when d 2 (Mahajan et al., 09)
  • Some 1e approximation algorithms known
  • Example running times
  • O(n kk2 e (2d1)k logk1(n) logk1(1/e))
  • (Har-Peled and Mazumdar, 04)
  • O(2(k/e)O(1)dn)
  • (Kumar et al., 04)
  • All exponential (or worse) in k

11
The k-means problem
  • An example real-world data set
  • From the UC-Irvine Machine Learning Repository
  • Looking to detect malevolent network connections
  • n494,021
  • k100
  • d38
  • 1e approximation algorithms too slow for this!

12
k-means method
  • The fast way
  • k-means method (Lloyd 82, MacQueen 67)
  • By far the most popular clustering algorithm
    used in scientific and industrial applications
    (Berkhin, 02)

13
k-means method
  • Start with k arbitrary centers ci
  • (In practice chosen at random from data points)

14
k-means method
  • Start with k arbitrary centers ci
  • (In practice chosen at random from data points)

15
k-means method
  • Start with k arbitrary centers ci
  • Assign points to clusters Ci based on closest ci

16
k-means method
  • Start with k arbitrary centers ci
  • Assign points to clusters Ci based on closest ci

17
k-means method
  • Start with k arbitrary centers ci
  • Assign points to clusters Ci based on closest ci

18
k-means method
  • Start with k arbitrary centers ci
  • Assign points to clusters Ci based on closest ci
  • Set each ci to be the center of mass of Ci

19
k-means method
  • Start with k arbitrary centers ci
  • Assign points to clusters Ci based on closest ci
  • Set each ci to be the center of mass of Ci

20
k-means method
  • Start with k arbitrary centers ci
  • Assign points to clusters Ci based on closest ci
  • Set each ci to be the center of mass of Ci
  • Repeat the last two steps until stable

21
k-means method
  • Start with k arbitrary centers ci
  • Assign points to clusters Ci based on closest ci
  • Set each ci to be the center of mass of Ci
  • Repeat the last two steps until stable

22
k-means method
  • Start with k arbitrary centers ci
  • Assign points to clusters Ci based on closest ci
  • Set each ci to be the center of mass of Ci
  • Repeat the last two steps until stable

23
k-means method
  • Start with k arbitrary centers ci
  • Assign points to clusters Ci based on closest ci
  • Set each ci to be the center of mass of Ci
  • Repeat the last two steps until stable

24
k-means method
  • Start with k arbitrary centers ci
  • Assign points to clusters Ci based on closest ci
  • Set each ci to be the center of mass of Ci
  • Repeat the last two steps until stable

25
k-means method
  • Start with k arbitrary centers ci
  • Assign points to clusters Ci based on closest ci
  • Set each ci to be the center of mass of Ci
  • Repeat the last two steps until stable

26
k-means method
  • Start with k arbitrary centers ci
  • Assign points to clusters Ci based on closest ci
  • Set each ci to be the center of mass of Ci
  • Repeat the last two steps until stable

27
k-means method
  • Start with k arbitrary centers ci
  • Assign points to clusters Ci based on closest ci
  • Set each ci to be the center of mass of Ci
  • Repeat the last two steps until stable

28
k-means method
  • Start with k arbitrary centers ci
  • Assign points to clusters Ci based on closest ci
  • Set each ci to be the center of mass of Ci
  • Repeat the last two steps until stable

29
k-means method
  • Start with k arbitrary centers ci
  • Assign points to clusters Ci based on closest ci
  • Set each ci to be the center of mass of Ci
  • Repeat the last two steps until stable

30
k-means method
  • Start with k arbitrary centers ci
  • Assign points to clusters Ci based on closest ci
  • Set each ci to be the center of mass of Ci
  • Repeat the last two steps until stable

31
k-means method
  • Start with k arbitrary centers ci
  • Assign points to clusters Ci based on closest ci
  • Set each ci to be the center of mass of Ci
  • Repeat the last two steps until stable

Clustering is stable k-means terminates!
32
What is known already
  • Number of iterations
  • Finite!
  • Each iteration decreases f
  • In practice
  • Sub-linear (Duda, 00)
  • Worst-case
  • O(n) (Har-Peled and Sadri, 04)
  • min(O(n3kd), O(kn)) (Inaba et al., 00)
  • Very large gap!

33
What is known already
  • Accuracy
  • Only finds a local optimum
  • Local optimum can be arbitrarily bad
  • i.e., f / fOPT unbounded
  • Even with high probability in 1 dimension
  • Even in natural examples well separated
    Gaussians

34
Main outline
  • What is k-means?
  • k-means worst-case complexity
  • Number of iterations can be
  • Super-polynomial!
  • k-means smoothed complexity
  • k-means
  • How slow is the k-means method? (Arthur and
    Vassilvitskii, 06)

35
Worst-case overview
  • Recursive construction
  • Start with input X (goes from A to B in T
    iterations)
  • Then modify X
  • Add 1 dimension, O(k) points, O(1) clusters
  • Old part of input still goes from A to B in T
    iterations
  • New part Resets everything once

A
B
A
B
Reset
T
T
36
Worst-case overview
  • Recursive construction
  • Repeat m times
  • O(m2) points
  • O(m) clusters
  • 2m iterations
  • Lower bound follows

37
Recursive construction (Overview)
Ci
The original input X (Data points not shown)
Start with an arbitrary input...
38
Recursive construction (Overview)
Ci
G
G
H
H
H
H
... and add O(1) clusters, O(k) points along a
new dimension Note the symmetry!
39
Recursive construction (Trace t0)
Ci
G
H
H
Zoomed in, showing only one side We trace k-means
from here
40
Recursive construction (Trace t0...T)
Ci
G
H
H
New points are far away New clusters are stable
while k-means works on old points
41
Recursive construction (Trace tT1)Assigning
points to clusters
Ci
G
H
H
42
Recursive construction (Trace tT1)Assigning
points to clusters
Ci
G
pi
H
H
Choose pi to be direct lift of final Ci center
At time T1 pi closer to joining Ci than ever
before Can position G so pi joins Ci at time T1
43
Recursive construction (Trace tT1)Assigning
points to clusters
Ci
G
pi
H
H
Choose pi to be direct lift of final Ci center
At time T1 pi closer to joining Ci than ever
before Can position G so pi joins Ci at time T1
44
Recursive construction (Trace tT1)Assigning
points to clusters
Ci
G
pi
H
H
Choose pi to be direct lift of final Ci center
At time T1 pi closer to joining Ci than ever
before Can position G so pi joins Ci at time T1
45
Recursive construction (Trace tT1)Recomputing
centers
Ci
G
H
H
Center of G moves further away Centers of Ci
constant by symmetry
46
Recursive construction (Trace tT1)Recomputing
centers
Ci
G
H
H
Center of G moves further away Centers of Ci
constant by symmetry
47
Recursive construction (Trace tT2) Assigning
points to clusters
Ci
G
qi
H
H
Gs center is far away it loses points Each qi
switches to Ci regardless of qis position in the
base space
48
Recursive construction (Trace tT2) Assigning
points to clusters
Ci
G
qi
H
H
Gs center is far away it loses points Each qi
switches to Ci regardless of qis position in the
base space
49
Recursive construction (Trace tT2) Assigning
points to clusters
Ci
G
qi
H
H
Gs center is far away it loses points Each qi
switches to Ci regardless of qis position in the
base space
50
Recursive construction (Trace tT2)Recomputing
centers
Ci
G
Centers reset to t0
H
H
Symmetry Centers of Ci not lifted towards
G Choose qis position to reset Ci in base space
51
Recursive construction (Trace tT2)Recomputing
centers
Ci
G
Centers reset to t0
H
H
Symmetry Centers of Ci not lifted towards
G Choose qis position to reset Ci in base space
52
Recursive construction (Trace tT3)Assigning
points to clusters
Ci
G
Centers reset to t0
H
H
H has moved closer to pi, qi but Ci has
not Position H so pi, qi switch to H now
53
Recursive construction (Trace tT3)Assigning
points to clusters
Ci
G
H
Same state as t1
H
H has moved closer to pi, qi but Ci has
not Position H so pi, qi switch to H now
54
Recursive construction (Trace tT3)Assigning
points to clusters
Ci
G
H
Same state as t1
H
H has moved closer to pi, qi but Ci has
not Position H so pi, qi switch to H now
55
Recursive construction (Trace tT3)Recomputing
centers
Ci
G
H
Same state as t1
H
56
Recursive construction (Trace tT3)Recomputing
centers
Ci
G
H
Same state as t1
H
57
Recursive construction (Trace tT4)Assigning
points to clusters
Ci
G
H
Same state as t1
H
58
Recursive construction (Trace tT4)Assigning
points to clusters
Ci
G
H
Same state as t2
H
59
Recursive construction (Trace tT4)Assigning
points to clusters
Ci
G
H
Same state as t2
H
60
Recursive construction (Trace tT4)Recomputing
centers
Ci
G
H
Same state as t2
H
61
Recursive construction (Trace tT4)Recomputing
centers
Ci
G
H
Same state as t2
H
We are done! New clusters are completely
stable T-2 more iterations needed for Ci Total
time 2T 2
62
Worst-case complexity summary
  • k-means can require iterations
  • For random centers even with high probability
  • Even when d2 (Vattani, 09)
  • (d1 is open)

63
Worst-case complexity summary
  • k-means can require iterations
  • For random centers even with high probability
  • Even when d2 (Vattani, 09)
  • (d1 is open)
  • ICP
  • Can require ?(n/d)d iterations
  • Similar (but easier) argument

64
Main outline
  • What is k-means?
  • k-means worst-case complexity
  • k-means smoothed complexity
  • Take an arbitrary input, but randomly perturb it
  • Expected number of iterations is polynomial
  • Works for any k, d
  • k-means
  • k-means has smoothed polynomial complexity
    (Arthur, Manthey, and Röglin, 09)

65
The problem with worst-case complexity
  • What is the problem?
  • k-means has bad worst-case complexity
  • But is not actually slow in practice
  • Need a different model to understand real world
  • Simple explanations
  • Average case
  • Real-world data is not random

66
The problem with worst-case complexity
  • A better explanation
  • Smoothed analysis (Spielman and Teng, 01)
  • Between average case and worst case
  • Perturb each point by normal distribution,
    variance s2
  • Show expected running time is poly in n, D/s
  • D diameter of point-set

67
Proof overview
  • Recall the potential function
  • X is set of all data points
  • c(x) is corresponding cluster center

68
Proof overview
  • Recall the potential function
  • X is set of all data points
  • c(x) is corresponding cluster center
  • Bound f
  • f nD2 initially
  • Will prove f very likely to drop e2 each iteration

69
Proof overview
  • Recall the potential function
  • X is set of all data points
  • c(x) is corresponding cluster center
  • Bound f
  • f nD2 initially
  • Will prove f very likely to drop e2 each
    iteration
  • Gives of iterations is at most n(D/e)2

70
The easy approach
  • Do union bound over all possible k-means steps
  • What defines a step?
  • Original clustering A ( kn choices)
  • Actually n3kd choices (Inaba et al., 00)
  • Resulting clustering B
  • Total number of possible steps n6kd
  • Probability a fixed step can be bad
  • Bounded by probability that A and B have
    near-identical f
  • Probability (e/s)d

71
The easy approach
  • The argument
  • Pk-means takes more than n(D/e)2 iterations
  • PThere exists a possible bad step
  • ( of possible steps) Pstep is bad
  • n6kd (e/s)d
  • small... if e lt s (1/n)O(k)
  • Resulting bound nO(k) iterations
  • Not polynomial!
  • (Arthur and Vassilitskii, 06), (Manthey and
    Röglin, 09)

72
How can this be improved?
  • Union bound is wasteful!
  • These two k-means steps can be analyzed together

73
How can this be improved?
  • Union bound is wasteful!
  • These two k-means steps can be analyzed together

If point is not equidistant between centers,
potential drops. True for both pictures.
74
How can this be improved?
  • Union bound is wasteful!
  • These two k-means steps can be analyzed together

Other clusters do not matter...
75
How can this be improved?
  • Union bound is wasteful!
  • These two k-means steps can be analyzed together

And for the relevant clusters, only the center
matters, not the exact points.
76
How can this be improved?
  • A transition blueprint
  • Which points switched clusters
  • Approximate positions for relevant centers
  • Bonus Most approximate centers determined by
    above!
  • Not obvious facts
  • m number of points switching clusters
  • transition blueprints (nk2)m (D/e)O(m)
  • Pblueprint is bad (e/s)m
  • (for most blueprints)

77
A good approach
  • The new argument
  • Pk-means takes more than n(D/e)2 iterations
  • PThere exists a possible bad blueprint
  • ( of possible blueprints) Pblueprint is
    bad
  • (nk2)m (D/e)O(m) (e/s)m
  • small... if e lt s (s/nD)O(1)
  • Resulting bound polynomial iterations!

78
Smoothed complexity summary
  • Smoothed complexity is polynomial
  • Still have work to do O(n26)
  • Getting tight exponents in smoothed analysis is
    hard
  • Original theorem for Simplex O(n96)!
  • ICP
  • Also polynomial smoothed complexity
  • Much easier argument!

79
Main outline
  • What is k-means?
  • k-means worst-case complexity
  • k-means smoothed complexity
  • k-means
  • Whats wrong with k-means?
  • Whats k-means?
  • O(log k)-competitive with OPT
  • Experimental results
  • k-means The advantages of careful seeding
    (Arthur and Vassilvitskii, 07)

80
Whats wrong with k-means?
  • Recall
  • Only finds a local optimum
  • Local optimum can be arbitrarily bad
  • i.e., f / fOPT unbounded
  • Even with high probability in 1 dimension
  • Even in natural examples well separated
    Gaussians

81
Whats wrong with k-means?
  • k-means locally optimizes a clustering
  • But can miss the big picture!

If the data set has well separated clusters...
82
Whats wrong with k-means?
  • k-means locally optimizes a clustering
  • But can miss the big picture!

... and we do the standard approach (choose
initial centers uniformly at random) it is easy
to get two centers in one cluster...
83
Whats wrong with k-means?
  • k-means locally optimizes a clustering
  • But can miss the big picture!

... and then k-means gets stuck in a local optimum
84
The solution
  • Easy way to fix this mistake
  • Make centers far apart

85
k-means
  • The right way of choosing initial centers
  • Choose first center uniformly at random

86
k-means
  • The right way of choosing initial centers
  • Choose first center uniformly at random
  • Repeat until k centers
  • Add a new center
  • Choose x0 with probability
  • D(x) distance between x and closest center

87
k-means
  • Example

(k3)
88
k-means
  • Example

(k3)
Choose 1st center uniformly at random
89
k-means
  • Example

D21272
D28242
D27232
D22212
(k3)
Choose 2nd center probability proportional to D2
90
k-means
  • Example

D21272
D21212
D22212
(k3)
Choose 3rd center probability proportional to D2
91
k-means
  • Example

(k3)
Now run k-means as normal
92
Theoretical guarantee
  • Claim
  • Ef O(log k) fOPT
  • Guarantee holds as soon as centers picked
  • But k-means steps good in practice

93
Proof idea
  • Let C1, C2, ..., Ck be OPT clusters
  • Points in Ci contribute
  • fOPT(Ci) to OPT potential
  • fkm(Ci) to k-means potential
  • Lemma If we pick a center from Ci
  • Efkm(Ci) 8fOPT(Ci)
  • Proof Linear algebra magic
  • True for any reasonable probability distribution

94
Proof idea
  • Real danger is we waste center on already covered
    Ci
  • Probability of choosing covered cluster
  • fcurrent(covered clusters) / fcurrent
  • 8fOPT / fcurrent
  • Cost of choosing one
  • If t uncovered clusters left
  • 1/t fraction of fcurrent now unfixable
  • Cost fcurrent / t
  • Expected cost
  • (fOPT / fcurrent ) (fcurrent / t) fOPT / t

95
Proof idea
  • Cost over all k steps
  • fOPT (1/1 1/2 ... 1/k) fOPT (log k)

96
k-means accuracy improvement
Improvement factor in f
Data set
97
k-means vs k-means
  • Values above 1 indicate k-means is
    out-performing k-means

98
Other algorithms?
  • Theory community has proposed other reasonable
    algorithms too
  • Iterative swapping best in practice (Kanungo et
    al., 04)
  • Theoretically O(1)-approximation
  • Implementation gives this up to be viable in
    practice
  • Actual guarantees None

99
k-means accuracy improvementvs Iterative
Swapping (Kanungo et al., 04)
Improvement factor in f
Data set
100
k-means vs Iterative Swapping
  • Values above 1 indicate k-means is
    out-performing Iterative Swapping

101
k-means summary
  • Friends dont let friends use vanilla k-means!
  • k-means has provable accuracy guarantee
  • O(log k)-competitive with OPT
  • k-means is faster on average
  • k-means gets better clusterings (almost) always

102
Main outline
  • Goals
  • Understand number of iterations
  • Harder than you might think!
  • Worst case exponential
  • Smoothed polynomial
  • Find better clusterings
  • k-means (modification of k-means)
  • Provably near-optimal
  • In practice faster and better than the
    competition

103
Special thanks!
  • My advisor
  • Rajeev Motwani
  • My defense committee
  • Ashish Goel
  • Vladlen Koltun
  • Serge Plotkin
  • Tim Roughgarden

104
Special thanks!
  • My co-authors
  • Bodo Manthey
  • Rina Panigrahy
  • Heiko Röglin
  • Aneesh Sharma
  • Sergei Vassilvitskii
  • Ying Xu

105
Special thanks!
  • My other fellow students
  • Gagan Aggarwal
  • Brian Babcock
  • Bahman Bahmani
  • Krishnaram Kenthapadi
  • Aleksandra Korolova
  • Shubha Nabar
  • Dilys Thomas

106
Special thanks!
  • And my listeners!
Write a Comment
User Comments (0)
About PowerShow.com