Title: Approximation Algorithms for Representative Points Problem of Clusters
1Approximation Algorithms for Representative
Points Problem of Clusters
- Sanpawat Kantabutra
- The Theory of Computation Group
- Computer Science Dept.
- Chiang Mai University
2Outline
- Definitions
- Non-Existence of Absolute Performance Guarantee
- MST-Based Approximation Algorithm
- Greedy Approximation Algorithm
- Performance Guarantees
- Open Problems
3Dissimilarity
- Let d(x, y) denote the Euclidean distance of
points x and y. -
4?-Clustering Problem
- Let S be a set of n d-dimensional points and ? a
real number. - Want to partition S into l clusters C1, C2,,Cl
s. t.?x,y?Ci, ? z1,z2,,zm ?Ci, x??y, s. t. - d(x,z1)lt ?,d(zt,zt1)lt
?,d(zm,y)lt ? - where Ci is a maximal cluster having this
property, ??Ci S, Ci?? Cj ?? when i ?? j
5Example of ?-Clusters
6?-Clustering Problem of Order K
- Let S be a set of n d-dimensional points and ? a
real number. - Want to partition S into l clusters C1, C2,,Cl,
? Ci ?? S, s. t. for each Ci ? Gj, ?1 ? j ? m,
there exists a path of subsets G1,G2,,Gm and 1m
constitutes all the Gj so that ?x?Gj, ?k-1
distinct y ?Gj d(x,y)lt?where Gj ?Gj1 ? ? and
k gt 1, and ?x?Ci, ?y?Cj, i ? j, d(x,y) ? ?.
7Example of ?-Clusters of Order 3
8Representative Points
- Let S ?Ci of d-dimensional points. Given
disjoint clusters C1,C2,,Cl and sets R1,R2,,Rl,
Ri ? Ci, we then say that Ri represent Ci iff - ?x?Ci, ?ri?Ri, ?rj?Rj, i?j, d(ri,x)ltd(rj,x)
9Example of Representative Points
10No Absolute Performance Guarantee
- Theorem 1. If P?NP, no polynomial time
approximation algorithm A for any instance I can
solve ?-REP with A(I)-OPT(I) ? r, for any fixed
r and any optimum solution OPT(I), where OPT(I),
A(I)?N, and r?N.
11MST-Based Approximation Algorithm (I)
- Modified Depth-First Search(v,R)
- 1. If ((Pred(v) ? R) AND (v is not the root))
- 2. R R ? v
- 3. Mark v as visited
- 4. For all nodes i adjacent to v not visited
- 5. Modified Depth-First Search(i,R)
12MST-Based Approximation Algorithm (II)
- INPUT l ?-clusters (of order k) C1,C2,. . . ,Cl
- OUTPUT A set R Ri of representative points
- MST-Based Approximation Algorithm
- 1. Let Ri ? for all i 1l
- 2. For each Ci
- 3. Compute distances of all pairs of points
in Ci - 4. Find a minimum-cost spanning tree MSTi
- from Ci where points and distances
become - nodes and edges respectively
13MST-Based Approximation Algorithm (III)
- 5. For each MSTi
- 6. Let v be a leaf node in MSTi and the root
of DFS - 7. Modified Depth-First Search(v,Ri)
- 8. Replace every leaf node x ? Ri with
Pred(x) - 9. Output R Ri
14Algorithms Correctness
- Proposition 1. The MST-based approximation
- algorithm produces a set R Ri of
representative points of size in the worst
case where n is the number of input points.
15Single Cluster Representation Problem
- Given a set S ? Ci of n points in
d-dimensional space where Ci is a ?-cluster and a
cluster number h, the single cluster
representation problem is to find a
representative set Rh ? Ch such that - ?y?Ch ?r?Rh ?x?S - Ch d(r,y)ltd(x,y)
16Heuristic Representative Algorithm (I)
- INPUT ?-clusters Ci and h where 1 ? h ? l
- OUTPUT Rep. set Rh
- Rh ?
- While (Ch ? ?)
- Pmax ?
- For all r ? Ch
- P ?
17Heuristic Representative Algorithm (II)
- P ?
- For all y ? Ch
- If (CheckCond(r,y,S-Ch))
- P P ? y
- If (PgtPmax)
- Pmax P
- rmax r
- ChCh-Pmax, SS-Pmax, RhRh ? rmax
- return Rh
18Algorithms Correctness
- Theorem 2. The heuristic representative
algorithm finds a representative set Rh ? Ch for
a single cluster representation problem according
to Definition 5.
19Greedy Approximation Algorithm
- INPUT ?-clusters C1,C2,. . . ,Cl S ?Ci
- OUTPUT A set T of representative sets Ri
- T ?
- For h 1 to l
- Heuristic Representative Algorithm(Ci,h,Rh)
- T T ? Rh
- Return T
20Algorithms Correctness
- Theorem 3. This algorithm finds a set of
representatives Ri for all ?-clusters Ci, 1 ?
i ? l, according to the Definition 4.
21Performance Guarantee
- Theorem 4. Let Agreedy denote the greedy
approximation algorithm and Rgreedy the
performance ratio. Then, for all input instances
I, - Rgreedy(I) ?
- where Cmax is the largest ?-cluster and k is
the order of the ?-clusters.
22Tight Instance
23Open Problems
- MST-Based Approximation is in NC?
- Is the greedy-based approximation scheme the best
one? What if we change the analysis of the lower
bound or the lower bound itself? - Does the constant relative performance guarantee
exist for this problem? - Is the greedy approximation problem is
P-complete?
24Questions and Answers
?