GAD: General Activity Detection for Fast Clustering on Large Data - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

GAD: General Activity Detection for Fast Clustering on Large Data

Description:

Exploit activity detection for fast clustering on different senarios ... at certain iteration, it will continue to be static at all suture iterations. Algorithm ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 17
Provided by: Jos373
Category:

less

Transcript and Presenter's Notes

Title: GAD: General Activity Detection for Fast Clustering on Large Data


1
GAD General Activity Detection for Fast
Clustering on Large Data
  • Xin Jin, Sangkyum Kim, Jiawei Han, Liangliang Cao
    and Zhijun Yin
  • SDM09

2
Outline
  • Introduction
  • GAD
  • GAD for very large clustering
  • Experimental
  • Conclusion
  • My though

3
Introduction
  • It focus on developing fast core clustering
    algorithms.
  • Contribution
  • Exploit activity detection for fast clustering on
    different senarios
  • It can achieve very high speed than k-means

4
GAD
  • Notations
  • NC(i, p, j)
  • pattern ps jth nearest center
  • D-NC(i, p, j)
  • distance from pattern p to its jth nearest center
  • Dist(i, p, Cj)
  • distance between pattern p and center Cj

5
Definition and Concepts
  • The GAD framework function
  • GAD(S, A, m, B)
  • S search methods, A activity states, m the
    number of nearest center, B boundary
  • Search methods
  • Full search - find a patterns m nearest center
  • Whole full search - perform full search for all
    the patterns
  • Partial search - search from active centers
  • m-search - search from a patterns previous m
    nearest centers
  • 0-search - a special case of m-search
  • m-boundary

6
GAD algorithm
  • General algorithm
  • Step 1. initialization
  • Step 2. search method decision
  • Step 3. update pattern ps nearest centers
    according to step 2
  • Step 4. get next pattern
  • Step 5. assign each pattern to its nearest center
  • Step 6. go to step 2 until all the centers are
    converged

7
Exact GAD algorithm
  • Search method decision
  • If its previous nearest center Cprev1 is static
    at this iteration, perform partial search
  • If NC(i, p, 1) become active, calculate Dist(i1,
    p, Cprev1)
  • Dist(i1, p, Cprev1) lt D-NC(i, p, 1), perform
    partial search
  • Dist(i1, p, Cprev1) gt D-NC(i, p, 1) or
    Dist(i1, p, Cprev1) boundary, perform partial
    search
  • Dist(i1, p, Cprev1) gt D-NC(i, p, 1) and
    boundary, perform full search

8
  • Update pattern ps nearest center
  • If full search is decided
  • Search from all centers to find the m nearest
    centers
  • Update the boundary as D-NC(i1, p, m)
  • If partial search is decided
  • Check the patterns previous m nearest center and
    keep the static centers among them as candidates
    for the current m nearest centers
  • Find current m nearest centers from the
    candidates and active centers
  • If D-NC(i1, p, m) is smaller than the boundary,
    update the boundary

9
Exact GAD algorithm
10
(No Transcript)
11
Full Search
12
(No Transcript)
13
NS-AGAD (Naïve State Approximate GAD)
  • Approximate GAD
  • It can further accelerate the speed of E-GAD
  • NS-AGAD
  • If a center is static at certain iteration, it
    will continue to be static at all suture
    iterations.
  • Algorithm
  • If ps previous nearest center Cprev1 is static
    at this iteration, perform 0-search
  • If 0-search is decided, simply copy the previous
    m nearest centers as the new m nearest centers

14
S-AGAD (Static AGAD)
  • If a patterns former nearest center is static
  • The area near the pattern is relatively stable
  • The new nearest center will likely come from the
    patterns previous m nearest centers
  • It avoid searching from other centers
  • Algorithm
  • If ps previous nearest center Cprev1 is static,
    perform m-search at this iteration

15
I-AGAD (Inward AGAD)
  • If a patterns previous nearest center is static
    or becomes activity but moves inward to the
    patterns
  • Algorithm
  • If ps previous nearest center Cprev1 is static,
    perform m-search at this iteration
  • If center Cprev1 is active, calculate Dist(i1,
    p, Cprev1)
  • If Dist(i1, p, Cprev1) lt Dist(i, p, Cprev1),
    perform m-search
  • If m-search is decided, search within previous m
    nearest centers and update the new order

16
WB-AGAD (Within-Boundary AGAD)
  • It further relaxed the constraints to the
    boundary
  • If a patterns previous nearest center is static
    or active at the next iteration, it is still
    within the boundary of the pattern
  • The new nearest center will likely be previous m
    nearest centers

17
GAD for Very Large Clusters
  • H-GAD
  • Hierarchical GAD
  • KD-GAD
  • Kd-tree GAD
  • Build two kd-tree
  • Full kd-tree
  • Active kd-tree

18
Experimental Evaluation
19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22
Conclusion
  • Propose a General Activity Detection framework
    for fast clustering.
  • It is several times faster than K-Means and the
    best speedup can be as high as 10 times.

23
My thought
  • Although this paper provide new core clustering
    algorithm, but whether uses on data streaming.
  • Is it the same result for different initialize
    center on GAD algorithm?
Write a Comment
User Comments (0)
About PowerShow.com