Single Pass Clustering - PowerPoint PPT Presentation

1 / 10
About This Presentation
Title:

Single Pass Clustering

Description:

Now, there is only one cluster, C1, so we only need to compare T3 ... Therefore, we use T3 to start a new cluster, C2. Now we have two clusters. C1 = {T1, T2} ... – PowerPoint PPT presentation

Number of Views:216
Avg rating:3.0/5.0
Slides: 11
Provided by: jsy2
Category:
Tags: clustering | pass | single | use

less

Transcript and Presenter's Notes

Title: Single Pass Clustering


1
Single Pass Clustering
  • Jieh-Shan George Yeh

2
Example
  • Consider the set of documents and terms
  • Clustering the terms using the single pass method
    (using the term vector columns)

3
Preliminary
  • Order the term vectors (column vectors).
  • Choose a simplicity function, say, dot product.
  • Set a pre-specified similarity threshold 10.

4
Step 1
  • Start with T1 in a cluster by itself, say C1.
  • At this point, C1 contains only one item, T1.
  • the centroid of C1 is simply the vector for T1
    C1 lt1, 3, 3, 2, 2gt.

5
Step 2
  • Now compare (i.e., measure similarities) of the
    next item (T2) to centroids of all existing
    clusters.
  • SIM(T2, C1) lt2, 1, 0, 1, 2gtlt1, 3, 3, 2, 2gt 21
    13 03 12 22 11 gt10
  • Add T2 to cluster C1.
  • Compute the new centroid for C1 (which now
    contains T1 and T2).
  • The centroid (which is the average vector for T1
    and T2 is
  • C1 lt3/2, 4/2, 3/2, 3/2, 4/2gt

6
Step 3
  • Now, there is only one cluster, C1, so we only
    need to compare T3 with C1 centroid.
  • SIM(T3, C1) 0 8/2 0 0 4/2 6 lt10
  • Therefore, we use T3 to start a new cluster, C2.
  • Now we have two clusters
  • C1 T1, T2
  • C2 T3

7
Step 4
  • Move to the next unclustered item, T4.
  • SIM(T4, C1) lt0, 3, 0, 3, 5gtlt3/2, 4/2, 3/2, 3/2,
    4/2gt 0 12/2 0 9/2 20/2 20.5 gt10
  • SIM(T4, C2) lt0, 3, 0, 3, 5gt lt0, 2, 0, 0, 1gt 0
    6 0 0 5 11gt10
  • SIM(T4, C1) gt SIM(T4, C2)
  • T4 will be added to cluster C1. Now we have the
    following C1 T1, T2, T4 C2 T3
  • The new centroid for C1 is now
  • C1 lt3/3, 7/3, 3/3, 6/3, 9/3gt

8
Step 5
  • The only item left unclustered is T5.
  • SIM(T5, C1) lt1, 0, 1, 0, 1gt lt3/3, 7/3, 3/3,
    6/3, 9/3gt 3/3 0 3/3 0 9/3 5 lt10
  • SIM(T5, C2) lt1, 0, 1, 0, 1gt lt0, 2, 0, 0, 1gt 0
    0 0 0 1 1 lt10
  • T5 will have to go into a new cluster C3.

9
Final
  • The final clusters are
  • C1 T1, T2, T4
  • C2 T3
  • C3 T5

10
Note
  • The results for this method are highly dependent
    on the similarity threshold.
  • The results are also are highly dependent on the
    order of vectors.
  • The computation complexity is O(nk), where n is
    the number of vectors and k is the number of
    classes.
Write a Comment
User Comments (0)
About PowerShow.com