Title: SCAN: A Structural Clustering Algorithm for Networks
1SCAN A Structural Clustering Algorithm for
Networks
Joint Work with Nurcan Yuruk (UALR) and Thomas A.
J. Schweiger (Acxiom)
2Network Clustering Problem
- Networks made up of the mutual relationships of
data elements usually have an underlying
structure. Because relationships are complex, it
is difficult to discover these structures. How
can the structure be made clear? - Stated another way, given simply information of
who associates with whom, could one identify
clusters of individuals with common interests or
special relationships (families, cliques,
terrorist cells).
3An Example of Networks
- How many clusters?
- What size should they be?
- What is the best partitioning?
- Should some points be segregated?
4A Social Network Model
- Individuals in a tight social group, or clique,
know many of the same people, regardless of the
size of the group. - Individuals who are hubs know many people in
different groups but belong to no single group.
Politicians, for example bridge multiple groups. - Individuals who are outliers reside at the
margins of society. Hermits, for example, know
few people and belong to no group.
5The Neighborhood of a Vertex
Define ?(?) as the immediate neighborhood of a
vertex (i.e. the set of people that an individual
knows ).
6Structure Similarity
- The desired features tend to be captured by a
measure we call Structural Similarity - Structural similarity is large for members of a
clique and small for hubs and outliers.
7Structural Connectivity 1
- ?-Neighborhood
- Core
- Direct structure reachable
- Structure reachable transitive closure of direct
structure reachability - Structure connected
1 M. Ester, H. P. Kriegel, J. Sander, X. Xu
(KDD'97)
8Structure-Connected Clusters
- Structure-connected cluster C
- Connectivity
- Maximality
- Hubs
- Not belong to any cluster
- Bridge to many clusters
- Outliers
- Not belong to any cluster
- Connect to less clusters
hub
outlier
9Algorithm
? 2 ? 0.7
10Algorithm
2
3
? 2 ? 0.7
5
1
4
7
6
0
11
8
12
10
9
0.63
13
11Algorithm
2
3
? 2 ? 0.7
5
1
4
7
6
0
0.67
11
8
0.82
12
10
0.75
9
13
12Algorithm
? 2 ? 0.7
13Algorithm
2
3
? 2 ? 0.7
5
1
4
7
6
0
11
8
12
10
9
0.67
13
14Algorithm
2
3
? 2 ? 0.7
5
1
4
7
6
0
11
0.73
8
0.73
12
0.73
10
9
13
15Algorithm
2
3
? 2 ? 0.7
5
1
4
7
6
0
11
8
12
10
9
13
16Algorithm
2
3
? 2 ? 0.7
5
1
4
7
0.51
6
0
11
8
12
10
9
13
17Algorithm
2
3
? 2 ? 0.7
5
1
4
7
6
0
0.68
11
8
12
10
9
13
18Algorithm
2
3
? 2 ? 0.7
5
1
4
7
6
0
11
8
12
0.51
10
9
13
19Algorithm
2
3
? 2 ? 0.7
5
1
4
7
6
0
11
8
12
10
9
13
20Algorithm
2
3
? 2 ? 0.7
5
1
0.51
4
7
0.68
6
0
0.51
11
8
12
10
9
13
21Algorithm
2
3
? 2 ? 0.7
5
1
4
7
6
0
11
8
12
10
9
13
22Running Time
- Running time O(E)
- For sparse networks O(V)
2 A. Clauset, M. E. J. Newman, C. Moore,
Phys. Rev. E 70, 066111 (2004).
23Are you ready for some football?
- Given only the 2006 schedule of what schools each
NCAA Division 1A team met on a football field,
what underlying structures could one discover?
24789 Contests
- 119 Division 1A school who play
- schools in their conference
- schools in other 1A conferences
- independent 1A schools (e.g. Army)
- schools in sub-1A conferences (e.g. Maine)
25Consider Arkansas Schedule
- USC Pacific 10
- Utah State Western Athletic
- Vanderbilt SEC
- Alabama SEC
- Auburn SEC
- Southeast Missouri State Non 1A
- Mississippi SEC
- Louisiana Monroe Sun Belt
- SouthCarolina SEC
- Tennessee SEC
- Mississippi State SEC
- LSU SEC
- Florida SEC
- Wisconsin Big 10
26The Network
27The 1A Conference
28Result of Our Algorithm
29Result of FastModularity Alg. 2
2 A. Clauset, M. E. J. Newman, C. Moore,
Phys. Rev. E 70, 066111 (2004).
30Conclusion
- We propose a novel network clustering algorithm
- It is fast O(E), for scale free networks
O(V) - It can find clusters, as well as hubs and
outliers - For more information
- See you in poster session this evening at poster
board 4 - Email xwxu_at_ualr.edu
- URL http//ifsc.ualr.edu/xwxu
- Thank you!