Title: A Framework For Community Identification in Dynamic Social Networks
1A Framework For Community Identification in
Dynamic Social Networks
- Chayant Tantipathananandh
- Tanya Berger-WolfDavid KempePresented by
Victor Lee
2Outline of Presentation
- The Challenge Dynamic Social Networks
- Framework and Problem Formulation
- Individual and Group Colorings
- Group Coloring Heuristics
- Experimental Results
- Future Directions
3The Problem
- Many well-known approaches to identify
communities in social networks - Graph Partitioning
- Clustering
- Various measures of closeness or density
- But, these approaches generally assume static
networks - Most social networks are dynamic
4Dynamic Social Networks
- Social Networks change over time
- Membership changes
- Interaction changes
- Most community identification techniques
- Use a single snapshot
- Or use time-averaged measurements
- Lose important information
5Importance of Dynamic Information
T1 T2 T3 T4 T5 T6
A
B
A
B
A
B
C
A
B
A
B
A
B
time
A
B
C
A
B
C
A
B
A
B
C
A
B
C
A
B
C
Network 1 Network 2
- Networks 1 and 2 same average characteristics,bu
t - Network 1 shows an oscillation
- Network 2 suggests that C joins the community
6Proposal
- New framework for modeling social networks over
time - Algorithms and Heuristics to identify dynamic
communities - Experiments to verify the concept and the
computational performance
7Problem Formation
- Given
- A set of individuals
- A sequence of snapshot observations
- Find
- A best-fit set of time-varying communities C(t)
- Best-fit time-varying community membership for
each individual - Approach
- Combinatorial optimization
- Graph coloring
8Model Individuals and Groups
- Set of individuals X i1, i2, in
- Sequence of observations ltP1, P2, PTgt
- Discrete time
- Record interaction between individuals
- The set of individuals interacting at time t
define a group. - If A interacts with B, and B interacts with
C,than A,B,C ? a group
A
C
B
9Group vs Community
- Snapshot Graph
- Individual is a vertex
- Interaction is an edge
- Group is a connected subgraph
- Assumption interaction is sufficiently limited
so that the graph is not connected (we have
disjoint groups) - Group ? Community
- Groups capture observed interaction at a point in
time - Communities extend over time
10Graphing the Observations
- Each time slice is one observation
- Edges within a time slice show observed
interaction at time t - Add edges joining all observations of the same
individual - No edges between groups from one time to another
? individual ? group
11Refine the Problem
- A community appears as a sequence of groups, of
at most one group per time slice. - Tasks
- Assign each group to a community(color the group
vertices) - Assign each individual to a community, for each
time step (color individual vertices) - More Assumptions
- Individuals belong to one community at a time
- Individuals dont change community frequently
- Individuals frequently appear in their community
12Cost Model
- Quantify a good community identification
- Assign costs to undesirable behavior
- I-cost ? when an individual changes color.
- G-costs
- b1 when an individual is absent from its
community. - b2 when an individual is present in a different
community. - C-cost g for each color that I uses
- Find a coloring with minimum cost
13Coloring Choices and Costs
At time T3, C temporarily changes its interaction.
T1 T2 T3 T4
A
B
C
D
A
B
C
D
A
B
C
D
A
B
C
D
A
B
C
D
A
B
C
D
time
A
B
C
D
A
B
C
D
Coloring 1 Coloring 2
- Coloring 1 C changes community and then changes
back. - Cost 2a ( g if this color hasnt been used
before) - Coloring 2 C stays in its original community and
just visits. - Cost b1 b2
- Optimal coloring depends on comparison (b1
b2) lt (2a g) or (2a)
14Finding Optimal Colorings
- Finding the optimal solution is NP-hard
- Partition the problem
- Find an optimal set of communities
- Find optimal assignment of individuals to
communities - If Phase 1 (Group Coloring) is completed first
- Phase 2 is reduced from O(2N) to O(2G),N of
individuals, G of groups - The cost incurred by one individuals coloring
is independent of the colors chosen by others.
15Independence of Individual Color Choice
- Proof
- Cost of an individuals behavior A (I-cost)
B (G-cost) C (C-cost) - Costs are assessed individually
- I-cost a ( of color changes)
- G-cost b1 ( absences from its group) b2
( visits to other groups) - C-cost g ( of colors that an individual
uses) - So, we can solve for each individual one at a
time. - Moreover, we can assess cost incrementally,from
time t to time t1
16Individual Coloring Algorithm
- C set of all colors observed to be used by an
individual i - F(t) S ? C 1 S t all possible subsets
of colors up to time t - G(t,x) G-cost to use color x at time t
- I(t,x,y) I-cost to use color x at time t-1 and
color y at time t - C(x,R) C-cost to use color x when color set R
has been used - Min. cost at time t, using color x, with color
set S used - At time1 G(I, x, x) G(1,x) At timet G
(t, S, x) G(t, x) min G(t-1, R, y)
I(t, x, y) C(x, R) over all R and y,
where R ? F(t-1), y ? R R U x S,
i-cost changing colorg-cost wrong
groupc-cost new color
17Optimal Individual Coloring
- Given a group coloring, the minimum cost of
coloring the individual I is min G(T, S, x) S
? F(T), x ? S - Time complexity is O( nTC2 2C )
- Space requirement is O( C 2C )
- If the number of groups C is not large, the
complexity is tractable.
18Optimal Group Coloring
- Determine the best mapping of groups at time t to
groups at time t1 - Groups that are mapped across time are part of
the same community and have the same color - A coloring is good if most individuals can retain
their color from step to step.
19Bipartite Matching Heuristic
- Matching Graph
- For each pair of groups g, g at times t, tt1,
add a weighted edge from vg,t to vg,t - Weight g n g (similarity of g to g)
- Find the maximum weight bipartite matching
- Evaluation
- Weights i-cost more than g-cost
- Performs well if membership is fairly stable
- No long range perspective
- More efficient heuristics?
i-cost changing colorg-cost wrong
groupc-cost new color
20Greedy Heuristics for Group Coloring
- Approach Maximize pairwise similarity between
groups, for all pairs of groups over all
timesteps - Jaccards index Jac(g, g') g n g' g U
g' - Weighted for temporal proximity JacD(g, g')
Jac(g, g') t - t'
overlap between g and g', scaled to size of g and
g'
21Greedy Heuristics for Group Coloring
- Greedy Heuristic 1 (time is not a factor)
- Construct a square similarity matrix of size
groups - Using agglomerative clustering
- Greedy Heuristic 2 (look backwards in time) For
t1 to T do - Match most similar pairs g, g' for any time t' lt
t - If similarity0 or all colors have been used, add
a new color - Greedy Heuristic 3 (look back the shortest
interval) - Like Heuristic 2, but use t', t' is the closest
value to t such that ? similarity(g, g') gt 0
22Experiment 1 Verify the Framework
- Does the framework capture the intuitive concept
of dynamic community? - Procedure
- Construct small, synthetic datasets
- Use exhaustive search to get a truly optimal
coloring
23Experiment 1A Assembly Line
- At each time step, 1 member leaves and 1 enters a
group, resulting in a complete membership change
in 3 steps.
- Results change as costs change. (A) favors
stable membership. (B) allows for more fluid
membership.
24Experiment 1B Dutiful Children
- 2, 3, and 4 are Children. 0 and 1 are Parents
that visit a different child each timestep.
- Results Framework succeeds at detecting the
individual children as well as the visitation
pattern.
25Experiment 2 Quality of Heuristic Results
- Do the heuristics obtain colorings similar to
those of an exhaustive search? - Procedure
- Re-test the synthetic datasets using the various
heuristics
Results At least one Heuristic method obtains
the same coloring and total cost as Exhaustive
Search
26Experiment 3 Real World Datasets
- Do the framework and heuristics together obtain
expected results using real-world datasets?
27Experiment 3A Southern Women
- Eighteen women in 1933 in Natchez, Tennessee
- Tracks their attendance at 14 social events
28Experiment 3A Prior Results
- Twenty one analyses (1941 to 2001) all show
similar results - Two clear communities
- The membership of individuals 8, 9, and 16 is
less certain.
29Experiment 3A Results
- Detects 4 communities, which are subsets of the
traditional 2 communities
- Individuals 6 and 10 change membership over time
- By adjusting cost factors, the results of most of
the 21 prior analyses can be duplicated
30Experiment 3B Grevys Zebra
- 28-member zebra herd observed 44 times over 3
months in 2002
- The graph to the left shows the aggregate
interaction. - Temporal information is lost.
31Experiment 3B Results
- Inferred communities agree with manual results
obtained by biologists. - 4 stable communities
- Some short-lived communities and some visiting
32Conclusions
- We present a framework for identifying
communities in dynamic social networks - The framework produces meaningful results
compared to traditional methods - Heuristic methods produce near-optimal solutions
- Future Directions
- Develop an approximation algorithm which
guarantees the quality of the result - Investigate scalability over network size and
time - Relax assumptions about interaction and dynamics