Finding dense components in weighted graphs - PowerPoint PPT Presentation

About This Presentation

Title:

Finding dense components in weighted graphs

Description:

Finding dense components in weighted graphs. Paul Horn. 12-2-02. Overview. Addressing the problem ... Not necessarily all, but as many as possible of graphs ... – PowerPoint PPT presentation

Number of Views:37

Avg rating:3.0/5.0

Slides: 23

Provided by: stude6

Learn more at: http://www.cs.rpi.edu

Category:

more less

Transcript and Presenter's Notes

Title: Finding dense components in weighted graphs

1
Finding dense components in weighted graphs

Paul Horn
12-2-02

2
Overview

Addressing the problem
What is the problem
How it differs from other already solved problems
Building a solution
Already existing research
Preliminary work
Final solution

3
Overview The Sequel

Analysis
Testing
Effectiveness
Time Complexity
Future Work
Trimming the data set more
Linking it with real data

4
The problem

To find dense subgraphs of a graph.
Not just the densest
Not necessarily all, but as many as possible of
graphs that are dense enough
The idea is to identify communities based on a
communications network
The more dense the communication is in within a
subgraph, the more likely it is a community

5
Why is it hard

The fastest flow based methods for finding the
single densest are cubic or worse.
We want more than one dense subgraph
The greedy approximation algorithm is destructive
and thus returns only one graph
The problem becomes harder when we allow
subgraphs to overlap

6
Weighty Ideas

Input graphs to the algorithm are weighted
Weights of a graph represent the intensity of a
communication
Intensity represents the duration and frequency
of a communication
Requires a new definition of density

7
How dense can it get?

Recall our old definition of
density
We modify it to give a
notion of density of a
weighted graph
Note that if the weight of all edges is one the
two definitions

8
Done before?

Discussed in Charikar paper presentation
Goldberg, A.V., Finding a Maximum Density
Subgraph. A flow based maximum density subgraph
algorithm
Charikar, Greedy Approximation Algorithms for
finding Dense Components in a Graph presented a
linear approximation algorithm

9
Preliminary Work

An implementation of Goldberg and Charikars
algorithm
In test data (generated in a dual-probability
Erdos-Reyne model) Charikars algorithm
identified close to the actual density graph
These graphs, however were unweighted and thus
ignored the weighted requirement, and it only had
one dense subgraph.

10
A First Attempt

A modification of Charikars algorithm for
weighted graphs
At each step remove a random edge of lowest
weight.
Then find all connected components
Recurse down on each component, and return the
maximal density subgraph.
By repeated executions of the algorithm the hope
is that different dense components will be
revealed, that can overlap.

11
Seems Promising, but

In test cases generated similarly to that used in
testing Charikar and Goldbergs algorithm,
successfully identified close to, if not the
entire, dense portions.
In simulated communication network data, the
graph was dense enough that large areas of the
graph were denser than the smaller portions, and
they were not found.

12
Partitioning?

By partitioning optimally, by finding a cut of
minimum size we can increase the density of the
graph (to some extent)
Since we cut edges of low weight, the edges of
high weight remain on each of the partitions.
(Obviously) doesnt work forever
However knowing approximately what size we want
we can find ideal candidates

13
Rethinking our algorithm

Partitioning based algorithm idea
Uses Kernighan-Lee to find close to optimal
partitions.
Recurses down on the partitions until the are of
the desired size.
The densest of the partitions left are our output.

14
Finalizing our thought

Run the algorithm on more than one partition.
Random partitions are likely to be close to
orthogonal.
Generate k partitions, and take best l partitions
(after KL is applied) at the top level
On each other level, generate k partitions, and
take the top one.

15
Analyzing the Situation

The 2-approximation bound that we had for KL-is
no longer necessarily valid.
The algorithm has met with some success in
identify clusters in simulated data, but needs
more tuning with respect to size, and the
trimming of the data set.
By trimming out small partitions that are found
that are similar, we reduce overlap
Now may find too many graphs, or incorrect graphs
but this problem can be relieved by taking only
the small portions of a certain density (say,
some percentage of the final)

16
Time it.

Original modification to Charikar runs in
approximately O(VE) time
New algorithim runs in approximately
O(klV2logV) time.
k, l due to generated the k partitions each time,
and picking the top l at each step.
V2 is a result of Kernighan-Lee
logV is the result of continuing to partition
In practice runs very fast. Partitioning graphs
of size 10000 vertices is possible in a
reasonable amount of time.

17
In the future

The algorithm still needs to better trim the
partitions it finds, and specifically needs to
find partitions of more variable size
Could perhaps trim based on the density of the
entire graph, or perhaps based on a maximum
density subgraph (as found by the modified
Charikar)
Already finds graphs of many sizes, but only
considers the smallest at the end, so could be
modified to include more of the larger partitions

18
In the Future II

Future data will not be simulated, but instead
come from online sources
Running on a newsgroup induced graph, for
instance, can hopefully help identify groups
interested in particular topics.
Finding graphs based on email or portions of the
web graph, could help identify groups of friends
or topic-related sites as well, and thus help
predict communities

19
So What?

By looking at not just a graph, but a series of
time based graph we can identify communities and
how they change over time.
Using this method we can hope to identify rules
which govern the changes of these communities and
make predictions on their future actions
Simulated data used was designed with this end in
mind.

20
Summing Up

Finding multiple dense subgraphs of a graph is a
relatively unexplored topic, especially finding
dense subgraphs of large graphs (so that exact
algorithms are unreasonable)
Prior work (such as Goldberg and Charikar)
centered on finding a single densest subgraph

21
Summing down

First algorithm a modification of Charikar
centered around removing edges and finding
connected components
Second algorithm based on Kernighan-Lee algorithm
for finding optimal partitions, and recursing
down to find small subgraphs that are generated
by cutting a small number of vertices.

22
The Summing

Still work to do
Linking it back to the real data
Internet data from newsgroups, email, etc
Using that to find communities over time
Finding microlaws that govern them based on how
the communities change over time
Finding better ways to trim data to ensure that
the best candidates are found

Write a Comment

User Comments (0)