Finding dense components in weighted graphs - PowerPoint PPT Presentation

About This Presentation
Title:

Finding dense components in weighted graphs

Description:

Finding dense components in weighted graphs. Paul Horn. 12-2-02. Overview. Addressing the problem ... Not necessarily all, but as many as possible of graphs ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 23
Provided by: stude6
Learn more at: http://www.cs.rpi.edu
Category:

less

Transcript and Presenter's Notes

Title: Finding dense components in weighted graphs


1
Finding dense components in weighted graphs
  • Paul Horn
  • 12-2-02

2
Overview
  • Addressing the problem
  • What is the problem
  • How it differs from other already solved problems
  • Building a solution
  • Already existing research
  • Preliminary work
  • Final solution

3
Overview The Sequel
  • Analysis
  • Testing
  • Effectiveness
  • Time Complexity
  • Future Work
  • Trimming the data set more
  • Linking it with real data

4
The problem
  • To find dense subgraphs of a graph.
  • Not just the densest
  • Not necessarily all, but as many as possible of
    graphs that are dense enough
  • The idea is to identify communities based on a
    communications network
  • The more dense the communication is in within a
    subgraph, the more likely it is a community

5
Why is it hard
  • The fastest flow based methods for finding the
    single densest are cubic or worse.
  • We want more than one dense subgraph
  • The greedy approximation algorithm is destructive
    and thus returns only one graph
  • The problem becomes harder when we allow
    subgraphs to overlap

6
Weighty Ideas
  • Input graphs to the algorithm are weighted
  • Weights of a graph represent the intensity of a
    communication
  • Intensity represents the duration and frequency
    of a communication
  • Requires a new definition of density

7
How dense can it get?
  • Recall our old definition of
  • density
  • We modify it to give a
  • notion of density of a
  • weighted graph
  • Note that if the weight of all edges is one the
    two definitions

8
Done before?
  • Discussed in Charikar paper presentation
  • Goldberg, A.V., Finding a Maximum Density
    Subgraph. A flow based maximum density subgraph
    algorithm
  • Charikar, Greedy Approximation Algorithms for
    finding Dense Components in a Graph presented a
    linear approximation algorithm

9
Preliminary Work
  • An implementation of Goldberg and Charikars
    algorithm
  • In test data (generated in a dual-probability
    Erdos-Reyne model) Charikars algorithm
    identified close to the actual density graph
  • These graphs, however were unweighted and thus
    ignored the weighted requirement, and it only had
    one dense subgraph.

10
A First Attempt
  • A modification of Charikars algorithm for
    weighted graphs
  • At each step remove a random edge of lowest
    weight.
  • Then find all connected components
  • Recurse down on each component, and return the
    maximal density subgraph.
  • By repeated executions of the algorithm the hope
    is that different dense components will be
    revealed, that can overlap.

11
Seems Promising, but
  • In test cases generated similarly to that used in
    testing Charikar and Goldbergs algorithm,
    successfully identified close to, if not the
    entire, dense portions.
  • In simulated communication network data, the
    graph was dense enough that large areas of the
    graph were denser than the smaller portions, and
    they were not found.

12
Partitioning?
  • By partitioning optimally, by finding a cut of
    minimum size we can increase the density of the
    graph (to some extent)
  • Since we cut edges of low weight, the edges of
    high weight remain on each of the partitions.
  • (Obviously) doesnt work forever
  • However knowing approximately what size we want
    we can find ideal candidates

13
Rethinking our algorithm
  • Partitioning based algorithm idea
  • Uses Kernighan-Lee to find close to optimal
    partitions.
  • Recurses down on the partitions until the are of
    the desired size.
  • The densest of the partitions left are our output.

14
Finalizing our thought
  • Run the algorithm on more than one partition.
  • Random partitions are likely to be close to
    orthogonal.
  • Generate k partitions, and take best l partitions
    (after KL is applied) at the top level
  • On each other level, generate k partitions, and
    take the top one.

15
Analyzing the Situation
  • The 2-approximation bound that we had for KL-is
    no longer necessarily valid.
  • The algorithm has met with some success in
    identify clusters in simulated data, but needs
    more tuning with respect to size, and the
    trimming of the data set.
  • By trimming out small partitions that are found
    that are similar, we reduce overlap
  • Now may find too many graphs, or incorrect graphs
    but this problem can be relieved by taking only
    the small portions of a certain density (say,
    some percentage of the final)

16
Time it.
  • Original modification to Charikar runs in
    approximately O(VE) time
  • New algorithim runs in approximately
    O(klV2logV) time.
  • k, l due to generated the k partitions each time,
    and picking the top l at each step.
  • V2 is a result of Kernighan-Lee
  • logV is the result of continuing to partition
  • In practice runs very fast. Partitioning graphs
    of size 10000 vertices is possible in a
    reasonable amount of time.

17
In the future
  • The algorithm still needs to better trim the
    partitions it finds, and specifically needs to
    find partitions of more variable size
  • Could perhaps trim based on the density of the
    entire graph, or perhaps based on a maximum
    density subgraph (as found by the modified
    Charikar)
  • Already finds graphs of many sizes, but only
    considers the smallest at the end, so could be
    modified to include more of the larger partitions

18
In the Future II
  • Future data will not be simulated, but instead
    come from online sources
  • Running on a newsgroup induced graph, for
    instance, can hopefully help identify groups
    interested in particular topics.
  • Finding graphs based on email or portions of the
    web graph, could help identify groups of friends
    or topic-related sites as well, and thus help
    predict communities

19
So What?
  • By looking at not just a graph, but a series of
    time based graph we can identify communities and
    how they change over time.
  • Using this method we can hope to identify rules
    which govern the changes of these communities and
    make predictions on their future actions
  • Simulated data used was designed with this end in
    mind.

20
Summing Up
  • Finding multiple dense subgraphs of a graph is a
    relatively unexplored topic, especially finding
    dense subgraphs of large graphs (so that exact
    algorithms are unreasonable)
  • Prior work (such as Goldberg and Charikar)
    centered on finding a single densest subgraph

21
Summing down
  • First algorithm a modification of Charikar
    centered around removing edges and finding
    connected components
  • Second algorithm based on Kernighan-Lee algorithm
    for finding optimal partitions, and recursing
    down to find small subgraphs that are generated
    by cutting a small number of vertices.

22
The Summing
  • Still work to do
  • Linking it back to the real data
  • Internet data from newsgroups, email, etc
  • Using that to find communities over time
  • Finding microlaws that govern them based on how
    the communities change over time
  • Finding better ways to trim data to ensure that
    the best candidates are found
Write a Comment
User Comments (0)
About PowerShow.com