An Efficient Algorithm for Enumerating Pseudo Cliques - PowerPoint PPT Presentation

About This Presentation
Title:

An Efficient Algorithm for Enumerating Pseudo Cliques

Description:

Title: PowerPoint Presentation Last modified by: uno Created Date: 1/1/1601 12:00:00 AM Document presentation format: Other titles – PowerPoint PPT presentation

Number of Views:77
Avg rating:3.0/5.0
Slides: 24
Provided by: research50
Category:

less

Transcript and Presenter's Notes

Title: An Efficient Algorithm for Enumerating Pseudo Cliques


1
An Efficient Algorithm for Enumerating Pseudo
Cliques
  • Takeaki Uno
  • National Institute of Informatics
  • The Graduate University for Advanced Studies

Dec/18/2007 ISAAC, Sendai
2
Introducing Pseudo Cliques
3
Analyzing Large Scale Database
  • By rapid growth of database size, we have to
    analyze databases in some computational way
  • Finding cliques in similarity/relation graphs
    is a popular way to classify the data, or get
    characterizations of the data

Group of similar or related objects
Thanks to good properties such as monotonicity,
(maximal) cliques can be enumerated very quickly
(up to 1,000,000/sec) Now, we are motivated to
find more rich object, dense structures, such as
pseudo cliques
4
Finding Cliques in Graph
  • Clique a complete subgraph
  • (complete bipartite subgraph ? bipartite
    clique)

Group of similar or related objects
Often used for finding clusters or groups
Graphs in practice are usually sparse but locally
dense, scale free, and satisfy small world
property Simple Backtacking (Branch-and-Bound)
works well, because of the monotone property
(polynomial time for each) Practically very
fast even for maximal ones (up to 1,000,000/sec)
5
Def. Pseudo Clique
  • For a vertex set K, the density of K is
  • (edges connecting vertices in
    K)
  • (K-1)K
    /2
  • - K is a clique ? density is 1
  • - K is an independent set ? density is 0
  • ? if density is high, K is nearly a
    clique

maximum edges in S
ave. ratio of vertices adjacent to a vertex
For given ?, K is a pseudo clique ? (density of
K) ? ?
We want to solve the problem of
enumerating all pseudo cliqus of the given graph
6
Existing Results
  • Easy to find one pseudo clique
  • ? two connected vertices always form a pseudo
    clique
  • Finding a pseudo clique of size k is
    NP-complete
  • ? Reducing k-clique problem by setting ? 1
  • Approximation algorithms for maximizing the
    density for size k
  • - O(V1/3-e) approaximation algorithm
  • - O((n/k)e) approx. if optimal solution is dense
    Tokuyama el al.
  • - PTAS if O(n2) edges Arora et al.
  • Many heuristic algorithms in data mining, data
    engineering, natural sciences
  • However, no algorithm for "complete"
    enumeration

7
Hardness for Branch-and-Bound
  • A straightforward approach is branch and bound
  • In each iteration, divide the
  • problem into two non-empty
  • problems by the
  • inclusion of a vertex


The existence of pseudo clique is NP-comp.
8
Proof of the Hardness
Theorem 1
  • For given graph G, threshold ?, and vertex
    set U, the problem of checking the existence of a
    pseudo clique including U is NP-complete

Proof reducing the problem of clique of k
vertices
input graph G(V,E)
only (U clique) is pseudo clique density
increases by increase of pseudo clique
size setting es.t. clique of size at least k
induces a pseudo clique
density
V2 -1 V2
9
Is This Really Hard?
  • We proved NP-hardness for "very dense graphs"
  • ? unclear for middle dense graph
  • ? possibility for polynomial time enumeration

hard
easy
?????
easy
10
Polynomial Time Enumeration
11
Reverse Search Approach
  • Introduce an acyclic parent-child relation on
    all pseudo cliques

objects
Enumeration by traversing the tree induced by the
relation
Need an algorithm for listing up all children
12
Parent of Pseudo Clique
  • v(K) min. deg. min. index vertex in GK
  • The parent of pseudo clique K ? K\v(K)

The parent of K
K
Density of K ave. degree GK /
(K-1) The parent is the removal of most
"sparse" vertex from K, thus is a pseudo clique
The parent is smaller than its child ? acyclic
relation
13
Ex. Enumeration Tree
  • threshold .7


1
2
4
5
3
7
6
14
Finding Children
  • A child is obtained by adding a vertex to the
    parent
  • degK(v) vertices in K adjacent to v
  • (can be maintained in O(?) time for vertex
    addition)
  • K?v is a child of K ?
  • ? K?v is a pseudo clique ? lower bound for
    degK(v)
  • ? v(K?v) v ? upper bound for degK(v)
  • - degK(v) lt min. deg. of K ? K?v is
    always a child
  • - degK(v) gt min. deg. of K 1 ? K?v never be
    a child
  • degK(v) min. deg. of K or 1 ? next slide

15
Detailed Condition
  • S(K) sequence of vertices in K in the order of
    (degree, index)
  • v is a child ? v is the top of S(K?v)
  • v is child only if v is adjacent to all
    vertices preceding to v in S(K)
  • For each vertex, find the first "non-adjacent
    vertex" in S(K)
  • This can be done in O(?2) time

top of S(K) is v(K)
Computation time for one iteration is O(?2 log
V) ( O(?k log V) if k-degenerate)
16
Computational Experiments
17
Implementation
  • Code is a simple version
  • - update degK(vi) at each addition
  • ? adding u to K takes O(deg(u)) time
  • - to find children, vi satisfying
  • ?K(K1) - (edges in K) ? degK(vi)
    ? d(K)1
  • ? O( C d(K)) O(E) time O(1) time
    for each
  • C vertices vi, degK(vi) d(K),
    d(K)1

Seems to be not large for children
18
Problem Instances
  • Pentium M 1.1GHz, 256MB memory, Cygwin, C, gcc
  • Test instances are
  • - random graphs
  • (make edge with probability p),
  • - locally dense random graphs
  • (vertex i is adjacent to vertices from i-k to
    ik with probability 1/2
  • - graphs generated from real-world data
  • (co-author graph)

19
Random Graphs
  • p 0.1, vertices 200 to 2000, threshold 0.8,
    0.9

Computation time linearly increase as ave. degree
20
Locally Dense Random Graph
  • make edge from a vertex to its neighbors with
    p0.5
  • vertices 100 to 25600, threshold 0.8, 0.9

10 times slower than clique enumeration
computation time per one clique does not change
21
Randomly Generated Scale Free Graph
  • Add vertices of degree 10 iteratively, to a
    clique of 10 vertices
  • Vertices to be connected are chosen according
    to their current degrees

Computation time increases quite slowly
22
Real-world Instance
  • co-author graph of academic paper database
  • vertices 30,000, edges 125,000, scale
    free

Computation time for one pseudo clique does not
depend on threshold
23
Bottom-wideness
  • Why good in practice?
  • The algorithm generates several recursive
    calls
  • ? recursion tree expands exponentially by
    going down
  • ? computation time is dominated by the lowest
    levels
  • On lower levels, small degree vertices are
    added ? fast!

Long time
Short time
When pseudo cliques are sufficiently large (over
5?) min. degree is small on average ? computation
time is short on average at lower levels
24
Conclusion
  • First polynomial delay polynomial space
    algorithm for enumerating pseudo cliques
  • Hardness result for straight forward
    branch-and-bound
  • Evaluate practical efficiency by computational
    experiments
  • Future works
  • Explain the gap between theory and practice
  • Introduce maximality and their enumeration
  • Apply the technique to other structures
    (pseudo bla bla bla)
  • (path, tree, bipartite clique, matching )
  • What is crucial for the compuation
    (enumeration) of structures with ambiguity
Write a Comment
User Comments (0)
About PowerShow.com