A%20Multi-Level%20Parallel%20Implementation%20of%20a%20Program%20for%20Finding%20Frequent%20Patterns%20in%20a%20Large%20Sparse%20Graph - PowerPoint PPT Presentation

View by Category
About This Presentation
Title:

A%20Multi-Level%20Parallel%20Implementation%20of%20a%20Program%20for%20Finding%20Frequent%20Patterns%20in%20a%20Large%20Sparse%20Graph

Description:

Finding frequent patterns in a large sparse graph. ... Electric Fence was very useful in finding coding errors. Algorithmic Parallelism ... – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 38
Provided by: michi69
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: A%20Multi-Level%20Parallel%20Implementation%20of%20a%20Program%20for%20Finding%20Frequent%20Patterns%20in%20a%20Large%20Sparse%20Graph


1
A Multi-Level Parallel Implementation of a
Program for Finding Frequent Patterns in a Large
Sparse Graph
  • Steve Reinhardt, Interactive Supercomputing
    sreinhardt_at_interactivesupercomputing.com
  • George Karypis, Dept. of Computer Science,
    University of Minnesota

2
Outline
  • Problem definition
  • Prior work
  • Problem and Approach
  • Results
  • Issues and Conclusions

3
Graph Datasets
  • Flexible and powerful representation
  • Evidence extraction and link discovery (EELD)
  • Social Networks/Web graphs
  • Chemical compounds
  • Protein structures
  • Biological Pathways
  • Object recognition and retrieval
  • Multi-relational datasets

4
Finding Patterns in Graphs Many Dimensions
  • MIS calculation for frequency
  • exact
  • approximate
  • upper bound
  • Algorithm
  • vertical (depth-first)
  • horizontal (breadth-first)
  • Structure of the graph dataset
  • many small graphs
  • graph transaction setting
  • one large graph
  • single-graph setting
  • Type of patterns
  • connected subgraphs
  • induced subgraphs
  • Nature of the algorithm
  • Finds all patterns that satisfy the minimum
    support requirement
  • Complete
  • Finds some of the patterns
  • Incomplete
  • Nature of the patterns occurrence
  • The pattern occurs exactly in the input graph
  • Exact algorithms
  • There is a sufficiently similar embedding of the
    pattern in the graph
  • Inexact algorithms

5
Single Graph Setting
  • Find all frequent subgraphs from a single sparse
    graph.
  • Choice of frequency definition

6
vSIGRAM Vertical Solution
  • Candidate generation by extension
  • Add one more edge to a current embedding.
  • Solve MIS on embeddings in the same equivalence
    class.
  • No downward-closure-based pruning
  • Two important components
  • Frequency-based pruning of extensions
  • Treefication based on canonical labeling

7
vSIGRAM Connection Table
  • Frequency-based pruning.
  • Trying every possible extension is expensive and
    inefficient.
  • A particular extension might have been tested
    before.
  • Categorize extensions into equivalent classes (in
    terms of isomorphism), and record if each class
    is frequent or not.
  • If a class becomes infrequent, never try it in
    later exploration.

8
Parallelization
  • Two clear sources of parallelism in the algorithm
  • Amount of parallelism from each source not known
    in advance
  • The code is typical C code
  • structs, pointers, frequent mallocs/frees of
    small areas, etc.
  • nothing like the Fortran-like (dense linear
    algebra) examples shown for many parallel
    programming methods
  • Parallel structures need to accommodate dynamic
    parallelism
  • Dynamic specification of parallel work
  • Dynamic allocation of processors to work
  • Chose OpenMP taskq/task constructs
  • Proposed extensions to OpenMP standard
  • Support parallel work being defined in multiple
    places in a program, but be placed on a single
    conceptual queue and executed accordingly
  • 20 lines of code changes in 15,000 line program
  • Electric Fence was very useful in finding coding
    errors

9
Algorithmic Parallelism
  • vSiGraM (G, MIS_type, f)
  • 1. F ? ?
  • 2. F1 ? all frequent size-1 subgraphs in G
  • 3. for each F1 in F1 do
  • 4. M(F1) ? all embeddings of F1
  • 5. for each F1 in F1 do // high-level
    parallelism
  • 6. F ? F ? vSiGraM-Extend(F1, G, f)
  • return F
  • vSiGraM-Extend(Fk, G , f)
  • 1. F ? ?
  • 2. for each embedding m in M(Fk) do // low-level
    parallelism
  • 3. Ck1 ? C k1 ? all (k1)-subgraphs of G
    containing m
  • 4. for each Ck1 in Ck1 do
  • 5. if Fk is not the generating parent of Ck1
    then
  • 6. continue
  • 7. compute Ck1.freq from M(Ck1)
  • 8. if Ck1.freq lt f then
  • 9. continue

10
Simple Taskq/Task Example
  • main()
  • int val
  • pragma intel omp taskq
  • val fib(12345)
  • fib(int n)
  • int partret2
  • if (ngt2)
  • pragma intel omp task
  • for(in-2 iltn i)
  • partretn-2-i fib(i)
  • return (partret0 partret1)
  • else
  • return 1

11
High-Level Parallelism with taskq/task
  • // At the bottom of expand_subgraph, after all
    child
  • // subgraphs have been identified, start them
    all.
  • pragma intel omp taskq
  • for (ii0 iiltsg_set_size(child) ii)
  • pragma intel omp task captureprivate(ii)
  • SubGraph csg
    sg_set_at(child,ii)
  • expand_subgraph(csg, csg-gtct, lg,
    ls, o)
  • // end-task

12
Low-Level Parallelism with taskq/task
  • pragma omp parallel shared(nt, priv_es)
  • pragma omp master
  • nt omp_get_num_threads()
    //threads in par
  • priv_es (ExtensionSet
    )kmp_calloc(nt, sizeof(ExtensionSet ))
  • pragma omp barrier
  • pragma intel omp taskq
  • for (i 0 i lt sg_vmap_size(sg)
    i)
  • pragma intel omp task captureprivate(i)
  • int th
    omp_get_thread_num()
  • if (priv_esth
    NULL)

  • priv_esth exset_init(128)
  • expand_map(sg,
    ct, ams, i, priv_esth, lg)

Implementation due to Grant Haab and colleagues
from Intel OpenMP library group
13
Experimental Results
  • SGI Altix 32 Itanium2 sockets (64 cores),
    1.6GHz
  • 64 GBytes (though not memory limited)
  • Linux
  • No special dplace/cpuset configuration
  • Minimum frequencies chosen to illuminate scaling
    behavior, not provide maximum performance

14
Dataset 1 - Chemical
Graph Frequency Type of Parallelism Number of processors Number of processors Number of processors Number of processors Number of processors Number of processors Number of processors
Graph Frequency Type of Parallelism 1 2 4 8 16 30 60
Time in seconds (speed-up) Time in seconds (speed-up) Time in seconds (speed-up) Time in seconds (speed-up) Time in seconds (speed-up) Time in seconds (speed-up) Time in seconds (speed-up)
dtp 500 High 31.94 17.01 (2.03) 14.76 (2.40) 13.89 (2.58) 14.00 (2.56) 13.97 (2.57)
dtp 500 Low 32.51 (0.98) 31.52 (1.01) 37.95 (0.83) 42.18 (0.74) 49.56 (0.63)
dtp 500 Both 17.52 (1.96) 14.88 (2.37) 15.80 (2.21) 29.85 (1.08) 44.37 (0.70)
dtp 100 High 93.96 48.86 (1.97) 27.12 (3.71) 16.82 (6.39) 15.05 (7.29) 14.52 (7.61)
dtp 100 Low 94.36 (1.00) 92.18 (1.02) 112.17 (0.83) 133.40 (0.70) 116.31 (0.80)
dtp 100 Both 48.38 (1.99) 27.27 (3.69) 61.52 (1.55) 315.94 (0.29) 281.83 (0.33)
dtp 50 High 282.15 142.02 (2.00) 62.73 (4.64) 34.44 (8.76) 19.40 (16.56) 15.06 (22.27) 15.80 (21.03)
dtp 50 Low 283.19 (1.00) 293.6 (0.96) 400.55 (0.70) 262.82 (1.07) 197.27 (1.44)
dtp 50 Both 140.47 (2.03) 81.18 (3.55) 242.09 (1.17) 513.39 (0.55) 581.04 (0.48)
15
Dataset 2 aviation
Graph Frequency Type of Parallelism Number of processors Number of processors Number of processors Number of processors Number of processors Number of processors Number of processors
Graph Frequency Type of Parallelism 1 2 4 8 16 30 60
Time in seconds (speed-up) Time in seconds (speed-up) Time in seconds (speed-up) Time in seconds (speed-up) Time in seconds (speed-up) Time in seconds (speed-up) Time in seconds (speed-up)
air1 1750 High 358.27 54.92 (7.19) 21.74 (22.30) 18.85 (27.29)
air1 1750 Low 171.04 (2.13)
air1 1500 High 771.82 112.30 (7.20) 39.40 (22.89) 33.99 (27.30)
air1 1250 High 1503.49 209.08 (7.37) 67.54 (24.31) 56.56 (29.58)
air1 1000 High 3909.95 490.38 (8.06) 155.33 (26.13) 158.14 (25.65)
16
Performance of High-level Parallelism
  • When sufficient quantity of work (i.e., frequency
    threshold is low enough)
  • Good speed-ups to 16P
  • Reasonable speed-ups to 30P
  • Little or no benefit above 30P
  • No insight into performance plateau

17
Poor Performance of Low-level Parallelism
  • Several possible effects ruled out
  • Granularity of data allocation
  • Barrier before master-only reduction
  • Source highly variable times for
    register_extension
  • 100X slower in parallel than serial,
  • but different instances from execution to
    execution
  • Apparently due to highly variable run-times for
    malloc
  • Not understood

18
Issues and Conclusions
  • OpenMP taskq/task were straightforward to use in
    this program and implemented the desired model
  • Performance was good to a medium range of
    processor counts (best 26X on 30P)
  • Difficult to gain insight into lack of
    performance
  • High-level parallelism 30P and above
  • Low-level parallelism

19
Backup
20
Datasets
Dataset Connected Components Vertices Edges Vertex Labels Edge Labels
Aviation 2,703 101,185 98,482 6,173 51
Credit 700 14,700 14,000 59 20
Citation 16,999 29,014 42,064 50 12
VLSI 2,633 12,752 11,542 23 1
21
Datasets
Dataset Connected Components Vertices Edges Vertex Labels Edge Labels
Aviation 2,703 101,185 98,482 6,173 51
Citation 16,999 29,014 42,064 50 12
VLSI 2,633 12,752 11,542 23 1
22
Aviation Dataset
  • Generally, vSIGRAM is 2-5 times faster than
    hSIGRAM (with exact and upper bound MIS)
  • Largest pattern contained 13 edges.

23
Credit Dataset
  • Generally, vSIGRAM is 2-5 times faster than
    hSIGRAM (with exact and upper bound MIS).
  • Largest pattern contained 13 edges.

24
Citation Dataset
  • But, hSIGRAM can be more efficient especially
    with upper bound MIS (ub).
  • Largest pattern contained 16 edges.

25
Contact Map Dataset
26
DTP Dataset
27
VLSI Dataset
  • Exact MIS never finished.
  • Longest pattern contained 5 edges (constraint).

28
SUBDUE
  • D. J. Cook and L. B. Holder. J. Artificial
    Intelligence Research, vol. 1, 1994.
  • Heuristic pattern discovery system based on MDL,
    written in C.
  • Version 5.0.6
  • With the default setting, finds 3 most
    interesting patterns. No overlaps are allowed.

29
Comparison with SUBDUE
Dataset SUBDUE SUBDUE SUBDUE vSIGRAM (approximate MIS) vSIGRAM (approximate MIS) vSIGRAM (approximate MIS) vSIGRAM (approximate MIS)
Dataset Freq. Size Runtime sec Freq. Largest Size Patterns Runtime sec
Credit 341 395 387 6 5 5 517 200 9 11,696 4
Credit 341 395 387 6 5 5 517 20 13 613,884 461
DTP 4,957 4,807 1,950 2 2 6 1,525 500 7 190 20
DTP 4,957 4,807 1,950 2 2 6 1,525 10 21 112,535 311
VLSI 773 773 244 1 1 1 16 200 5 137 3
VLSI 773 773 244 1 1 1 16 25 5 1,452 18
  • Similar results with SEuS

30
Comparison With SEuS
  • S. Ghazizadeh and S. Chawathe. DS2002.
  • Pattern discovery algorithm using the summary
    data structure.
  • Allows overlaps when counting frequency.
  • Tends to produce more number of patterns, because
    the frequency of each patterns becomes generally
    higher.
  • Written in JAVA
  • From Credit Dataset, SEuS discovered 48 patterns
    for 50 seconds (the support threshold unknown).
  • vSIGRAM (apprx) spent 20 seconds to find 11,696
    patterns.

31
Summary
  • With approximate and exact MIS, vSIGRAM is 2-5
    times faster than hSIGRAM.
  • With upper bound MIS, however, hSIGRAM can prune
    a larger number of infrequent patterns.
  • The downward closure property plays the role.
  • For some datasets, using exact MIS for frequency
    counting is just intractable.
  • Compared to SUBDUE, SIGRAM finds more and longer
    patterns in shorter amount of runtime.

32
Thank You!
  • Slightly longer version of this paper is also
    available as a technical report.
  • SIGRAM executables will be available for download
    soon from http//www.cs.umn.edu/karypis/pafi/

33
Complete Frequent Subgraph MiningExisting Work
So Far
  • Input A set of graphs (transactions) support
    threshold
  • Goal Find all frequently occurring subgraphs in
    the input dataset.
  • AGM (Inokuchi et al., 2000), vertex-based, may
    not be connected.
  • FSG (Kuramochi et. al., 2001), edge-based, only
    connected subgraphs
  • AcGM (Inokuchi et al., 2002), gSpan (Yan Han,
    2002), FFSM (Huan et al., 2003), etc. follow
    FSGs problem definition.
  • Frequency of each subgraph ?
  • The number of supporting transactions.
  • Does not matter how many embeddings are in each
    transaction.

34
Frequency Under Transaction Setting
Transaction 1
Transaction 2
Transaction 3
  • Convenient assumption
  • No need to care multiple embeddings per
    transaction

35
Wait!
  • What happens if there is no notion of
    transactions in input datasets?
  • Many real graph datasets are not in the
    transaction format.
  • Network-related, VLSI design, etc.
  • Graphs created from data with temporal nature
    (e.g., link discovery, intrusion detection)

36
What is the reasonable frequency definition?
  • Two reasonable choices
  • The frequency is determined by the total number
    of embeddings.
  • Not downward closed.
  • Too many patterns.
  • Artificially high frequency of certain patterns.
  • The frequency is determined by the number of
    edge-disjoint embeddings (Vanetik et al, ICDM
    2002).
  • Downward closed.
  • Since each occurrence utilizes different sets of
    edges, occurrence frequencies are bounded.
  • Solved by finding the maximum independent set
    (MIS) of the embedding overlap graph.

37
Embedding Overlap and MIS
E1
E2
  • Edge-disjoint embeddings
  • E1, E2, E3
  • E1, E2, E4
  • Create an overlap graph and solve MIS
  • Vertex ? Embedding
  • Edge ? Overlap

E3
E4
38
OK. Definition is Fine, but
  • MIS-based frequency seems reasonable.
  • Next question How to develop mining algorithms
    for the single graph setting.

39
How to Handle Single Graph Setting?
  • Issue 1 Frequency counting
  • Exact MIS is often intractable.
  • Issue 2 Choice of search scheme
  • Horizontal (breadth-first)
  • Vertical (depth-first)

40
Issue 1 MIS-Based Frequency
  • We considered approximate (greedy) and upper
    bound MIS too.
  • Approximate MIS may underestimate the frequency.
  • Upper bound MIS may overestimate the frequency.
  • MIS is NP-complete and not be approximated.
  • Practically simple greedy scheme works pretty
    well.
  • Halldórsson and Radhakrishnan. Greed is good,
    1997.

41
Approximate and Upper Bound MIS
  • Greedy MIS
  • Successively remove lowest degree vertices

42
Issue 2 Search Scheme
  • Frequent subgraph mining ?
  • Exploration in the lattice of subgraphs
  • Horizontal
  • Level-wise
  • Candidate generation and pruning
  • Joining
  • Downward closure property
  • Frequency counting
  • Vertical
  • Traverse the lattice as if it were a tree.

43
Stop to Summarize for the moment
  • Type of MIS for frequency counting
  • Approximate (greedy)
  • Exact
  • Upper bound
  • Search scheme
  • Horizontal
  • Vertical

44
hSIGRAM Horizontal Method
  • Natural extension of FSG to the single graph
    setting.
  • Candidate generation and pruning.
  • Downward closure property ?
  • Tighter pruning than vertical method
  • Two-phase frequency counting
  • All embeddings by subgraph isomorphism
  • Anchor edge list intersection, instead of TID
    list intersection.
  • Localize subgraph isomorphism
  • MIS for the embeddings
  • Approximate and upper bound MIS give subset and
    superset respectively.

45
TID List Recap
TID( ) T1, T2, T3
TID( ) ? TID( ) n TID( ) n
TID( ) T1, T3
46
Anchor Edges
  • Each subgraph must appear close enough together.
  • Keep one edge for each.
  • Complete embeddings require too much memory.
  • Localize subgraph isomorphism.

47
Treefication
Treefied Lattice
Lattice of Subgraphs
size k 1
size k
size k - 1
  • a node in the search space (i.e., a
    subgraph)
  • Based on subgraph/supergraph relation
  • Avoid visiting the same node in the lattice more
    than once.
About PowerShow.com