Lei Shi - PowerPoint PPT Presentation

About This Presentation
Title:

Lei Shi

Description:

Title: PowerPoint Presentation Last modified by: Lei Shi Created Date: 1/1/1601 12:00:00 AM Document presentation format: On-screen Show Other titles – PowerPoint PPT presentation

Number of Views:91
Avg rating:3.0/5.0
Slides: 43
Provided by: cseBuffal9
Learn more at: https://cse.buffalo.edu
Category:
Tags: approach | discovery | lei | shi

less

Transcript and Presenter's Notes

Title: Lei Shi


1
Seminar 2009
Frequent Subgraph/ Substructure Mining
  • Lei Shi
  • Department of Computer Science and Engineering
  • State University of New York at Buffalo

2
Outline
  • Introduction
  • Apriori-based Subgrah Mining
  • Pattern Growth Subgraph Mining
  • Summary

3
Graphs are everywhere
4
Graph Mining Problems
  • Graph Pattern Mining
  • Frequent subgraph pattern mining
  • Pattern summarization
  • Optimal graph patterns
  • Graph patterns with constraints
  • Approximate graph patterns .
  • Graph Classification
  • Graph clustering
  • Important node identification
  • Bridge and hub identification
  • Other Important Topics
  • Graph compression
  • Graph model
  • Social network analysis.

5
Subgraph pattern Mining
  • Frequent subgraph
  • A (sub)graph is frequent if its support
    (occurrence frequency) in a given dataset is no
    less than a minimum support threshold
  • Application of subgraph pattern mining
  • Mining biochemical structures
  • Program control flow analysis
  • Mining XML structures or Web communities
  • Building blocks for graph classifiction,
    clustering,compression, comparison and
    correlation analysis.

6
Frequent Subgraph Example
(1) (2) (3)
7
Key Challenges in Subgraph Mining
  • Graph isomorphism
  • to detect if two graphs are identical in
    structure
  • Graph representation (Canonical Labeling)
  • A canonical label is a unique code of a given
    graph.
  • Canonical label should be the same no matter how
    graphs are represented, as long as graphs have
    the same topological structure and the same
    labeling of edges and vertices.
  • Subgraph candidate generation
  • generate candidate frequent subgraphs from
    datasets

8
Subgraph Mining Approaches
  • Apriori-based
  • AGM/AcGM Inokuchi, et al. (PKDD00)
  • FSG Kuramochi and Karypis (ICDM01)
  • M. Kuramochi and G. Karypis. Frequent subgraph
    discovery. In ICDM01, pages 313-320, Nov. 2001
  • PATH Vanetik and Gudes (ICDM02, ICDM04)
  • FFSM Huan, et al. (ICDM03) and SPIN Huan et
    al. (KDD04)
  • FTOSM Horvath et al. (KDD06)
  • Pattern growth based
  • Subdue Holder et al. (KDD94)
  • MoFa Borgelt and Berthold (ICDM02)
  • gSpan Yan and Han (ICDM02)
  • Yan, X. and Han, J. 2002. gSpan Graph-Based
    Substructure Pattern Mining. In Proceedings of
    the 2002 IEEE international Conference on Data
    Mining (Icdm02) (December 09-12, 2002). ICDM.
    IEEE Computer Society, Washington, DC, 721
  • Gaston Nijssen and Kok (KDD04)
  • CMTreeMiner Chi et al. (TKDE05)
  • LEAP Yan et al. (SIGMOD08)

9
Outline
  • Introduction and Background
  • Apriori-based Subgrah Mining
  • Pattern Growth Subgraph Mining
  • Summary

10
Apriori-based Approach
  • FSG Frequent subgraph discovery. In ICDM01,
    Nov. 2001 M.Kuramochi and G. Karypis.
  • Flattened Representation as Canonical Labeling
  • Apriori-based method to generate subgraph
    candidate

11
Graph Representation in FSG
  • Flattened Representation

12
Graph Representation in FSG
  • Flatterned Representation

Lexicographic order or dictionary order
13
Apriori-based method
  • Apriori Property
  • If a graph is frequent, all of its subgraphs are
    frequent.
  • Candidate Generation
  • Create a set of candidate size k1
  • -from given two frequent k-subgraphs
  • -containing the same (k-1)-subgraph
  • -Result in several candidates size k1

14
Apriori-based method
  • Graph candidate generated Example

15
Apriori-based method
  • FlowChart

16
Apriori-based method
  • Experiment Result
  • -Chemical Compound Dataset, which contains 340
    compounds,24 different atoms (vertices)

17
Outline
  • Introduction
  • Apriori-based Subgrah Mining
  • Pattern Growth Subgraph Mining
  • Summary

18
Motivation of gSpan
  • Weakness of Apriori-based approach
  • The generation of size (k1) subgraph candidates
    from size k frequent subgraph too complicated and
    complex.
  • Pruning false positive subgraph isomorphism is
    an NP complete problem which is costly.
  • gSpan Graph-Based Substructure Pattern Mining
  • Change the way to represent a graph (DFS Depth
    First Search)
  • Using pattern growth to generate new subgraph
    candidate.

19
gSpan Graph-Based Substructure Pattern Mining
  • DFS (Depth First Search) Code
  • First Step DFS the graph and use edges on the
    path to represent the graph.
  • Second Step DFS Lexicographic Order
  • Pattern Growth subgraph generation

20
DFS code
An edge is presented by 5 tuples.
21
DFS code
  • Second Step DFS Lexicographic Order

22
Pattern Growth Approach
  • Pattern Growth (free extension)

23
Pattern Growth Approach
  • Duplicate Graphs

24
Pattern Growth Approach
  • Free extension

25
Pattern Growth Approach
  • Right most extension

26
Pattern Growth Approach
  • Exmaples (cont.)

27
gSpan
28
gSpan
29
Pattern Growth Approach
  • Experimental result using Chemical data
  • 340 molecules
  • 66 atom types and
  • 4 bond types as labels
  • On average only 27 vertices with 28 edges

30
Summary
  • Graph representation
  • Flattern representation vs. DFS code
  • Generation of Candidate Patterns
  • apriori vs. pattern growth

31

32
Pattern-Growth Approach
33
Frequent Graph Pattern
  • Given a graph dataset D, find subgraph g, s.t.
  • Where is the percentage of graphs
    in D that contain g.
  • Problem 1 Exponential Pattern Set
  • Problem 2 Threshold Setting

34
Difference between frequent itemset and frequent
subgraph discovery
35
Frequent itemset discovery
36
subgraph Mining Algorithms
  • Apriori-based approach
  • AGM/AcGM Inokuchi, et al. (PKDD00)
  • FSG Kuramochi and Karypis (ICDM01)
  • PATH Vanetik and Gudes (ICDM02, ICDM04)
  • FFSM Huan, et al. (ICDM03) and SPIN Huan et
    al. (KDD04)
  • FTOSM Horvath et al. (KDD06)
  • Pattern growth approach
  • Subdue Holder et al. (KDD94)
  • MoFa Borgelt and Berthold (ICDM02)
  • gSpan Yan and Han (ICDM02)
  • Gaston Nijssen and Kok (KDD04)
  • CMTreeMiner Chi et al. (TKDE05)
  • LEAP Yan et al. (SIGMOD08)

37
Framework of subraph Mining Algorithms
  • Search Order
  • breadth vs. depth
  • complete vs. incomplete
  • Generation of Candidate Patterns
  • apriori vs. pattern growth
  • Discovery Order of Patterns
  • DFS order
  • path tree graph
  • Elimination of Duplicate Subgraphs
  • passive vs. active
  • Support Calculation
  • embedding store or not

38
Frequent Subgraph
Examples
39
Example (cont.)
40
Subgraph Mining Approaches
  • Apriori-based approach
  • AGM/AcGM Inokuchi, et al. (PKDD00)
  • FSG Kuramochi and Karypis (ICDM01)
  • M. Kuramochi and G. Karypis. Frequent subgraph
    discovery. In ICDM01, pages 313-320, Nov. 2001
  • PATH Vanetik and Gudes (ICDM02, ICDM04)
  • FFSM Huan, et al. (ICDM03) and SPIN Huan et
    al. (KDD04)
  • FTOSM Horvath et al. (KDD06)
  • Pattern growth approach
  • Subdue Holder et al. (KDD94)
  • MoFa Borgelt and Berthold (ICDM02)
  • gSpan Yan and Han (ICDM02)
  • Yan, X. and Han, J. 2002. gSpan Graph-Based
    Substructure Pattern Mining. In Proceedings of
    the 2002 IEEE international Conference on Data
    Mining (Icdm02) (December 09-12, 2002). ICDM.
    IEEE Computer Society, Washington, DC, 721
  • Gaston Nijssen and Kok (KDD04)
  • CMTreeMiner Chi et al. (TKDE05)
  • LEAP Yan et al. (SIGMOD08)

41
Outline
  • Introduction and Background
  • Apriori-based Subgrah Mining
  • Pattern Growth Subgraph Mining
  • Summary
  • DFS code
  • Yan, X. and Han, J. 2002. gSpan Graph-Based
    Substructure Pattern Mining. In Proceedings of
    the 2002 IEEE international Conference on Data
    Mining (Icdm02) (December 09-12, 2002). ICDM.
    IEEE Computer Society, Washington, DC, 721

42
Pattern Growth Approach
Write a Comment
User Comments (0)
About PowerShow.com