GRAPHBASED HIERARCHICAL CONCEPTUAL CLUSTERING - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

GRAPHBASED HIERARCHICAL CONCEPTUAL CLUSTERING

Description:

Previous work defined classification trees. Inadequate in ... Better hierarchical description: classification lattice. A cluster can have ... classification ... – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 33
Provided by: istvan4
Category:

less

Transcript and Presenter's Notes

Title: GRAPHBASED HIERARCHICAL CONCEPTUAL CLUSTERING


1
GRAPH-BASED HIERARCHICAL CONCEPTUAL CLUSTERING
  • by
  • Istvan Jonyer,
  • Lawrence B. Holder and
  • Diane J. Cook
  • The University of Texas at Arlington

2
Outline
  • What is hierarchical conceptual clustering?
  • Overview of Subdue
  • Conceptual clustering in Subdue
  • Evaluation of hierarchical clusterings
  • Experiments and results
  • Conclusions

3
What is clustering?
4
What is hierarchical conceptual clustering?
  • Unsupervised concept learning
  • Generating hierarchies to explain data
  • Applications
  • Hypothesis generation and testing
  • Prediction based on groups
  • Finding taxonomies

5
Example hierarchical conceptual clustering
6
The Problem
  • Hierarchical conceptual clustering in
    discrete-valued structural databases
  • Existing systems
  • Continuous-valued
  • Discrete but unstructured
  • We can do better! (Field under explored)

7
Related Work
  • Cobweb
  • Labyrinth
  • AutoClass
  • Snob
  • In Euclidian space Chameleon, Cure
  • Unsupervised learning algorithms

8
The Solution
  • Take Subdue and extend it!

9
Overview of Subdue
  • Data mining in graph representations of
    structural databases

10
Overview of Subdue
  • Iteratively searching for best substructure by
    MDL heuristic

11
Overview of Subdue
  • Compress using best substructure

12
Overview of Subdue
  • Fuzzy match
  • Inexact matching of subgraphs
  • Applications
  • Defining fuzzy concepts
  • Evaluation of clusterings

13
Conceptual Clustering with Subdue
  • Use Subdue to identify clusters
  • The best subgraph in an iteration defines a
    cluster
  • When to stop within an iteration?
  • Use limit option
  • Use size option
  • Use first minimum heuristic (new)

14
The First Minimum Heuristic
  • Use subgraph at first local minimum
  • Detect it using prune2 option

15
The First Minimum Heuristic
  • Not a greedy heuristic!
  • Although first local minimum is usually the
    global minimum
  • First local minimum is caused by a smaller, more
    frequently occurring subgraph
  • Subsequent minima are caused by bigger, less
    frequently occurring subgraphs
  • gt First subgraph is more general

16
The First Minimum Heuristic
  • A multi-minimum search space

17
Lattice vs. Tree
  • Previous work defined classification trees
  • Inadequate in structured domains
  • Better hierarchical description classification
    lattice
  • A cluster can have more than one parent
  • A parent can be at any level (not only one level
    above)

18
Hierarchical Clustering in Subdue
  • Subdue can compress by a subgraph after each
    iteration
  • Subsequent clusters may be defined in terms of
    previously defined clusters
  • This results in a hierarchy

19
Hierarchical Conceptual Clustering of an
Artificial Domain
20
Hierarchical Conceptual Clustering of an
Artificial Domain
21
Evaluation of Clusterings
  • Traditional evaluation
  • Not applicable to hierarchical domains
  • No known evaluation for hierarchical clusterings
  • Most hierarchical evaluations are anecdotal

22
New Evaluation Heuristic for Hierarchical
Clusterings
  • Properties of a good clustering
  • Small number of clusters
  • Large coverage ? good generality
  • Big cluster descriptions
  • More features ? more inferential power
  • Minimal or no overlap between clusters
  • More distinct clusters ? better defined concepts

23
New Evaluation Heuristic for Hierarchical
Clusterings
  • Big clusters bigger distance between disjoint
    clusters
  • Overlap less overlap ? bigger distance
  • Few clusters averaging comparisons

24
Experiments and Results
  • Validation in an artificial domain
  • Validation in unstructured domains
  • Comparison to existing systems
  • Real world applications

25
The Animal Domain
26
Hierarchical Clustering of the Animal Domain
27
Hierarchical Clustering of the Animal Domain by
Cobweb
28
Comparison of Subdue and Cobweb
  • Quality of Subdues lattice (tree) 2.60
  • Quality of Cobwebs tree 1.74
  • Therefore Subdue is better
  • Reasons for a higher score
  • Better generalization resulting in less clusters
  • Eliminating overlap between (reptile) and
    (amphibian/fish)

29
Chemical Application Clustering of a DNA sequence
30
Chemical Application Clustering of a DNA sequence
  • Coverage
  • 61
  • 68
  • 71

31
Conclusions
  • Goal of hierarchical conceptual clustering of
    structured databases was achieved
  • Synthesized classification lattice
  • Developed new evaluation heuristic for
    hierarchical clusterings
  • Good performance in comparison to other systems,
    even in unstructured domains

32
Future Work
  • More experiments on real-world domains
  • Comparison to other systems
  • Incorporation of evaluation tool into Subdue
Write a Comment
User Comments (0)
About PowerShow.com