ACDC: An Algorithm for Comprehension Driven Clustering - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

ACDC: An Algorithm for Comprehension Driven Clustering

Description:

ACDC does not attempt to satisfy certain criteria. ACDC decomposes a software ... ACDC discovers clusters that follow commonly occurring patterns observed in ... – PowerPoint PPT presentation

Number of Views:203
Avg rating:5.0/5.0
Slides: 16
Provided by: csUal
Category:

less

Transcript and Presenter's Notes

Title: ACDC: An Algorithm for Comprehension Driven Clustering


1
ACDC An Algorithm for Comprehension Driven
Clustering
2
Introduction
  • Software clustering techniques are used to
    decompose large software systems into subsystems
  • Common clustering criteria include low coupling,
    high cohesion etc.
  • ACDC does not attempt to satisfy certain
    criteria.
  • ACDC decomposes a software system for the purpose
    of system comprehension
  • ACDC discovers clusters that follow commonly
    occurring patterns observed in decompositions of
    large software systems

3
Background
  • Two common existing software clustering
    approaches
  • knowledge-based approach
  • based on domain knowledge
  • grouping modules of similar or complementary
    functionality
  • structure-based approach
  • looks at interactions between entities
    (procedures, variables)
  • typical interactions are method calls or
    variable accesses
  • clustering becomes partitioning the vertex set of
    a graph
  • graph nodes are entities and edges are
    interactions

4
Clustering for Comprehension
  • Comprehension driven clustering algorithm
    features
  • effective cluster naming
  • meaningful cluster names aid in program
    understanding
  • bounded cluster cardinality
  • creating clusters with a limited number of
    objects
  • pattern-driven approach
  • using commonly occurring patterns to create
    clusters

5
Subsystem patterns
  • Commonly occurring subsystem patterns
  • source file pattern
  • set of procedures and variables contained in the
    same source file can be grouped together to form
    a cluster
  • directory structure pattern
  • directories may correspond to subsystems in some
    cases
  • body-header pattern
  • clustering files that split procedures into two
    different files

6
Subsystem patterns
  • leaf collection pattern
  • independent set of files serving similar purpose
  • usually represent leafs in the system graph
  • support library pattern
  • set of procedures accessed by the majority of
    subsystems
  • central dispatcher pattern
  • dual of the support library pattern
  • represents nodes with a large out-degree
  • common example is the procedure called driver

7
Subsystem patterns
  • subgraph dominator pattern
  • particular type of subgraph of a system graph
    G(V,E)
  • this subgraph must contain a denominator node n0
    and a set of nodes n
  • in order to reach nodes n from any of the
    remaining nodes in the graph one must go through
    the denominator node

8
The ACDC Algorithm
  • The ACDC algorithm performs clustering in two
    stages
  • Stage1 Skeleton construction
  • identifying subsystems using pattern-driven
    approach
  • Stage 2 Orphan adoption
  • maintaining systems decomposition as the system
    evolves

9
Stage1 Skeleton Construction
  • ACDC performs following steps in order of
    precedence
  • constructing source file clusters
  • ACDC clusters resources belonging to the same
    file and uses the name of the file as the cluster
    name
  • Body-header conglomeration
  • the interface and the implementation of a
    software module from a cluster
  • for example, in C, files foo.h and foo.c are
    grouped into cluster foo.ss

10
Stage1 Skeleton Construction
  • leaf collection and support library
    identification
  • ACDC identifies collections of files according to
    the leaf collection and support library pattern
  • no clusters are formed at this point
  • ordered and limited subgraph domination
  • main step of the algorithm
  • considers nodes in ascending out-degree when
    trying to identify nodes following the subgraph
    denominator pattern
  • discovered subsystem name is the name of the
    denominator node plus the suffix .ss
  • obtained subsystems are organized in a tree-like
    containment hierarchy
  • number of nodes in a subgraph is bounded

11
Stage1 Skeleton Construction
  • Creation of support.ss subsystem
  • final step of skeleton construction process
  • left over files previously identified as
    candidates for a support library pattern are
    assigned to this subsystem

12
Stage2 Orphan Adoption
  • incremental clustering technique
  • assigns all files to some subsystem

13
ACDC Algorithm properties
  • The algorithms satisfies three essential
    features
  • effective cluster naming
  • meaningful names are assigned in each step of the
    algorithm
  • bounded cluster cardinality
  • number of nodes in each cluster is limited
  • pattern-driven approach
  • several commonly occurring patterns are used

14
Algorithm validation
  • ACDC has been validated on two large systems
  • TOBEY
  • took only 54 seconds to cluster
  • obtained skeleton size was 64.3 the original
    size
  • Linux
  • took only 84 seconds to cluster
  • obtained skeleton size was 51.1 the original
    size

15
Conclusion
  • ACDC aids in program comprehension
  • uses commonly occurring decomposition patterns
  • provides encouraging experimental results
Write a Comment
User Comments (0)
About PowerShow.com