Supporting Program Indexing and Querying in Source Code Digital Libraries - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Supporting Program Indexing and Querying in Source Code Digital Libraries

Description:

Software reuse creating software systems from existing software ... specifications Mili et.al(1994), Schumann and Fischer(1997), Penix(1999) ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 16
Provided by: scm1
Category:

less

Transcript and Presenter's Notes

Title: Supporting Program Indexing and Querying in Source Code Digital Libraries


1
Supporting Program Indexing and Querying in
Source Code Digital Libraries
  • Yuhanis Yusof and Omer F. Rana
  • Cardiff University, Wales, UK

2
Presentation Outline
  • Background
  • Software reuse
  • Software repositories
  • Literature review
  • Agent-based architecture in s/code DLs
  • Agent Interaction
  • Program retrieval
  • Results
  • Conclusions Future Work

3
Background
  • Software reuse creating software systems from
    existing software
  • Software artifacts such as source code, design
    specifications, software components
  • Software repositories
  • contain a wealth of valuable information for
    empirical studies in software engineering
  • source control systems store changes to the
    source code as development progresses
  • defect tracking systems follow the resolution of
    software defects
  • as the reuse of existing software artifacts
    becomes more important - store open-source
    applications in a DL format
  • Requires suitable managing operations such as
    retrieval and querying

4
Background
  • Current open source site sourceforge.net,
    freshmeat.net
  • Keyword search based on initial description of
    the project
  • Retrieve the (whole) project developers need to
    manually browse the content to locate required
    component
  • Component-based reuse
  • Formal language specifications Mili
    et.al(1994), Schumann and Fischer(1997),
    Penix(1999), Nakkrasae and Sophatsathit(2002)
  • Faceted index - Prieto-Diaz(1985)
  • Semantic net - Sugumaran and Storey(2003)

5
Our Approach
  • Target users developers with intention to
    develop reusable software
  • Reusable software offers code design
    scavenging, software component reuse and concept
    reuse
  • Agents to index and retrieve source code programs
    based on
  • Java program structure
  • classes, comments, identifiers, packages and
    import statements
  • Design patterns
  • patterns of classes and communicating objects
    that solve specific design problems
  • offers design solutions as reused entities
    concept reuse

6
Our Approach
  • Higher level of abstraction to represent source
    code programs

Reused entity is more abstract and is designed to
be configured and adapted for a range of
situations
Concept Reuse
Developers are inevitably constrained by design
decisions specific reusable entities
Component Reuse
abstraction
Object/Function Reuse
Requires detail knowledge of the classes and
function modules
7
Agent-based Architecture
  • Examples of query
  • Natural language (keyword, phrase)
  • Singleton registry class
  • Program template

public class Registry private Registry
() public static Registry getRegistry()
synchronized(classlock) if (registry
null) registry new Registry()
return registry
8
Agent-based Architecture
9
Agent-based Architecture
  • Portions of agents description for program index
    building is given as follows

10
Agent Interaction Program retrieval -
11
Agent Interaction Program retrieval -
  • Agents roles
  • Decompose queries
  • Identify relevant information to be extracted
    from queries
  • Identify collaborating classes (design patterns)
  • The Index Creation Agent (ICA) consist of 3
    agents
  • Keyword agent (KEMA)
  • Terms in query are analyzed separately as an
    individual token
  • Java Template agent (TEMA).
  • class name(s)
  • file name together with its full path
  • method name and signatures
  • superclass,
  • abstract class
  • interface class.
  • Design Pattern agent (DEPA)
  • Determines the existence of design patterns
    (Singleton, Composite Observer)

12
Results
  • Similarities measurements Precision and Recall
  • Precision (fraction of the retrieved programs
    which is relevant)
  • retrieve relevant programs
  • retrieve programs
  • Recall (fraction of the relevant programs which
    has been retrieved)
  • retrieve relevant programs
  • relevant programs
  • Experiment - 7 Java applications constituting of
    477 files
  • Retrieval based on the first 50 documents

13
Results
  • Both Composite and Observer has higher precision
    at lower recall values the top ranked retrieved
    programs are relevant, however, fail to retrieve
    all relevant programs
  • Singleton is superior at higher recall values
    able to retrieve all relevant programs but the
    programs were ranked at the bottom

14
Discussions
  • Result analysis - compensate between precision
    and recall. Higher recall produces less
    precision.
  • Singleton Not many developers implement
    Singleton pattern (2 out of 477 files) due to the
    fact that it controls the creation of class
    instances.
  • Composite Both precision and recall can be
    increase if the task agent (design pattern)
    manage to identify more interface class.
  • Observer - balance trade-off between precision
    and recall as both analysis reached more than 80
    of value at the end of the retrieval
  • The system is capable of achieving such result
    due to the cooperation between task agents,
    information agents and interface agents.
  • Based on the search query and stored programs,
    appropriate information extraction is undertaken
    by the agents

15
Conclusion Future Work
  • Program structure and design patterns can be use
    as retrieval methods in source code DLs
  • Agents technology has proven to facilitate the
    process of indexing, querying and retrieving
    program source code
  • decomposing problem into smaller chunks
  • identifying the appropriate information
    extraction and collaborating classes
  • Agent negotiation strategies need to be
    identified in generating optimum search results
Write a Comment
User Comments (0)
About PowerShow.com