Title: Supporting Program Indexing and Querying in Source Code Digital Libraries
1Supporting Program Indexing and Querying in
Source Code Digital Libraries
- Yuhanis Yusof and Omer F. Rana
- Cardiff University, Wales, UK
2Presentation Outline
- Background
- Software reuse
- Software repositories
- Literature review
- Agent-based architecture in s/code DLs
- Agent Interaction
- Program retrieval
- Results
- Conclusions Future Work
-
3Background
- Software reuse creating software systems from
existing software - Software artifacts such as source code, design
specifications, software components - Software repositories
- contain a wealth of valuable information for
empirical studies in software engineering - source control systems store changes to the
source code as development progresses - defect tracking systems follow the resolution of
software defects - as the reuse of existing software artifacts
becomes more important - store open-source
applications in a DL format - Requires suitable managing operations such as
retrieval and querying
4Background
- Current open source site sourceforge.net,
freshmeat.net - Keyword search based on initial description of
the project - Retrieve the (whole) project developers need to
manually browse the content to locate required
component - Component-based reuse
- Formal language specifications Mili
et.al(1994), Schumann and Fischer(1997),
Penix(1999), Nakkrasae and Sophatsathit(2002) - Faceted index - Prieto-Diaz(1985)
- Semantic net - Sugumaran and Storey(2003)
5Our Approach
- Target users developers with intention to
develop reusable software - Reusable software offers code design
scavenging, software component reuse and concept
reuse - Agents to index and retrieve source code programs
based on - Java program structure
- classes, comments, identifiers, packages and
import statements - Design patterns
- patterns of classes and communicating objects
that solve specific design problems - offers design solutions as reused entities
concept reuse
6Our Approach
- Higher level of abstraction to represent source
code programs
Reused entity is more abstract and is designed to
be configured and adapted for a range of
situations
Concept Reuse
Developers are inevitably constrained by design
decisions specific reusable entities
Component Reuse
abstraction
Object/Function Reuse
Requires detail knowledge of the classes and
function modules
7Agent-based Architecture
- Examples of query
- Natural language (keyword, phrase)
- Singleton registry class
- Program template
public class Registry private Registry
() public static Registry getRegistry()
synchronized(classlock) if (registry
null) registry new Registry()
return registry
8Agent-based Architecture
9Agent-based Architecture
- Portions of agents description for program index
building is given as follows
10Agent Interaction Program retrieval -
11Agent Interaction Program retrieval -
- Agents roles
- Decompose queries
- Identify relevant information to be extracted
from queries - Identify collaborating classes (design patterns)
- The Index Creation Agent (ICA) consist of 3
agents - Keyword agent (KEMA)
- Terms in query are analyzed separately as an
individual token - Java Template agent (TEMA).
- class name(s)
- file name together with its full path
- method name and signatures
- superclass,
- abstract class
- interface class.
- Design Pattern agent (DEPA)
- Determines the existence of design patterns
(Singleton, Composite Observer)
12Results
- Similarities measurements Precision and Recall
- Precision (fraction of the retrieved programs
which is relevant) - retrieve relevant programs
- retrieve programs
- Recall (fraction of the relevant programs which
has been retrieved) - retrieve relevant programs
- relevant programs
- Experiment - 7 Java applications constituting of
477 files - Retrieval based on the first 50 documents
13Results
- Both Composite and Observer has higher precision
at lower recall values the top ranked retrieved
programs are relevant, however, fail to retrieve
all relevant programs - Singleton is superior at higher recall values
able to retrieve all relevant programs but the
programs were ranked at the bottom
14Discussions
- Result analysis - compensate between precision
and recall. Higher recall produces less
precision. - Singleton Not many developers implement
Singleton pattern (2 out of 477 files) due to the
fact that it controls the creation of class
instances. - Composite Both precision and recall can be
increase if the task agent (design pattern)
manage to identify more interface class. - Observer - balance trade-off between precision
and recall as both analysis reached more than 80
of value at the end of the retrieval - The system is capable of achieving such result
due to the cooperation between task agents,
information agents and interface agents. - Based on the search query and stored programs,
appropriate information extraction is undertaken
by the agents
15Conclusion Future Work
- Program structure and design patterns can be use
as retrieval methods in source code DLs - Agents technology has proven to facilitate the
process of indexing, querying and retrieving
program source code - decomposing problem into smaller chunks
- identifying the appropriate information
extraction and collaborating classes - Agent negotiation strategies need to be
identified in generating optimum search results