How to do successful research in software evolution - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

How to do successful research in software evolution

Description:

Exploration / navigation / visualization. Abstract. to desired. meta-model. Automated ... Exploration. CS846. Michael W. Godfrey. 9. Case studies of origin analysis ... – PowerPoint PPT presentation

Number of Views:90
Avg rating:3.0/5.0
Slides: 25
Provided by: michaelw1
Category:

less

Transcript and Presenter's Notes

Title: How to do successful research in software evolution


1
How to do successful research in software
evolution
  • Michael W. Godfrey
  • Software Architecture Group (SWAG)
  • University of Waterloo

2
A general approach
  • OK, its really just our research groups way to
    do successful research in software evolution ?
  • A three stage tool-based pipeline
  • Extract
  • Abstract
  • Navigate, query, explore

3
A general approach
Automated
Abstract to desired meta-model
Extract raw facts
Source artifacts
Simplified data
Semi-automated
Exploration / navigation / visualization
4
(No Transcript)
5
(No Transcript)
6
Four interesting ways in which history can
teach us about software
  • Michael W. Godfrey
  • Xinyi Dong
  • Cory Kapser
  • Lijie Zou
  • Software Architecture Group (SWAG)
  • University of Waterloo

7
Longitudinal case studies of growth and evolution
  • Studied several OSSs, esp. Linux kernel
  • Looked for evolutionary narratives to explain
    observable historical phenomena
  • Methodology
  • Analyze individual tarball versions
  • Build hierarchical metrics data model
  • Generate graphs, look for interesting lumps under
    the carpet, try to answer why

8
Longitudinal case studies of growth and evolution
Analysis scripts
Source code
Metrics data
Extraction / analysis
MS Excel
Exploration
9
Case studies of origin analysis
  • Reasoning about structural change
  • (moving, renaming, merging, splitting, etc.)
  • Try to reconstruct what happened
  • Formalized several change patterns
  • e.g., service consolidation
  • Methodology
  • Consider consecutive pairs of versions
  • Entity analysis metrics-based clone detection
  • Relationship analysis compare relational images
    (calls, called-by, uses, extends, etc)
  • Create evolutionary record of what happened
  • what evolved from what, and how/why

10
Case studies of origin analysis
ER model
cppx / Understand / Beagle
Source code
Metrics data
Extraction / analysis
Beagle
Exploration
11
Case studies of code cloning
  • Motivation
  • Lots of research in clone detection, but more on
    algorithms and tools than on case studies and
    comprehension
  • What kinds of cloning are there? Why does
    cloning happen? What kinds are the most/least
    harmful? Do different clone kinds have different
    precision / recall numbers? Different algorithms?
  • Future work track clone evolution
  • Do related bugs get fixed? Does cloned code have
    more bugs?
  • Methodology
  • Use CCFinder on source to find initial clone
    pairs.
  • Use ctags to map out source files into entity
    regions
  • Consecutive typedefs, fcn prototypes, var defs
  • Individual macros, structs, unions, enums, fcn
    defs
  • Map (abstract up) clone pairs to the source code
    regions

12
Case studies of code cloning
  • Methodology
  • Filter different region kinds according to
    observed heuristics
  • C structs often look alike parameterized string
    matching returns many more false positives
    without these filters than, say, between
    functions.
  • Sort clones by location
  • Same region, same file, same directory, or
    different directory
  • and entity kind
  • fcn to fcn / structures (enum, union, struct) /
    macro / heterogeneous (different region kinds) /
    misc. clones
  • and even more detailed criteria
  • Function initialization / finalization clones,
  • Navigate and investigate using CICS gui, look for
    patterns
  • Cross subsystem clones seems to vary more over
    time
  • Intra subsystem clones are usually function clones

13
Case studies of code cloning
CCFinder
Source code
Custom filters and sorter
Taxonomized clone pairs
ctags
Extraction / analysis
CICS gui
Exploration
14
Longitudinal case studies of software
manufacturing-related artifacts
  • Q How much maintenance effort is put into SM
    artifacts, relative to the system as a whole?
  • Studying six OSSs
  • GCC, PostgreSQL, kepler, ant, mycore, midworld
  • All used CVS we examined their logs
  • We look for SM artifacts (Makefile, build.xml,
    SConscript) and compared them to non-SM artifacts

15
Longitudinal case studies of software
manufacturing-related artifacts
  • Some results
  • Between 58 and 81 of the core developers
    contributed changes to SM artifacts
  • SM artifacts were responsible for
  • 3-10 of the number of changes made
  • Up to 20 of the total LOC changed (GCC)
  • Open questions
  • How difficult is it to maintain these artifacts?
  • Do different SM tools require different amounts
    of effort?

16
Longitudinal case studies of software
manufacturing-related artifacts
Analysis scripts
CVS repos
Metrics data
Extraction / analysis
MS Excel
Exploration
17
Dimensions of studies
  • Single version vs. consecutive version pairs vs.
    longitudinal study
  • Coarsely vs. finely grained detail
  • Intermediate representation of artifacts
  • Raw code vs. metrics vs. ER-like semantic model
  • Navigable representation of system architecture
    auto-abstraction of info at arbitrary levels

18
Challenges in this field
  • Dealing with scale
  • Big system analysis times many versions
  • Research tools often live at bleeding edge, slow
    and produce voluminous detail
  • Automation
  • Research tools often buggy, require handholding
  • Often, hard to get automated multiple analyses.

19
Challenges in this field
  • Artifact linkage and analysis granularity
  • Repositories (CVS, Unix fs) often store only
    source code, with no special understanding of,
    say, where a particular method resides.
  • (How) should we make them smarter?
  • e.g., ctags and CCfinder
  • Your thoughts?

20
Four interesting ways in which history can
teach us about software
  • Michael W. Godfrey
  • Xinyi Dong
  • Cory Kapser
  • Lijie Zou
  • Software Architecture Group (SWAG)
  • University of Waterloo

21
(No Transcript)
22
Tools that SWAG have written
  • Fact extractors
  • LDX for object files compiled for Linux Wu
  • Recommended for C/C systems that can be built
    on Linux
  • CPPX for gcc-compliant C/C systems Malton /
    Dean
  • some features of C not yet supported
  • Much slower and less robust than LDX
  • These fact extractors use the TA language for
    output.

23
Tools that SWAG have written
  • Fact manipulators
  • JGrok/QL Wu
  • a re-implementation of grok Holt in Java
  • Basically, JGrok reads in data stored as sets and
    relations, and allows set/relationship operations
    to be performed on them.
  • JGrok has no special knowledge of sw systems!
  • Can input / output data in the TA language
  • Visualization engine
  • LSedit Farmaner / Davis / Synytskyy
  • Java application performs layout and
    visualization of software system facts encoded
    in TA.

24
More on SWAG tools
  • See SWAGs web page for examples and
    documentation
  • http//www.swag.uwaterloo.ca
  • Currently documents are up-to-date!!
  • Ignore Portable Bookshelf (PBS), Beagle for now
Write a Comment
User Comments (0)
About PowerShow.com