Identifying%20Reasons%20for%20Software%20Changes%20Using%20Historic%20Databases - PowerPoint PPT Presentation

About This Presentation
Title:

Identifying%20Reasons%20for%20Software%20Changes%20Using%20Historic%20Databases

Description:

They did not say why they changed these definitions. The Case Study Company ... A software that eliminates prefixes and suffixes to get back to the root word. ... – PowerPoint PPT presentation

Number of Views:922
Avg rating:3.0/5.0
Slides: 20
Provided by: lionel51
Category:

less

Transcript and Presenter's Notes

Title: Identifying%20Reasons%20for%20Software%20Changes%20Using%20Historic%20Databases


1
Identifying Reasons for Software Changes Using
Historic Databases
  • The CISC 864 Analysis
  • By Lionel Marks

2
Purpose of the Paper
  • Using the textual description of a change, try to
    understand why that change was performed
    (Adaptive, Corrective, or Perfective)
  • Observe difficulty, size, and interval on the
    different types of changes

3
Three Different Types of Changes
  • Traditionally, the three types of changes are
    (Taken from ELEC 876 Slides)

4
Three Types of Changes in This Paper
  • Adaptive Adding new features wanted by the
    customer (Switched with Perfective)
  • Corrective Fixing Faults
  • Perfective Restructuring code to accommodate
    future changes (Switched with Adaptive)
  • They did not say why they changed these
    definitions

5
The Case Study Company
  • This paper did not divulge the company it used
    for its case study
  • It is an actual business
  • Kept developer names/actions anonymous in the
    study
  • This allowed them to study a real system that has
    lasted for many years, and has a large (and old)
    version control system.

6
Structure of the ECMS
  • The Companys Source Code Control System - ECMS (
    Extended Change Management System)
  • MRs vs. Deltas
  • Each MR could have multiple Deltas of changes to
    one file
  • Delta each time a file was touched

7
The Test System
  • Called System A for anonymity purposes
  • Has
  • 2M lines of source code
  • 3000 files
  • 100 modules
  • Over the last 10 years
  • 33171 MRs
  • An average of 4 deltas each

8
How they Classified Maintenance Activities
(Adaptive, Corrective, Perfective)
  • If you were given this project
  • You have
  • The CVS repository, and access to the
    descriptions along with commits
  • The goal of labelling each commit as Adaptive,
    Corrective, or Perfective.
  • What would you intuitively study in the
    descriptions?

9
How they Classified Maintenance Activities
(Adaptive, Corrective, Perfective)
  • They had a 5 step process
  • Cleanup and normalization
  • Word Frequency Analysis
  • Keyword Clustering and Classification
  • MR abstract classification
  • Repeat analysis from step 2 on unclassified MR
    abstracts

10
Step 1 Cleanup and Normalization
  • Their approach used WordNet
  • A software that eliminates prefixes and suffixes
    to get back to the root word. E.g. fixing and
    fixes are all of the root word fix
  • WordNet also had a synonym feature, but it was
    not used.
  • They would be hard to correlate properly to the
    context of SW maintenance, and could be
    misinterpreted.

11
Step 2 Word Frequency Analysis
  • Determine the frequency of a set of words in the
    descriptions (Histogram for each description)
  • What words in the English language would be
    neutral to these classifications and be noise
    in this experiment?

12
Step 3 Keyword Clustering
  • Classification was done by reading the
    description of 20 randomly selected changes for
    each selected term in their set, such as
    cleanup meaning perfective maintenance. Human
    reading was done.
  • If word matched less than 75 of cases, then
    deemed neutral
  • Found that rework was used a lot during code
    inspection (a new classification)

13
Step 4 MR Classification Rules
  • Like the hard-coded answer when the learning
    algorithm fails
  • If an inspection word is found, then it is deemed
    an inspection classification
  • If fix, bug, error, fixup, or fail are present,
    the change is corrective
  • If more than one type of keyword is present, the
    dominating frequency wins.

14
Step 5 Cycle Back to Step 2
  • As in Step 2 you cannot cover the frequency of
    every word in your document all at once, take
    some more now
  • Perform more learning and see if new frequent
    terms fit
  • Use static rules to resolve unclassified
    descriptions
  • When all else failed, considered fixes to be
    corrective

15
Case Study Compare Against Human Classification
  • 20 Candidates, 150 MRs
  • More than 61 of the time, the tool and the real
    people came to the same classification
  • Kappa and ANOVA were used to show significance in
    the results

16
How Purposes Affect Size and Interval
  • Corrective and Adaptive had the lowest change
    intervals
  • New Code Development and inspection changes added
    the most lines
  • Inspection deleted the most lines
  • Distribution functions are significant at a 0.01
    level ANOVA described significance as well, but
    is inappropriate due skewed distributions

17
Change Difficulty
  • 20 Candidates, 150 MRs
  • Goal To model the difficulty of each MR. Is
    classification significant?

18
Modeling Difficulty
  • Modeling of Size Deltas ( of files touched)
  • Difficulty changed with number of deltas except
    in corrective and perfective (changes in SW/HW)
    changes
  • Length of time modeled in difficulty as well

19
Likes and Dislikes of this Paper
  • Likes
  • The algorithm used to make classifications good
    way to break down the problem
  • The accumulation graphs were interesting
  • Their utilization of a real company is also a
    breath of fresh air real data!
  • Dislikes
  • Asking developers months after the work how hard
    changes were. No better way at moment, but
    results can be skewed with time.
  • Using a real company, the anonymity made the
    product comparison in the paper less interesting
Write a Comment
User Comments (0)
About PowerShow.com