Title: Identifying%20Reasons%20for%20Software%20Changes%20Using%20Historic%20Databases
1Identifying Reasons for Software Changes Using
Historic Databases
- The CISC 864 Analysis
- By Lionel Marks
2Purpose of the Paper
- Using the textual description of a change, try to
understand why that change was performed
(Adaptive, Corrective, or Perfective) - Observe difficulty, size, and interval on the
different types of changes
3Three Different Types of Changes
- Traditionally, the three types of changes are
(Taken from ELEC 876 Slides)
4Three Types of Changes in This Paper
- Adaptive Adding new features wanted by the
customer (Switched with Perfective) - Corrective Fixing Faults
- Perfective Restructuring code to accommodate
future changes (Switched with Adaptive) - They did not say why they changed these
definitions
5The Case Study Company
- This paper did not divulge the company it used
for its case study - It is an actual business
- Kept developer names/actions anonymous in the
study - This allowed them to study a real system that has
lasted for many years, and has a large (and old)
version control system.
6Structure of the ECMS
- The Companys Source Code Control System - ECMS (
Extended Change Management System) - MRs vs. Deltas
- Each MR could have multiple Deltas of changes to
one file - Delta each time a file was touched
7The Test System
- Called System A for anonymity purposes
- Has
- 2M lines of source code
- 3000 files
- 100 modules
- Over the last 10 years
- 33171 MRs
- An average of 4 deltas each
8How they Classified Maintenance Activities
(Adaptive, Corrective, Perfective)
- If you were given this project
- You have
- The CVS repository, and access to the
descriptions along with commits - The goal of labelling each commit as Adaptive,
Corrective, or Perfective. - What would you intuitively study in the
descriptions?
9How they Classified Maintenance Activities
(Adaptive, Corrective, Perfective)
- They had a 5 step process
- Cleanup and normalization
- Word Frequency Analysis
- Keyword Clustering and Classification
- MR abstract classification
- Repeat analysis from step 2 on unclassified MR
abstracts
10Step 1 Cleanup and Normalization
- Their approach used WordNet
- A software that eliminates prefixes and suffixes
to get back to the root word. E.g. fixing and
fixes are all of the root word fix - WordNet also had a synonym feature, but it was
not used. - They would be hard to correlate properly to the
context of SW maintenance, and could be
misinterpreted.
11Step 2 Word Frequency Analysis
- Determine the frequency of a set of words in the
descriptions (Histogram for each description) - What words in the English language would be
neutral to these classifications and be noise
in this experiment?
12Step 3 Keyword Clustering
- Classification was done by reading the
description of 20 randomly selected changes for
each selected term in their set, such as
cleanup meaning perfective maintenance. Human
reading was done. - If word matched less than 75 of cases, then
deemed neutral - Found that rework was used a lot during code
inspection (a new classification)
13Step 4 MR Classification Rules
- Like the hard-coded answer when the learning
algorithm fails - If an inspection word is found, then it is deemed
an inspection classification - If fix, bug, error, fixup, or fail are present,
the change is corrective - If more than one type of keyword is present, the
dominating frequency wins.
14Step 5 Cycle Back to Step 2
- As in Step 2 you cannot cover the frequency of
every word in your document all at once, take
some more now - Perform more learning and see if new frequent
terms fit - Use static rules to resolve unclassified
descriptions - When all else failed, considered fixes to be
corrective
15Case Study Compare Against Human Classification
- 20 Candidates, 150 MRs
- More than 61 of the time, the tool and the real
people came to the same classification - Kappa and ANOVA were used to show significance in
the results
16How Purposes Affect Size and Interval
- Corrective and Adaptive had the lowest change
intervals - New Code Development and inspection changes added
the most lines - Inspection deleted the most lines
- Distribution functions are significant at a 0.01
level ANOVA described significance as well, but
is inappropriate due skewed distributions
17Change Difficulty
- 20 Candidates, 150 MRs
- Goal To model the difficulty of each MR. Is
classification significant?
18Modeling Difficulty
- Modeling of Size Deltas ( of files touched)
- Difficulty changed with number of deltas except
in corrective and perfective (changes in SW/HW)
changes - Length of time modeled in difficulty as well
19Likes and Dislikes of this Paper
- Likes
- The algorithm used to make classifications good
way to break down the problem - The accumulation graphs were interesting
- Their utilization of a real company is also a
breath of fresh air real data! - Dislikes
- Asking developers months after the work how hard
changes were. No better way at moment, but
results can be skewed with time. - Using a real company, the anonymity made the
product comparison in the paper less interesting