Identifying%20Reasons%20for%20Software%20Changes%20Using%20Historic%20Databases - PowerPoint PPT Presentation

About This Presentation

Title:

Identifying%20Reasons%20for%20Software%20Changes%20Using%20Historic%20Databases

Description:

They did not say why they changed these definitions. The Case Study Company ... A software that eliminates prefixes and suffixes to get back to the root word. ... – PowerPoint PPT presentation

Number of Views:922

Avg rating:3.0/5.0

Slides: 20

Provided by: lionel51

Category:

more less

Transcript and Presenter's Notes

Title: Identifying%20Reasons%20for%20Software%20Changes%20Using%20Historic%20Databases

1
Identifying Reasons for Software Changes Using
Historic Databases

The CISC 864 Analysis
By Lionel Marks

2
Purpose of the Paper

Using the textual description of a change, try to
understand why that change was performed
(Adaptive, Corrective, or Perfective)
Observe difficulty, size, and interval on the
different types of changes

3
Three Different Types of Changes

Traditionally, the three types of changes are
(Taken from ELEC 876 Slides)

4
Three Types of Changes in This Paper

Adaptive Adding new features wanted by the
customer (Switched with Perfective)
Corrective Fixing Faults
Perfective Restructuring code to accommodate
future changes (Switched with Adaptive)
They did not say why they changed these
definitions

5
The Case Study Company

This paper did not divulge the company it used
for its case study
It is an actual business
Kept developer names/actions anonymous in the
study
This allowed them to study a real system that has
lasted for many years, and has a large (and old)
version control system.

6
Structure of the ECMS

The Companys Source Code Control System - ECMS (
Extended Change Management System)
MRs vs. Deltas
Each MR could have multiple Deltas of changes to
one file
Delta each time a file was touched

7
The Test System

Called System A for anonymity purposes
Has
2M lines of source code
3000 files
100 modules
Over the last 10 years
33171 MRs
An average of 4 deltas each

8
How they Classified Maintenance Activities
(Adaptive, Corrective, Perfective)

If you were given this project
You have
The CVS repository, and access to the
descriptions along with commits
The goal of labelling each commit as Adaptive,
Corrective, or Perfective.
What would you intuitively study in the
descriptions?

9
How they Classified Maintenance Activities
(Adaptive, Corrective, Perfective)

They had a 5 step process
Cleanup and normalization
Word Frequency Analysis
Keyword Clustering and Classification
MR abstract classification
Repeat analysis from step 2 on unclassified MR
abstracts

10
Step 1 Cleanup and Normalization

Their approach used WordNet
A software that eliminates prefixes and suffixes
to get back to the root word. E.g. fixing and
fixes are all of the root word fix
WordNet also had a synonym feature, but it was
not used.
They would be hard to correlate properly to the
context of SW maintenance, and could be
misinterpreted.

11
Step 2 Word Frequency Analysis

Determine the frequency of a set of words in the
descriptions (Histogram for each description)
What words in the English language would be
neutral to these classifications and be noise
in this experiment?

12
Step 3 Keyword Clustering

Classification was done by reading the
description of 20 randomly selected changes for
each selected term in their set, such as
cleanup meaning perfective maintenance. Human
reading was done.
If word matched less than 75 of cases, then
deemed neutral
Found that rework was used a lot during code
inspection (a new classification)

13
Step 4 MR Classification Rules

Like the hard-coded answer when the learning
algorithm fails
If an inspection word is found, then it is deemed
an inspection classification
If fix, bug, error, fixup, or fail are present,
the change is corrective
If more than one type of keyword is present, the
dominating frequency wins.

14
Step 5 Cycle Back to Step 2

As in Step 2 you cannot cover the frequency of
every word in your document all at once, take
some more now
Perform more learning and see if new frequent
terms fit
Use static rules to resolve unclassified
descriptions
When all else failed, considered fixes to be
corrective

15
Case Study Compare Against Human Classification

20 Candidates, 150 MRs
More than 61 of the time, the tool and the real
people came to the same classification
Kappa and ANOVA were used to show significance in
the results

16
How Purposes Affect Size and Interval

Corrective and Adaptive had the lowest change
intervals
New Code Development and inspection changes added
the most lines
Inspection deleted the most lines
Distribution functions are significant at a 0.01
level ANOVA described significance as well, but
is inappropriate due skewed distributions

17
Change Difficulty

20 Candidates, 150 MRs
Goal To model the difficulty of each MR. Is
classification significant?

18
Modeling Difficulty

Modeling of Size Deltas ( of files touched)
Difficulty changed with number of deltas except
in corrective and perfective (changes in SW/HW)
changes
Length of time modeled in difficulty as well

19
Likes and Dislikes of this Paper

Likes
The algorithm used to make classifications good
way to break down the problem
The accumulation graphs were interesting
Their utilization of a real company is also a
breath of fresh air real data!
Dislikes
Asking developers months after the work how hard
changes were. No better way at moment, but
results can be skewed with time.
Using a real company, the anonymity made the
product comparison in the paper less interesting