Empirical Validation of Three Software Metrics Suites to Predict Fault-Proneness of Object-Oriented Classes Developed Using Highly Iterative or Agile Software Development Processes - PowerPoint PPT Presentation

1 / 36

About This Presentation

Title:

Empirical Validation of Three Software Metrics Suites to Predict Fault-Proneness of Object-Oriented Classes Developed Using Highly Iterative or Agile Software Development Processes

Description:

Title: A Systematic Review of Cross- vs. Within-Company Cost Estimation Studies Author: Emilia Last modified by: bll Created Date: 4/2/2006 4:22:29 AM – PowerPoint PPT presentation

Number of Views:281

Avg rating:3.0/5.0

Slides: 37

Provided by: emi84

Category:

more less

Transcript and Presenter's Notes

Title: Empirical Validation of Three Software Metrics Suites to Predict Fault-Proneness of Object-Oriented Classes Developed Using Highly Iterative or Agile Software Development Processes

1
Empirical Validation of Three SoftwareMetrics
Suites to Predict Fault-Pronenessof
Object-Oriented Classes DevelopedUsing Highly
Iterative or AgileSoftware Development Processes
EEL 6883 Research Paper Presentation

Hector M.Olague, Letha H.Etzkorn, Sampson
Gholston, Stephen Quattlebaum
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL.
33, NO. 6, JUNE 2007

Mustafa Ilhan Akbas Omer Bilal Orhan
2
Introduction

OO metrics have been developed to assess design
quality
A measure must be correct both theoretically and
practically.
Empirical validation is necessary to demonstrate
the usefulness of a metric in practical
applications

3
Earlier Studies

El Emam et al. gives a literature survey
Many emphasize validation of the CK metrics
suite. Also additional metrics for validation
studies are considered.
Using OO class metrics as quality predictors
may be useful when a highly iterative or
agile SW process is employed.

4
Case Study

Empirical validation of OO class metrics
Goal To assess the ability of OO metrics to
identify fault-prone
components in different software development
environments.
New Agile SW process
Questions to be answered
Can the OO metrics suites employed identify
fault-prone
classes in software developed using a highly
iterative, or
agile, development process during its initial
delivery ?
2. Can the OO metrics suites employed identify
fault-
prone classes in multiple, sequential releases of
software
developed using a highly iterative or agile
process?

5
Test Case

Test case Multiple versions of Rhino.
Rhino may be considered an example of the use of
the agile software development model in open
source software.
New enhancements-gtNew defects
Validated
The Chidamber and Kemerer (CK) metrics
Abreus Metrics for Object-Oriented Desig
Bansiya and Davis Quality Metrics for
Object-Oriented Design (QMOOD) validated.

6
Chidamber and Kemerers (CK) Metrics

Originally 1991, revised in 1994

7
Brito e Abreus MOOD Metrics

A small suite of metrics that could be used to
evaluate systems designed with an outside-in
methodology whereby early preparation and
planning for development are of special
importance.
The metrics that do not depend to a great extent
on the definitions of functions can be collected
early in the design phase.
The metrics should be easy to compute and should
have a formal definition not tied to any
particular OO language, should use consistent
units, and should result in numbers independent
of the system size.
All of the metrics result in a probability value
between 0 and 1.

8
Bansiya and Daviss Quality Model for
ObjectOriented Design (QMOOD) Metrics

The QMOOD metrics calculated on a system can be
used to compute a kind of supermetric, the Total
Quality Index.
QMOOD metrics are defined to be computable early
in the design process.
Bansiya and Davis first decided on a set of
design quality attributes, based loosely on the
attributes defined (ISO) 9126 standard
reusability, flexibility, understandability,
functionality, extendibility, and effectiveness.
Then, they identified a set of object- oriented
design properties that support the design quality
attributes.
There are 11 of these design properties for each
one, Bansiya and Davis identified a metric that
captures the design property.

9
Examined SW Data

Mozilla Rhino Open-source implementation of
JavaScript by Netscape.
6 releases were analyzed.
Metrics collection tool used Software System
Markup Language (SSML) tool chain
Set of tools working together to translate
source code into an intermediate source-language
independent representation and then perform
analysis on this representation.
One of the tools in the SSML tool chain yielded
the metrics used in the paper.

10
12 principles by Agile Alliance

1. Early and continuous delivery of SW.
2. Welcome changing requirements, even late.
3. Deliver working SW frequently.
4. Business people and developers must work
together
daily
5. Build projects around motivated individuals.
6. The most efficient and effective method to
convey information is face-to-face conversation.
7. Working software is the primary measure of
progress.
8. Agile processes promote sustainable
development.
9. Continuous attention to technical excellence
enhances
agility.
10. Simplicity is essential.
11. The best designs emerge from self-organizing
teams
12. The team reflects on how to become more
effective
regularly and adjusts its behavior accordingly.

Open source The users/clients of the software
are typically the developers
X
Users provide updates continually as the software
is used

The team performs regression testing and relies
on user feedback
Users would perform improvements only to their
own work.
Users who need the updates are the teams
11
Analysis Methods

Fault data was collected and analyzed for Rhino.
SSML tools were used to collect OO metrics from
the source code for each version of Rhino

12
Analysis

It is checked whether the metrics from 3
different suites are related to each other
Whether they are measuring different dimensions
of OO class quality or whether they measure the
same thing ?
First, a calibration Analyzed intercorrelations
between the CK class metrics and compared results
to see differences compared to previous case
studies.
Then, to determine which metrics can be used as
fault predictors, a bivariate correlation between
defects and the individual metrics from 3 metrics
suites was performed. New for QMOOD and MOOD
metric suites.

13
Analysis

Developed models using the different metrics
suites to predict faults
The lack of variability in the response variable
has been the principal motivation to forgo the
use of traditional linear regression techniques
in favor of logistic regression.
Examined the distribution of the number of
defects found in classes of Rhino SW and
concluded there was good variability in the
response variable in 3 later versions of Rhino,
so developed multivariate linear regression
models for them.
Yielded poor models. So employed binary logistic
regression analysis to develop models to predict
faults.

14
BLR

Performed univariate binary logistic regression
(UBLR) of metrics versus faults to determine
which variables were statistically significant
quality indicators.
Performed a collinearity analysis to determine
which variables to include in the multivariate
binary logistic regression (MBLR) models.
Developed three models for the CK metrics, two
models for the MOOD metrics, and two models for
the QMOOD metrics. Models are validated using a
simple holdout method

15
Starting the Results Part
16
Results

Comparison of CK to previous studies
Bivariate Correlation between defects and models.
Logistic Regression Analysis

17
Comparison of CK results in Rhino to Previous
Statistical Studies

CK has 6 metrics and these results will show the
inter correlation between those. (Table 1)
WMC Weighted Methods Per Class
DIT Depth of Inheritance Tree
NOC Number of Children
CBO Coupling between Children
RFC Response for a Class
LCOM Lack of Cohesion of MEthods

18
Comparison of CK results
19
Comparison of CK results
WMC DIT RFC NOC CBO LCOM98
WMC 100 30 95 15 60 70
DIT 100 30 10 30 30
RFC 100 15 60 70
NOC 100 20 10
CBO 100 55
LCOM98 100
20
Bivariate Correlation Between Defects and Metrics
From Models

Similar results were shown on CK metric but
little on others.
Only 3 versions of Rhino is shown here because
number of faults are higher in these versions.
(i.e. 178, 198, 201)

21
Bivariate Correlation Between Defects and Metric
Components

CK RTC and WMC good positive correlation,
QMOOD Dam is consistent negative correlation
QMOD NOM result might be wrong ??

22
Logistic Regression Analysis

Binary logistic regression analysis is done in
different Rhino versions.
First univariate BLR to determine which metrics
are good indicators of quality than multivariate
BLR on CK and QMOOD.

23
Univariate Binary Logistic RegressionMetrics
versus Faults

The measures of association used are
Log Likelihood (LL)
P-value
Odds Ratio
Test statistics (G)
Hosmer-Lemeshow (HL)
CKCBO is significant in 5/6 versions.
CK-LCOM98 is significant 4/6 versions
CK-RFC QMOOD-CIS are significant on all 6
versions
MOOD metrics were significant only in 2/6 of the
versions

24
Multivariate Binary Logistic Regression

Performed a collinearity analysis to determine
which models to use in MBLR.
There are correlation between 2 variables in
every metric suite models. That means a potential
Collinearity problem (i.e. there are dependent
variables in the model).
To remove the dependent variables and build a
model using independent variables, VIF (variance
inflation factor) of all possible repressors will
be computed. The ones with the multicollinearity
problem will be remove and reevaluated.
Also Condition Number is an indicator of
Multicollinearity will be examined. (CN Largest
EigenValue/ Rest of Eigs)

25
MBLR Parameter Selection
26
MBLR Parameter SelectionModel1
27
MBLR Parameter SelectionModel2
28
MBLR Parameter SelectionModel3
Similar selections are done for QMOOD and MOOD
and 2 models from each suites are selected and
created.
29
MBLR RESULTS

3 models using CK, 2 using MOOD and 2 using QMOOD
is created.
Model1 for CK was successful but univariate BLR
for CK-WMC is also successful. So why use
Multivariate?
Some other parameters give significant result for
different versions.
Models for MOOD were unsuccessful.
2 models of QMOOD were shown significant, and 2
different parameters were important in 2 models,
so using multivariate might help.

30
MBLR Model Validation

The results are shown for small and large
classes, because size of the class might bias the
errors in the code. Also only the concordant
(defectives remain defective) values will be
shown.
Results.
Models are able to classify fault prone classes.
There is a general deterioration of effectiveness
of the metrics as the software progress the
versions.
CK and QMOOD models performed better than MOOD
models

31
MBLR Model Validation
32
MBLR Model Validation

Also effectiveness deteriorate faster in
successive version for small classes.

33
Conclusions.
34
Conclusion

Authors conducted a statistical analysis of the
CK, MOOD, and QMOOD OO class metrics suites using
six versions of Mozillas Rhino open source
software
Primary contribution is using OO metrics to
predict defects in agile SW.
Another contribution is the empirical study of
the QMOOD and MOOD metrics
CK-WMC, CK-RFC, QMOOD-CIS, and QMOOD-NOM are
consistent predictors of class quality
(error-proneness).

35
Conclusion

The MOOD metrics were not useful as predictors of
OO class quality.
CK metrics suite produced the best three models
for predicting OO class quality, followed closely
by one QMOOD model.
CK metrics have been shown to be better and more
reliable predictors of fault-proneness than the
MOOD or QMOOD metrics.
Class size can impact metric performance.
There are practical limitations to the
effectiveness of the metrics over the course of
several software iterations as the software
matures and the dynamic nature of the software
development process subsides.

36
Future Work

Complexity-related measures may be effective in
detecting error-prone classes in highly iterative
or agile processes
The decision trees may be more effective than
binary logistic regression in detecting
error-prone classes in highly iterative or agile
processes.
Various aspects of OO complexity proposed in
previous studies and implemented in various
metrics suites may be better predictors of OO
class quality in highly iterative or agile
systems.
The use of decision trees using metrics from this
study.