Empirical Validation of Three Software Metrics Suites to Predict Fault-Proneness of Object-Oriented Classes Developed Using Highly Iterative or Agile Software Development Processes - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Empirical Validation of Three Software Metrics Suites to Predict Fault-Proneness of Object-Oriented Classes Developed Using Highly Iterative or Agile Software Development Processes

Description:

Title: A Systematic Review of Cross- vs. Within-Company Cost Estimation Studies Author: Emilia Last modified by: bll Created Date: 4/2/2006 4:22:29 AM – PowerPoint PPT presentation

Number of Views:281
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Empirical Validation of Three Software Metrics Suites to Predict Fault-Proneness of Object-Oriented Classes Developed Using Highly Iterative or Agile Software Development Processes


1
Empirical Validation of Three SoftwareMetrics
Suites to Predict Fault-Pronenessof
Object-Oriented Classes DevelopedUsing Highly
Iterative or AgileSoftware Development Processes
EEL 6883 Research Paper Presentation
  • Hector M.Olague, Letha H.Etzkorn, Sampson
    Gholston, Stephen Quattlebaum
  • IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL.
    33, NO. 6, JUNE 2007

Mustafa Ilhan Akbas Omer Bilal Orhan
2
Introduction
  • OO metrics have been developed to assess design
    quality
  • A measure must be correct both theoretically and
    practically.
  • Empirical validation is necessary to demonstrate
    the usefulness of a metric in practical
    applications

3
Earlier Studies
  • El Emam et al. gives a literature survey
  • Many emphasize validation of the CK metrics
    suite. Also additional metrics for validation
    studies are considered.
  • Using OO class metrics as quality predictors
  • may be useful when a highly iterative or
  • agile SW process is employed.

4
Case Study
  • Empirical validation of OO class metrics
  • Goal To assess the ability of OO metrics to
    identify fault-prone
  • components in different software development
    environments.
  • New Agile SW process
  • Questions to be answered
  • Can the OO metrics suites employed identify
    fault-prone
  • classes in software developed using a highly
    iterative, or
  • agile, development process during its initial
    delivery ?
  • 2. Can the OO metrics suites employed identify
    fault-
  • prone classes in multiple, sequential releases of
    software
  • developed using a highly iterative or agile
    process?

5
Test Case
  • Test case Multiple versions of Rhino.
  • Rhino may be considered an example of the use of
    the agile software development model in open
    source software.
  • New enhancements-gtNew defects
  • Validated
  • The Chidamber and Kemerer (CK) metrics
  • Abreus Metrics for Object-Oriented Desig
  • Bansiya and Davis Quality Metrics for
    Object-Oriented Design (QMOOD) validated.

6
Chidamber and Kemerers (CK) Metrics
  • Originally 1991, revised in 1994

7
Brito e Abreus MOOD Metrics
  • A small suite of metrics that could be used to
    evaluate systems designed with an outside-in
    methodology whereby early preparation and
    planning for development are of special
    importance.
  • The metrics that do not depend to a great extent
    on the definitions of functions can be collected
    early in the design phase.
  • The metrics should be easy to compute and should
    have a formal definition not tied to any
    particular OO language, should use consistent
    units, and should result in numbers independent
    of the system size.
  • All of the metrics result in a probability value
    between 0 and 1.

8
Bansiya and Daviss Quality Model for
ObjectOriented Design (QMOOD) Metrics
  • The QMOOD metrics calculated on a system can be
    used to compute a kind of supermetric, the Total
    Quality Index.
  • QMOOD metrics are defined to be computable early
    in the design process.
  • Bansiya and Davis first decided on a set of
    design quality attributes, based loosely on the
    attributes defined (ISO) 9126 standard
    reusability, flexibility, understandability,
    functionality, extendibility, and effectiveness.
  • Then, they identified a set of object- oriented
    design properties that support the design quality
    attributes.
  • There are 11 of these design properties for each
    one, Bansiya and Davis identified a metric that
    captures the design property.

9
Examined SW Data
  • Mozilla Rhino Open-source implementation of
    JavaScript by Netscape.
  • 6 releases were analyzed.
  • Metrics collection tool used Software System
    Markup Language (SSML) tool chain
  • Set of tools working together to translate
    source code into an intermediate source-language
    independent representation and then perform
    analysis on this representation.
  • One of the tools in the SSML tool chain yielded
    the metrics used in the paper.

10
12 principles by Agile Alliance
  • 1. Early and continuous delivery of SW.
  • 2. Welcome changing requirements, even late.
  • 3. Deliver working SW frequently.
  • 4. Business people and developers must work
    together
  • daily
  • 5. Build projects around motivated individuals.
  • 6. The most efficient and effective method to
    convey information is face-to-face conversation.
  • 7. Working software is the primary measure of
    progress.
  • 8. Agile processes promote sustainable
    development.
  • 9. Continuous attention to technical excellence
    enhances
  • agility.
  • 10. Simplicity is essential.
  • 11. The best designs emerge from self-organizing
    teams
  • 12. The team reflects on how to become more
    effective
  • regularly and adjusts its behavior accordingly.

Open source The users/clients of the software
are typically the developers
X
Users provide updates continually as the software
is used

The team performs regression testing and relies
on user feedback
Users would perform improvements only to their
own work.
Users who need the updates are the teams
11
Analysis Methods
  • Fault data was collected and analyzed for Rhino.
  • SSML tools were used to collect OO metrics from
    the source code for each version of Rhino

12
Analysis
  • It is checked whether the metrics from 3
    different suites are related to each other
  • Whether they are measuring different dimensions
    of OO class quality or whether they measure the
    same thing ?
  • First, a calibration Analyzed intercorrelations
    between the CK class metrics and compared results
    to see differences compared to previous case
    studies.
  • Then, to determine which metrics can be used as
    fault predictors, a bivariate correlation between
    defects and the individual metrics from 3 metrics
    suites was performed. New for QMOOD and MOOD
    metric suites.

13
Analysis
  • Developed models using the different metrics
    suites to predict faults
  • The lack of variability in the response variable
    has been the principal motivation to forgo the
    use of traditional linear regression techniques
    in favor of logistic regression.
  • Examined the distribution of the number of
    defects found in classes of Rhino SW and
    concluded there was good variability in the
    response variable in 3 later versions of Rhino,
    so developed multivariate linear regression
    models for them.
  • Yielded poor models. So employed binary logistic
    regression analysis to develop models to predict
    faults.

14
BLR
  1. Performed univariate binary logistic regression
    (UBLR) of metrics versus faults to determine
    which variables were statistically significant
    quality indicators.
  2. Performed a collinearity analysis to determine
    which variables to include in the multivariate
    binary logistic regression (MBLR) models.
  3. Developed three models for the CK metrics, two
    models for the MOOD metrics, and two models for
    the QMOOD metrics. Models are validated using a
    simple holdout method

15
Starting the Results Part
16
Results
  • Comparison of CK to previous studies
  • Bivariate Correlation between defects and models.
  • Logistic Regression Analysis

17
Comparison of CK results in Rhino to Previous
Statistical Studies
  • CK has 6 metrics and these results will show the
    inter correlation between those. (Table 1)
  • WMC Weighted Methods Per Class
  • DIT Depth of Inheritance Tree
  • NOC Number of Children
  • CBO Coupling between Children
  • RFC Response for a Class
  • LCOM Lack of Cohesion of MEthods

18
Comparison of CK results
19
Comparison of CK results
  WMC DIT RFC NOC CBO LCOM98
WMC 100 30 95 15 60 70
DIT   100 30 10 30 30
RFC     100 15 60 70
NOC       100 20 10
CBO         100 55
LCOM98           100
20
Bivariate Correlation Between Defects and Metrics
From Models
  • Similar results were shown on CK metric but
    little on others.
  • Only 3 versions of Rhino is shown here because
    number of faults are higher in these versions.
    (i.e. 178, 198, 201)

21
Bivariate Correlation Between Defects and Metric
Components
  • CK RTC and WMC good positive correlation,
  • QMOOD Dam is consistent negative correlation
  • QMOD NOM result might be wrong ??

22
Logistic Regression Analysis
  • Binary logistic regression analysis is done in
    different Rhino versions.
  • First univariate BLR to determine which metrics
    are good indicators of quality than multivariate
    BLR on CK and QMOOD.

23
Univariate Binary Logistic RegressionMetrics
versus Faults
  • The measures of association used are
  • Log Likelihood (LL)
  • P-value
  • Odds Ratio
  • Test statistics (G)
  • Hosmer-Lemeshow (HL)
  • CKCBO is significant in 5/6 versions.
  • CK-LCOM98 is significant 4/6 versions
  • CK-RFC QMOOD-CIS are significant on all 6
    versions
  • MOOD metrics were significant only in 2/6 of the
    versions

24
Multivariate Binary Logistic Regression
  • Performed a collinearity analysis to determine
    which models to use in MBLR.
  • There are correlation between 2 variables in
    every metric suite models. That means a potential
    Collinearity problem (i.e. there are dependent
    variables in the model).
  • To remove the dependent variables and build a
    model using independent variables, VIF (variance
    inflation factor) of all possible repressors will
    be computed. The ones with the multicollinearity
    problem will be remove and reevaluated.
  • Also Condition Number is an indicator of
    Multicollinearity will be examined. (CN Largest
    EigenValue/ Rest of Eigs)

25
MBLR Parameter Selection
26
MBLR Parameter SelectionModel1
27
MBLR Parameter SelectionModel2
28
MBLR Parameter SelectionModel3
Similar selections are done for QMOOD and MOOD
and 2 models from each suites are selected and
created.
29
MBLR RESULTS
  • 3 models using CK, 2 using MOOD and 2 using QMOOD
    is created.
  • Model1 for CK was successful but univariate BLR
    for CK-WMC is also successful. So why use
    Multivariate?
  • Some other parameters give significant result for
    different versions.
  • Models for MOOD were unsuccessful.
  • 2 models of QMOOD were shown significant, and 2
    different parameters were important in 2 models,
    so using multivariate might help.

30
MBLR Model Validation
  • The results are shown for small and large
    classes, because size of the class might bias the
    errors in the code. Also only the concordant
    (defectives remain defective) values will be
    shown.
  • Results.
  • Models are able to classify fault prone classes.
  • There is a general deterioration of effectiveness
    of the metrics as the software progress the
    versions.
  • CK and QMOOD models performed better than MOOD
    models

31
MBLR Model Validation
32
MBLR Model Validation
  • Also effectiveness deteriorate faster in
    successive version for small classes.

33
Conclusions.
34
Conclusion
  • Authors conducted a statistical analysis of the
    CK, MOOD, and QMOOD OO class metrics suites using
    six versions of Mozillas Rhino open source
    software
  • Primary contribution is using OO metrics to
    predict defects in agile SW.
  • Another contribution is the empirical study of
    the QMOOD and MOOD metrics
  • CK-WMC, CK-RFC, QMOOD-CIS, and QMOOD-NOM are
    consistent predictors of class quality
    (error-proneness).

35
Conclusion
  • The MOOD metrics were not useful as predictors of
    OO class quality.
  • CK metrics suite produced the best three models
    for predicting OO class quality, followed closely
    by one QMOOD model.
  • CK metrics have been shown to be better and more
    reliable predictors of fault-proneness than the
    MOOD or QMOOD metrics.
  • Class size can impact metric performance.
  • There are practical limitations to the
    effectiveness of the metrics over the course of
    several software iterations as the software
    matures and the dynamic nature of the software
    development process subsides.

36
Future Work
  • Complexity-related measures may be effective in
    detecting error-prone classes in highly iterative
    or agile processes
  • The decision trees may be more effective than
    binary logistic regression in detecting
    error-prone classes in highly iterative or agile
    processes.
  • Various aspects of OO complexity proposed in
    previous studies and implemented in various
    metrics suites may be better predictors of OO
    class quality in highly iterative or agile
    systems.
  • The use of decision trees using metrics from this
    study.
Write a Comment
User Comments (0)
About PowerShow.com