A Constrained Regression Technique for COCOMO Calibration - PowerPoint PPT Presentation

About This Presentation
Title:

A Constrained Regression Technique for COCOMO Calibration

Description:

Uses 22 cost drivers plus size measure. Introduces 5 scale factors ... Results of statistical significance tests on MMRE (0.05 confidence level used) ... – PowerPoint PPT presentation

Number of Views:122
Avg rating:3.0/5.0
Slides: 29
Provided by: VuNg7
Learn more at: https://csse.usc.edu
Category:

less

Transcript and Presenter's Notes

Title: A Constrained Regression Technique for COCOMO Calibration


1
A Constrained Regression Technique for COCOMO
Calibration
  • Presented by
  • Vu Nguyen
  • On behalf of
  • Vu Nguyen, Bert Steece, Barry Boehm
  • nguyenvu, berts, boehm_at_usc.edu

2
Outline
  • Introduction
  • Multiple Linear Regression
  • OLS, Stepwise, Lasso, Ridge
  • Constrained Linear Regression
  • Validation and Comparison
  • COCOMO overview
  • Cross validation
  • Conclusions
  • Limitations
  • Future Work

3
Introduction
  • Building software estimation models is a search
    problem to find the best possible parameters that
  • generate high prediction accuracy
  • satisfy predefined constraints

4
Multiple Linear Regression
  • Multiple linear regression is presented as
  • yi ?0 ?1xi1 ?kxik ?i , i 1,2,,
    n
  • Where,
  • ?0, ?1,, ?k are the coefficients
  • n is the number of observations
  • k is the number of variables
  • xij is the value of the variable jth for the ith
    observation
  • yi is the response of the ith observation

5
Ordinary Least Squares
  • OLS is the most common method to estimate
    coefficients ?0, ?1,, ?k
  • OLS estimates coefficients by minimizing the sum
    of squared errors (SSE)
  • Minimize
  • is the estimate of ith observation

6
Some Limitations of OLS
  • Highly sensitive to outliers
  • Low bias but high variance (e.g., caused by
    collinearity or overfitting)
  • Unable to constrain the estimates of coefficients
  • Estimated coefficients may be counter-intuitive
  • Example, OLS coefficient estimate for RUSE is
    negative, e.g., increase RUSE rating results in a
    decrease in effort

OLS estimates
Develop for Reuse (RUSE)
7
Some Other Approaches
  • Stepwise (forward selection)
  • Start with no variable and gradually add
    variables until optimal solution is achieved
  • Ridge
  • Minimize SSE and impose a penalty on sum of
    squared coefficients
  • Lasso
  • Minimize SSE and impose a penalty on sum of
    absolute coefficients

8
Outline
  • Introduction
  • Multiple Linear Regression
  • OLS, Stepwise, Lasso, Ridge
  • Constrained Linear Regression
  • Validation
  • COCOMO overview
  • Cross validation
  • Conclusions
  • Limitations
  • Future Work

9
Constrained Regression
  • Principles
  • Use optimization paradigm optimizing objective
    function with constraint
  • Minimize f(y, X) subject to cf(z)
  • Impose constraints on coefficients and relative
    error
  • Expect to reduce variance by reducing the number
    of variables (variance and bias tradeoff)

10
Constrained Regression (cont)
  • General form
  • Minimize subject to
  • Constrained Minimum Sum of Squared Errors (CMSE)
  • Constrained Minimum Sum of Absolute Errors (CMAE)
  • Constrained Minimum Sum of Relative Errors (CMRE)

11
Solve the Equations
  • Solving the equations is an optimization problem
  • CMSE quadratic programming
  • CMRE and CMAE transformed to the form of linear
    programming
  • We used lpsolve and quadprog packages in R
  • Determine parameter c using cross-validation

12
Outline
  • Introduction
  • Multiple Linear Regression
  • OLS, Stepwise, Lasso, Ridge
  • Constrained Linear Regression
  • Validation and comparison
  • COCOMO overview
  • Cross validation
  • Conclusions
  • Limitations
  • Future Work

13
Validation and Comparison
  • Two COCOMO datasets
  • COCOMO 2000 161 projects
  • COCOMO 81 63 projects
  • Comparing with popular model building approaches
  • OLS
  • Stepwise
  • Lasso
  • Ridge
  • Cross-validation
  • 10-fold cross validation

14
COCOMO
  • Cost Constructive Model (COCOMO) first published
    in 1981
  • Calibrated using 63 projects (COCOMO 81 dataset)
  • Uses SLOC as a size measure and 15 cost drivers
  • COCOMO II published in 2000
  • Reflects changes in technologies and practices
  • Uses 22 cost drivers plus size measure
  • Introduces 5 scale factors
  • Calibrated using 161 data points (COCOMO II
    dataset)

15
COCOMO Overview (cont)
  • COCOMO Effort Equation, non-linear
  • Linearize the model using log-transformation
  • COCOMO 81
  • log(PM) ?0 ?1 log(Size) ?2 log(EM1)
    ?16 log(EM15)
  • COCOMO II
  • log(PM) ?0 ?1 log(Size) ?i SFi
    log(Size) ?j log(EMj)
  • Estimate coefficients using a linear regression
    method

16
Model Accuracy Measures
  • Magnitude of relative errors (MRE)
  • Mean of MRE (MMRE)
  • Prediction Level PRED(l) k/N
  • Where, k is the number of estimates with MRE
    l

17
Cross Validation
  • 10-fold cross validation was used
  • Step 1. Randomly split the dataset into K10
    subsets
  • Step 2. For each i 1 ... 10
  • Remove the subset i th and build the model
  • i th subset is used as testing set to calculate
    MMREi and PRED(l)I
  • Step 3. Repeat 1 and 2 for r15 times

18
Non-cross validation results
CMSE CMSE CMAE CMAE CMRE CMRE
Max. MRE (c) MMRE PRED MMRE PRED MMRE PRED
infinity 0.23 0.78 0.22 0.81 0.21 0.78
1.2 0.23 0.78 0.21 0.81 0.21 0.80
1.0 0.23 0.76 0.21 0.77 0.21 0.77
0.8 0.23 0.72 0.22 0.73 0.21 0.75
0.6 0.24 0.68 0.25 0.65 0.23 0.70
COCOMO II dataset (N 161)
OLS Max MRE1.23 PRED0.78
CMSE CMSE CMAE CMAE CMRE CMRE
Max. MRE (c) MMRE PRED MMRE PRED MMRE PRED
infinity 0.30 0.62 0.29 0.65 0.30 0.68
1.2 0.30 0.65 0.28 0.65 0.28 0.62
1.0 0.30 0.62 0.29 0.62 0.29 0.59
0.8 0.30 0.58 0.29 0.60 0.30 0.59
COCOMO 81 dataset (N 63)
PRED(0.3)
19
Cross-validation Results
COCOMO II dataset
20
Statistical Significance
  • Results of statistical significance tests on MMRE
    (0.05 confidence level used)
  • Mann-Whitney U hypothesis test

COCOMOII.2000 COCOMO 81
There are statistical differences among Lasso, Ridge, OLS, Stepwise p gt 0.11 p gt 0.15




CMSE outperforms Ridge, OLS p gt
0.10 p gt 0.10
CMSE outperforms Lasso, Stepwise p lt
0.02 p gt 0. 05
CMAE outperforms Lasso, Ridge, OLS p lt 10-3
p lt 0. 02
Stepwise
CMRE outperforms Lasso, Ridge, OLS p lt 10-4
p lt 10-4
Stepwise
21
Comparing With Published Results
  • Some best published results in for COCOMO
    datasets
  • Bayesian analysis (Boehm et al., 2000)
  • Chen et al., 2006
  • Best cross-validated mean PRED(.30)

Dataset Bayesian Chen et al CMRE
COCOMO II 70 NA 75
COCOMO 81 NA 51 56
22
Productivity Range
CMRE A 2.27 B 0.98
COCOMO II.2000 A 2.94 B 0.91
23
Outline
  • Introduction
  • Multiple Linear Regression
  • OLS, Stepwise, Lasso, Ridge
  • Constrained Linear Regression
  • Validation and comparison
  • COCOMO overview
  • Cross validation
  • Conclusions
  • Limitations
  • Future Work

24
Conclusions
  • Technique imposes constraints on the estimates of
    coefficients and the magnitude of errors term
  • Directly resolving the unexpected estimates of
    coefficients determined by data
  • Estimation accuracies are favorable
  • CMRE and CMAE outperform OLS, Stepwise, Ridge,
    Lasso, and CMSE
  • MRE and MAE are favorable objective functions
  • Technique can be applied in not only COCOMO-like
    models but also other linear models
  • An alternative for researchers and practitioners
    to build models

25
Limitations
  • As the technique deals with the optimization,
    sub-optimal solution is returned instead of
    global-optimal one
  • Multiple solutions exist for the estimates of
    coefficients
  • There are only two datasets investigated, the
    technique might not work well on other datasets

26
Future Work
  • Validate the technique using other datasets
    (e.g., NASA datasets)
  • Compare results from the technique with others
    such as neutral networks, generic programming
  • Apply and compare with other objective functions
  • MdMRE (median of MRE)
  • Z measure (zestimate/actual)

27
References
  • Boehm et al., 2000. B. Boehm, E. Horowitz, R.
    Madachy, D. Reifer, B. K. Clark, B. Steece, A. W.
    Brown, S. Chulani, and C. Abts, Software Cost
    Estimation with COCOMO II. Prentice Hall, 2000.
  • Chen et al., 2000, Z. Chen, T. Menzies, D. Port,
    and B. Boehm. Finding the right data for software
    cost modeling. IEEE Software, Nov 2005.

28
Thank You
  • QA
Write a Comment
User Comments (0)
About PowerShow.com