A Constrained Regression Technique for COCOMO Calibration - PowerPoint PPT Presentation

About This Presentation

Title:

A Constrained Regression Technique for COCOMO Calibration

Description:

Uses 22 cost drivers plus size measure. Introduces 5 scale factors ... Results of statistical significance tests on MMRE (0.05 confidence level used) ... – PowerPoint PPT presentation

Number of Views:122

Avg rating:3.0/5.0

Slides: 29

Provided by: VuNg7

Learn more at: https://csse.usc.edu

Category:

more less

Transcript and Presenter's Notes

Title: A Constrained Regression Technique for COCOMO Calibration

1
A Constrained Regression Technique for COCOMO
Calibration

Presented by
Vu Nguyen
On behalf of
Vu Nguyen, Bert Steece, Barry Boehm
nguyenvu, berts, boehm_at_usc.edu

2
Outline

Introduction
Multiple Linear Regression
OLS, Stepwise, Lasso, Ridge
Constrained Linear Regression
Validation and Comparison
COCOMO overview
Cross validation
Conclusions
Limitations
Future Work

3
Introduction

Building software estimation models is a search
problem to find the best possible parameters that
generate high prediction accuracy
satisfy predefined constraints

4
Multiple Linear Regression

Multiple linear regression is presented as
yi ?0 ?1xi1 ?kxik ?i , i 1,2,,
n
Where,
?0, ?1,, ?k are the coefficients
n is the number of observations
k is the number of variables
xij is the value of the variable jth for the ith
observation
yi is the response of the ith observation

5
Ordinary Least Squares

OLS is the most common method to estimate
coefficients ?0, ?1,, ?k
OLS estimates coefficients by minimizing the sum
of squared errors (SSE)
Minimize
is the estimate of ith observation

6
Some Limitations of OLS

Highly sensitive to outliers
Low bias but high variance (e.g., caused by
collinearity or overfitting)
Unable to constrain the estimates of coefficients
Estimated coefficients may be counter-intuitive
Example, OLS coefficient estimate for RUSE is
negative, e.g., increase RUSE rating results in a
decrease in effort

OLS estimates
Develop for Reuse (RUSE)
7
Some Other Approaches

Stepwise (forward selection)
Start with no variable and gradually add
variables until optimal solution is achieved
Ridge
Minimize SSE and impose a penalty on sum of
squared coefficients
Lasso
Minimize SSE and impose a penalty on sum of
absolute coefficients

8
Outline

Introduction
Multiple Linear Regression
OLS, Stepwise, Lasso, Ridge
Constrained Linear Regression
Validation
COCOMO overview
Cross validation
Conclusions
Limitations
Future Work

9
Constrained Regression

Principles
Use optimization paradigm optimizing objective
function with constraint
Minimize f(y, X) subject to cf(z)
Impose constraints on coefficients and relative
error
Expect to reduce variance by reducing the number
of variables (variance and bias tradeoff)

10
Constrained Regression (cont)

General form
Minimize subject to
Constrained Minimum Sum of Squared Errors (CMSE)
Constrained Minimum Sum of Absolute Errors (CMAE)
Constrained Minimum Sum of Relative Errors (CMRE)

11
Solve the Equations

Solving the equations is an optimization problem
CMSE quadratic programming
CMRE and CMAE transformed to the form of linear
programming
We used lpsolve and quadprog packages in R
Determine parameter c using cross-validation

12
Outline

Introduction
Multiple Linear Regression
OLS, Stepwise, Lasso, Ridge
Constrained Linear Regression
Validation and comparison
COCOMO overview
Cross validation
Conclusions
Limitations
Future Work

13
Validation and Comparison

Two COCOMO datasets
COCOMO 2000 161 projects
COCOMO 81 63 projects
Comparing with popular model building approaches
OLS
Stepwise
Lasso
Ridge
Cross-validation
10-fold cross validation

14
COCOMO

Cost Constructive Model (COCOMO) first published
in 1981
Calibrated using 63 projects (COCOMO 81 dataset)
Uses SLOC as a size measure and 15 cost drivers
COCOMO II published in 2000
Reflects changes in technologies and practices
Uses 22 cost drivers plus size measure
Introduces 5 scale factors
Calibrated using 161 data points (COCOMO II
dataset)

15
COCOMO Overview (cont)

COCOMO Effort Equation, non-linear
Linearize the model using log-transformation
COCOMO 81
log(PM) ?0 ?1 log(Size) ?2 log(EM1)
?16 log(EM15)
COCOMO II
log(PM) ?0 ?1 log(Size) ?i SFi
log(Size) ?j log(EMj)
Estimate coefficients using a linear regression
method

16
Model Accuracy Measures

Magnitude of relative errors (MRE)
Mean of MRE (MMRE)
Prediction Level PRED(l) k/N
Where, k is the number of estimates with MRE
l

17
Cross Validation

10-fold cross validation was used
Step 1. Randomly split the dataset into K10
subsets
Step 2. For each i 1 ... 10
Remove the subset i th and build the model
i th subset is used as testing set to calculate
MMREi and PRED(l)I
Step 3. Repeat 1 and 2 for r15 times

18
Non-cross validation results
CMSE CMSE CMAE CMAE CMRE CMRE
Max. MRE (c) MMRE PRED MMRE PRED MMRE PRED
infinity 0.23 0.78 0.22 0.81 0.21 0.78
1.2 0.23 0.78 0.21 0.81 0.21 0.80
1.0 0.23 0.76 0.21 0.77 0.21 0.77
0.8 0.23 0.72 0.22 0.73 0.21 0.75
0.6 0.24 0.68 0.25 0.65 0.23 0.70
COCOMO II dataset (N 161)
OLS Max MRE1.23 PRED0.78
CMSE CMSE CMAE CMAE CMRE CMRE
Max. MRE (c) MMRE PRED MMRE PRED MMRE PRED
infinity 0.30 0.62 0.29 0.65 0.30 0.68
1.2 0.30 0.65 0.28 0.65 0.28 0.62
1.0 0.30 0.62 0.29 0.62 0.29 0.59
0.8 0.30 0.58 0.29 0.60 0.30 0.59
COCOMO 81 dataset (N 63)
PRED(0.3)
19
Cross-validation Results
COCOMO II dataset
20
Statistical Significance

Results of statistical significance tests on MMRE
(0.05 confidence level used)
Mann-Whitney U hypothesis test

COCOMOII.2000 COCOMO 81
There are statistical differences among Lasso, Ridge, OLS, Stepwise p gt 0.11 p gt 0.15

CMSE outperforms Ridge, OLS p gt
0.10 p gt 0.10
CMSE outperforms Lasso, Stepwise p lt
0.02 p gt 0. 05
CMAE outperforms Lasso, Ridge, OLS p lt 10-3
p lt 0. 02
Stepwise
CMRE outperforms Lasso, Ridge, OLS p lt 10-4
p lt 10-4
Stepwise
21
Comparing With Published Results

Some best published results in for COCOMO
datasets
Bayesian analysis (Boehm et al., 2000)
Chen et al., 2006
Best cross-validated mean PRED(.30)

Dataset Bayesian Chen et al CMRE
COCOMO II 70 NA 75
COCOMO 81 NA 51 56
22
Productivity Range
CMRE A 2.27 B 0.98
COCOMO II.2000 A 2.94 B 0.91
23
Outline

Introduction
Multiple Linear Regression
OLS, Stepwise, Lasso, Ridge
Constrained Linear Regression
Validation and comparison
COCOMO overview
Cross validation
Conclusions
Limitations
Future Work

24
Conclusions

Technique imposes constraints on the estimates of
coefficients and the magnitude of errors term
Directly resolving the unexpected estimates of
coefficients determined by data
Estimation accuracies are favorable
CMRE and CMAE outperform OLS, Stepwise, Ridge,
Lasso, and CMSE
MRE and MAE are favorable objective functions
Technique can be applied in not only COCOMO-like
models but also other linear models
An alternative for researchers and practitioners
to build models

25
Limitations

As the technique deals with the optimization,
sub-optimal solution is returned instead of
global-optimal one
Multiple solutions exist for the estimates of
coefficients
There are only two datasets investigated, the
technique might not work well on other datasets

26
Future Work

Validate the technique using other datasets
(e.g., NASA datasets)
Compare results from the technique with others
such as neutral networks, generic programming
Apply and compare with other objective functions
MdMRE (median of MRE)
Z measure (zestimate/actual)

27
References

Boehm et al., 2000. B. Boehm, E. Horowitz, R.
Madachy, D. Reifer, B. K. Clark, B. Steece, A. W.
Brown, S. Chulani, and C. Abts, Software Cost
Estimation with COCOMO II. Prentice Hall, 2000.
Chen et al., 2000, Z. Chen, T. Menzies, D. Port,
and B. Boehm. Finding the right data for software
cost modeling. IEEE Software, Nov 2005.

28
Thank You

Write a Comment

User Comments (0)