Title: A Constrained Regression Technique for COCOMO Calibration
1A Constrained Regression Technique for COCOMO
Calibration
- Presented by
- Vu Nguyen
- On behalf of
- Vu Nguyen, Bert Steece, Barry Boehm
- nguyenvu, berts, boehm_at_usc.edu
2Outline
- Introduction
- Multiple Linear Regression
- OLS, Stepwise, Lasso, Ridge
- Constrained Linear Regression
- Validation and Comparison
- COCOMO overview
- Cross validation
- Conclusions
- Limitations
- Future Work
3Introduction
- Building software estimation models is a search
problem to find the best possible parameters that
- generate high prediction accuracy
- satisfy predefined constraints
4Multiple Linear Regression
- Multiple linear regression is presented as
- yi ?0 ?1xi1 ?kxik ?i , i 1,2,,
n - Where,
- ?0, ?1,, ?k are the coefficients
- n is the number of observations
- k is the number of variables
- xij is the value of the variable jth for the ith
observation - yi is the response of the ith observation
5Ordinary Least Squares
- OLS is the most common method to estimate
coefficients ?0, ?1,, ?k - OLS estimates coefficients by minimizing the sum
of squared errors (SSE) - Minimize
- is the estimate of ith observation
6Some Limitations of OLS
- Highly sensitive to outliers
- Low bias but high variance (e.g., caused by
collinearity or overfitting) - Unable to constrain the estimates of coefficients
- Estimated coefficients may be counter-intuitive
- Example, OLS coefficient estimate for RUSE is
negative, e.g., increase RUSE rating results in a
decrease in effort
OLS estimates
Develop for Reuse (RUSE)
7Some Other Approaches
- Stepwise (forward selection)
- Start with no variable and gradually add
variables until optimal solution is achieved - Ridge
- Minimize SSE and impose a penalty on sum of
squared coefficients - Lasso
- Minimize SSE and impose a penalty on sum of
absolute coefficients
8Outline
- Introduction
- Multiple Linear Regression
- OLS, Stepwise, Lasso, Ridge
- Constrained Linear Regression
- Validation
- COCOMO overview
- Cross validation
- Conclusions
- Limitations
- Future Work
9Constrained Regression
- Principles
- Use optimization paradigm optimizing objective
function with constraint - Minimize f(y, X) subject to cf(z)
- Impose constraints on coefficients and relative
error - Expect to reduce variance by reducing the number
of variables (variance and bias tradeoff)
10Constrained Regression (cont)
- General form
- Minimize subject to
- Constrained Minimum Sum of Squared Errors (CMSE)
-
-
- Constrained Minimum Sum of Absolute Errors (CMAE)
- Constrained Minimum Sum of Relative Errors (CMRE)
-
11Solve the Equations
- Solving the equations is an optimization problem
- CMSE quadratic programming
- CMRE and CMAE transformed to the form of linear
programming - We used lpsolve and quadprog packages in R
- Determine parameter c using cross-validation
12Outline
- Introduction
- Multiple Linear Regression
- OLS, Stepwise, Lasso, Ridge
- Constrained Linear Regression
- Validation and comparison
- COCOMO overview
- Cross validation
- Conclusions
- Limitations
- Future Work
13Validation and Comparison
- Two COCOMO datasets
- COCOMO 2000 161 projects
- COCOMO 81 63 projects
- Comparing with popular model building approaches
- OLS
- Stepwise
- Lasso
- Ridge
- Cross-validation
- 10-fold cross validation
14COCOMO
- Cost Constructive Model (COCOMO) first published
in 1981 - Calibrated using 63 projects (COCOMO 81 dataset)
- Uses SLOC as a size measure and 15 cost drivers
- COCOMO II published in 2000
- Reflects changes in technologies and practices
- Uses 22 cost drivers plus size measure
- Introduces 5 scale factors
- Calibrated using 161 data points (COCOMO II
dataset)
15COCOMO Overview (cont)
- COCOMO Effort Equation, non-linear
- Linearize the model using log-transformation
- COCOMO 81
- log(PM) ?0 ?1 log(Size) ?2 log(EM1)
?16 log(EM15) - COCOMO II
- log(PM) ?0 ?1 log(Size) ?i SFi
log(Size) ?j log(EMj) - Estimate coefficients using a linear regression
method
16Model Accuracy Measures
- Magnitude of relative errors (MRE)
- Mean of MRE (MMRE)
- Prediction Level PRED(l) k/N
- Where, k is the number of estimates with MRE
l
17Cross Validation
- 10-fold cross validation was used
- Step 1. Randomly split the dataset into K10
subsets - Step 2. For each i 1 ... 10
- Remove the subset i th and build the model
- i th subset is used as testing set to calculate
MMREi and PRED(l)I - Step 3. Repeat 1 and 2 for r15 times
18Non-cross validation results
CMSE CMSE CMAE CMAE CMRE CMRE
Max. MRE (c) MMRE PRED MMRE PRED MMRE PRED
infinity 0.23 0.78 0.22 0.81 0.21 0.78
1.2 0.23 0.78 0.21 0.81 0.21 0.80
1.0 0.23 0.76 0.21 0.77 0.21 0.77
0.8 0.23 0.72 0.22 0.73 0.21 0.75
0.6 0.24 0.68 0.25 0.65 0.23 0.70
COCOMO II dataset (N 161)
OLS Max MRE1.23 PRED0.78
CMSE CMSE CMAE CMAE CMRE CMRE
Max. MRE (c) MMRE PRED MMRE PRED MMRE PRED
infinity 0.30 0.62 0.29 0.65 0.30 0.68
1.2 0.30 0.65 0.28 0.65 0.28 0.62
1.0 0.30 0.62 0.29 0.62 0.29 0.59
0.8 0.30 0.58 0.29 0.60 0.30 0.59
COCOMO 81 dataset (N 63)
PRED(0.3)
19Cross-validation Results
COCOMO II dataset
20Statistical Significance
- Results of statistical significance tests on MMRE
(0.05 confidence level used) - Mann-Whitney U hypothesis test
COCOMOII.2000 COCOMO 81
There are statistical differences among Lasso, Ridge, OLS, Stepwise p gt 0.11 p gt 0.15
CMSE outperforms Ridge, OLS p gt
0.10 p gt 0.10
CMSE outperforms Lasso, Stepwise p lt
0.02 p gt 0. 05
CMAE outperforms Lasso, Ridge, OLS p lt 10-3
p lt 0. 02
Stepwise
CMRE outperforms Lasso, Ridge, OLS p lt 10-4
p lt 10-4
Stepwise
21Comparing With Published Results
- Some best published results in for COCOMO
datasets - Bayesian analysis (Boehm et al., 2000)
- Chen et al., 2006
- Best cross-validated mean PRED(.30)
Dataset Bayesian Chen et al CMRE
COCOMO II 70 NA 75
COCOMO 81 NA 51 56
22Productivity Range
CMRE A 2.27 B 0.98
COCOMO II.2000 A 2.94 B 0.91
23Outline
- Introduction
- Multiple Linear Regression
- OLS, Stepwise, Lasso, Ridge
- Constrained Linear Regression
- Validation and comparison
- COCOMO overview
- Cross validation
- Conclusions
- Limitations
- Future Work
24Conclusions
- Technique imposes constraints on the estimates of
coefficients and the magnitude of errors term - Directly resolving the unexpected estimates of
coefficients determined by data - Estimation accuracies are favorable
- CMRE and CMAE outperform OLS, Stepwise, Ridge,
Lasso, and CMSE - MRE and MAE are favorable objective functions
- Technique can be applied in not only COCOMO-like
models but also other linear models - An alternative for researchers and practitioners
to build models
25Limitations
- As the technique deals with the optimization,
sub-optimal solution is returned instead of
global-optimal one - Multiple solutions exist for the estimates of
coefficients - There are only two datasets investigated, the
technique might not work well on other datasets
26Future Work
- Validate the technique using other datasets
(e.g., NASA datasets) - Compare results from the technique with others
such as neutral networks, generic programming - Apply and compare with other objective functions
- MdMRE (median of MRE)
- Z measure (zestimate/actual)
27References
- Boehm et al., 2000. B. Boehm, E. Horowitz, R.
Madachy, D. Reifer, B. K. Clark, B. Steece, A. W.
Brown, S. Chulani, and C. Abts, Software Cost
Estimation with COCOMO II. Prentice Hall, 2000. - Chen et al., 2000, Z. Chen, T. Menzies, D. Port,
and B. Boehm. Finding the right data for software
cost modeling. IEEE Software, Nov 2005.
28Thank You