A Systematic Review of Cross- vs. Within-Company Cost Estimation Studies - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

A Systematic Review of Cross- vs. Within-Company Cost Estimation Studies

Description:

A Systematic Review of Cross- vs. Within-Company Cost Estimation Studies ... Prepare a systematic review to determine factors that influence the outcome of the ... – PowerPoint PPT presentation

Number of Views:83
Avg rating:3.0/5.0
Slides: 38
Provided by: emi84
Category:

less

Transcript and Presenter's Notes

Title: A Systematic Review of Cross- vs. Within-Company Cost Estimation Studies


1
A Systematic Review of Cross- vs. Within-Company
Cost Estimation Studies
EEL 6883 Research Paper Presentation
  • Barbara Kitchenham, Emilia Mendes, Guilherme
    Travassos
  • IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL.
    33, NO. 5, MAY 2007

Mustafa Ilhan Akbas Omer Bilal Orhan
2
Outline
  • Motivation
  • Objective
  • Method
  • Results
  • Conclusions
  • Comments

3
Motivation
  • Cross vs within-company cost estimation
  • Early studies suggested calibrating general
    purpose cost estimation models and using only
    single-company data. BUT
  • Time required to collect data
  • Older projects may not reflect current tech
  • Care is necessary in data collection
  • Cross-company models are favored. BUT

4
Motivation
  • 1999 Maxwell Within-company model is more
    accurate
  • 1999 Briand Cross-company model could be as
    acc.
  • 2000 Briand (with Maxwell data) Cross-comp.
    model can be as good as within data.
  • 2002 Wieczorek Ruhe Same trend with Briand data
  • 2005 Mendes - Same trend with another data set
  • But
  • 2000,2001Jeffrey - Within-company models are
    superior
  • 2003 Lefley Shepperd - Within-company model is
    more accurate with Briand data
  • 2004 Mendes, Kitchenham - Within-company models
    are significantly better

5
Motivation
  • Applicability of cross company models to the
    effort estimate for single company projects
    contradicts.

6
Objective
  • To determine under what conditions individual
    organisations are able to rely on
    cross-company-based estimation models
  • To provide advice to researchers about the value
    of cross-company models.

7
Method
  • Prepare a systematic review to determine factors
    that influence the outcome of the studies.
  • Discuss different variations in experimental
    procedure

8
Research Questions
  • For the review, the authors follow the
    approach of Kitchenham paper
  • Procedures For Performing Systematic Reviews
  • Point of view is formed by research questions
  • Question one
  • What evidence is there that cross-company
    estimation models are not significantly different
    from within-company estimation models for
    predicting effort for software/Web projects?

9
Research Questions
  • Question two
  • What characteristics of the study data sets and
    the data analysis methods used in the study
    affect the outcome of within-company and
    cross-company effort estimation accuracy studies?
  • Question three
  • Which experimental procedure is most appropriate
    for studies comparing within-company and
    cross-company effort estimation models?

10
Method Population, Intervention, Comparison,
Outcome
  • Population Cross-company benchmarking data bases
    of Web and software projects
  • Intervention Effort estimation models
    constructed from cross-company data, used to
    predict effort for single company projects
  • Comparison Intervention Effort estimation models
    constructed from the within- company data only
  • Outcome The accuracy of the cross- and
    within-company models

11
Search Strategy used for Primary Studies
  • The search terms used are constructed using the
    following strategy
  • Derive major terms from the questions by
    identifying the population, intervention and
    outcome
  • Identify alternative spellings and synonyms for
    major terms. Consultations with field experts
    and/or subject librarians to identify the terms
  • Check the keywords in any relevant papers we
    already have
  • Use the Boolean OR to incorporate alternative
    spellings and synonyms Use the Boolean AND to
    link the major terms from population,
    intervention and outcome.

12
The main search terms
  • Population software, Web, project.
  • Intervention cross-company, project, effort,
    estimation,model.
  • Comparison single-company, project, effort,
    estimation,model.
  • Outcomes prediction, estimate, accuracy

13
Sample search string
AND (software OR application OR product OR Web )
AND (method OR process OR system OR technique
OR methodology OR procedure) AND(cross company
OR multi organisation OR within organisation OR
single company OR single-organisational OR
company-specific) AND(model) AND(effort OR
cost) AND(estimation OR prediction OR
assessment) Complete set of search strings is
given in the paper.
14
Initial Search Phase
  • Identification of candidate primary sources based
    on authors knowledge, and searches of electronic
    databases using the derived search strings
  • 1344 papers were retrieved, 25 represented the
    set of 10 known papers.
  • Manual scan of titles and/or abstracts of all
    1344 papers

15
Databases/Journals Searched(from an earlier work)
  • Electronic Databases
  • INSPEC
  • El Compendex
  • Science Direct
  • Web of Science
  • IEEExplore
  • ACM Digital library
  • Individual journals (J) and conference
    proceedings (C)
  • Empirical Software Engineering (J)
  • Information and Software Technology (J)
  • Software Process Improvement and Practice (J)
  • Management Science (J)
  • International Software Metrics Symposium (C)
  • International Conference on Software Engineering
    (C)
  • Evaluation and Assessment in Software Engineering
    (manual search) (C)

16
Secondary search phase
  • Has two sub-phases
  • To review the references of each of the primary
    sources to find candidate primary sources
    repeatedly until no further relevant document is
    found.
  • To contact researchers who authored the primary
    sources in the first phase, or who could be
    working on the topic. Six researchers were
    contacted, no one was working in the area.

17
Study Selection
  • Criteria for including a primary study
  • Any study compared predictions of cross-company
    with within-company models based on analysis of
    single-company project data.
  • Criteria for excluding a primary study
  • If projects were only collected from a small
    number of different sources
  • If models derived from a within-company data set
    were compared with predictions from a general
    cost estimation model.

18
Study Quality Assessment
  • Part 1 The quality of the study itself.
  • Has four top-level questions and an additional
    quality issue related to data set size. (Weight
    1,5)
  • Is the data analysis process appropriate?
  • Did studies carry out a sensitivity or residual
    analysis?
  • Were accuracy statistics based on the raw data
    scale?
  • How good was the study comparison method?

19
Study Quality Assessment
  • Part 2 The quality of the provided reporting.
  • Has four top-level questions.
  • (Weight 1)
  • Is it clear what projects were used to construct
    each model?
  • Is it clear how accuracy was measured?
  • Is it clear what cross-validation method was
    used?
  • Were all model construction methods fully
    defined?

20
Quality
  • Quality is used in 2 different ways
  • as a score to ensure that results are not largely
    confounded with quality
  • a source of difference indicator between studies.
  • Quality of the study, not the model used.
  • The overall quality is good.
  • The factors varied between papers are size of
    data set, the method for predictions and
    performance of sensitivity analyses.

21
Data Extraction Strategy
  • For each paper a reviewer was nominated at random
    as data extractor, checker, or adjudicator.
  • Extractor Reads the paper and completes the
    form
  • Checker Reads the paper and verify the
    correctness of the form
  • Adjudicator If there is a disagreement between
    first two, then reads the paper and give the
    final decision.

22
Data Extraction Strategy
  • Roles were assigned at random with the following
    restrictions
  • No one should be data extractor on a paper he/she
    authored.
  • All reviewers should have an equal work load (as
    far as possible).

23
Results Question 1
What evidence is there that cross-company
estimation models are not significantly different
from within-company estimation models for
predicting effort for software/Web projects?

24
Results Question 1
  • The Studies are organized into 3 groups
  • Cross-company models are not significantly
    different from within-company models. (4 out of
    10)
  • Cross-company models are significantly worse than
    within-company models. (All accuracy statistics
    are better for within-co models) (6 out of 10)
  • Studies that didnt undertake formal statistical
    testing inconclusive ( 2 of them, S1 and S7)


25
Results Question 1
  • Four studies stating cross-company models are not
    significantly different. Uses leave-one-out,
    which biases positively towards within-company
    models.
  • S6 is not independent (uses S2 data), so this
    cannot be used as an evidence in group1.
  • S1 and S7 did not test the statistical
    significance. They are regarded as inconclusive
    and cannot be used as evidence either.

26
Results Question 2
  • What characteristics of the study data sets and
    the data analysis methods used in the study
    affect the outcome of within-company and
    cross-company effort estimation accuracy studies?
  • S10 contradicts that quality control makes cross-
    models as good as within-company models.
  • S3 and S1 take a different view on quality
    control (ESA database) Quality control isnt
    reliable.
  • S2 and S6 both agree that stringent quality
    control is applied to data collection.
  • Quality control can not ensure cross-company
    models perform as well as within-company models.

27
Results Question 2
  • No consistent evidence that the quality of the
    studies influences the results
  • S2 and S3 have lower scores
  • S10 has the highest quality score

28
Results Question 2
  • Number of projects in the within-company models.
  • There is noticeable difference in this number for
    S2, S3, S10 (median 63) and S4, S5, S8, S9
    (median 10) are compared.
  • All the studies where within-company predictions
    were significantly better than cross-company
    predictions used small within-company data sets
    of fair quality.
  • Similar pattern applies to the range of effort
    values for the entire database

29
Results Question 2
  • Number of projects in the within-company models.
  • No clear patterns were observed for the size
    metrics used, nor for the procedure used to build
    the within-company model

30
Results Question 2
  • The relationship between within-company and
    cross-company projects.
  • Tukutuku suggests, greater the difference between
    projects, less likely it is that the
    cross-company model will provide accurate
    predictions for single company project.
  • There is no clear indication that the strength of
    the cross-company relationship is a major factor
    in determining whether cross-company prediction
    models are as good as within-company models.

30
31
Results Question 3
  • Which experimental procedure is most
    appropriate for studies comparing within-company
    and cross-company estimation models?
  • There is a large variation in the adopted
    procedures.

31
32
Results Question 3
  • Studies aimed at assessing the conditions
    that would favor (or not) the use of a
    cross-company model should adopt the following
    procedure
  • Use new within-co data sets independent of
    existing cross-co data sets
  • Perform sensitivity analysis using residual
    analysis for non-regression-based methods and
    influence analysis for regression-based methods.
  • Use regression analysis as the default model
    construction method.
  • Use a stepwise approach on the cross-company data
    based on variables collected in within-company
    data set.
  • Apply data transformations appropriate to the
    specific application
  • Perform statistical tests based on the absolute
    residuals on the raw data scale.
  • Report the residuals for each model or the effort.

33
Results Question 3
  • Unable to provide definitive advice on cross
    validation but the authors believe that
    leave-one-out cross validation is not
    sufficiently stringent criterion.

33
34
Conclusions
  • Some organizations would benefit from models
    derived from cross-company databases, while some
    others would not.
  • The review is not able to conclusively explain
    the reason for this but shows some trends.

34
35
Conclusions
  • Some trends
  • In all cases where within-company datasets
    significantly outperformed, the datasets are
    small and cross validation method was not very
    stringent.
  • Within-co data is a subset of cross-co in all
    studies which shows no significant difference
    between two.
  • Similarly, the within-co data sets had been
    collected separately in half of the studies that
    shows within-company dataset is significantly
    better.

35
36
Conclusions
  • Authors advice
  • Consider the similarity of the projects in the
    cross-company dataset to your project and
    characteristics of your own company.
  • Further research is required. To researchers
  • Come to consensus about the appropriate
    experimental procedure for this type of study.
    (authors suggest their procedure ?)

36
37
Comments
  • There were no other reviews on the same topic
    that have been previously conducted.
  • The review criteria are not well-defined.
  • Only 6 of 10 studies give results for Q1.
  • No definitive results.
  • There is no information about company size for
    some projects.
  • If the projects undertaken in the company are
    similar to the dataset of cross-co model, it can
    be used. But deciding this similarity is another
    problem.
  • The authors contributed to 3 of 10 studies.
  • The paper cant go further away from the starting
    point.
Write a Comment
User Comments (0)
About PowerShow.com