Software Reliability Engineering: Techniques and Tools - PowerPoint PPT Presentation

Loading...

PPT – Software Reliability Engineering: Techniques and Tools PowerPoint presentation | free to download - id: 3c11e5-ZTkxY



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Software Reliability Engineering: Techniques and Tools

Description:

Techniques and Tools CS130 Winter, 2002 Source Material Software Reliability and Risk Management: Techniques and Tools , Allen Nikora and Michael Lyu, tutorial ... – PowerPoint PPT presentation

Number of Views:545
Avg rating:3.0/5.0
Slides: 199
Provided by: csUclaEd
Learn more at: http://www.cs.ucla.edu
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Software Reliability Engineering: Techniques and Tools


1
Software Reliability Engineering Techniques and
Tools
CS130 Winter, 2002
2
Source Material
  • Software Reliability and Risk Management
    Techniques and Tools, Allen Nikora and Michael
    Lyu, tutorial presented at the 1999 International
    Symposium on Software Reliability Engineering
  • Allen Nikora, John Munson, Determining Fault
    Insertion Rates For Evolving Software Systems,
    proceedings of the International Symposium on
    Software Reliability Engineering, Paderborn,
    Germany, November, 1998

3
Agenda
  • Part I Introduction
  • Part II Survey of Software Reliability Models
  • Part III Quantitative Criteria for Model
    Selection
  • Part IV Input Data Requirements and Data
    Collection Mechanisms
  • Part V Early Prediction of Software Reliability
  • Part VI Current Work in Estimating Fault Content
  • Part VII Software Reliability Tools

4
Part I Introduction
  • Reliability Measurement Goal
  • Definitions
  • Reliability Theory

5
Reliability Measurement Goal
  • Reliability measurement is a set of mathematical
    techniques that can be used to estimate and
    predict the reliability behavior of software
    during its development and operation.
  • The primary goal of software reliability modeling
    is to answer the following question
  • Given a system, what is the probability that it
    will fail in a given time interval, or, what is
    the expected duration between successive
    failures?

6
Basic Definitions
  • Software Reliability R(t) The probability of
    failure-free operation of a computer program for
    a specified time under a specified environment.
  • Failure The departure of program operation from
    user requirements.
  • Fault A defect in a program that causes
    failure.

7
Basic Definitions (contd)
  • Failure Intensity (rate) f(t) The expected
    number of failures experienced in a given time
    interval.
  • Mean-Time-To-Failure (MTTF) Expected value of a
    failure interval.
  • Expected total failures m(t) The number of
    failures expected in a time period t.

8
Reliability Theory
  • Let "T" be a random variable representing the
    failure time or lifetime of a physical system.
  • For this system, the probability that it will
    fail by time "t" is
  • The probability of the system surviving until
    time "t" is

9
Reliability Theory (contd)
  • Failure rate - the probability that a failure
    will occur in the interval t1, t2 given that a
    failure has not occurred before time t1. This is
    written as

10
Reliability Theory (contd)
  • Hazard rate - limit of the failure rate as the
    length of the interval approaches zero. This is
    written as
  • This is the instantaneous failure rate at time t,
    given that the system survived until time t. The
    terms hazard rate and failure rate are often used
    interchangeably.

11
Reliability Theory (contd)
  • A reliability objective expressed in terms of one
    reliability measure can be easily converted into
    another measure as follows (assuming an average
    failure rate, ? , is measured)

12
Reliability Theory (cont'd)
13
Part II Survey of Software Reliability Models
  • Software Reliability Estimation Models
  • Exponential NHPP Models
  • Jelinski-Moranda/Shooman Model
  • Musa-Okumoto Model
  • Geometric Model
  • Software Reliability Modeling and Acceptance
    Testing

14
Jelinski-Moranda/Shooman Models
  • Jelinski-Moranda model was developed by Jelinski
    and Moranda of McDonnell Douglas Astronautics
    Company for use on Navy NTDS software and a
    number of modules of the Apollo program. The
    Jelinski-Moranda model was published in
    1971.
  • Shooman's model, discovered independently of
    Jelinski and Moranda's work, was also published
    in 1971. Shooman's model is identical to the JM
    model.

15
Jelinski-Moranda/Shooman (cont'd)
  • Assumptions
  • The number of errors in the code is fixed.
  • No new errors are introduced into the code
    through the correction process.
  • The number of machine instructions is essentially
    constant.
  • Detections of errors are independent.
  • The software is operated in a similar manner as
    the anticipated operational usage.
  • The error detection rate is proportional to the
    number of errors remaining in the code.

16
Jelinski-Moranda/Shooman (cont'd)
  • Let ? represent the amount of debugging time
    spent on the system since the start of the test
    phase.
  • From assumption 6, we have
  • where K is the proportionality constant, and ?r
    is the error rate (number of remaining errors
    normalized with respect to the number of
    instructions).
  • ET number of errors initially in the program
  • IT number of machine instructions in the
    program
  • ?c cumulative number of errors fixed in the
    interval 0,????normalized by the number of
    instructions).

z(?) K?r(?)
?r(?) ET / IT - ?c(?)
17
Jelinski-Moranda/Shooman (cont'd)
  • ET and IT are constant (assumptions 1 and 3).
  • No new errors are introduced into the correction
    process (assumption 2).
  • As???? ?, ?c(?) ?? ET/IT, so ?r(?) ?? 0.
  • The hazard rate becomes

18
Jelinski-Moranda/Shooman (cont'd)
  • The reliability function becomes
  • The expression for MTTF is

19
Geometric Model
  • Proposed by Moranda in 1975 as a variation of the
    Jelinski-Moranda model.
  • Unlike models previously discussed, it does not
    assume that the number of errors in the program
    is finite, nor does it assume that errors are
    equally likely to occur.
  • This model assumes that errors become
    increasingly difficult to detect as debugging
    progresses, and that the program is never
    completely error free.

20
Geometric Model (cont'd)
  • Assumptions
  • There are an infinite number of total errors.
  • All errors do not have the same chance of
    detection.
  • The detections of errors are independent.
  • The software is operated in a similar manner as
    the anticipated operational usage.
  • The error detection rate forms a geometric
    progression and is constant between error
    occurrences.

21
Geometric Model (cont'd)
  • The above assumptions result in the following
    hazard rate
  • z(t) D?i-1
  • for any time "t" between the (i - 1)st and the
    i'th error.
  • The initial value of z(t) D

22
Geometric Model (cont'd)
Hazard Rate Graph
Hazard rate

D

D(1 - ?)
D(1 - ?? )
2
D?
2
D?
time

23
Musa-Okumoto Model
  • The Musa-Okumoto model assumes that the failure
    intensity function decreases exponentially with
    the number of failures observed
  • Since ?(t) d?(t)/dt, we have the following
    differential equation

  • or

24
Musa-Okumoto Model (contd)
  • Note that
  • We then obtain

25
Musa-Okumoto Model (contd)
  • Integrating this last equation yields
  • Since ?(0) 0, C 1, and the mean value
    function ?(t) is

26
Software Reliability Modeling and Acceptance
Testing
  • Given a piece of software advertised as having a
    failure rate ?, you can see if it meets that
    failure rate to a specific level of confidence.
  • ? is the risk (probability) of falsely saying
    that the software does not meet the failure rate
    goal.
  • ? is the risk of saying that the goal is met
    when it is not.
  • The discrimination ratio, ?, is the factor you
    specify that identifies acceptable departure from
    the goal. For instance, if ? 2, the acceptable
    failure rate lies between ?/2 and 2?.

27
Software Reliability Modeling and Acceptance
Testing (contd)
Reject
Continue
Failure Number
Accept
Normalized Failure Time (Time to failure times
failure intensity objective)
28
Software Reliability Modeling and Acceptance
Testing (contd)
  • We can now draw a chart as shown in the previous
    slide. Define intermediate quantities A and B as
    follows
  • The boundary between the reject and continue
    regions is given by
  • where n is number of failures observed. The
    boundary between the continue and accept
    regions of the chart is given by

29
Part III Criteria for Model Selection
  • Background
  • Non-Quantitative criteria
  • Quantitative criteria

30
Criteria for Model Selection - Background
  • When software reliability models first appeared,
    it was felt that a process of refinement would
    produce definitive models that would apply to
    all development and test situations
  • Current situation
  • Dozens of models have been published in the
    literature
  • Studies over the past 10 years indicate that the
    accuracy of the models is variable
  • Analysis of the particular context in which
    reliability measurement is to take place so as to
    decide a priori which model to use does not seem
    possible.

31
Criteria for Model Selection (contd)
  • Non-Quantitative Criteria
  • Model Validity
  • Ease of measuring parameters
  • Quality of assumptions
  • Applicability
  • Simplicity
  • Insensitivity to noise

32
Criteria for Model Selection (contd)
  • Quantitative Criteria for Post-Model Application
  • Self-consistency
  • Goodness-of-Fit
  • Relative Accuracy (Prequential Likelihood Ratio)
  • Bias (U-Plot)
  • Bias Trend (Y-Plot)

33
Criteria for Model Selection (contd)
  • Self-constency - Analysis of a models predictive
    quality can help user decide which model(s) to
    use.
  • Simplest question a SRM user can ask is How
    reliable is the software at this moment?
  • The time to the next failure, Ti, is usually
    predicted using observed times ot failure
  • In general, predictions of Ti can be made using
    observed times to failure
  • The results of predictions made for different
    values of K can then be compared. If a model
    produced self consistent results for differing
    values of K, this indicates that its use is
    appropriate for the data on which the particular
    predictions were made.
  • HOWEVER, THIS PROVIDES NO GUARANTEE THAT THE
    PREDICTIONS ARE CLOSE TO THE TRUTH.

34
Criteria for Model Selection (contd)
  • Goodness-of-fit - Kolmogorov-Smirnov Test
  • Uses the absolute vertical distance between two
    CDFs to measure goodness of fit.
  • Depends on the fact that
  • where F0 is a known, continuous CDF, and Fn is
    the sample CDF, is distribution free.

35
Criteria for Model Selection (contd)
  • Goodness-of-fit (contd) - Chi-Square Test
  • More suited to determining GOF of failure counts
    data than to interfailure times.
  • Value given by
  • where
  • n number of independent repetitions of an
    experiment in which the outcomes are decomposed
    into k1 mutually exclusive sets A1, A2,..., Ak1
  • Nj number of outcomes in the jth set
  • pj PAj

36
Criteria for Model Selection (contd)
  • Prequential Likelihood Ratio
  • The pdf for Fi(t) for Ti is based on observations
    . The pdf
  • For one-step ahead predictions of
    , the prequential likelihood is
  • Two prediction systems, A and B, can be evaluated
    by computing the prequential likelihood ratio
  • If PLRn approaches infinity as n approaches
    infinity, B is discarded in favor of A

37
Prequential Likelihood Example
fi2
fi1
fi
true pdf
High bias, low noise
true pdf
fi2
fi1
fi3
fi
Low bias, high noise
38
Criteria for Model Selection (contd)
  • Prequential Likelihood Ratio (cont'd)
  • When predictions have been made for
    , the PLR is given by
  • Using Bayes' Rule, the PLR is rewritten as

39
Criteria for Model Selection (contd)
  • Prequential Likelihood Ratio (contd)
  • This equals
  • If the initial conditions were based only on
    prior belief, the second factor of the final
    equation is the prior odds ratio. If the user is
    indifferent between models A and B, this ratio
    has a value of 1.

40
Criteria for Model Selection (contd)
  • Prequential Likelihood Ratio (contd)
  • The final equation is then written as
  • This is the posterior odds ratio, where wA is
    the posterior belief that A is true after making
    predictions with both A and B and comparing them
    with actual behavior.

41
Criteria for Model Selection (contd)
  • The u-plot can be used to assess the predictive
    quality of a model
  • Given a predictor, , that estimates the
    probability that the time to the next failure is
    less than t. Consider the sequence
  • where each is a probability integral
    transform of the observed ti using the previously
    calculated predictor based upon
    .
  • If each were identical to the true, but
    hidden, , then the would be realizations of
    independent random variables with a uniform
    distribution in 0,1.
  • The problem then reduces to seeing how closely
    the sequence resembles a random sample from
    0,1

42
U-Plots for JM and LV Models
1.0
JM
LV
0.5
0
0
1.0
0.5
43
Criteria for Model Selection (contd)
  • The y-plot
  • Temporal ordering is not shown in a u-plot. The
    y-plot addresses this deficiency
  • To generate a y-plot, the following steps are
    taken
  • Compute the sequence of
  • For each , compute
  • Obtain by computing
  • for i ? m, m representing the number of
    observations made
  • If the really do form a sequence of
    independent random variables in 0,1, the slope
    of the plotted will be constant.

44
Y-Plots for JM and LV Models
1.0
LV
JM
0.5
0
0
1.0
0.5
45
Criteria for Model Selection (contd)
  • Quantitative Criteria Prior to Model Application
  • Arithmetical Mean of Interfailure Times
  • Laplace Test

46
Arithmetical Mean of Interfailure Times
  • Calculate arithmetical mean of interfailure times
    as follows
  • i number of observed failures
  • ?j jth interfailure time
  • Increasing series of t(i) suggests reliability
    growth.
  • Decreasing series of t(i) suggests reliability
    decrease.

47
Laplace Test
  • The occurrence of failures is assumed to follow a
    non-homogeneous Poisson process whose failure
    intensity is decreasing
  • Null hypothesis is that occurrences of failures
    follow a homogeneous Poisson process (I.e., b0
    above).
  • For interfailure times, test statistic computed
    by

48
Laplace Test (contd)
  • For interval data, test statistic computed by

49
Laplace Test (contd)
  • Interpretation
  • Negative values of the Laplace factor indicate
    decreasing failure intensity.
  • Positive values suggest an increasing failure
    intensity.
  • Values varying between 2 and -2 indicate stable
    reliability.
  • Significance is that associated with normal
    distribution e.g.
  • The null hypothesis H0 HPP vs. H1
    decreasing failure intensity is rejected at the
    5 significance level for m(T) lt -1.645
  • The null hypothesis H0 HPP vs. H1
    increasing failure intensity is rejected at the
    5 significance level for m(T) gt -1.645
  • The null hypothesis H0 HPP vs. H1 there is
    a trend is rejected at the 5 significance level
    for m(T) gt 1.96

50
Part IV Input Data Requirements and Data
Collection Mechanisms
  • Model Inputs
  • Time Between Successive Failures
  • Failure Counts and Test Interval Lengths
  • Setting up a Data Collection Mechanism
  • Minimal Set of Required Data
  • Data Collection Mechanism Examples

51
Input Data Requirements and Data Collection
Mechanisms
  • Model Inputs - Time between Successive Failures
  • Most of the models discussed in Section II
    require the times between successive failures as
    inputs.
  • Preferred units of time are expressed in CPU time
    (e.g., CPU seconds between subsequent failures).
  • Allows computation of reliability independent of
    wall-clock time.
  • Reliability computations in one environment can
    be easily transformed into reliability estimates
    in another, provided that the operational
    profiles in both environments are the same and
    that the instruction execution rates of the
    original environment and the new environment can
    be related.

52
Input Data Requirements and Data Collection
Mechanisms (contd)
  • Model Inputs - Time between Successive Failures
    (contd)
  • Advantage - CPU time between successive failures
    tends to more accurately characterize the failure
    history of a software system than calendar time.
    Accurate CPU time between failures can give
    greater resolution than other types of data.
  • Disadvantage - CPU time between successive
    failures can often be more difficult to collect
    than other types of failure history data.

53
Input Data Requirements and Data Collection
Mechanisms (contd)
  • Model Inputs ( contd) - Failure Counts and Test
    Interval Lengths
  • Failure history can be collected in terms of test
    interval lengths and the number of failures
    observed in each interval. Several of the models
    described in Section II use this type of input.
  • The failure reporting systems of many
    organizations will more easily support collection
    of this type of data rather than times between
    successive failures. In particular, the use of
    automated test systems can easily establish the
    length of each test interval. Analysis of the
    test run will then provide the number of failures
    for that interval.
  • Disadvantage - failure counts data does not
    provide the resolution that accurately collected
    times between failures provide.

54
Input Data Requirements and Data Collection
Mechanisms (contd)
  • Setting up a Data Collection Mechanism
  • 1. Establish clear, consistent objectives.
  • 2. Develop a plan for the data collection
    process. Involve all individuals concerned (e.g.
    software designers, testers, programmers,
    managers, SQA and SCM staff). Address the
    following issues
  • a. Frequency of data collection.
  • b. Data collection responsibilities
  • c. Data formats
  • d. Processing and storage of data
  • e. Assuring integrity of data/adherence to
    objectives
  • f. Use of existing mechanisms to collect data

55
Input Data Requirements and Data Collection
Mechanisms (contd)
  • Setting up a Data Collection Mechanism (contd)
  • 3. Identify and evaluate tools to support data
    collection effort.
  • 4. Train all parties in use of selected tools.
  • 5. Perform a trial run of the plan prior to
    finalizing it.
  • 6. Monitor the data collection process on a
    regular basis (e.g. weekly intervals) to assure
    that objectives are being met, determine current
    reliability of software, and identify problems in
    collecting/analyzing the data.
  • 7. Evaluate the data on a regular basis. Assess
    software reliability as testing proceeds, not
    only at scheduled release time.
  • 8. Provide feedback to all parties during data
    collec-tion/analysis effort.

56
Input Data Requirements and Data Collection
Mechanisms (contd)
  • Minimal Set of Required Data - to measure
    software reliability during test, the following
    minimal set of data should be collected by a
    development effort
  • Time between successive failures OR test interval
    lengths/number of failures per test interval.
  • Functional area tested during each interval.
  • Date on which functionality was added to software
    under test identifier for functionality added.
  • Number of testers vs. time.
  • Dates on which testing environment changed, and
    nature of changes.
  • Dates on which test method changed.

57
Part VI Early Prediction of Software Reliability
  • Background
  • RADC Study
  • Phase-Based Model

58
Part VI Background
  • Modeling techniques discussed in preceding
    sections can be applied only during test phases.
  • These techniques do not take into account
    structural properties of the system being
    developed or characteristics of the development
    environment.
  • Current techniques can measure software
    reliability, but model outputs cannot be easily
    used to choose development methods or structural
    characteristics that will increase reliability.
  • Measuring software reliability prior to test is
    an open area. Work in this area includes
  • RADC study of 59 projects
  • Phase-Based model
  • Analysis of complexity

59
Part VI RADC Study
  • Study of 59 software development efforts,
    sponsored by RADC in mid 1980s
  • Purpose - develop a method for predicting
    software reliability in the life cycle phases
    prior to test. Acceptable model forms were
  • measures leading directly to reliability/failure
    rate predictions
  • predictions that could be translated to failure
    rates (e.g., error density)
  • Advantages of error density as a software
    reliability figure of merit, according to
    participating investigators
  • It appears to be a fairly invariant number.
  • It can be obtained from commonly available data.
  • It is not directly affected by variables in the
    environment
  • Conversion among error density metrics is fairly
    straightforward.

60
Part VI RADC Study (contd)
  • Advantages of error density as a software
    reliability figure of merit (contd)
  • Possible to include faults by inspection with
    those found during testing and operations, since
    the time-dependent elements of the latter do not
    need to be accounted for.
  • Major disadvantages cited by the investigators
    are
  • This metric cannot be combined with hardware
    reliability metrics.
  • Does not relate to observations in the user
    environment. It is far easier for users to
    observe the availability of their systems than
    their fault density, and users tend to be far
    more concerned about how frequently they can
    expect the system to go down.
  • No assurance that all of the faults have been
    found.

61
Part VI RADC Study (contd)
  • Given these advantages and disadvantages, the
    investigators decided to attempt prediction of
    error density during the early phases of a
    development effort, and develop a transformation
    function that could be used to interpret the
    predicted error density as a failure rate. The
    driving factor seemed to be that data available
    early in life cycle could be much more easily
    used to predict error densities rather than
    failure rates.

62
Part VI RADC Study (contd)
  • Investigators postulated that the following
    measures representing development environment and
    product characteristics could be used as inputs
    to a model that would predict the error density,
    measured in errors per line of code, at the start
    of the testing phase.
  • A -- Application Type (e.g. real-time control
    system, scientific computation system,
    information management system)
  • D -- Development Environment (characterized by
    development methodology and available tools).
    The types of development environments considered
    are the organic, semi-detached, and embedded
    modes, familiar from the COCOMO cost model.

63
Part VI RADC Study (contd)
  • Measures of development environment and product
    characteristics (contd)
  • Requirements and Design Representation Metrics
  • SA - Anomaly Management
  • ST - Traceability
  • SQ - Incorporation of Quality Review results into
    the software
  • Software Implementation Metrics
  • SL - Language Type (e.g. assembly, high-order
    language, fourth generation language)
  • SS - Program Size
  • SM - Modularity
  • SU - Extent of Reuse
  • SX - Complexity
  • SR - Incorporation of Standards Review results
    into the software

64
Part VI RADC Study (contd)
  • Initial error density at the start of test given
    by
  • Initial failure rate
  • F linear execution frequency of the program
  • K fault exposure ratio (1.410-7 lt K lt
    10.610-7, with an average value of 4.210-7)
  • W0 number of inherent faults

65
Part VI RADC Study (contd)
  • Moreover, F R/I, where
  • R is the average instruction rate
  • I is the number of object instructions in the
    program
  • I can be further rewritten as IS QX, where
  • IS is the number of source instructions,
  • QX is the code expansion ratio (the ratio of
    machine instruc-tions to source instructions,
    which has an average value of 4 according to this
    study).
  • Therefore, the initial failure rate can be
    expressed as

66
Part VI Phase-Based Model
  • Developed by John Gaffney, Jr. and Charles F.
    Davis of the Software Productivity Consortium
  • Makes use of error statistics obtained during
    technical review of requirements, design and the
    implementation to predict software reliability
    during test and operations.
  • Can also use failure data during testing to
    estimate reliability.
  • Assumptions
  • The development effort's current staffing level
    is directly related to the number of errors
    discovered during a development phase.
  • The error discovery curve is monomodal.
  • Code size estimates are available during early
    phases of a development effort.
  • Fagan inspections are used during all development
    phases.

67
Part VI Phase-Based Model
  • The first two assumptions, plus Norden's
    observation that the Rayleigh curve represents
    the "correct" way of applying to a development
    effort, results in the following expression for
    the number of errors discovered during a life
    cycle phase
  • E Total Lifetime Error Rate, expressed in
  • Errors per Thousand Source Lines of Code (KSLOC)
  • t Error Discovery Phase index

68
Part VI Phase-Based Model
  • Note that t does not represent ordinary calendar
    time. Rather, t represents a phase in the
    development process. The values of t and the
    corresponding life cycle phases are
  • t 1 - Requirements Analysis
  • t 2 - Software Design
  • t 3 - Implementation
  • t 4 - Unit Test
  • t 5 - Software Integration Test
  • t 6 - System Test
  • t 7 - Acceptance Test

69
Part VI Phase-Based Model
  • ?p, the Defect Discovery Phase Constant is the
    location of the peak in a continuous fit to the
    failure data. This is the point at which 39 of
    the errors have been discovered
  • The cumulative form of the model is
  • where Vt is the number of errors per KSLOC that
    have been dis-covered through phase t

70
Part VI Phase-Based Model
71
Part VI Phase-Based Model
  • This model can also be used to estimate the
    number of latent errors in the software. Recall
    that the number of errors per KSLOC removed
    through the n'th phase is
  • The number of errors remaining in the software at
    that point is
  • times the number of source statements

72
Part VII Current Work in Estimating Fault Content
  • Analysis of Complexity
  • Regression Tree Modeling

73
Analysis of Complexity
  • The need for measurement
  • The measurement process
  • Measuring software change
  • Faults and fault insertion
  • Fault insertion rates

74
Analysis of Complexity (contd)
  • Recent work has focused on relating measures of
    software structure to fault content.
  • Problem - although different software metrics
    will say different things about a software
    system, they tend to be interrelated and can be
    highly correlated with one another (e.g., McCabe
    complexity and line count are highly correlated).

75
Analysis of Complexity (contd)
  • Relative complexity measure, developed by Munson
    and Khoshgoftaar, attempts to handle the problem
    of interdependence and multicollinearity among
    software metrics.
  • Technique used is factor analysis, whose pur-pose
    is to decompose a set of correlated measures into
    a set of eigenvalues and eigenvectors.

76
Analysis of Complexity (contd)
  • The need for measurement
  • The measurement process
  • Measuring software change
  • Faults and fault insertion
  • Fault insertion rates

77
Analysis of Complexity - Measuring Software
  • LOC 14
  • Stmts 12
  • N1 30
  • N2 23
  • eta1 15
  • eta2 12

CMA
Metric Analysis
Module Characteristics
78
Analysis of Complexity - Simplifying Measurements
Principal Components Analysis
Metric Analysis
Modules
50
12 23 54 12 203 39 238 34
40
7 13 64 12 215 9 39 238

PCA/ RCM
CMA
60
11 21 54 12 241 39 238 35
45
5 33 44 12 205 39 138 44
55
42 55 54 12 113 29 234 14
Program
Raw Metrics
Relative Complexity
79
Analysis of Complexity - Relative Complexity
  • Relative complexity is a synthesized metric
  • Relative complexity is a fault surrogate
  • Composed of metrics closely related to faults
  • Highly correlated with faults

80
Analysis of Complexity (contd)
  • The need for measurement
  • The measurement process
  • Measuring software change
  • Faults and fault insertion
  • Fault insertion rates

81
Analysis of Complexity (contd)
  • Software Evolution
  • We assume that we are developing (maintaining) a
    program
  • We are really working with many programs over
    time
  • They are different programs in a very real sense
  • We must identify and measure each version of each
    program module

82
Analysis of Complexity (contd)
  • Evolution of the STS Primary Avionics Software
    System (PASS)

83
Analysis of Complexity (contd)
Build N1
Build N
The Problem
84
Analysis of Complexity (contd)
  • Managing fault counts during evolution
  • Some faults are inserted during branch builds
  • These fault counts must be removed when the
    branch is pruned
  • Some faults are eliminated on branch builds
  • These faults must be removed from the main
    sequence build
  • Fault count should contain only those faults on
    the main sequence to the current build
  • Faults attributed to modules not in the current
    build must be removed from the current count

85
Analysis of Complexity (contd)
  • Baselining a software system
  • Software changes over software builds
  • Measurements, such as relative complexity, change
    across builds
  • Initial build as a baseline
  • Relative complexity of each build
  • Measure change in fault surrogate from initial
    baseline

86
Analysis of Complexity - Measurement Baseline
87
Analysis of Complexity - Baseline Components
  • Vector of means
  • Vector of standard deviations
  • Transformation matrix

88
Analysis of Complexity - Comparing Two Builds
Build i
Measurement Tools
Baselined Build i
Code Churn
RCM- Delta
Source Code
Baseline
RCM Values
Code Deltas
Build j
Baselined Build j
89
Analysis of Complexity - Measuring Evolution
  • Different modules in different builds
  • set of modules not in latest build
  • set of modules not in early build
  • set of common modules
  • Code delta
  • Code churn
  • Net code churn

90
Analysis of Complexity (contd)
  • The need for measurement
  • The measurement process
  • Measuring software change
  • Faults and fault insertion
  • Fault insertion rates

91
Analysis of Complexity - Fault Insertion
Build N
Build N1
Existing Faults
Existing Faults
Faults Removed
Faults Added
92
Analysis of Complexity - Identifying and Counting
Faults
  • Unlike failures, faults are not directly
    observable
  • fault counts should be at same level of
    granularity as software structure metrics
  • Failure counts could be used as a surrogate for
    fault counts if
  • Number of faults were related to number of
    failures
  • Distribution of number of faults per failure had
    low variance
  • The faults associated with a failure were
    confined to a single procedure/function
  • Actual situation shown on next slide

93
Analysis of Complexity - Observed Distribution of
Faults per Failure
94
Analysis of Complexity - Fault Identification and
Counting Rules
  • Taxonomy based on corrective actions taken in
    response to failure reports
  • faults in variable usage
  • Definition and use of new variables
  • Redefinition of existing variables (e.g. changing
    type from float to double)
  • Variable deletion
  • Assignment of a different value to a variable
  • faults involving constants
  • Definition and use of new constants
  • Constant definition deletion

95
Analysis of Complexity - Fault Identification and
Counting Rules (contd)
  • Control flow faults
  • Addition of new source code block
  • Deletion of erroneous conditionally-executed
    path(s) within a set of conditionally executed
    statements
  • Addition of execution paths within a set of
    conditionally executed statements
  • Redefinition of existing condition for execution
    (e.g. change if i lt 9 to if i lt 9)
  • Removal of source code block
  • Incorrect order of execution
  • Addition of a procedure or function
  • Deletion of a procedure or function

96
Analysis of Complexity (contd)
  • Control flow fault examples - removing execution
    paths from a code block
  • Counts as two faults, since two paths were removed

97
Analysis of Complexity (contd)
Control flow examples (contd) - addition of
conditional execution paths to code
block Counts as three faults, since
three paths were added
98
Analysis of Complexity - Estimating Fault Content
  • The fault potential of a module i is directly
    proportional to its relative complexity
  • From previous development projects develop a
    proportionality constant, k, for total faults
  • Faults per module

99
Analysis of Complexity - Estimating Fault
Insertion Rate
  • Proportionality constant, k, representing the
    rate of fault insertion
  • For jth build, total faults insertion
  • Estimate for the fault insertion rate

100
Analysis of Complexity (contd)
  • The need for measurement
  • The measurement process
  • Measuring software change
  • Faults and fault insertion
  • Fault insertion rates

101
Analysis of Complexity - Relationships Between
Change in Fault Count and Structural Change
  • code churn
  • code delta

102
Analysis of Complexity - Regression Models
  • is the number of faults inserted between
    builds j and j1
  • is the measured code churn between builds j
    and j1
  • is the measured code delta between builds j
    and j1

103
Analysis of Complexity - PRESS Scores - Linear
vs. Nonlinear Models

104
Analysis of Complexity - Selecting an Adequate
Linear Model
  • Linear model gives best R2 and PRESS score.
  • Is the model based only on code churn an adequate
    predictor at the 5 significance level?
  • R2-adequate test shows that code churn is not an
    adequate predictor at the 5 significance level .

105
Analysis of Complexity - Analysis of Predicted
Residuals
106
Regression Tree Modeling
  • Objectives
  • Attractive way to encapsulate the knowledge of
    experts and to aid decision making.
  • Uncovers structure in data
  • Can handle data with complicated an unexplained
    irregularities
  • Can handle both numeric and categorical variables
    in a single model.

107
Regression Tree Modeling (contd)
  • Algorithm
  • Determine set of predictor variables (software
    metrics) and a response variable (number of
    faults).
  • Partition the predictor variable space such that
    each partition or subset is homogeneous with
    respect to the dependent variable.
  • Establish a decision rule based on the predictor
    variables which will identify the programs with
    the same number of faults.
  • Predict the value of the dependent variable which
    is the average of all the observations in the
    partition.

108
Regression Tree Modeling (contd)
  • Algorithm (contd)
  • Minimize the deviance function given by
  • Establish stopping criteria based on
  • Cardinality threshold - leaf node is smaller than
    certain absolute size.
  • Homogeneity threshold - deviance of leaf node is
    less than some small percentage of the deviance
    of the root node I.e., leaf node is homogeneous
    enough.

109
Regression Tree Modeling (contd)
  • Application
  • Software for medical imaging system, consisting
    of 4500 modules amounting to 400,000 lines of
    code written in Pascal, FORTRAN, assembly
    language, and PL/M.
  • Random sample of 390 modules from the ones
    written in Pascal and FORTRAN, consisting of
    about 40,000 lines of code.
  • Software was developed over a period of five
    years, and had been in use at several hundred
    sites.
  • Number of changes made to the executable code
    documented by Change Reports (CRs) indicates
    software development effort.

110
Regression Tree Modeling (contd)
  • Application
  • Software for medical imaging system, consisting
    of 4500 modules amounting to 400,000 lines of
    code written in Pascal, FORTRAN, assembly
    language, and PL/M.
  • Random sample of 390 modules from the ones
    written in Pascal and FORTRAN, consisting of
    about 40,000 lines of code.
  • Software was developed over a period of five
    years, and had been in use at several hundred
    sites.
  • Number of changes made to the executable code
    documented by Change Reports (CRs) indicates
    software development effort.

111
Regression Tree Modeling (contd)
  • Application (contd)
  • Metrics
  • Total lines of code (TC)
  • Number of code lines (CL)
  • Number of characters (Cr)
  • Number of comments (Cm)
  • Comment characters (CC)
  • Code characters (Co)
  • Halsteads program length
  • Halsteads estimate of program length metric
  • Jensens estimate of program length metric
  • Cyclomatic complexity metric
  • Bandwidth metric

112
Regression Tree Modeling (contd)
  • Pruning
  • Tree grown using the stopping rules is too
    elaborate.
  • Pruning - equivalent to variable selection in
    linear regression.
  • Determines a nested sequence of subtrees of the
    given tree by recursively snipping off partitions
    with minimal gains in deviance reduction
  • Degree of pruning can be determine by using
    cross-validation.

113
Regression Tree Modeling (contd)
  • Pruning
  • Tree grown using the stopping rules is too
    elaborate.
  • Pruning - equivalent to variable selection in
    linear regression.
  • Determines a nested sequence of subtrees of the
    given tree by recursively snipping off partitions
    with minimal gains in deviance reduction
  • Degree of pruning can be determine by using
    cross-validation.

114
Regression Tree Modeling (contd)
  • Cross-Validation
  • Evaluate the predictive performance of the
    regression tree and degree of pruning in the
    absence of a separate validation set.
  • Data are divided into two mutually exclusive
    sets, viz., learning sample and test sample.
  • Learning sample is used to grow the tree, while
    the test sample is used to evaluate the tree
    sequence.
  • Deviance - measure to assess the performance of
    the prediction rule in predicting the number of
    errors for the test sample of different tree
    sizes.

115
Regression Tree Modeling (contd)
  • Performance Analysis
  • Two types of errors
  • Predict more faults than the actual number - Type
    I misclassification.
  • Predict fewer faults than actual number - Type II
    error.
  • Type II error is more serious.
  • Type II error in case of tree modeling is 8.7,
    and in case of fault density is 13.1.
  • Tree modeling approach is significantly than
    fault density approacy.
  • Can also be used to classify modules into
    fault-prone and non fault-prone categories.
  • Decision rule - classifies the module as
    fault-prone if the predicted number of faults is
    greater than a certain a.
  • Choice of a determines the misclassification rate.

116
Part VIII Software Reliability Tools
  • SRMP
  • SMERFS
  • CASRE

117
Where Do They Come From?
  • Software Reliability Modeling Program (SRMP)
  • Bev Littlewood of City University, London
  • Statistical Modeling and Estimation of
    Reliability Functions for Software (SMERFS)
  • William Farr of Naval Surface Warfare Center
  • Computer-Aided Software Reliability Estimation
    Tool (CASRE)
  • Allen Nikora, JPL Michael Lyu, Chinese
    University of Hong Kong

118
SRMP Main Features
  • Multiple Models (9)
  • Model Application Scheme Multiple Iterations
  • Data Format Time-Between-Failures Data Only
  • Parameter Estimation Maximum Likelihood
  • Multiple Evaluation Criteria - Prequential
    Likelihood, Bias, Bias Trend, Model Noise
  • Simple U-Plots and Y-Plots

119
SMERFS Main Features
  • Multiple Models (12)
  • Model Application Scheme Single Execution
  • Data Format Failure-Counts and Time-Between
    Failures
  • On-line Model Description Manual
  • Two parameter Estimation Methods
  • Least Square Method
  • Maximum Likelihood Method
  • Goodness-of-fit Criteria Chi-Square Test, KS
    Test
  • Model Applicability - Prequential Likelihood,
    Bias, Bias Trend, Model Noise
  • Simple Plots

120
The SMERFS Tool Main Menu
  • Data Input
  • Data Edit
  • Data Transformation
  • Data Statistics
  • Plots of the Raw Data
  • Model Applicability Analysis
  • Executions of the Models
  • Analyses of Model Fit
  • Stop Execution of SMERFS

121
CASRE Main Features
  • Multiple Models (12)
  • Model Application Scheme Multiple Iterations
  • Goodness-of-Fit Criteria - Chi-Square Test, KS
    Test
  • Multiple Evaluation Criteria - Prequential
    Likelihood, Bias, Bias Trend, Model Noise
  • Conversions between Failure-Counts Data and
    Time-Between-Failures Data
  • Menu-Driven, High-Resolution Graphical User
    Interface
  • Capability to Make Linear Combination Models

122
CASRE High-Level Architecture
123
Further Reading
  • A. A. Abdel-Ghaly, P. Y. Chan, and B. Littlewood
    "Evaluation of Competing Software Reliability
    Predictions," IEEE Transactions on Software
    Engineering vol. SE-12, pp. 950-967 Sep. 1986.
  • T. Bowen, "Software Quality Measurement for
    Distributed Systems", RADC TR-83-175.
  • W. K. Erlich, A. Iannino, B. S. Prasanna, J. P.
    Stampfel, and J. R. Wu, "How Faults Cause
    Software Failures Implications for Software
    Reliability Engineering", published in
    proceedings of the International Symposium on
    Software Reliability Engineering, pp 233-241, May
    17-18, 1991, Austin, TX
  • M. E. Fagan, "Advances in Software Inspections",
    IEEE Transactions on Software Engineering, vol
    SE-12, no 7, July, 1986, pp 744-751
  • M. E. Fagan, "Design and Code Inspections to
    Reduce Errors in Program Development," IBM
    Systems Journal, Volume 15, Number 3, pp 182-211,
    1976
  • W. H. Farr, O. D. Smith, and C. L.
    Schimmelpfenneg, "A PC Tool for Software
    Reliability Measurement," published in the 1988
    Proceedings of the Institute of Environmental
    Sciences, King of Prussia, PA

124
Further Reading (contd)
  • W. H. Farr, O. D. Smith, "Statistical Modeling
    and Estimation of Reliability Functions for
    Software (SMERFS) User's Guide," Naval Weapons
    Surface Center, December 1988 (approved for
    unlimited public distribution by NSWC)
  • J. E. Gaffney, Jr. and C. F. Davis, "An Approach
    to Estimating Software Errors and Availability,"
    SPC-TR-88-007, version 1.0, March, 1988,
    proceedings of Eleventh Minnowbrook Workshop on
    Software Reliability, July 26-29, 1988, Blue
    Mountain Lake, NY
  • J. E. Gaffney, Jr. and J. Pietrolewicz, "An
    Automated Model for Software Early Error
    Prediction (SWEEP)," Proceedings of Thirteenth
    Minnow-brook Workshop on Software Reliability,
    July 24-27, 1990, Blue Mountain Lake, NY
  • A. L. Goel, S. N. Sahoo, "Formal Specifications
    and Reliability An Experimental Study",
    published in proceedings of the International
    Symposium on Software Reliability Engineering, pp
    139-142, May 17-18, 1991, Austin, TX
  • A. Grnarov, J. Arlat, A. Avizienis, "On the
    Performance of Software Fault-Tolerance
    Strategies", published in the proceedings of the
    Tenth International Symposium on Fault Tolerant
    Computing (FTCS-10), Kyoto, Japan, October, 1980,
    pp 251-253

125
Further Reading (contd)
  • K. Kanoun, M. Bastos Martini, J. Moreira De
    Souza, A Method for Software Reliability
    Analysis and Prediction - Application to the
    TROPICO-R Switching System, IEEE Transactions on
    Software Engineering, April 1991, pp, 334-344
  • J. C. Kelly, J. S. Sherif, J. Hops, "An Analysis
    of Defect Densities Found During Software
    Inspections", Journal of Systems Software, vol
    17, pp 111-117, 1992
  • T. M. Khoshgoftaar and J. C. Munson, "A Measure
    of Software System Complexity and its
    Relationship to Faults," proceedings of 1992
    International Simulation Technology Conference
    and 992 Workshop on Neural Networks (SIMTEC'92 -
    sponsored by the Society for Computer
    Simulation), pp. 267-272, November 4-6, 1992,
    Clear Lake, TX
  • M. Lu, S. Brocklehurst, and B. Littlewood,
    "Combination of Predictions Obtained from
    Different Software Reliability Growth Models,"
    proceedings of the IEEE 10th Annual Software
    Reliability Symposium, pp 24-33, June 25-26,
    1992, Denver, CO
  • M. Lyu, ed. Handbook of Software Reliablity
    Engineering, McGraw-Hill and IEEE Computer
    Society Press, 1996, ISBN 0-07-0349400-8

126
Further Reading (contd)
  • M. Lyu, "Measuring Reliability of Embedded
    Software An Empirical Study with JPL Project
    Data," published in the Proceedings of the
    International Conference on Probabilistic Safety
    Assessment and Management February 4-6, 1991,
    Los ngeles, CA.
  • M. Lyu and A. Nikora, "A Heuristic Approach for
    Software Reliability Prediction The
    Equally-Weighted Linear Combination Model,"
    published in the proceedings of the IEEE
    International Symposium on Software Reliability
    Engineering, May 17-18, 1991, Austin, TX M. Lyu
    and A. Nikora, "Applying Reliability Models More
    Effectively", IEEE Software, vol. 9, no. 4, pp.
    43-52, July, 1992
  • M. Lyu and A. Nikora, "Software Reliability
    Measurements Through Com-bination Models
    Approaches, Results, and a CASE Tool,"
    proceedings the 15th Annual International
    Computer Software and Applications Conference
    COMPSAC91), September 11-13, 1991, Tokyo, Japan
  • J. McCall, W. Randall, S. Fenwick, C. Bowen, P.
    Yates, N. McKelvey, M. Hecht, H. Hecht, R. Senn,
    J. Morris, R. Vienneau, "Methodology for Software
    Reliability Prediction and Assessment," Rome Air
    Development Center (RADC) Technical Report
    RADC-TR-87-171. volumes 1 and 2, 1987
  • J. Munson and T. Khoshgoftaar, "The Use of
    Software Metrics in Reliability Models,"
    presented at the initial meeting of the IEEE
    Subcommittee on Software Reliability Engineering,
    April 12-13, 1990, Washington, DC

127
Further Reading (contd)
  • J. C. Munson, "Software Measurement Problems and
    Practice," Annals of Software Engineering, J. C.
    Baltzer AG, Amsterdam 1995.
  • J. C. Munson, Software Faults, Software
    Failures, and Software Reliability Modeling,
    Information and Software Technology, December,
    1996.
  • J. C. Munson and T. M. Khoshgoftaar
    Regression Modeling of Software Quality An
    Empirical Investigation, Journal of Information
    and Software Technology, 32, 1990, pp. 105-114.
  • J. Munson, A. Nikora, Estimating Rates of Fault
    Insertion and Test Effectiveness in Software
    Systems, invited paper, published in Proceedings
    of the Fourth ISSAT International Conference on
    Quality and Reliability in Design, Seattle, WA,
    August 12-14, 1998
  • John D. Musa., Anthony Iannino, Kazuhiro Okumoto,
    Software Reliability Measurement, Prediction,
    Application McGraw-Hill, 1987 ISBN
    0-07-044093-X.
  • A. Nikora, J. Munson, Finding Fault with Faults
    A Case Study, presented at the Annual Oregon
    Workshop on Software Metrics, May 11-13, 1997,
    Coeur dAlene, ID.
  • A. Nikora, N. Schneidewind, J. Munson, "IVV
    Issues in Achieving High Reliability and Safety
    in Critical Control System Software," proceedings
    of the Third ISSAT International Conference on
    Reliability and Quality in Design, March 12-14,
    1997, Anaheim, CA.

128
Further Reading (contd)
  • A. Nikora, J. Munson, Determining Fault
    Insertion Rates For Evolving Software Systems,
    proceedings of the Ninth International Symposium
    on Software Reliability Engineering, Paderborn,
    Germany, November 4-7, 1998
  • Norman F. Schneidewind, Ted W,.Keller, "Applying
    Reliability Models to the Space Shuttle", IEEE
    Software, pp 28-33, July, 1992
  • N. Schneidewind, Reliability Modeling for
    Safety-Critical Software, IEEE Transactions on
    Reliability, March, 1997, pp. 88-98
  • N. Schneidewind, "Measuring and Evaluating
    Maintenance Process Using Reliability, Risk, and
    Test Metrics", proceedings of the International
    Conference on Software Maintenance, September
    29-October 3, 1997, Bari, Italy.
  • N. Schneidewind, "Software Metrics Model for
    Integrating Quality Control and Prediction",
    proceedings of the 8th International Sympsium on
    Software Reliability Engineering, November 2-5,
    1997, Albuquerque, NM.
  • N. Schneidewind, "Software Metrics Model for
    Quality Control", Proceedings of the Fourth
    International Software Metrics Symposium,
    November 5-7, 1997, Albuquerque, NM.

129
Additional Information
  • CASRE Screen Shots
  • Further modeling details
  • Additional Software Reliability Models
  • Quantitative Criteria for Model Selection the
    Subadditivity Property
  • Increasing the Predictive Accuracy of Models

130
CASRE - Initial Display
131
CASRE - Applying Filters
132
CASRE - Running Average Trend Test
133
CASRE - Laplace Test
134
CASRE - Selecting and Running Models
135
CASRE - Displaying Model Results
136
CASRE - Displaying Model Results (contd)
137
CASRE - Prequential Likelihood Ratio
138
CASRE - Model Bias
139
CASRE - Model Bias Trend
140
CASRE - Ranking Models
141
CASRE - Model Ranking Details
142
CASRE - Model Ranking Details (contd)
143
CASRE - Model Results Table
144
CASRE - Model Results Table (contd)
145
CASRE - Model Results Table (contd)
146
Additional Software Reliability Models
  • Software Reliability Estimation Models
  • Exponential NHPP Models
  • Generalized Poisson Model
  • Non-homogeneous Poisson Process Model
  • Musa Basic Model
  • Musa Calendar Time Model
  • Schneidewind Model
  • Littlewood-Verrall Bayesian Model
  • Hyperexponential Model

147
Generalized Poisson Model
  • Proposed by Schafer, Alter, Angus, and Emoto for
    Hughes Aircraft Company under contract to RADC in
    1979.
  • Model is analogous in form to the
    Jelinski-Moranda model but taken within the error
    count framework. The model can be shown reduce
    to the Jelinski-Moranda model under the
    appropriate circumstances.

148
Generalized Poisson Model (cont'd)
  • Assumptions
  • The expected number of errors occurring in any
    time interval is proportional to the error
    content at the time of testing and to some
    function of the amount of time spent in error
    testing.
  • All errors are equally likely to occur and are
    independent of each other.
  • Each error is of the same order of severity as
    any other error.
  • The software is operated in a similar manner as
    the anticipated usage.
  • The errors are corrected at the ends of the
    testing intervals without introduction of new
    errors into the program.

149
Generalized Poisson Model (cont'd)
  • Construction of Model
  • Given testing intervals of length X1, X2,...,Xn
  • fi errors discovered during the i'th interval
  • At the end of the i'th interval, a total of Mi
    errors have been corrected
  • First assumption of the model yields
  • E(fi) ?(N - Mi-1)gi (x1, x2, ..., xi)
  • where
  • ? is a proportionality constant
  • N is the initial number of errors
  • gi is a function of the amount of testing time
    spent, previously and currently. gi is usually
    non-decreasing. If gi (x1, x2, ..., xi) xi,
    then the model reduces to the Jelinski-Moranda
    model.

150
Schneidewind Model
  • Proposed by Norman Schneidewind in 1975.
  • Model's basic premise is that as the testing
    progresses over time, the error detection process
    changes. Therefore, recent error counts are
    usually of more use than earlier counts in
    predicting future error counts.
  • Schneidewind identifies three approaches to using
    the error count data. These are identified in
    the following slide.

151
Schneidewind Model
  • First approach is to use all of the error counts
    for all testing intervals.
  • Second approach is to use only the error counts
    from test intervals s through m and ignore
    completely the error counts from the first s - 1
    test intervals, assuming that there have been m
    test intervals to date.
  • Third approach is a hybrid approach which uses
    the cumulative error count for the first s - 1
    intervals and the individual error counts for the
    last m - s 1 intervals.

152
Schneidewind Model (cont'd)
  • Assumptions
  • The number of errors detected in one interval is
    independent of the error count in another.
  • The error correction rate is proportional to the
    number of errors to be corrected.
  • The software is operated in a similar manner as
    the anticipated operational usage.
  • The mean number of detected errors decreases from
    one interval to the next.
  • The intervals are all of the same length.

153
Schneidewind Model (cont'd)
  • Assumptions (cont'd)
  • The rate of error detection is proportional the
    number of
About PowerShow.com