Loading...

PPT – Software Reliability Engineering: Techniques and Tools PowerPoint presentation | free to download - id: 3c11e5-ZTkxY

The Adobe Flash plugin is needed to view this content

Software Reliability Engineering Techniques and

Tools

CS130 Winter, 2002

Source Material

- Software Reliability and Risk Management

Techniques and Tools, Allen Nikora and Michael

Lyu, tutorial presented at the 1999 International

Symposium on Software Reliability Engineering - Allen Nikora, John Munson, Determining Fault

Insertion Rates For Evolving Software Systems,

proceedings of the International Symposium on

Software Reliability Engineering, Paderborn,

Germany, November, 1998

Agenda

- Part I Introduction
- Part II Survey of Software Reliability Models
- Part III Quantitative Criteria for Model

Selection - Part IV Input Data Requirements and Data

Collection Mechanisms - Part V Early Prediction of Software Reliability
- Part VI Current Work in Estimating Fault Content
- Part VII Software Reliability Tools

Part I Introduction

- Reliability Measurement Goal
- Definitions
- Reliability Theory

Reliability Measurement Goal

- Reliability measurement is a set of mathematical

techniques that can be used to estimate and

predict the reliability behavior of software

during its development and operation. - The primary goal of software reliability modeling

is to answer the following question - Given a system, what is the probability that it

will fail in a given time interval, or, what is

the expected duration between successive

failures?

Basic Definitions

- Software Reliability R(t) The probability of

failure-free operation of a computer program for

a specified time under a specified environment.

- Failure The departure of program operation from

user requirements. - Fault A defect in a program that causes

failure.

Basic Definitions (contd)

- Failure Intensity (rate) f(t) The expected

number of failures experienced in a given time

interval. - Mean-Time-To-Failure (MTTF) Expected value of a

failure interval. - Expected total failures m(t) The number of

failures expected in a time period t.

Reliability Theory

- Let "T" be a random variable representing the

failure time or lifetime of a physical system. - For this system, the probability that it will

fail by time "t" is - The probability of the system surviving until

time "t" is

Reliability Theory (contd)

- Failure rate - the probability that a failure

will occur in the interval t1, t2 given that a

failure has not occurred before time t1. This is

written as

Reliability Theory (contd)

- Hazard rate - limit of the failure rate as the

length of the interval approaches zero. This is

written as - This is the instantaneous failure rate at time t,

given that the system survived until time t. The

terms hazard rate and failure rate are often used

interchangeably.

Reliability Theory (contd)

- A reliability objective expressed in terms of one

reliability measure can be easily converted into

another measure as follows (assuming an average

failure rate, ? , is measured)

Reliability Theory (cont'd)

Part II Survey of Software Reliability Models

- Software Reliability Estimation Models
- Exponential NHPP Models
- Jelinski-Moranda/Shooman Model
- Musa-Okumoto Model
- Geometric Model
- Software Reliability Modeling and Acceptance

Testing

Jelinski-Moranda/Shooman Models

- Jelinski-Moranda model was developed by Jelinski

and Moranda of McDonnell Douglas Astronautics

Company for use on Navy NTDS software and a

number of modules of the Apollo program. The

Jelinski-Moranda model was published in

1971. - Shooman's model, discovered independently of

Jelinski and Moranda's work, was also published

in 1971. Shooman's model is identical to the JM

model.

Jelinski-Moranda/Shooman (cont'd)

- Assumptions
- The number of errors in the code is fixed.
- No new errors are introduced into the code

through the correction process. - The number of machine instructions is essentially

constant. - Detections of errors are independent.
- The software is operated in a similar manner as

the anticipated operational usage. - The error detection rate is proportional to the

number of errors remaining in the code.

Jelinski-Moranda/Shooman (cont'd)

- Let ? represent the amount of debugging time

spent on the system since the start of the test

phase. - From assumption 6, we have
- where K is the proportionality constant, and ?r

is the error rate (number of remaining errors

normalized with respect to the number of

instructions). - ET number of errors initially in the program
- IT number of machine instructions in the

program - ?c cumulative number of errors fixed in the

interval 0,????normalized by the number of

instructions).

z(?) K?r(?)

?r(?) ET / IT - ?c(?)

Jelinski-Moranda/Shooman (cont'd)

- ET and IT are constant (assumptions 1 and 3).
- No new errors are introduced into the correction

process (assumption 2). - As???? ?, ?c(?) ?? ET/IT, so ?r(?) ?? 0.
- The hazard rate becomes

Jelinski-Moranda/Shooman (cont'd)

- The reliability function becomes
- The expression for MTTF is

Geometric Model

- Proposed by Moranda in 1975 as a variation of the

Jelinski-Moranda model. - Unlike models previously discussed, it does not

assume that the number of errors in the program

is finite, nor does it assume that errors are

equally likely to occur. - This model assumes that errors become

increasingly difficult to detect as debugging

progresses, and that the program is never

completely error free.

Geometric Model (cont'd)

- Assumptions
- There are an infinite number of total errors.
- All errors do not have the same chance of

detection. - The detections of errors are independent.
- The software is operated in a similar manner as

the anticipated operational usage. - The error detection rate forms a geometric

progression and is constant between error

occurrences.

Geometric Model (cont'd)

- The above assumptions result in the following

hazard rate - z(t) D?i-1
- for any time "t" between the (i - 1)st and the

i'th error. - The initial value of z(t) D

Geometric Model (cont'd)

Hazard Rate Graph

Hazard rate

D

D(1 - ?)

D(1 - ?? )

2

D?

2

D?

time

Musa-Okumoto Model

- The Musa-Okumoto model assumes that the failure

intensity function decreases exponentially with

the number of failures observed - Since ?(t) d?(t)/dt, we have the following

differential equation -

or

Musa-Okumoto Model (contd)

- Note that
- We then obtain

Musa-Okumoto Model (contd)

- Integrating this last equation yields
- Since ?(0) 0, C 1, and the mean value

function ?(t) is

Software Reliability Modeling and Acceptance

Testing

- Given a piece of software advertised as having a

failure rate ?, you can see if it meets that

failure rate to a specific level of confidence. - ? is the risk (probability) of falsely saying

that the software does not meet the failure rate

goal. - ? is the risk of saying that the goal is met

when it is not. - The discrimination ratio, ?, is the factor you

specify that identifies acceptable departure from

the goal. For instance, if ? 2, the acceptable

failure rate lies between ?/2 and 2?.

Software Reliability Modeling and Acceptance

Testing (contd)

Reject

Continue

Failure Number

Accept

Normalized Failure Time (Time to failure times

failure intensity objective)

Software Reliability Modeling and Acceptance

Testing (contd)

- We can now draw a chart as shown in the previous

slide. Define intermediate quantities A and B as

follows - The boundary between the reject and continue

regions is given by - where n is number of failures observed. The

boundary between the continue and accept

regions of the chart is given by

Part III Criteria for Model Selection

- Background
- Non-Quantitative criteria
- Quantitative criteria

Criteria for Model Selection - Background

- When software reliability models first appeared,

it was felt that a process of refinement would

produce definitive models that would apply to

all development and test situations - Current situation
- Dozens of models have been published in the

literature - Studies over the past 10 years indicate that the

accuracy of the models is variable - Analysis of the particular context in which

reliability measurement is to take place so as to

decide a priori which model to use does not seem

possible.

Criteria for Model Selection (contd)

- Non-Quantitative Criteria
- Model Validity
- Ease of measuring parameters
- Quality of assumptions
- Applicability
- Simplicity
- Insensitivity to noise

Criteria for Model Selection (contd)

- Quantitative Criteria for Post-Model Application
- Self-consistency
- Goodness-of-Fit
- Relative Accuracy (Prequential Likelihood Ratio)
- Bias (U-Plot)
- Bias Trend (Y-Plot)

Criteria for Model Selection (contd)

- Self-constency - Analysis of a models predictive

quality can help user decide which model(s) to

use. - Simplest question a SRM user can ask is How

reliable is the software at this moment? - The time to the next failure, Ti, is usually

predicted using observed times ot failure - In general, predictions of Ti can be made using

observed times to failure - The results of predictions made for different

values of K can then be compared. If a model

produced self consistent results for differing

values of K, this indicates that its use is

appropriate for the data on which the particular

predictions were made. - HOWEVER, THIS PROVIDES NO GUARANTEE THAT THE

PREDICTIONS ARE CLOSE TO THE TRUTH.

Criteria for Model Selection (contd)

- Goodness-of-fit - Kolmogorov-Smirnov Test
- Uses the absolute vertical distance between two

CDFs to measure goodness of fit. - Depends on the fact that
- where F0 is a known, continuous CDF, and Fn is

the sample CDF, is distribution free.

Criteria for Model Selection (contd)

- Goodness-of-fit (contd) - Chi-Square Test
- More suited to determining GOF of failure counts

data than to interfailure times. - Value given by
- where
- n number of independent repetitions of an

experiment in which the outcomes are decomposed

into k1 mutually exclusive sets A1, A2,..., Ak1 - Nj number of outcomes in the jth set
- pj PAj

Criteria for Model Selection (contd)

- Prequential Likelihood Ratio
- The pdf for Fi(t) for Ti is based on observations

. The pdf - For one-step ahead predictions of

, the prequential likelihood is - Two prediction systems, A and B, can be evaluated

by computing the prequential likelihood ratio - If PLRn approaches infinity as n approaches

infinity, B is discarded in favor of A

Prequential Likelihood Example

fi2

fi1

fi

true pdf

High bias, low noise

true pdf

fi2

fi1

fi3

fi

Low bias, high noise

Criteria for Model Selection (contd)

- Prequential Likelihood Ratio (cont'd)
- When predictions have been made for

, the PLR is given by - Using Bayes' Rule, the PLR is rewritten as

Criteria for Model Selection (contd)

- Prequential Likelihood Ratio (contd)
- This equals
- If the initial conditions were based only on

prior belief, the second factor of the final

equation is the prior odds ratio. If the user is

indifferent between models A and B, this ratio

has a value of 1.

Criteria for Model Selection (contd)

- Prequential Likelihood Ratio (contd)
- The final equation is then written as
- This is the posterior odds ratio, where wA is

the posterior belief that A is true after making

predictions with both A and B and comparing them

with actual behavior.

Criteria for Model Selection (contd)

- The u-plot can be used to assess the predictive

quality of a model - Given a predictor, , that estimates the

probability that the time to the next failure is

less than t. Consider the sequence - where each is a probability integral

transform of the observed ti using the previously

calculated predictor based upon

. - If each were identical to the true, but

hidden, , then the would be realizations of

independent random variables with a uniform

distribution in 0,1. - The problem then reduces to seeing how closely

the sequence resembles a random sample from

0,1

U-Plots for JM and LV Models

1.0

JM

LV

0.5

0

0

1.0

0.5

Criteria for Model Selection (contd)

- The y-plot
- Temporal ordering is not shown in a u-plot. The

y-plot addresses this deficiency - To generate a y-plot, the following steps are

taken - Compute the sequence of
- For each , compute
- Obtain by computing
- for i ? m, m representing the number of

observations made - If the really do form a sequence of

independent random variables in 0,1, the slope

of the plotted will be constant.

Y-Plots for JM and LV Models

1.0

LV

JM

0.5

0

0

1.0

0.5

Criteria for Model Selection (contd)

- Quantitative Criteria Prior to Model Application
- Arithmetical Mean of Interfailure Times
- Laplace Test

Arithmetical Mean of Interfailure Times

- Calculate arithmetical mean of interfailure times

as follows - i number of observed failures
- ?j jth interfailure time
- Increasing series of t(i) suggests reliability

growth. - Decreasing series of t(i) suggests reliability

decrease.

Laplace Test

- The occurrence of failures is assumed to follow a

non-homogeneous Poisson process whose failure

intensity is decreasing - Null hypothesis is that occurrences of failures

follow a homogeneous Poisson process (I.e., b0

above). - For interfailure times, test statistic computed

by

Laplace Test (contd)

- For interval data, test statistic computed by

Laplace Test (contd)

- Interpretation
- Negative values of the Laplace factor indicate

decreasing failure intensity. - Positive values suggest an increasing failure

intensity. - Values varying between 2 and -2 indicate stable

reliability. - Significance is that associated with normal

distribution e.g. - The null hypothesis H0 HPP vs. H1

decreasing failure intensity is rejected at the

5 significance level for m(T) lt -1.645 - The null hypothesis H0 HPP vs. H1

increasing failure intensity is rejected at the

5 significance level for m(T) gt -1.645 - The null hypothesis H0 HPP vs. H1 there is

a trend is rejected at the 5 significance level

for m(T) gt 1.96

Part IV Input Data Requirements and Data

Collection Mechanisms

- Model Inputs
- Time Between Successive Failures
- Failure Counts and Test Interval Lengths
- Setting up a Data Collection Mechanism
- Minimal Set of Required Data
- Data Collection Mechanism Examples

Input Data Requirements and Data Collection

Mechanisms

- Model Inputs - Time between Successive Failures
- Most of the models discussed in Section II

require the times between successive failures as

inputs. - Preferred units of time are expressed in CPU time

(e.g., CPU seconds between subsequent failures). - Allows computation of reliability independent of

wall-clock time. - Reliability computations in one environment can

be easily transformed into reliability estimates

in another, provided that the operational

profiles in both environments are the same and

that the instruction execution rates of the

original environment and the new environment can

be related.

Input Data Requirements and Data Collection

Mechanisms (contd)

- Model Inputs - Time between Successive Failures

(contd) - Advantage - CPU time between successive failures

tends to more accurately characterize the failure

history of a software system than calendar time.

Accurate CPU time between failures can give

greater resolution than other types of data. - Disadvantage - CPU time between successive

failures can often be more difficult to collect

than other types of failure history data.

Input Data Requirements and Data Collection

Mechanisms (contd)

- Model Inputs ( contd) - Failure Counts and Test

Interval Lengths - Failure history can be collected in terms of test

interval lengths and the number of failures

observed in each interval. Several of the models

described in Section II use this type of input. - The failure reporting systems of many

organizations will more easily support collection

of this type of data rather than times between

successive failures. In particular, the use of

automated test systems can easily establish the

length of each test interval. Analysis of the

test run will then provide the number of failures

for that interval. - Disadvantage - failure counts data does not

provide the resolution that accurately collected

times between failures provide.

Input Data Requirements and Data Collection

Mechanisms (contd)

- Setting up a Data Collection Mechanism
- 1. Establish clear, consistent objectives.
- 2. Develop a plan for the data collection

process. Involve all individuals concerned (e.g.

software designers, testers, programmers,

managers, SQA and SCM staff). Address the

following issues - a. Frequency of data collection.
- b. Data collection responsibilities
- c. Data formats
- d. Processing and storage of data
- e. Assuring integrity of data/adherence to

objectives - f. Use of existing mechanisms to collect data

Input Data Requirements and Data Collection

Mechanisms (contd)

- Setting up a Data Collection Mechanism (contd)
- 3. Identify and evaluate tools to support data

collection effort. - 4. Train all parties in use of selected tools.
- 5. Perform a trial run of the plan prior to

finalizing it. - 6. Monitor the data collection process on a

regular basis (e.g. weekly intervals) to assure

that objectives are being met, determine current

reliability of software, and identify problems in

collecting/analyzing the data. - 7. Evaluate the data on a regular basis. Assess

software reliability as testing proceeds, not

only at scheduled release time. - 8. Provide feedback to all parties during data

collec-tion/analysis effort.

Input Data Requirements and Data Collection

Mechanisms (contd)

- Minimal Set of Required Data - to measure

software reliability during test, the following

minimal set of data should be collected by a

development effort - Time between successive failures OR test interval

lengths/number of failures per test interval. - Functional area tested during each interval.
- Date on which functionality was added to software

under test identifier for functionality added. - Number of testers vs. time.
- Dates on which testing environment changed, and

nature of changes. - Dates on which test method changed.

Part VI Early Prediction of Software Reliability

- Background
- RADC Study
- Phase-Based Model

Part VI Background

- Modeling techniques discussed in preceding

sections can be applied only during test phases. - These techniques do not take into account

structural properties of the system being

developed or characteristics of the development

environment. - Current techniques can measure software

reliability, but model outputs cannot be easily

used to choose development methods or structural

characteristics that will increase reliability. - Measuring software reliability prior to test is

an open area. Work in this area includes - RADC study of 59 projects
- Phase-Based model
- Analysis of complexity

Part VI RADC Study

- Study of 59 software development efforts,

sponsored by RADC in mid 1980s - Purpose - develop a method for predicting

software reliability in the life cycle phases

prior to test. Acceptable model forms were - measures leading directly to reliability/failure

rate predictions - predictions that could be translated to failure

rates (e.g., error density) - Advantages of error density as a software

reliability figure of merit, according to

participating investigators - It appears to be a fairly invariant number.
- It can be obtained from commonly available data.
- It is not directly affected by variables in the

environment - Conversion among error density metrics is fairly

straightforward.

Part VI RADC Study (contd)

- Advantages of error density as a software

reliability figure of merit (contd) - Possible to include faults by inspection with

those found during testing and operations, since

the time-dependent elements of the latter do not

need to be accounted for. - Major disadvantages cited by the investigators

are - This metric cannot be combined with hardware

reliability metrics. - Does not relate to observations in the user

environment. It is far easier for users to

observe the availability of their systems than

their fault density, and users tend to be far

more concerned about how frequently they can

expect the system to go down. - No assurance that all of the faults have been

found.

Part VI RADC Study (contd)

- Given these advantages and disadvantages, the

investigators decided to attempt prediction of

error density during the early phases of a

development effort, and develop a transformation

function that could be used to interpret the

predicted error density as a failure rate. The

driving factor seemed to be that data available

early in life cycle could be much more easily

used to predict error densities rather than

failure rates.

Part VI RADC Study (contd)

- Investigators postulated that the following

measures representing development environment and

product characteristics could be used as inputs

to a model that would predict the error density,

measured in errors per line of code, at the start

of the testing phase. - A -- Application Type (e.g. real-time control

system, scientific computation system,

information management system) - D -- Development Environment (characterized by

development methodology and available tools).

The types of development environments considered

are the organic, semi-detached, and embedded

modes, familiar from the COCOMO cost model.

Part VI RADC Study (contd)

- Measures of development environment and product

characteristics (contd) - Requirements and Design Representation Metrics
- SA - Anomaly Management
- ST - Traceability
- SQ - Incorporation of Quality Review results into

the software - Software Implementation Metrics
- SL - Language Type (e.g. assembly, high-order

language, fourth generation language) - SS - Program Size
- SM - Modularity
- SU - Extent of Reuse
- SX - Complexity
- SR - Incorporation of Standards Review results

into the software

Part VI RADC Study (contd)

- Initial error density at the start of test given

by - Initial failure rate
- F linear execution frequency of the program
- K fault exposure ratio (1.410-7 lt K lt

10.610-7, with an average value of 4.210-7) - W0 number of inherent faults

Part VI RADC Study (contd)

- Moreover, F R/I, where
- R is the average instruction rate
- I is the number of object instructions in the

program - I can be further rewritten as IS QX, where
- IS is the number of source instructions,
- QX is the code expansion ratio (the ratio of

machine instruc-tions to source instructions,

which has an average value of 4 according to this

study). - Therefore, the initial failure rate can be

expressed as

Part VI Phase-Based Model

- Developed by John Gaffney, Jr. and Charles F.

Davis of the Software Productivity Consortium - Makes use of error statistics obtained during

technical review of requirements, design and the

implementation to predict software reliability

during test and operations. - Can also use failure data during testing to

estimate reliability. - Assumptions
- The development effort's current staffing level

is directly related to the number of errors

discovered during a development phase. - The error discovery curve is monomodal.
- Code size estimates are available during early

phases of a development effort. - Fagan inspections are used during all development

phases.

Part VI Phase-Based Model

- The first two assumptions, plus Norden's

observation that the Rayleigh curve represents

the "correct" way of applying to a development

effort, results in the following expression for

the number of errors discovered during a life

cycle phase - E Total Lifetime Error Rate, expressed in
- Errors per Thousand Source Lines of Code (KSLOC)
- t Error Discovery Phase index

Part VI Phase-Based Model

- Note that t does not represent ordinary calendar

time. Rather, t represents a phase in the

development process. The values of t and the

corresponding life cycle phases are - t 1 - Requirements Analysis
- t 2 - Software Design
- t 3 - Implementation
- t 4 - Unit Test
- t 5 - Software Integration Test
- t 6 - System Test
- t 7 - Acceptance Test

Part VI Phase-Based Model

- ?p, the Defect Discovery Phase Constant is the

location of the peak in a continuous fit to the

failure data. This is the point at which 39 of

the errors have been discovered - The cumulative form of the model is
- where Vt is the number of errors per KSLOC that

have been dis-covered through phase t

Part VI Phase-Based Model

Part VI Phase-Based Model

- This model can also be used to estimate the

number of latent errors in the software. Recall

that the number of errors per KSLOC removed

through the n'th phase is - The number of errors remaining in the software at

that point is - times the number of source statements

Part VII Current Work in Estimating Fault Content

- Analysis of Complexity
- Regression Tree Modeling

Analysis of Complexity

- The need for measurement
- The measurement process
- Measuring software change
- Faults and fault insertion
- Fault insertion rates

Analysis of Complexity (contd)

- Recent work has focused on relating measures of

software structure to fault content. - Problem - although different software metrics

will say different things about a software

system, they tend to be interrelated and can be

highly correlated with one another (e.g., McCabe

complexity and line count are highly correlated).

Analysis of Complexity (contd)

- Relative complexity measure, developed by Munson

and Khoshgoftaar, attempts to handle the problem

of interdependence and multicollinearity among

software metrics. - Technique used is factor analysis, whose pur-pose

is to decompose a set of correlated measures into

a set of eigenvalues and eigenvectors.

Analysis of Complexity (contd)

- The need for measurement
- The measurement process
- Measuring software change
- Faults and fault insertion
- Fault insertion rates

Analysis of Complexity - Measuring Software

- LOC 14
- Stmts 12
- N1 30
- N2 23
- eta1 15
- eta2 12

CMA

Metric Analysis

Module Characteristics

Analysis of Complexity - Simplifying Measurements

Principal Components Analysis

Metric Analysis

Modules

50

12 23 54 12 203 39 238 34

40

7 13 64 12 215 9 39 238

PCA/ RCM

CMA

60

11 21 54 12 241 39 238 35

45

5 33 44 12 205 39 138 44

55

42 55 54 12 113 29 234 14

Program

Raw Metrics

Relative Complexity

Analysis of Complexity - Relative Complexity

- Relative complexity is a synthesized metric
- Relative complexity is a fault surrogate
- Composed of metrics closely related to faults
- Highly correlated with faults

Analysis of Complexity (contd)

- The need for measurement
- The measurement process
- Measuring software change
- Faults and fault insertion
- Fault insertion rates

Analysis of Complexity (contd)

- Software Evolution
- We assume that we are developing (maintaining) a

program - We are really working with many programs over

time - They are different programs in a very real sense
- We must identify and measure each version of each

program module

Analysis of Complexity (contd)

- Evolution of the STS Primary Avionics Software

System (PASS)

Analysis of Complexity (contd)

Build N1

Build N

The Problem

Analysis of Complexity (contd)

- Managing fault counts during evolution
- Some faults are inserted during branch builds
- These fault counts must be removed when the

branch is pruned - Some faults are eliminated on branch builds
- These faults must be removed from the main

sequence build - Fault count should contain only those faults on

the main sequence to the current build - Faults attributed to modules not in the current

build must be removed from the current count

Analysis of Complexity (contd)

- Baselining a software system
- Software changes over software builds
- Measurements, such as relative complexity, change

across builds - Initial build as a baseline
- Relative complexity of each build
- Measure change in fault surrogate from initial

baseline

Analysis of Complexity - Measurement Baseline

Analysis of Complexity - Baseline Components

- Vector of means
- Vector of standard deviations
- Transformation matrix

Analysis of Complexity - Comparing Two Builds

Build i

Measurement Tools

Baselined Build i

Code Churn

RCM- Delta

Source Code

Baseline

RCM Values

Code Deltas

Build j

Baselined Build j

Analysis of Complexity - Measuring Evolution

- Different modules in different builds
- set of modules not in latest build
- set of modules not in early build
- set of common modules
- Code delta
- Code churn
- Net code churn

Analysis of Complexity (contd)

- The need for measurement
- The measurement process
- Measuring software change
- Faults and fault insertion
- Fault insertion rates

Analysis of Complexity - Fault Insertion

Build N

Build N1

Existing Faults

Existing Faults

Faults Removed

Faults Added

Analysis of Complexity - Identifying and Counting

Faults

- Unlike failures, faults are not directly

observable - fault counts should be at same level of

granularity as software structure metrics - Failure counts could be used as a surrogate for

fault counts if - Number of faults were related to number of

failures - Distribution of number of faults per failure had

low variance - The faults associated with a failure were

confined to a single procedure/function - Actual situation shown on next slide

Analysis of Complexity - Observed Distribution of

Faults per Failure

Analysis of Complexity - Fault Identification and

Counting Rules

- Taxonomy based on corrective actions taken in

response to failure reports - faults in variable usage
- Definition and use of new variables
- Redefinition of existing variables (e.g. changing

type from float to double) - Variable deletion
- Assignment of a different value to a variable
- faults involving constants
- Definition and use of new constants
- Constant definition deletion

Analysis of Complexity - Fault Identification and

Counting Rules (contd)

- Control flow faults
- Addition of new source code block
- Deletion of erroneous conditionally-executed

path(s) within a set of conditionally executed

statements - Addition of execution paths within a set of

conditionally executed statements - Redefinition of existing condition for execution

(e.g. change if i lt 9 to if i lt 9) - Removal of source code block
- Incorrect order of execution
- Addition of a procedure or function
- Deletion of a procedure or function

Analysis of Complexity (contd)

- Control flow fault examples - removing execution

paths from a code block - Counts as two faults, since two paths were removed

Analysis of Complexity (contd)

Control flow examples (contd) - addition of

conditional execution paths to code

block Counts as three faults, since

three paths were added

Analysis of Complexity - Estimating Fault Content

- The fault potential of a module i is directly

proportional to its relative complexity - From previous development projects develop a

proportionality constant, k, for total faults - Faults per module

Analysis of Complexity - Estimating Fault

Insertion Rate

- Proportionality constant, k, representing the

rate of fault insertion - For jth build, total faults insertion
- Estimate for the fault insertion rate

Analysis of Complexity (contd)

- The need for measurement
- The measurement process
- Measuring software change
- Faults and fault insertion
- Fault insertion rates

Analysis of Complexity - Relationships Between

Change in Fault Count and Structural Change

- code churn
- code delta

Analysis of Complexity - Regression Models

- is the number of faults inserted between

builds j and j1 - is the measured code churn between builds j

and j1 - is the measured code delta between builds j

and j1

Analysis of Complexity - PRESS Scores - Linear

vs. Nonlinear Models

Analysis of Complexity - Selecting an Adequate

Linear Model

- Linear model gives best R2 and PRESS score.
- Is the model based only on code churn an adequate

predictor at the 5 significance level? - R2-adequate test shows that code churn is not an

adequate predictor at the 5 significance level .

Analysis of Complexity - Analysis of Predicted

Residuals

Regression Tree Modeling

- Objectives
- Attractive way to encapsulate the knowledge of

experts and to aid decision making. - Uncovers structure in data
- Can handle data with complicated an unexplained

irregularities - Can handle both numeric and categorical variables

in a single model.

Regression Tree Modeling (contd)

- Algorithm
- Determine set of predictor variables (software

metrics) and a response variable (number of

faults). - Partition the predictor variable space such that

each partition or subset is homogeneous with

respect to the dependent variable. - Establish a decision rule based on the predictor

variables which will identify the programs with

the same number of faults. - Predict the value of the dependent variable which

is the average of all the observations in the

partition.

Regression Tree Modeling (contd)

- Algorithm (contd)
- Minimize the deviance function given by
- Establish stopping criteria based on
- Cardinality threshold - leaf node is smaller than

certain absolute size. - Homogeneity threshold - deviance of leaf node is

less than some small percentage of the deviance

of the root node I.e., leaf node is homogeneous

enough.

Regression Tree Modeling (contd)

- Application
- Software for medical imaging system, consisting

of 4500 modules amounting to 400,000 lines of

code written in Pascal, FORTRAN, assembly

language, and PL/M. - Random sample of 390 modules from the ones

written in Pascal and FORTRAN, consisting of

about 40,000 lines of code. - Software was developed over a period of five

years, and had been in use at several hundred

sites. - Number of changes made to the executable code

documented by Change Reports (CRs) indicates

software development effort.

Regression Tree Modeling (contd)

- Application
- Software for medical imaging system, consisting

of 4500 modules amounting to 400,000 lines of

code written in Pascal, FORTRAN, assembly

language, and PL/M. - Random sample of 390 modules from the ones

written in Pascal and FORTRAN, consisting of

about 40,000 lines of code. - Software was developed over a period of five

years, and had been in use at several hundred

sites. - Number of changes made to the executable code

documented by Change Reports (CRs) indicates

software development effort.

Regression Tree Modeling (contd)

- Application (contd)
- Metrics
- Total lines of code (TC)
- Number of code lines (CL)
- Number of characters (Cr)
- Number of comments (Cm)
- Comment characters (CC)
- Code characters (Co)
- Halsteads program length
- Halsteads estimate of program length metric
- Jensens estimate of program length metric
- Cyclomatic complexity metric
- Bandwidth metric

Regression Tree Modeling (contd)

- Pruning
- Tree grown using the stopping rules is too

elaborate. - Pruning - equivalent to variable selection in

linear regression. - Determines a nested sequence of subtrees of the

given tree by recursively snipping off partitions

with minimal gains in deviance reduction - Degree of pruning can be determine by using

cross-validation.

Regression Tree Modeling (contd)

- Pruning
- Tree grown using the stopping rules is too

elaborate. - Pruning - equivalent to variable selection in

linear regression. - Determines a nested sequence of subtrees of the

given tree by recursively snipping off partitions

with minimal gains in deviance reduction - Degree of pruning can be determine by using

cross-validation.

Regression Tree Modeling (contd)

- Cross-Validation
- Evaluate the predictive performance of the

regression tree and degree of pruning in the

absence of a separate validation set. - Data are divided into two mutually exclusive

sets, viz., learning sample and test sample. - Learning sample is used to grow the tree, while

the test sample is used to evaluate the tree

sequence. - Deviance - measure to assess the performance of

the prediction rule in predicting the number of

errors for the test sample of different tree

sizes.

Regression Tree Modeling (contd)

- Performance Analysis
- Two types of errors
- Predict more faults than the actual number - Type

I misclassification. - Predict fewer faults than actual number - Type II

error. - Type II error is more serious.
- Type II error in case of tree modeling is 8.7,

and in case of fault density is 13.1. - Tree modeling approach is significantly than

fault density approacy. - Can also be used to classify modules into

fault-prone and non fault-prone categories. - Decision rule - classifies the module as

fault-prone if the predicted number of faults is

greater than a certain a. - Choice of a determines the misclassification rate.

Part VIII Software Reliability Tools

- SRMP
- SMERFS
- CASRE

Where Do They Come From?

- Software Reliability Modeling Program (SRMP)
- Bev Littlewood of City University, London
- Statistical Modeling and Estimation of

Reliability Functions for Software (SMERFS) - William Farr of Naval Surface Warfare Center
- Computer-Aided Software Reliability Estimation

Tool (CASRE) - Allen Nikora, JPL Michael Lyu, Chinese

University of Hong Kong

SRMP Main Features

- Multiple Models (9)
- Model Application Scheme Multiple Iterations
- Data Format Time-Between-Failures Data Only
- Parameter Estimation Maximum Likelihood
- Multiple Evaluation Criteria - Prequential

Likelihood, Bias, Bias Trend, Model Noise - Simple U-Plots and Y-Plots

SMERFS Main Features

- Multiple Models (12)
- Model Application Scheme Single Execution
- Data Format Failure-Counts and Time-Between

Failures - On-line Model Description Manual
- Two parameter Estimation Methods
- Least Square Method
- Maximum Likelihood Method
- Goodness-of-fit Criteria Chi-Square Test, KS

Test - Model Applicability - Prequential Likelihood,

Bias, Bias Trend, Model Noise - Simple Plots

The SMERFS Tool Main Menu

- Data Input
- Data Edit
- Data Transformation
- Data Statistics

- Plots of the Raw Data
- Model Applicability Analysis
- Executions of the Models
- Analyses of Model Fit
- Stop Execution of SMERFS

CASRE Main Features

- Multiple Models (12)
- Model Application Scheme Multiple Iterations
- Goodness-of-Fit Criteria - Chi-Square Test, KS

Test - Multiple Evaluation Criteria - Prequential

Likelihood, Bias, Bias Trend, Model Noise - Conversions between Failure-Counts Data and

Time-Between-Failures Data - Menu-Driven, High-Resolution Graphical User

Interface - Capability to Make Linear Combination Models

CASRE High-Level Architecture

Further Reading

- A. A. Abdel-Ghaly, P. Y. Chan, and B. Littlewood

"Evaluation of Competing Software Reliability

Predictions," IEEE Transactions on Software

Engineering vol. SE-12, pp. 950-967 Sep. 1986. - T. Bowen, "Software Quality Measurement for

Distributed Systems", RADC TR-83-175. - W. K. Erlich, A. Iannino, B. S. Prasanna, J. P.

Stampfel, and J. R. Wu, "How Faults Cause

Software Failures Implications for Software

Reliability Engineering", published in

proceedings of the International Symposium on

Software Reliability Engineering, pp 233-241, May

17-18, 1991, Austin, TX - M. E. Fagan, "Advances in Software Inspections",

IEEE Transactions on Software Engineering, vol

SE-12, no 7, July, 1986, pp 744-751 - M. E. Fagan, "Design and Code Inspections to

Reduce Errors in Program Development," IBM

Systems Journal, Volume 15, Number 3, pp 182-211,

1976 - W. H. Farr, O. D. Smith, and C. L.

Schimmelpfenneg, "A PC Tool for Software

Reliability Measurement," published in the 1988

Proceedings of the Institute of Environmental

Sciences, King of Prussia, PA

Further Reading (contd)

- W. H. Farr, O. D. Smith, "Statistical Modeling

and Estimation of Reliability Functions for

Software (SMERFS) User's Guide," Naval Weapons

Surface Center, December 1988 (approved for

unlimited public distribution by NSWC) - J. E. Gaffney, Jr. and C. F. Davis, "An Approach

to Estimating Software Errors and Availability,"

SPC-TR-88-007, version 1.0, March, 1988,

proceedings of Eleventh Minnowbrook Workshop on

Software Reliability, July 26-29, 1988, Blue

Mountain Lake, NY - J. E. Gaffney, Jr. and J. Pietrolewicz, "An

Automated Model for Software Early Error

Prediction (SWEEP)," Proceedings of Thirteenth

Minnow-brook Workshop on Software Reliability,

July 24-27, 1990, Blue Mountain Lake, NY - A. L. Goel, S. N. Sahoo, "Formal Specifications

and Reliability An Experimental Study",

published in proceedings of the International

Symposium on Software Reliability Engineering, pp

139-142, May 17-18, 1991, Austin, TX - A. Grnarov, J. Arlat, A. Avizienis, "On the

Performance of Software Fault-Tolerance

Strategies", published in the proceedings of the

Tenth International Symposium on Fault Tolerant

Computing (FTCS-10), Kyoto, Japan, October, 1980,

pp 251-253

Further Reading (contd)

- K. Kanoun, M. Bastos Martini, J. Moreira De

Souza, A Method for Software Reliability

Analysis and Prediction - Application to the

TROPICO-R Switching System, IEEE Transactions on

Software Engineering, April 1991, pp, 334-344 - J. C. Kelly, J. S. Sherif, J. Hops, "An Analysis

of Defect Densities Found During Software

Inspections", Journal of Systems Software, vol

17, pp 111-117, 1992 - T. M. Khoshgoftaar and J. C. Munson, "A Measure

of Software System Complexity and its

Relationship to Faults," proceedings of 1992

International Simulation Technology Conference

and 992 Workshop on Neural Networks (SIMTEC'92 -

sponsored by the Society for Computer

Simulation), pp. 267-272, November 4-6, 1992,

Clear Lake, TX - M. Lu, S. Brocklehurst, and B. Littlewood,

"Combination of Predictions Obtained from

Different Software Reliability Growth Models,"

proceedings of the IEEE 10th Annual Software

Reliability Symposium, pp 24-33, June 25-26,

1992, Denver, CO - M. Lyu, ed. Handbook of Software Reliablity

Engineering, McGraw-Hill and IEEE Computer

Society Press, 1996, ISBN 0-07-0349400-8

Further Reading (contd)

- M. Lyu, "Measuring Reliability of Embedded

Software An Empirical Study with JPL Project

Data," published in the Proceedings of the

International Conference on Probabilistic Safety

Assessment and Management February 4-6, 1991,

Los ngeles, CA. - M. Lyu and A. Nikora, "A Heuristic Approach for

Software Reliability Prediction The

Equally-Weighted Linear Combination Model,"

published in the proceedings of the IEEE

International Symposium on Software Reliability

Engineering, May 17-18, 1991, Austin, TX M. Lyu

and A. Nikora, "Applying Reliability Models More

Effectively", IEEE Software, vol. 9, no. 4, pp.

43-52, July, 1992 - M. Lyu and A. Nikora, "Software Reliability

Measurements Through Com-bination Models

Approaches, Results, and a CASE Tool,"

proceedings the 15th Annual International

Computer Software and Applications Conference

COMPSAC91), September 11-13, 1991, Tokyo, Japan - J. McCall, W. Randall, S. Fenwick, C. Bowen, P.

Yates, N. McKelvey, M. Hecht, H. Hecht, R. Senn,

J. Morris, R. Vienneau, "Methodology for Software

Reliability Prediction and Assessment," Rome Air

Development Center (RADC) Technical Report

RADC-TR-87-171. volumes 1 and 2, 1987 - J. Munson and T. Khoshgoftaar, "The Use of

Software Metrics in Reliability Models,"

presented at the initial meeting of the IEEE

Subcommittee on Software Reliability Engineering,

April 12-13, 1990, Washington, DC

Further Reading (contd)

- J. C. Munson, "Software Measurement Problems and

Practice," Annals of Software Engineering, J. C.

Baltzer AG, Amsterdam 1995. - J. C. Munson, Software Faults, Software

Failures, and Software Reliability Modeling,

Information and Software Technology, December,

1996. - J. C. Munson and T. M. Khoshgoftaar

Regression Modeling of Software Quality An

Empirical Investigation, Journal of Information

and Software Technology, 32, 1990, pp. 105-114. - J. Munson, A. Nikora, Estimating Rates of Fault

Insertion and Test Effectiveness in Software

Systems, invited paper, published in Proceedings

of the Fourth ISSAT International Conference on

Quality and Reliability in Design, Seattle, WA,

August 12-14, 1998 - John D. Musa., Anthony Iannino, Kazuhiro Okumoto,

Software Reliability Measurement, Prediction,

Application McGraw-Hill, 1987 ISBN

0-07-044093-X. - A. Nikora, J. Munson, Finding Fault with Faults

A Case Study, presented at the Annual Oregon

Workshop on Software Metrics, May 11-13, 1997,

Coeur dAlene, ID. - A. Nikora, N. Schneidewind, J. Munson, "IVV

Issues in Achieving High Reliability and Safety

in Critical Control System Software," proceedings

of the Third ISSAT International Conference on

Reliability and Quality in Design, March 12-14,

1997, Anaheim, CA.

Further Reading (contd)

- A. Nikora, J. Munson, Determining Fault

Insertion Rates For Evolving Software Systems,

proceedings of the Ninth International Symposium

on Software Reliability Engineering, Paderborn,

Germany, November 4-7, 1998 - Norman F. Schneidewind, Ted W,.Keller, "Applying

Reliability Models to the Space Shuttle", IEEE

Software, pp 28-33, July, 1992 - N. Schneidewind, Reliability Modeling for

Safety-Critical Software, IEEE Transactions on

Reliability, March, 1997, pp. 88-98 - N. Schneidewind, "Measuring and Evaluating

Maintenance Process Using Reliability, Risk, and

Test Metrics", proceedings of the International

Conference on Software Maintenance, September

29-October 3, 1997, Bari, Italy. - N. Schneidewind, "Software Metrics Model for

Integrating Quality Control and Prediction",

proceedings of the 8th International Sympsium on

Software Reliability Engineering, November 2-5,

1997, Albuquerque, NM. - N. Schneidewind, "Software Metrics Model for

Quality Control", Proceedings of the Fourth

International Software Metrics Symposium,

November 5-7, 1997, Albuquerque, NM.

Additional Information

- CASRE Screen Shots
- Further modeling details
- Additional Software Reliability Models
- Quantitative Criteria for Model Selection the

Subadditivity Property - Increasing the Predictive Accuracy of Models

CASRE - Initial Display

CASRE - Applying Filters

CASRE - Running Average Trend Test

CASRE - Laplace Test

CASRE - Selecting and Running Models

CASRE - Displaying Model Results

CASRE - Displaying Model Results (contd)

CASRE - Prequential Likelihood Ratio

CASRE - Model Bias

CASRE - Model Bias Trend

CASRE - Ranking Models

CASRE - Model Ranking Details

CASRE - Model Ranking Details (contd)

CASRE - Model Results Table

CASRE - Model Results Table (contd)

CASRE - Model Results Table (contd)

Additional Software Reliability Models

- Software Reliability Estimation Models
- Exponential NHPP Models
- Generalized Poisson Model
- Non-homogeneous Poisson Process Model
- Musa Basic Model
- Musa Calendar Time Model
- Schneidewind Model
- Littlewood-Verrall Bayesian Model
- Hyperexponential Model

Generalized Poisson Model

- Proposed by Schafer, Alter, Angus, and Emoto for

Hughes Aircraft Company under contract to RADC in

1979. - Model is analogous in form to the

Jelinski-Moranda model but taken within the error

count framework. The model can be shown reduce

to the Jelinski-Moranda model under the

appropriate circumstances.

Generalized Poisson Model (cont'd)

- Assumptions
- The expected number of errors occurring in any

time interval is proportional to the error

content at the time of testing and to some

function of the amount of time spent in error

testing. - All errors are equally likely to occur and are

independent of each other. - Each error is of the same order of severity as

any other error. - The software is operated in a similar manner as

the anticipated usage. - The errors are corrected at the ends of the

testing intervals without introduction of new

errors into the program.

Generalized Poisson Model (cont'd)

- Construction of Model
- Given testing intervals of length X1, X2,...,Xn
- fi errors discovered during the i'th interval
- At the end of the i'th interval, a total of Mi

errors have been corrected - First assumption of the model yields
- E(fi) ?(N - Mi-1)gi (x1, x2, ..., xi)
- where
- ? is a proportionality constant
- N is the initial number of errors
- gi is a function of the amount of testing time

spent, previously and currently. gi is usually

non-decreasing. If gi (x1, x2, ..., xi) xi,

then the model reduces to the Jelinski-Moranda

model.

Schneidewind Model

- Proposed by Norman Schneidewind in 1975.
- Model's basic premise is that as the testing

progresses over time, the error detection process

changes. Therefore, recent error counts are

usually of more use than earlier counts in

predicting future error counts. - Schneidewind identifies three approaches to using

the error count data. These are identified in

the following slide.

Schneidewind Model

- First approach is to use all of the error counts

for all testing intervals. - Second approach is to use only the error counts

from test intervals s through m and ignore

completely the error counts from the first s - 1

test intervals, assuming that there have been m

test intervals to date. - Third approach is a hybrid approach which uses

the cumulative error count for the first s - 1

intervals and the individual error counts for the

last m - s 1 intervals.

Schneidewind Model (cont'd)

- Assumptions
- The number of errors detected in one interval is

independent of the error count in another. - The error correction rate is proportional to the

number of errors to be corrected. - The software is operated in a similar manner as

the anticipated operational usage. - The mean number of detected errors decreases from

one interval to the next. - The intervals are all of the same length.

Schneidewind Model (cont'd)

- Assumptions (cont'd)
- The rate of error detection is proportional the

number of