SOFTWARE METRICS FOR CONTROL AND QUALITY ASSURANCE COURSE OVERVIEW - PowerPoint PPT Presentation

1 / 206
About This Presentation
Title:

SOFTWARE METRICS FOR CONTROL AND QUALITY ASSURANCE COURSE OVERVIEW

Description:

software metrics for control and quality assurance course overview – PowerPoint PPT presentation

Number of Views:843
Avg rating:3.0/5.0
Slides: 207
Provided by: Norman121
Category:

less

Transcript and Presenter's Notes

Title: SOFTWARE METRICS FOR CONTROL AND QUALITY ASSURANCE COURSE OVERVIEW


1
SOFTWARE METRICS FOR CONTROL AND QUALITY
ASSURANCE COURSE OVERVIEW
2
Course Objectives
  • At the end of this section of the course you
    should be able to
  • write a metrics plan (define appropriate software
    metrics and data collection programmes to satisfy
    different quality assurance objectives)
  • understand the importance of quantification in
    software engineering.
  • differentiate between good and bad use of
    measurement in software engineering
  • know how to use a range of software measurement
    techniques to monitor product and process quality
  • analyse different types of software metrics
    datasets
  • use a method for software risk management that
    takes account of multiple factors and uncertainty

3
Course Structure
  • Software quality metrics basics
  • Software metrics practice
  • Framework for software metrics
  • Software reliability
  • Measurement theory and statistical analysis
  • Empirical software engineering
  • Software metrics for risk and uncertainty

4
Recommended Reading
  • The main course text for this part of the course
    is
  • Fenton NE and Pfleeger SL, Software Metrics A
    Rigorous Practical Approach (2nd Edn), PWS,
    1998

5
LESSON 1SOFTWARE QUALITY METRICS BASICS
6
Lesson 1 objectives
  • Understand different definitions of software
    quality and how you might measure it
  • Understand different notions of defects and be
    able to classify them
  • Understand the basic techniques of data
    collection and how to apply them

7
How many Lines of Code?
8
What is software quality?
  • Fitness for purpose?
  • Conformance to specification?
  • Absence of defects?
  • Degree of excellence?
  • Timeliness?
  • All of the above?
  • None of the above?

9
Software quality - relevance
high
Timeliness Time to market
Productivity LOC or FP per month
Relevance to producer
Technical product quality delivered
defects per KLOC
Conformance to schedule deviation from planned
budgets/ requirements
Process maturity/stability capability index
low
Relevance to customer
high
10
Software Quality Models
Use
Factor
Criteria
Communicativeness
METRICS
Accuracy
Consistency
Product operation
Device Efficiency
Accessibility
Completeness
Structuredness
Conciseness
Product revision
Device independence
Legability
Self-descriptiveness
Traceability
11
Definition of system reliability
The reliability of a system is the probability
that the system will execute without failure in a
given environment for a given period of time.
  • Implications
  • No single reliability number for a given system -
    dependent on how the system is used
  • Use probability to express our uncertainty
  • Time dependent

12
What is a software failure?
  • Alternative views
  • Formal view
  • Any deviation from specified program behaviour
    is a failure
  • Conformance with specification is all that
    matters
  • This is the view adopted in computer science
  • Engineering view
  • Any deviation from required, specified or
    expected behaviour is a failure
  • If an input is unspecified the program should
    produce a sensible output appropriate for the
    circumstances
  • This is the view adopted in dependability
    assessment

13
Human errors, faults, and failures
?
can lead to
can lead to
human error
fault
failure
  • Human Error Designers mistake
  • Fault Encoding of an error into a software
    document/product
  • Failure Deviation of the software system from
    specified or expected behaviour

14
Processing errors
In the absence of fault tolerance
Human Error
Fault
Processing Error
Failure
Input
15
Relationship between faults and failures (Adams
1984)
Failures (sized by MTTF)
Faults
35 of all faults only lead to very rare failures
(MTTFgt5000 years)
16
The relationship between faults and failures
  • Most faults are benign
  • For most faults removal will not lead to greatly
    improved reliability
  • Large reliability improvements only come when we
    eliminate the small proportion of faults which
    lead to the more frequent failures
  • Does not mean we should stop looking for faults,
    but warns us to be careful about equating fault
    counts with reliability

17
The defect density measure an important health
warning
  • Defects faults È failures
  • but sometimes defects faults or defects
    failures
  • System defect density
  • where size is usually measured as thousands of
    lines of code (KLOC)
  • Defect density is used as a de-facto measure of
    software quality.
  • in the light of the Adams data this is very
    dangerous
  • What are industry norms and what do they mean?

number of defects found
system size
18
Defect density Vs module size
Defect Density
Theory
Observation?
Lines of Code
19
A Study in Relative Efficiency of Testing Methods
R B Grady, Practical Software metrics for
Project Management and Process Improvement,
Prentice Hall, 1992
20
The problem with problems
  • Defects
  • Faults
  • Failures
  • Anomalies
  • Bugs
  • Crashes

21
Incident Types
  • Failure (in pre or post release)
  • Fault
  • Change request

22
Generic Data
  • Applicable to all incident types
  • What Product details
  • Where (Location) Where is it?
  • Who Who found it?
  • When (Timing) When did it occur?
  • What happened (End Result) What was observed?
  • How (Trigger) How did it arise?
  • Why (Cause) Why did it occur?
  • Severity/Criticality/Urgency
  • Change

23
Example Failure Data
  • What ABC Software Version 2.3
  • Where Normans home PC
  • Who Norman
  • When 13 Jan 2000 at 2108 after 35 minutes of
    operational use
  • End result Program crashed with error message
    xyz
  • How Loaded external file and clicked the command
    Z.
  • Why ltBLANK - refer to faultgt
  • Severity Major
  • Change ltBLANKgt

24
Example Fault Data (1) - reactive
  • What ABC Software Version 2.3
  • Where Help file, section 5.7
  • Who Norman
  • When 15 Jan 2000, during formal inspection
  • End result Likely to cause users to enter
    invalid passwords
  • How The text wrongly says that passwords are
    case sensitive
  • Why ltBLANKgt
  • Urgency Minor
  • Change Suggest rewording as follows ...

25
Example Fault Data (2) - responsive
  • What ABC Software Version 2.3
  • Where Function ltabcdgt in Module ltts0023gt
  • Who Simon
  • When 14 Jan 2000, after 2 hours investigation
  • What happened Caused reported failure id lt0096gt
  • How ltBLANKgt
  • Why Missing exception code for command Z
  • Urgency Major
  • Change exception code for command Z added to
    function ltabcdgt and also to function ltefghgt.
    Closed on 15 Jan 2000.

26
Example Change Request
  • What ABC Software Version 2.3
  • Where File save menu options
  • Who Norman
  • When 20 Jan 2000
  • End result ltBLANKgt
  • How ltBLANKgt
  • Why Must be able to save files in ascii format -
    currently not possible
  • Urgency Major
  • Change Add function to enable ascii format file
    saving

27
Tracking incidents to components
  • Incidents need to be traceable to identifiable
    components - but at what level of granularity?
  • Unit
  • Module
  • Subsystem
  • System

28
Fault classifications used in Eurostar control
system
29
Lesson 1 Summary
  • Software quality is a multi-dimensional notion
  • Defect density is a common (but confusing) way of
    measuring software quality
  • The notion of defects or problems is highly
    ambiguous - distinguish between faults and
    failures
  • Removing faults may not lead to large reliability
    improvements
  • Much data collection focuses on incident types
    failures, faults, and changes. There are who,
    when, where,.. type data to collect in each case
  • System components must be identified at
    appropriate levels of granularity

30
LESSON 2 SOFTWARE METRICS PRACTICE
31
Lesson 2 Objectives
  • Understand why measurement is important for
    software quality assurance and assessment
  • Understand the basic metrics approaches used in
    industry and how to apply them
  • Understand the importance of goal-driven
    measurement and know how to identify specific
    goals
  • Understand what a metrics plan is and how to
    write one

32
Why software measurement?
  • To assess software products
  • To assess software methods
  • To help improve software processes

33
From Goals to Actions
34
Goal Question Metric (GQM)
  • There should be a clearly-defined need for every
    measurement.
  • Begin with the overall goals of the project or
    product.
  • From the goals, generate questions whose answers
    will tell you if the goals are met.
  • From the questions, suggest measurements that can
    help to answer the questions.
  • From Basili and Rombachs Goal-Question-Metrics
    paradigm, described in IEEE Transactions on
    Software Engineering, 1988 paper on the TAME
    project.

35
GQM Example
Identify fault-prone modules as early as possible
Goal
What do we mean by fault-prone module?
Does complexity impact fault-proneness?
How much testing is done per module?
Questions
.
  • Defect data for each module
  • faults found per testing phase
  • failures traced to module

Metrics
  • Effort data for each module
  • Testing effort per testing phase
  • faults found per testing phase
  • Size/complexity data
  • for each module
  • KLOC
  • complexity metrics

36
The Metrics Plan
  • For each technical goal this contains information
    about
  • WHY metrics can address the goal
  • WHAT metrics will be collected, how they will be
    defined, and how they will be analyzed
  • WHO will do the collecting, who will do the
    analyzing, and who will see the results
  • HOW it will be done - what tools, techniques and
    practices will be used to support metrics
    collection and analysis
  • WHEN in the process and how often the metrics
    will be collected and analyzed
  • WHERE the data will be stored

37
The Enduring LOC Measure
  • LOC Number of Lines Of Code
  • The simplest and most widely used measure of
    program size. Easy to compute and automate
  • Used (as normalising measure) for
  • productivity assessment (LOC/effort)
  • effort/cost estimation (Effort f(LOC))
  • quality assessment/estimation (defects/LOC))
  • Alternative (similar) measures
  • KLOC Thousands of Lines Of Code
  • KDSI Thousands of Delivered Source Instructions
  • NCLOC Non-Comment Lines of Code
  • Number of Characters or Number of Bytes

38
Example Software Productivity at Toshiba
Instructions per programmer month
300
Introduced Software
250
Workbench System
200
150
100
50
0
1972
1974
1976
1978
1980
1982
39
Problems with LOC type measures
  • No standard definition
  • Measures length of programs rather than size
  • Wrongly used as a surrogate for
  • effort
  • complexity
  • functionality
  • Fails to take account of redundancy and reuse
  • Cannot be used comparatively for different types
    of programming languages
  • Only available at the end of the development
    life-cycle

40
Fundamental software size attributes
  • length the physical size of the product
  • functionality measures the functions supplied by
    the product to the user
  • complexity
  • Problem complexity measures the complexity of the
    underlying problem.
  • Algorithmic complexity reflects the
    complexity/efficiency of the algorithm
    implemented to solve the problem
  • Structural complexity measures the structure of
    the software used to implement the algorithm
    (incudes control flow structure, hierarchical
    structure and modular structure)
  • Cognitive complexity measures the effort required
    to understand the software.

41
The search for more discriminating metrics
  • Measures that
  • capture cognitive complexity
  • capture structural complexity
  • capture functionality (or functional complexity)
  • are language independent
  • can be extracted at early life-cycle phases

42
The 1970s Measures of Source Code
  • Characterized by
  • Halsteads Software Science metrics
  • McCabes Cyclomatic Complexity metric
  • Influenced by
  • Growing acceptance of structured programming
  • Notions of cognitive complexity

43
Halsteads Software Science Metrics
A program P is a collection of tokens, classified
as either operators or operands.
n1 number of unique operators n2 number of
unique operands N1 total occurrences of
operators N2 total occurrences of operands
Length of P is N N1N2 Vocabulary of P is n
n1n2
Theory Estimate of N is N n1 log n1 n2 log
n2
Theory Effort required to generate P is
n1 N2 N log n 2n2
(elementary mental discriminations)
E
Theory Time required to program P is TE/18
seconds
44
McCabes Cyclomatic Complexity Metric v
If G is the control flowgraph of program P and G
has e edges (arcs) and n nodes
v(P) e-n2
v(P) is the number of linearly independent paths
in G
here e 16 n 13 v(P) 5
More simply, if d is the number of decision nodes
in G then
v(P) d1
McCabe proposed v(P)lt10 for each module P
45
Flowgraph based measures
  • Many software measures are based on a flowgraph
    model of a program
  • Most such measures can be automatically computed
    once the flowgraph decomposition is known
  • The notion of flowgraph decomposition provides a
    rigorous, generalised theory of structured
    programming
  • There are tools for computing flowgraph
    decomposition

46
The 1980s Early Life-Cycle Measures
  • Predictive process measures - effort and cost
    estimation
  • Measures of designs
  • Measures of specifications

47
Software Cost Estimation
48
Simple COCOMO Effort Prediction
effort a (size)b
effort person months size KDSI
(predicted) a,b constants depending on type of
system
organic a 2.4 b
1.05 semi-detached a 3.0 b
1.12 embedded a 3.6 b 1.2
49
COCOMO Development Time Prediction
time a (effort)b
effort person months time development time
(months) a,b constants depending on type of
system
organic a 2.5 b
0.38 semi-detached a 2.5 b
0.35 embedded a 2.5 b 0.32
50
Regression Based Cost Modelling
log E (Effort)
10,000
Slope b
1000
100
log E log a b log S E a S b
10
log a
1K
10K
100K
1000K
10000K
log S(Size)
51
Albrechts Function Points
Count the number of
External inputs External outputs External
inquiries External files Internal files
giving each a weighting factor
The Unadjusted Function Count (UFC) is the sum
of all these weighted scores
To get the Adjusted Function Count (FP),
multiply by a Technical Complexity Factor (TCF)
FP UFC x TCF
52
Function Points Example
53
Function Points Applications
  • Used extensively as a size measure in
    preference to LOC
  • Examples

FP Person months effort
Productivity
Defects FP
Quality
Effort prediction
Ef(FP)
54
Function Points and Program Size
Language
Source Statements per FP
Assembler C Algol COBOL FORTRAN Pascal RPG PL/1 MO
DULA-2 PROLOG LISP BASIC 4 GL Database APL SMALLTA
LK Query languages Spreadsheet languages
320 150 106 106 106 91 80 80 71 64 64 64 40 32 21
16 6
55
The 1990s Broader Perspective
  • Reports on Company-wide measurement programmes
  • Benchmarking
  • Impact of SEIs CMM process assessment
  • Use of metrics tools
  • Measurement theory as a unifying framework
  • Emergence of international software measurement
    standards
  • measuring software quality
  • function point counting
  • general data collection

56
The SEI Capability Maturity Model
Level 5 Optimising
Process change management Technology change
management Defect prevention
Level 4 Managed
Software quality management Quantitative process
mgment
Level 3 Defined
Peer reviews Training programme Intergroup
coordination Integrated s/w management Organizatio
n process definition/focus
Level 2 Repeatable
S/W configuration management S/W QA S/W
project planning S/W subcontract management S/W
requirements management
Level 1 Initial/ad-hoc
57
Results of 1987-1991 SEI Assessments
Level 1 Level 2 Level 3 Level 4 Level 5
81 12 7 0 0
87 9 4 0 0
62 23 15 0 0
58
Process improvement at Motorola
In-process defects/MAELOC
59
IBM Space Shuttle Software Metrics Program (1)
Early detection rate
Total inserted error rate
60
IBM Space Shuttle Software Metrics Program (2)
Predicted total error rate trend (errors per KLOC)
14
12
10
8
95 high
6
Actual
expected
4
2
95 low
0
1
3
5
7
8A
8C
8F
Onboard flight software releases
61
IBM Space Shuttle Software Metrics Program (3)
Onboard flight software failures occurring per
base system
Basic operational increment
62
ISO 9126 Software Product Evaluation Standard
  • Quality characteristics and guidelines for their
    use
  • Chosen characteristics are
  • Functionality
  • Reliability
  • Usability
  • Efficiency
  • Maintainability
  • Portability

63
Lesson 2 Summary
  • Measurement activities should be goal-driven
  • Metrics Plan details how to create metrics
    programme to meet specific technical objectives
  • Software metrics usually driven by objectives
  • productivity assessment
  • cost/effort estimation
  • quality assessment and prediction
  • All common metrics traceable to above objectives
  • Recent trend away from specific metrics and
    models toward company-wide metrics programmes
  • Software measurement now widely accepted as key
    subject area in software engineering

64
LESSON 3 SOFTWARE METRICS FRAMEWORK
65
Lesson 3 Objectives
  • Learn basic measurement definitions and a
    software metrics framework that conforms to these
  • Understand how and why diverse metrics activities
    fit into the framework
  • Learn how to define your own relevant metrics in
    a rigorous way
  • Bringing it together in case study

66
Software Measurement Activities
Are these diverse activities related?
67
Opposing Views on Measurement?
  • When you can measure what you are speaking
    about, and express it in numbers, you know
    something about it but when you cannot measure
    it, when you cannot express it in numbers, your
    knowledge is of a meagre kind.
  • Lord Kelvin
  • In truth, a good case could be made that if your
    knowledge is meagre and unsatisfactory, the last
    thing in the world you should do is make
    measurements. The chance is negligible that you
    will measure the right things accidentally.
  • George Miller

68
Definition of Measurement
Measurement is the process of empirical objective
assignment of numbers to entities, in order to
characterise a specific attribute.
  • Entity an object or event
  • Attribute a feature or property of an entity
  • Objective the measurement process must be
    based on a well-defined rule whose results
    are repeatable

69
Example Measures
70
Avoiding Mistakes in Measurement
  • Common mistakes in software measurement can be
    avoided simply by adhering to the definition of
    measurement. In particular
  • You must specify both entity and attribute
  • The entity must be defined precisely
  • You must have a reasonable, intuitive
    understanding of the attribute before you propose
    a measure
  • The theory of measurement formalises these ideas

71
Be Clear of Your Attribute
  • It is a mistake to propose a measure if there
    is no consensus on what attribute it
    characterises.
  • Results of an IQ test
  • intelligence?
  • or verbal ability?
  • or problem solving skills?
  • defects found / KLOC
  • quality of code?
  • quality of testing?

72
A Cautionary Note
  • We must not re-define an attribute to fit in with
    an existing measure.

73
Types and uses of measurement
  • Two distinct types of measurement
  • direct measurement
  • indirect measurement
  • Two distinct uses of measurement
  • for assessment
  • for prediction
  • Measurement for prediction requires a prediction
    system

74
Some Direct Software Measures
  • Length of source code (measured by LOC)
  • Duration of testing process (measured by elapsed
    time in hours)
  • Number of defects discovered during the testing
    process (measured by counting defects)
  • Effort of a programmer on a project (measured by
    person months worked)

75
Some Indirect Software Measures
LOC produced person months of effort
Programmer productivity
number of defects module size
Module defect density Defect detection efficiency
Requirements stability Test effectiveness
ratio System spoilage
number of defects detected total number of defects
numb of initial requirements total number of
requirements
number of items covered total number of items
effort spent fixing faults total project effort
76
Predictive Measurement
  • Measurement for prediction requires a prediction
    system. This consists of
  • Mathematical model
  • e.g. EaSb where E is effort in person months
    (to be predicted), S is size (LOC), and a and b
    are constants.
  • Procedures for determining model parameters
  • e.g. Use regression analysis on past project
    data to determine a and b.
  • Procedures for interpreting the results
  • e.g. Use Bayesian probability to determine the
    likelihood that your prediction is accurate to
    within 10

77
No Short Cut to Accurate Prediction
  • Testing your methods on a sample of past data
    gets to the heart of the scientific approach to
    gambling. Unfortunately this implies some
    preliminary spadework, and most people skimp on
    that bit, preferring to rely on blind faith
    instead
  • Drapkin and Forsyth 1987
  • Software prediction (such as cost estimation) is
    no different from gambling in this respect

78
Products, Processes, and Resources
  • Process a software related activity or event
  • testing, designing, coding, etc.
  • Product an object which results from a process
  • test plans, specification and design documents,
    source and object code, minutes of meetings, etc.
  • Resource an item which is input to a process
  • people, hardware, software, etc.

79
Internal and External Attributes
  • Let X be a product, process, or resource
  • External attributes of X are those which can only
    be measured with respect to how X relates to its
    environment
  • e.g. reliability or maintainability of source
    code (product)
  • Internal attributes of X are those which can be
    measured purely in terms of X itself
  • e.g. length or structuredness of source code
    (product)

80
The Framework Applied
ATTRIBUTES
ENTITIES
External
Internal
PRODUCTS Specification Source Code ....
Length, functionality modularity, structuredness,
reuse ....
maintainability reliability .....
PROCESSES Design Test ....
time, effort, spec faults found time, effort,
failures observed ....
stability cost-effectiveness ....
RESOURCES People Tools ....
age, price, CMM level price, size ....
productivity usability, quality ....
81
Lesson 3 Summary
  • Measurement is about characterising attributes of
    entities
  • Measurement can be either direct or indirect
  • Measurement is either for assessment or
    prediction
  • The framework for software measurement is based
    on
  • classifying software entities as products,
    processes, and resources
  • classifying attributes as internal or external
  • determining whether the activity is assessment or
    prediction
  • only when you can answer all these questions are
    you ready for measurment

82
CASE STUDY COMPANY OBJECTIVES
  • Monitor and improve product reliability
  • requires information about actual operational
    failures
  • Monitor and improve product maintainability
  • requires information about fault discovery and
    fixing
  • Process improvement
  • too high a level objective for metrics programme
  • previous objectives partially characterise
    process improvement

83
General System Information
  • 27 releases since Nov '87 implementation
  • Currently 1.6 Million LOC in main system (15.2
    increase from 1991 to 1992)

1600000
1400000
1200000
LOC
1000000
COBOL
800000
Natural
600000
400000
200000
0
1991
1992
84
Main Data
Fault Number
Week In
System Area
Fault Type
Week Out
Hours to Repair
...
...
...
...
...
...
F254
92/14
C2
P
92/17
5.5
  • faults are really failures (the lack of a
    distinction caused problems)
  • 481 (distinct) cleared faults during the year
  • 28 system areas (functionally cohesive)
  • 11 classes of faults
  • Repair time actual time to locate and fix defect

85
Case Study Components
  • 28 System areas
  • All closed faults traced to system area
  • System areas made up of Natural, Batch COBOL, and
    CICS COBOL programs
  • Typically 80 programs in each. Typical program
    1000 LOC
  • No documented mapping of program to system area
  • For most faults batch repair and reporting
  • No direct, recorded link between fault and
    program in most cases
  • No database with program size information
  • No historical database to capture trends

86
Single Incident Close Report
Fault id Reported Definition Description Progr
ams changed SPE Date closed
F752 18/6/92 Logically deleted work done
records appear on enquiries Causes misleading
info to users Amend ADDITIONAL WORK
PERFORMED RDVIPG2A to ignore work done records
with FLAG-AMEND 1 or 2 RDVIPG2A, RGHXXZ3B Joe
Bloggs 26/6/92
87
Single Incident Close Report Improved Version
Fault id Reported Trigger End result Cause Ch
ange Programs changed SPE Date closed
F752 18/6/92 Delete work done record, then open
enquiry Deleted records appear on enquiries,
providing misleading info to users Omission of
appropriate flag variables for work done
records Amend ADDITIONAL WORK PERFORMED in
RDVIPG2A to ignore work done records
with FLAG-AMEND 1 or 2 RDVIPG2A, RGHXXZ3B Joe
Bloggs 26/6/92
88
Fault Classification
Non-orthogonal Data Micro JCL Operations Misc Un
resolved
Program Query Release Specification User
89
Missing Data
  • Recoverable
  • Size information
  • Static/complexity information
  • Mapping of faults to programs
  • Severity categories
  • Non-recoverable
  • Operational usage per system area
  • Success/failure of fixes
  • Number of repeated failures

90
Reliability Trend
Faults received per week
50
40
30
Faults
20
10
0
10
20
30
40
50
Week
91
Identifying Fault Prone Systems?
Number or faults per system area (1992)
90
80
70
60
faults
50
40
30
20
10
0
C2
J
System area
92
Analysis of Fault Types
Faults by fault type (total 481 faults)
Others
Data
User
Query
Unresolved
Release
Misc
Program
93
Fault Types and System Areas
Most common faults over system areas
70
60
50
Program
40
Data
faults
User
30
Release
20
Unresolved
10
Query
Miscellaneous
0
Area
94
Maintainability Across System Areas
Mean Time To Repair Fault (by system area)
10
9
8
7
hours
6
5
4
3
2
1
0
D
O
S
W1
F
W
C3
P
L
G
C1
J
T
D1
G2
N
Z
C
C2
G1
U
System Area
95
Maintainability Across Fault Types
Mean Time To Repair Fault (by fault type)
9
8
7
JCL Program Spec Release Operations User Unresolve
d Misc Data Query
6
5
Hours
4
3
2
1
0
Fault type
96
Case study results with additional data System
Structure
97
Normalised Fault Rates (1)
Faults per KLOC
98
Normalised Fault Rates (2)
Faults per KLOC
99
Case Study 1 Summary
  • The hard to collect data was mostly all there
  • Exceptional information on post-release faults
    and maintenance effort
  • It is feasible to collect this crucial data
  • Some easy to collect (but crucial) data was
    omitted or not accessible
  • The addition to the metrics database of some
    basic information (mostly already collected
    elsewhere) would have enabled proactive activity.
  • Goals almost fully met with the simple additional
    data.
  • Crucial explanatory analysis possible with simple
    additional data
  • Goals of monitoring reliability and
    maintainability only partly met with existing data

100
LESSSON 4 SOFTWARE METRICS MEASUREMENT THEORY
AND STATISTICAL ANALYSIS
101
Lesson 4 Objectives
  • To understand in a formal sense what it means to
    measure something and to know when we have a
    satisfactory measure
  • To understand the different measurement scale
    types
  • To understand which types of statistical analyses
    are valid for which scale types
  • To be able to perform some simple statistical
    analyses relevant to software measurement data

102
Natural Evolution of Measures
  • As our understanding of an attribute grows, it is
    possible to define more sophisticated measures
    e.g. temperature of liquids
  • 200BC - rankings, hotter than
  • 1600 - first thermometer preserving hotter
    than
  • 1720 - Fahrenheit scale
  • 1742 - Centigrade scale
  • 1854 - Absolute zero, Kelvin scale

103
Measurement Theory Objectives
  • Measurement theory is the scientific basis for
    all types of measurement. It is used to determine
    formally
  • When we have really defined a measure
  • Which statements involving measurement are
    meaningful
  • What the appropriate scale type is
  • What types of statistical operations can be
    applied to measurement data

104
Measurement Theory Key Components
  • Empirical relation system
  • the relations which are observed on entities in
    the real world which characterise our
    understanding of the attribute in question,
    e.g. Fred taller than Joe (for height of
    people)
  • Representation condition
  • real world entities are mapped to number (the
    measurement mapping) in such a way that all
    empirical relations are preserved in numerical
    relations and no new relations are created e.g.
    M(Fred) gt M(Joe) precisely when Fred is taller
    than Joe
  • Uniqueness Theorem
  • Which different mappings satisfy the
    representation condition, e.g. we can measure
    height in inches, feet, centimetres, etc but all
    such mappings are related in a special way.

105
Representation Condition
Real World
Number System
M
Joe
Fred
63
72
Joe taller than Fred
M(Joe) gt M(Fred)
Empirical relation
Numerical relation
preserved under M as
106
Meaningfulness in Measurement
  • Some statements involving measurement appear more
    meaningful than others
  • Fred is twice as tall as Jane
  • The temperature in Tokyo today is twice that in
    London
  • The difference in temperature between Tokyo and
    London today is twice what it was yesterday

Formally a statement involving measurement
is meaningful if its truth value is invariant
of transformations of allowable scales
107
Measurement Scale Types
  • Some measures seem to be of a different type to
    others, depending on what kind of statements are
    meaningful. The 5 most important scale types of
    measurement are
  • Nominal
  • Ordinal
  • Interval
  • Ratio
  • Absolute

Increasing order of sophistication
108
Nominal Scale Measurement
  • Simplest possible measurement
  • Empirical relation system consists only of
    different classes no notion of ordering.
  • Any distinct numbering of the classes is an
    acceptable measure (could even use symbols rather
    than numbers), but the size of the numbers have
    no meaning for the measure

109
Ordinal Scale Measurement
  • In addition to classifying, the classes are also
    ordered with respect to the attribute
  • Any mapping that preserves the ordering (i.e. any
    monotonic function) is acceptable
  • The numbers represent ranking only, so addition
    and subtraction (and other arithmetic operations)
    have no meaning

110
Interval Scale Measurement
  • Powerful, but rare in practice
  • Distances between entities matters, but not
    ratios
  • Mapping must preserve order and intervals
  • Examples
  • Timing of events occurrence, e.g. could measure
    these in units of years, days, hours etc, all
    relative to different fixed events. Thus it is
    meaningless to say Project X started twice as
    early as project Y, but meaningful to say the
    time between project X starting and now is twice
    the time between project Y starting and now
  • Air Temperature measured on Fahrenheit or
    Centigrade scale

111
Ratio Scale Measurement
  • Common in physical sciences. Most useful scale of
    measurement
  • Ordering, distance between entities, ratios
  • Zero element (representing total lack of the
    attribute)
  • Numbers start at zero and increase at equal
    intervals (units)
  • All arithmetic can be meaningfully applied

112
Absolute Scale Measurement
  • Absolute scale measurement is just counting
  • The attribute must always be of the form of
    number of occurrences of x in the entity
  • number of failures observed during integration
    testing
  • number of students in this class
  • Only one possible measurement mapping (the actual
    count)
  • All arithmetic is meaningful

113
Problems of measuring of program complexity
  • Attribute is complexity of programs
  • Let R be empirical relation more complex than

xRy but neither xRz nor zRy
  • No real-valued measure of complexity is
    possible

114
Validation of Measures
  • Validation of a software measure is the process
    of ensuring that the measure is a proper
    numerical characterisation of the claimed
    attribute
  • Example
  • A valid measure of length of programs must not
    contradict any intuitive notion about program
    length
  • If program P2 is bigger than P1 then m(P2) gt
    m(P1)
  • If m(P1) 7 and m(P2) 9 then if P1 and P2
    are concatenated then m(P1P2) must equal
    m(P1)m(P2) 16
  • A stricter criterion is to demonstrate that the
    measure is itself part of valid prediction system

115
Validation of Prediction Systems
  • Validation of a prediction system, in a given
    environment, is the process of establishing the
    accuracy of the predictions made by empirical
    means
  • i.e. by comparing predictions against known data
    points
  • Methods
  • Experimentation
  • Actual use
  • Tools
  • Statistics
  • Probability

116
Scale Types Summary
Scale Types
Characteristics
Nominal Ordinal Interval Ratio Absolute
Entities are classified. No arithmetic
meaningful. Entities are classified and ordered.
Cannot use or -. Entities classified, ordered,
and differences between them understood
(units). No zero, but can use ordinary
arithmetic on intervals. Zeros, units, ratios
between entities. All arithmetic. Counting only
one possible measure. All arithmetic.
117
Meaningfulness and Statistics
  • The scale type of a measure affects what
    operations it is meaningful to perform on the
    data
  • Many statistical analyses use arithmetic
    operators
  • These techniques cannot be used on certain data -
    particularly nominal and ordinal measures

118
Example The Mean
  • Suppose we have a set of values a1,a2,...,an
    and wish to compute the average
  • The mean is
  • The mean is not a meaningful average for a set of
    ordinal scale data

119
Alternative Measures of Average
Median The midpoint of the data when it
is arranged in increasing order. It divides the
data into two equal parts
Suitable for ordinal data. Not suitable for
nominal data since it relies on order having
meaning.
Mode The commonest value
Suitable for nominal data
120
Summary of Meaningful Statistics
Scale Type
Average
Spread
Nominal Ordinal Interval Ratio Absolute
Mode Median Arithmetic mean Geometric mean Any
Frequency Percentile Standard deviation Coefficien
t of variation Any
121
Non-Parametric Techniques
  • Most software measures cannot be assumed to be
    normally distributed. This restricts the kind of
    analytical techniques we can apply.
  • Hence we use non-parametric techniques
  • Pie charts
  • Bar graphs
  • Scatter plots
  • Box plots

122
Box Plots
  • Graphical representation of the spread of data.
  • Consists of a box with tails drawn relative to a
    scale.
  • Constructing the box plot
  • Arrange data in increasing order
  • The box is defined by the median, upper quartile
    (u) and lower quartile (l) of the data. Box
    length b is u ? l
  • Upper tail is u1.5b, lower tail is l ? 1.5b
  • Mark any data items outside upper or lower tail
    (outliers)
  • If necessary truncate tails (usually at 0) to
    avoid meaningless concepts like negative lines of
    code

x
outlier
123
Box Plots Examples
124
Scatterplots
  • Scatterplots are used to represent data for which
    two measures are given for each entity
  • Two dimensional plot where each axis represents
    one measure and each entity is plotted as a point
    in the 2-D plane

125
Example Scatterplot Length vs Effort
126
Determining Relationships
non-linear fit
linear fit
outliers?
127
Causes of Outliers
  • There may be many causes of outliers, some
    acceptable and others not. Further investigation
    is needed to determine the cause
  • Example A long module with few errors may be due
    to
  • the code being of high quality
  • the module being especially simple
  • reuse of code
  • poor testing
  • Only the last requires action, although if it is
    the first it would be useful to examine further
    explanatory factors so that the good lessons can
    be learnt (was it use of a special tool or
    method, was it just because of good people or
    management, or was it just luck?)

128
Control Charts
  • Help you to see when your data are within
    acceptable bounds
  • By watching the data trends over time, you can
    decide whether to take action to prevent problems
    before they occur.
  • Calculate the mean and standard deviation of the
    data, and then two control limits.

129
Control Chart Example
4.0
3.5
Preparation hours per hour of inspection
3.0
2.5
2.0
1.5
Mean
1.0
0.5
0
1
2
3
4
5
6
7
Components
130
Lesson 4 Summary
  • Measurement theory enables us to determine when a
    measure is properly defined and what its scale
    type is
  • The scale type for a measure determines
  • Which statements about the measure are meaningful
  • Which statistical operations can be applied to
    the data
  • Most software metrics data comes from a
    non-normal distribution. This means that we need
    to use non-parametric analysis techniques
  • Pie charts, bar graphs, scatterplots, and box
    plots
  • Scatterplots and box plots are particularly
    useful for outlier analysis
  • Finding outliers is a good starting point for
    software quality control

131
LESSON 5 EMPIRICAL RESULTS
132
Lesson 5 Objectives
  • To see typical metrics from a major system
  • To understand how these metrics cast doubt on
    common software engineering assumptions
  • To understand from practical examples both the
    benefits and limitations of software metrics for
    quality control and assurance
  • To learn how measurement is used to evaluate
    technologies in software engineering
  • To appreciate how little is really known about
    what really works in software engineering

133
Case study Basic data
  • Major switching system software
  • Modules randomly selected from those that were
    new or modified in each release
  • Module is typically 2,000 LOC
  • Only distinct faults that were fixed are conted
  • Numerous metrics for each module

134
Hypotheses tested
  • Hypotheses relating to Pareto principle of
    distribution of faults and failures
  • Hypotheses relating to the use of early fault
    data to predict later fault and failure data
  • Hypotheses about metrics for fault prediction
  • Benchmarking hypotheses

135
Hypothesis 1a a small number of modules contain
most of the faults discovered during testing
100
80
60
of Faults
40
20
0
30
60
90
of Modules
136
Hypothesis 1b
  • If a small number of modules contain most of the
    faults discovered during pre-release testing then
    this is simply because those modules constitute
    most of the code size.
  • For release n, the 20 of the modules which
    account for 60 of the faults (discussed in
    hypothesis 1a) actually make up just 30 of the
    system size. The result for release n1 was
    almost identical.

137
Hypothesis 2a a small number of modules contain
most of the operational faults?
100
80
60
of Failures
40
20
0
10
100
of Modules
138
Hypothesis 2b
  • if a small number of modules contain most of the
    operational faults then this is simply because
    those modules constitute most of the code size.
  • No very strong evidence in favour of a converse
    hypothesis
  • most operational faults are caused by faults in a
    small proportion of the code
  • For release n, 100 of operational faults
    contained in modules that make up just 12 of
    entire system size. For release n1, 80 of
    operational faults contained in modules that make
    up 10 of the entire system size.

139
Higher incidence of faults in function testing
implies higher incidence of faults in system
testing?
100
80
60
ST
FT
40
20
0
15
30
45
60
75
90
of Modules
140
Hypothesis 4Higher incidence of faults
pre-release implies higher incidence of faults
post-release?
  • At the module level
  • This hypothesis underlies the wide acceptance of
    the fault-density measure

141
Pre-release vs post-release faults
Modules fault prone pre-release are NOT
fault-prone post-release - demolishes most
defect prediction models
142
Size metrics good predictors of fault and failure
prone modules?
  • Hypothesis 5a Smaller modules are less likely to
    be failure prone than larger ones
  • Hypothesis 5b Size metrics are good predictors
    of. number of pre-release faults in a module
  • Hypothesis 5c Size metrics are good predictors
    of number of post-release faults in a module
  • Hypothesis 5d Size metrics are good predictors
    of a modules (pre-release) fault-density
  • Hypothesis 5e Size metrics are good predictors
    of a modules (post-release) fault-density

143
Plotting faults against size
Correlation but poor prediction
Faults
Lines of code
144
Cyclomatic complexity against pre-and
post-release faults
Cyclomatic complexity no better at prediction
than KLOC (for either pre- or post-release)
145
Defect density Vs size
35
Size is no indicator of defect density (this
demolishes many software engineering assumptions)
30
25
Defects per KLOC
20
15
10
5
0
0
2000
4000
6000
8000
10000
Module size (KLOC)
146
Complexity metrics vs simple size metrics
  • Are complexity metrics better predictors of fault
    and failure-prone modules than simple size
    metrics Not really, but they are available
    earlier
  • Results of hypothesis 4 are devastating for
    metrics validation
  • A valid metric is implicitly a very bad
    predictor of what it is supposed to be predicting
  • However
  • complexity metrics can help to identify modules
    likely to be fault-prone pre-release at a very
    early stage (metrics like SigFF are available
    long before LOC)
  • complexity metrics may be good indicators of
    maintainability

147
Benchmarking hypotheses
  • Do software systems produced in similar
    environments have broadly similar fault densities
    at similar testing and operational phases?

148
Case study conclusions
  • Pareto principle confirmed, but normal
    explanations are wrong
  • Complexity metrics not significantly better
    than simple size measures
  • Modules which are especially fault-prone
    pre-release are not especially fault-prone
    post-release this result is very damaging to
    much software metrics work
  • Clearly no causal link between size and defect
    density
  • Crucial explanatory variables missing testing
    effort and operational usage - incorporated in
    BBNs

149
Evaluating Software Engineering Technologies
through Measurement
150
The Uncertainty of Reliability Achievement methods
  • Software engineering is dominated by
    revolutionary methods that are supposed to solve
    the software crisis
  • Most methods focus on fault avoidance
  • Proponents of methods claim theirs is best
  • Adopting a new method can require a massive
    overhead with uncertain benefits
  • Potential users have to rely on what the experts
    say

151
Actual Promotional Claims for Formal Methods
What are we to make of such claims?
152
The Virtues of Cleanroom
  • ... industrial programming teams can produce
    software with unprecedented quality. Instead of
    coding in 50 errors per thousand lines of code
    and removing 90 by debugging to leave 5 errors
    per thousand lines, programmers using functional
    verification can produce code that has never been
    executed with less than 5 errors per thousand
    lines and remove nearly all of them in
    statistical testing.
  • Mills H, Dyer M, Linger R, Cleanroom software
    engineering, IEEE Software, Sept 1987, 19-25

153
The Virtues of Verification (in Cleanroom)
  • If a program looks hard to verify, it is the
    program that should be revised not the
    verification. The result is high productivity in
    producing software that requires little or no
    debugging.
  • Mills H, Dyer M, Linger R, Cleanroom software
    engineering, IEEE Software, Sept 1987, 19-25

154
Use of Measurement in Evaluating Methods
  • Measurement is the only truly convincing means of
    establishing the efficacy of a method/tool/techniq
    ue
  • Quantitative claims must be supported by
    empirical evidence

We cannot rely on anecdotal evidence. There is
simply too much at stake.
155
Weinberg-Schulman Experiment
Completion time
Program size
Data space used
Program clarity
User-friendly output
Completion time
1
4
4
5
3
Program size
2-3
1
2
3
5
5
2
1
4
4
Data space used
Program clarity
1-2
4
3
3
2
User-friendly output
1-2
2-3
5
5
1
Ref Weinberg GM and Schulman EL, Goals and
performance in computer programming,
Human Factors 16(1), 1974, 70-77
156
Empirical Evidence About Software Engineering
Methods
  • Limited support for n-version programming
  • Little public evidence to support claims made for
    formal methods or OOD
  • Conflicting evidence on CASE
  • No conclusive evidence even to support structured
    programming
  • Inspection techniques are cost-effective (but
    ill-defined)

We know almost nothing about which (if
any) software engineering methods really work
157
The Case of Flowcharts vs Pseudocode (1)
  • ... flowcharts are merely a redundant
    presentation of the information contained in the
    programming statements
  • Schneiderman et al, Experimental investigations
    of the usability of detailed flowcharts in
    programming, Comm ACM, June 1977, 861-881
  • led to flowcharts being shunned as a means of
    program or algorithm documentation
  • ... flowcharts should be avoided as a form of
    program documentation
  • J Martin and C McClure, Diagramming Techniques
    for Analysts and Programmers, Prentice-Hall, 1985

158
The Case of Flowcharts vs Pseudocode (2)
  • ... these experiments were flawed in method
    and/or used unstructured flowcharts
  • ... significantly less time is required to
    comprehend algorithms presented as flowcharts
  • DA Scanlan, Structured flowcharts outperform
    pseudocode an experimental comparison, IEEE
    Software, Sept 1989, 28-36

159
The Evidence for Structured Programming
  • The precepts of structured programming are
    compelling, yet the empirical evidence is
    equivocal
  • I Vessey and R Webber, Research on structured
    programming an empiricists evaluation, IEEE
    Trans Software Eng, 10, July 1984, 397-407

It is hard to known which claims we can believe
160
The Virtues of Structured Programming
  • When a program was claimed to be 90 done with
    solid top-down structured programming, it would
    take only 10 more effort to complete it (instead
    of another 90).
  • Mills H, Structured programming retrospect and
    prospect, IEEE Software, 3(6), Nov 1986, 55-66

161
Management Before Technology
  • Results of SQEs extensive survey were summarised
    as
  • Best projects do not necessarily have state of
    the art methodologies or extensive automation and
    tooling. They do rely on basic principles such as
    strong team work, project communication, and
    project controls. Good organization appears to be
    far more of a critical success factor than
    technology or methodology.
  • Hetzel B, Making Software Measurement Work,
    QED, 1993

162
Formal Methods Rewarding Quantified Success
  • The Queens award for technological achievement
    1990 to INMOS and Oxford University PRG
  • Her majesty the Queen has been graciously
    pleased to approve the Prime Ministers
    recommendation that the award should be conferred
    this year ... for the development of formal
    methods in the specification and design of
    microprocessors ... The use of formal methods
    has enabled development time to be reduced by 12
    months
  • The 1991 award went to PRG and IBM Hursley for
    the use of formal methods (Z) on CICS.

163
IBM/PRG Project Use of Z in CICS
  • Many measurements of the process of developing
    CICS/ESA V3.1 were conducted by IBM
  • Costs of development reduced by almost 5.5M
    (8)
  • Significant decreases in product failure rate
    claimed
  • The moral of this tale is that formal methods
    can not only improve quality, but also the
    timeliness and cost of producing state-of-the-art
    products
  • Jones G, Queens Award for Technology, e-mail
    broadcast. Oxford University PRG, 1992

But the quantitative evidence is not in the
public domain
164
CICS study problems found during development
cycle
Z used
Problems
non Z
per
KLOC
Z
Pld Cld Mld Ut
Fv St Ca
165
Comprehensibility of Formal specifications
  • After a weeks training in formal specification,
    engineers can use it in their work
  • ConForm project summary , European Focus, Issue
    8, 1997
  • Use of a formal method is no longer an
    adventure it is becoming routine
  • FM99 World Congress of Formal Methods,
    Publicity Material 1998

166
Difficulty of understanding Z
Number of students
Number of correct responses
167
Experiment to assess effect of structuring Z on
comprehension
  • 65 students (who had completed extensive Z
    course). Blocking applied to groups
  • Specification A monolithic 121 lines mostly in
    one Z schema.
  • Specification B 6 main schemas each approx 20
    lines. Total spec 159 lines
  • Specification C 18 small schemas. Total spec 165
    lines

168
Comparisons of scores for the different
specifications
60
50
score out of 60
40
A monolithic
B 6 schemas
30
C small schemas
20
10
0
0
5
10
15
20
25
student id
169
Formal Methods for Safety Critical Systems
  • Wide consensus that formal methods must be used
  • Formal methods mandatory in Def Stan 00-55
  • These mathematical approaches provide us with
    the best available approach to the development of
    high-integrity systems.
  • McDermid JA, Safety critical systems a
    vignette, IEE Software Eng J, 8(1), 2-3, 1993

170
SMARTIE Formal Methods Study CDIS Air Traffic
Control System
  • Best quantitative evidence yet to support FM
  • Mixture of formally (VDM, CCS) and informally
    developed modules.
  • The techniques used resulted in extraordinarily
    high levels of reliability (0.81 failures per
    KLOC).
  • Little difference in total number of pre-delivery
    faults for formal and informal methods (though
    unit testing revealed fewer errors in modules
    developed using formal techniques), but clear
    difference in the post-delivery failures.

171
CDIS fault report form
172
Relative sizes and changes reported for each
design type in delivered code
Design Type
Total Lines
Number of
Code
Number
Total
Percent
of Delivered
Fault
Changes
of
Number
Delivered
Code
Report-
per
Modules
of
Modules
generated
KLOC
Having
Delivered
Changed
Code
This
Modules
Changes in
Design
Changed
Delivered
Type
Code
FSM
19064
260
13.6
67
52
78
VDM
61061
1539
25.2
352
284
81
VDM/CCS
22201
202
9.1
82
57
70
Formal
102326
2001
19.6
501
393
78
Informal
78278
1644
21.0
469
335
71
173
Code changes by design type for modules requiring
many changes
Design Type
Total
Number
Percent of
Number
Percent
Number
of
Modules
of
of
of
Modules
Changed
Modules
Modules
Modules
with Over
with Over
Changed
Changed
5 Changes
10
Per
Changes
Module
Per
Module
FSM
58
11
16
8
12
VDM
284
89
25
35
19
VDM/CCS
58
11
13
3
4
Formal
400
111
2
Write a Comment
User Comments (0)
About PowerShow.com