Title: Applying Statistical Process Control SPC in Software Presented by Dr. Saswati Bhattacharyya
1Applying Statistical Process Control (SPC) in
SoftwarePresented byDr. Saswati Bhattacharyya
2Agenda
- Quantitative Techniques
- Statistical Techniques
10/15/2009
3Introduction
- The term Statistical Process Control (SPC) is
typically used in context of manufacturing
processes (although it may also pertain to
services and other activities), to denote
statistical methods used to monitor and improve
the quality of the respective operations. - By gathering information about the various stages
of the process and performing statistical
analysis on that information, the SPC engineer is
able to take necessary action (often preventive)
to ensure that the overall process stays
in-control and to allow the product to meet all
desired specifications. - SPC involves monitoring processes, identifying
problem areas, recommending methods to reduce
variation and verifying that they work,
optimizing the process, assessing the reliability
of parts, and other analytic operations.
4Introduction
- SPC uses basic statistical quality control
methods as quality control charts (Shewhart,
Pareto, and others), capability analysis, gage
repeatability/reproducibility analysis, and
reliability analysis. - Specialized experimental methods (DOE) and other
advanced statistical techniques are often part of
global SPC systems. - Important components of effective, modern SPC
systems are real-time access to data and
facilities to document and respond to incoming QC
data on-line, efficient central QC data
warehousing, and groupware facilities allowing QC
engineers to share data and reports.
5Why Measure
- If you dont know where you are, a map wont
help. - If you dont know where you are going, any road
will do.
6Use of Measures
- Measures by definition, are raw data
- Unless this data is sorted in some way, or used
to derive information, decision making will not
be possible - Metrics are hence derived out of measures by
- using Contextual information
- Applying a formula
- Performing some calculations or computations
- Metrics are hence derived
- Some say measures and metrics have no
difference - Measures can be either base or derived we
prefer the former terminology due to its industry
usage
7Type of Measures
- Historically, industry has used two types
- Process metrics/measures
- Indicate how an activity was performed or
product delivered/developed or service
rendered - E,g. Time taken to perform a task. Effort,
Schedule, number of attempts, number of
inspections, number of reviews etc. - Product metrics/measures
- Used to indicate an attribute of what was
delivered or performed, but not how was it
done. - E,g. Defect Density in modules. Post release
defects, number of customer complaints - Product measures most likely will indicate the
way of building the product the process. - Process measures are a great way to predict
quality of product.
8Examples
- Identify whether these are process or product
metrics/measures - Schedule Variance
- Effort Variance
- Review Efficiency number of defects detected
per hour - Review effectiveness number of defects detected
in all reviews or per review - Mean Time to Enhance
- Mean Time to fix Bug
- Reliability number of times a product fails on
site - System Availability percentage up time
- Time spent in project management activities
- Number of change requests open in last 3 months
- Number of defects per LOC
- Number of defects detected by Joe Smith in an
hour - Sales in this quarter
- Number of New Customer accounts this year
- Net Cash flow in this company
9Uses and Abuses of Measures
- There are three kind of lies lies, damn lies
and statistics - Figures dont lie, liars figure
- Many abusers are either ignorant or careless
- Some others have an objective to mislead the
reader by emphasizing the data that support their
position
10An example of Misuse
- Average one measure of central tendency
- Average defects across the organizations
projects is 18 - What if there were 7 projects that had 11, 13,
14, 12, 13, 52 and 11 defects - Would 18 then seem like a typical number of
defects across the organization? - There is no objective set of criteria on what
average should be reported every time
11Yet Another
- 2 out of 3 software developers recommend Oracle
as the best ERP solution - Surveyors trick by pointing the survey where 2 of
3 mentioned this ERP - There needs to be more surveys
- Strong association between variables
- Number of hours that one studies and the marks
scored more hours means more score? - The two variables are related
12Summary
- Sometimes numbers can be deceptive
- Statistics means to be precise
- Understanding these will make you a better
consumer of Statistical information and help you
defend yourself against those who mislead!
13Putting it all together
14 - Quantitative Analysis Techniques
15Quantitative Techniques
- Quantitative Techniques of Analysis
- Typically involve grouping data or combining
data by some key - E.g., sorting in an ascending or descending order
- Depicting it in simple chart or graph
- Interpreting or understanding the data and the
context - May not be used for any decision making
- Can not be used for prediction purposes
16Data Grouping and Charting Techniques
- Frequency Distribution
- A Frequency distribution is a grouping of data
into mutually exclusive categories showing the
number of observations in each class - Example Actual Effort (in man-months) of
projects completed till date - 26, 35, 66, 15, 82, 74, 28, 35, 33, 78, 91, 51,
56, 37, 68, 48, 52, 72, 65, 57, 42, 59, 45, 62,
73, 54, 58, 63. (to organize the data into a
frequency distribution)
17Graphic Representations
- A Histogram is a graph in which the classes are
marked on the horizontal axis and the class
frequencies on the vertical axis - The class frequencies are represented by the
heights of the bars and the bars are drawn
adjacent to each other
18Graphic Representations
- A frequency polygon consists of line segments
connecting the points formed by the class
midpoint and the class frequency
19Graphic Representations
- A cumulative frequency distribution (Ogive) is
used to determine how many or what proportion of
the data values are below or above a certain value
20Bar Chart
- A bar chart can be used to depict any of the
levels of measurement (nominal, ordinal, interval
or ratio)
21Pie Charts
- A pie chart is useful for displaying a relative
frequency distribution. A circle is divided
proportionally to the relative frequency and
portions of the circle are allocated for the
different groups - Example A sample of 100 children were asked to
indicate their favorite color
22Pie Charts
- A pie chart is useful for displaying a relative
frequency distribution. A circle is divided
proportionally to the relative frequency and
portions of the circle are allocated for the
different groups
23Statistical Analysis Techniques
- SPC for Software Premises
- Software process is performed by people not
machines - The Software process is (or can be) repeatable,
but not repetitive - The act of measuring and analyzing will change
behavior potentially in dysfunctional ways - Another way to look at it
- Enumerative Studies aim is to determine how
many as opposed to why so many - Analytic Studies aim is to predict or improve
the behavior in future
24Enumerative Studies in Software
- Inspections of code modules to detect and count
existing defects - Functional or system testing to ascertain the
extent to which a product has certain qualities - Measurement of software size to determine project
status or the amount of software under
configuration control - Measurement of staff hours so that the results
can be used to bill customers or track
expenditures against budget
25Analytic Studies
- Evaluating software tools, technologies or
methods - Tracking defect discovery rates to predict
product release dates - Evaluating defect discovery profiles to identify
focal areas for process improvement, predicting
schedules, costs or operational reliability - Using control charts to stabilize and improve
software processes or to assess process capability
26Statistically Controlled
- A phenomenon will be said to be controlled when,
through the use of past experience, we can
predict , at least within limits , how the
phenomenon may be expected to vary in the future - Walter A. Shewhart, 1931
27Control Charts
- Statistical Quality Control emphasizes in-process
control with the objective of controlling the
quality of a manufacturing process or service
operation using sampling techniques. - Statistical sampling techniques are used to aid
in the manufacturing of a product to
specifications rather than attempt to inspect
quality into the product after it is manufactured - Control charts are useful for monitoring a process
28Causes of Variation
- There is variation in all parts produced by a
process. There are two sources of variation - Chance Variation (Common Cause)
- Cannot be completely eliminated unless there is a
major change in the equipment or material used in
the process - Random in nature
- Assignable Variation (Special Cause)
- Can be eliminated or reduced by investigating the
problem and finding the cause - Non-random in nature
29Example 1
- Bus travel time 25 minutes from Point A to
Point B - Each run may not exactly take 25 min
- Some may take longer and some lesser
- Snowstorm or an accident
- Chance Variation (Common Cause)
- Driver may not hit the green lights at the right
time due to his driving inefficiency - Assignable Variation (Special Cause)
30Example 2
- Chance Variation (Common Causes)
- Internal Machine friction
- Temperature influences
- Humidity influences
- Vibrations transmitted from a passing forklift
- Network clogging due to a snowstorm
- System freeze due to a power failures
- Assignable Variation (Special Cause)
- Operator who continually sets up machine
incorrectly - Hole drilled into steel due to a dull drill
- Processor speed low while requirement is for a
higher one - New programmer or inexperienced manager
- Tired programmer due to overwork
- Attrition midway on the project
- Network or link failure on particular day
- No knowledge of domain or lack of functional
knowledge
31Why are we bothered about variation
- It will change the shape, dispersion and central
tendency of the characteristic being measured - Assignable variation can be correctable and
stabilized economically
32Purpose of Quality Control Charts
- The purpose of quality-control charts is to
portray graphically when an assignable cause
enters the production system so that it can be
identified and corrected - This is accomplished by periodically selecting a
random sample from the current production
33Why Control Charts
- Control charts let you know what processes can
do, so that you can set achievable goals - They represent the Voice of the process
- Identify unusual events and point at fixable
problems and potential process improvements
34Variable and Attributes Data
- Variable Data
- Represents measurements of a continuous
phenomenon - e,.g. elapsed time, effort expanded, years of
experience, memory utilization, cost of rework - Volume, weight, height, length, efficiency,
viscosity - Attributes Data
- Represents data that can either conform or not to
conform to a discrete set of values - E.g. number of defects found, number of
defectives per unit, of projects using a formal
code inspection method
35Simple Control Charts for software
- X-Bar Chart
- nP and p-Chart
- C and U-Chart
- I-X Chart (Individual X-Chart)
- XmR Chart (Moving Range Chart)
36Types of Quality Control Charts - Variables
- The mean of the x-bar chart is designed to
control variables such as weight, length etc. The
upper control limit (UCL) and the lower control
limit (LCL) are obtained from the equation - _
_ - UCL XA2R and UCL X-A2R
-
_ - Where X is the mean of the sample means and R is
the mean of the sample ranges.
37Finding special causes by control charts
- Consider Peer Review (PR) preparation rate for 5
samples (each sample is covering 4 PRs from 4
divisions)
38Mean and Ranges
- The table below shows the mean and ranges
39Compute UCL and LCL
- Compute the Grand Mean (X-double bar) and the
average range. - Grand Mean (25.2526.7525.25)/5 26.35
- Mean Range (563)/5 5.8
- Determine the UCL and LCL for the average
preparation time - UCL 26.35.7295.8 30.58
- LCL 26.35-.7295.8 22.12
- 0.729 is the constant found from statistical
table.
40Control Charts for Means and Analysis
- Is the process under control? Any special causes
of variation? - Control Charts are typically indicative after 25
or 30 samples
41Detecting instabilities and Out of Control
- Examine control charts for instances of behavior
and patterns that show nonrandom behavior - Values falling outside the control limits and
unusual patterns within the running record
suggest that assignable causes exist
42Tests of Out of Control and Instabilities
- Test 1 A single point falls outside the 3-sigma
control limits - Test 2 At least two of three successive values
fall on the same side of , and more than two
sigma units away from, the center line - Test 3 At least four out of five successive
values fall on the same side of, and more than
one sigma units away from , the center line - Test 4 At least eight successive values fall on
the same side of the center line.
43Typical Causal Analysis
- Test 1
- Unusual Single event
- Accident snowstorm, Merger, Attrition
- Test 2
- Chance variation
- New operator or untrained operator
- Test 3
- New machine or new process/procedure
- Test 4
- Wear and tear of the machine
- These are only typical, there could be atypical
causes as well!!
44Statistics in Action An Example
- Conviction of a person who bribed some players to
loss in betting - X-bar and R-bar charts showed unusual betting
patterns and some contestants did not win as
expected - A QC Expert identified times when assignable
causes stopped and prosecutors were able to tie
this to the times of the arrest of the suspect - Canadian firm on the Japanese order in 1980s
- Three defectives shipped separately
45Types of Control Charts nP and p
- Used with discrete binomial data (number of
failures) - Likelihood of items failure unaffected by
failure of previous item in the sample - nP charts
- xi number of failures in a sample
- fixed sample size n
- average fraction non-conforming p
- Mean np
- Constant control limits np ? 3 np(1 - p)1/2
- P charts
- pi proportion of failures in a sample
- variable sample size ni
- Mean p
- Variable control limits p ? 3 p(1 - p)/ni1/2
- Control limits tighten up for larger sample sizes
and relax for smaller sample sizes
46Types of Control Charts c and u
- Used with discrete Poisson data (count of
defects/sample) - independent events (defects)
- probability proportional to area of opportunity
(sample size) - events are rare ( lt 10 possible defects)
- C charts
- ci event count
- constant area of opportunity
- average number of events per sample cavg
- Mean cavg
- Constant control limits cavg ? 3(cavg)1/2
- U charts
- ui event count per unit area of opportunity
(defects/unit size) - variable area of opportunity ai
- Mean uavg
- Variable control limits uavg ? ? 3 (uavg)1/2
- Control limits tighten up for larger sample sizes
and relax for smaller sample sizes
47Individual X Chart
- Sample of Individual Reviews
- Can show the distribution of variation in cases
of non homogeneous data
48Individual X Chart
49XmR Charts
- Used with continuous data (measurements)
- no assumptions about underlying distribution
- Appropriate for items that are not produced in
batches or when it is desirable to use all
available data - Two charts X and mR (moving Range of X)
- mRavg is used to estimate s for X as well as mR
- mRi Xi - Xi-1
- X chart mean Xavg
- X chart control limits Xavg ? 2.660 mRavg
- mR chart mean mRavg
- mR chart control limit 3.268 mRavg
50Other Statistical Techniques
- Control Charts are not enough
- Confidence intervals
- Prediction intervals
- Test of hypotheses
- ANOVA
- Probability techniques
- Regression
- Correlation Analysis
51Recent Studies
- Although software industry has made significant
progress in implementing metrics programs, large
number of them fail - Howard Rubins Rubins Systems, Inc.
- 1997 study indicated that four out of five
metrics programs fail to succeed - Success is defined as
- Measurements program that lasts for more than two
years and that impacts the business decisions
made by the organization
52Primary Reasons for Failure
- Not tied to business goals
- Irrelevant or not understood by key players
- Perceived to be unfair or resisted
- Motivated wrong behavior
- Expensive, cumbersome
- No action based on the numbers
- No sustained management sponsorship
53Success in Metrics Programs
- Are more than collecting data
- Benefit and value come from decisions taken from
data - Sometimes choosing the right metrics becomes
overwhelming due to many opportunities that exist
especially in large organizations - Goal driven measurement is a must!
- Ref GUIDEBOOK CMU/SEI-97-HB-003 by
- William A. Florac, Robert E. Park, Anita D.
Carleton
54