Prioritizing Test Cases for Regression Testing - PowerPoint PPT Presentation

About This Presentation

Title:

Prioritizing Test Cases for Regression Testing

Description:

1) techniques that order test cases based on their total coverage of code components, ... After complete coverage has been achieved the remaining test cases are ... – PowerPoint PPT presentation

Number of Views:2254

Avg rating:3.0/5.0

Slides: 48

Provided by: martinl

Learn more at: http://www.cs.ndsu.nodak.edu

Category:

more less

Transcript and Presenter's Notes

Title: Prioritizing Test Cases for Regression Testing

1
Prioritizing Test Cases for Regression Testing

Article By Rothermel, et al.
Presentation by
Martin, Otto, and Prashanth

Test case prioritization techniques - schedule
test cases for execution in an order that
attempts to increase their effectiveness at
meeting some performance goal.
One goal is the rate of fault detection - a
measure of how quickly faults are detected within
the testing process
An improved rate of fault detection during
testing can provide faster feedback on the system
under test and let software engineers begin
correcting faults earlier than might otherwise be
possible.
One application of prioritization techniques
involves regression testing

This paper describes several techniques for using
test execution information to prioritize test
cases for regression testing, including
1) techniques that order test cases based on
their total coverage of code components,
2) techniques that order test cases based on
their coverage of code components not previously
covered, and
3) techniques that order test cases based on
their estimated ability to reveal faults in the
code components that they cover.

When the time required to re-execute an entire
test suite is short, test case prioritization may
not be cost-effective-it may be sufficient simply
to schedule test cases in any order.
When the time required to execute an entire test
suite is sufficiently long, however, test-case
prioritization may be beneficial because, in this
case, meeting testing goals earlier can yield
meaningful benefits.
In general test case prioritization, given
program P and test suite T, we prioritize the
test cases in T with the intent of finding an
ordering of test cases that will be useful over a
succession of subsequent modified versions of P.
In the case of regression testing, prioritization
techniques can use information gathered in
previous runs of existing test cases to help
prioritize the test cases for subsequent runs.

This paper considers 9 different test case
prioritization techniques.
The first three techniques serve as experimental
controls
The last six techniques represent heuristics that
could be implemented using software tools
A source of motivation for these approaches is
the conjecture that the availability of test
execution data can be an asset.
This assumes that past test execution data can be
used to predict, with sufficient accuracy,
subsequent execution behavior.

6
(No Transcript)
7

Definition 1. The Test Case Prioritization
Problem
Given T, a test suite, PT, the set of
permutations of T, and f, a function from PT to
the real numbers.
PT represents the set of all possible
prioritizations (orderings) of T
f is a function that, applied to any such
ordering, yields an award value for that
ordering.

A challenge care must be taken to keep the cost
of performing the prioritization from excessively
delaying the very regression testing activities
it is intended to facilitate.

M3 Optimal prioritization.
Given program P and a set of known faults for P,
if we can determine, for test suite T, which test
cases in T expose which faults in P, then we can
determine an optimal ordering of the test cases
in T for maximizing T's rate of fault detection
for that set of faults.
This is not a practical technique, as it requires
a priori knowledge of the existence of faults and
of which test cases expose which faults.
However, by using this technique in the empirical
studies, we can gain insight into the success of
other practical heuristics, by comparing their
solutions to optimal solutions.

M4 Total statement coverage prioritization.
By instrumenting a program, we can determine, for
any test case, which statements in that program
were exercised (covered) by that test case.
We can then prioritize test cases in terms of the
total number of statements they cover by counting
the number of statements covered by each test
case and then sorting the test cases in
descending order of that number.

M5 Additional statement coverage prioritization.
Total statement coverage prioritization schedules
test cases in the order of total coverage
achieved however, having executed a test case
and covered certain statements, more may be
gained in subsequent testing by executing
statements that have not yet been covered.
Additional statement coverage prioritization
iteratively selects a test case that yields the
greatest statement coverage, then adjusts the
coverage information on all remaining test cases
to indicate their coverage of statements not yet
covered and repeats this process until all
statements covered by at least one test case.
We may reach a point where each statement has
been covered by at least one test case, and the
remaining unprioritized test cases cannot add
additional statement coverage. We could order
these remaining test cases using any
prioritization technique.

12
(No Transcript)
13

M6 Total branch coverage prioritization.
Total branch coverage prioritization is the same
as total statement coverage prioritization,
except that it uses test coverage measured in
terms of program branches rather than statements.
In this context, we define branch coverage as
coverage of each possible overall outcome of a
(possibly compound) condition in a predicate.
Thus, for example, each if or while statement
must be exercised such that it evaluates at least
once to true and at least once to false.

M7 Additional branch coverage prioritization.
Additional branch coverage prioritization is the
same as additional statement coverage
prioritization, except that it uses test coverage
measured in terms of program branches rather than
statements.
After complete coverage has been achieved the
remaining test cases are prioritized by resetting
coverage vectors to their initial values and
reapplying additional branch coverage
prioritization to the remaining test cases.

M8 Total fault-exposing-potential (FEP)
prioritization.
Some faults are more easily exposed than other
faults, and some test cases are more adept at
revealing particular faults than other test
cases.
The ability of a test case to expose a fault-that
test case's fault exposing potential
(FEP)-depends not only on whether the test case
covers (executes) a faulty statement, but also on
the probability that a fault in that statement
will cause a failure for that test case
Three probabilities that could be used in
determining FEP
1) the probability that a statement s is executed
(execution probability),
2) the probability that a change in s can cause a
change in program state (infection probability),
and
3) the probability that a change in state
propagates to output (propagation probability).

This paper adopts an approach that uses mutation
analysis, to produce a combined estimate of
propagation-and-infection that does not
incorporate independent execution probabilities.
Mutation analysis creates a large number of
faulty versions (mutants) of a program by
altering program statements, and uses these to
assess the quality of test suites by measuring
whether those test suites can detect those faults
(kill those mutants).
Given program P and test suite T, we first create
a set of mutants N n1 n2 . . . nm for P,
noting which statement sj in P contains each
mutant. Next, for each test case ti in T, we
execute each mutant version nk of P on ti, noting
whether ti kills that mutant.
Having collected this information for every test
case and mutant, we consider each test case ti
and each statement sj in P, and calculate the
fault-exposing potential FEP(s, t) of ti on sj as
the ratio of mutants of sj killed by ti to the
total number of mutants of sj.

To perform total FEP prioritization, given these
FEP(s t) values, we next calculate, for each
test case ti in T, an award value, by summing the
FEP(sj ti) values for all statements sj in P.
Given these award values, we then prioritize test
cases by sorting them in order of descending
award value.

18
(No Transcript)
19

M9 Additional fault-exposing-potential (FEP)
prioritization.
This lets us account for the fact that additional
executions of a statement may be less valuable
than initial executions.
We require a mechanism for measuring the value of
an execution of a statement, that can be related
to FEP values.
For this, we use the term confidence. We say that
the confidence in statement s, C(s), is an
estimate of the probability that s is correct.
If we execute a test case t that exercises s and
does not reveal a fault in s, C(s) should
increase.

20
(No Transcript)
21

Research Questions
Can test case prioritization improve the rate of
fault detection in test suites?
How do the various test case prioritization
techniques discussed earlier compare to one
another in terms of effects on rate of fault
detection?
Effectiveness Measures
Use a weighted Average of the Percentage of
Faults Detected (APFD)
Ranges from 0..100
Higher numbers means faster detection
Problems with APFD
Doesnt measure cost of prioritization
Cost is normally amortized because test suites
are created after the release of a version of the
software

22
Effectiveness Example
23

Programs used
Aristotle program analysis system for test
coverage and control graph information
Proteum mutation system to obtain mutation
scores.
Used 8 C programs as subjects
First 7 were created at Siemens, the eighth is a
European Space Agency program

24
(No Transcript)
25

Siemens Programs - Description
7 programs used by Siemens in a study that
observed the fault detecting effectiveness of
coverage criteria
Created faulty versions of these programs by
manual seeding them with single errors creating
the number of versions column
Using single line faults only allows researchers
to determine whether a test case discovers the
error or not
For each of the seven programs, a test case suite
was created by Siemens. First via a black box
method, they then completed the suite using white
box testing, so that each executable statement,
edge, and definition use pair was exercised by
at least 30 test cases.
Kept faulty programs whose errors were detectable
by between 3 and 350 test cases
Test suites were created by the researchers by
random selection until a branch coverage adequate
test suite was created
Proteum was used to create mutants of the seven
programs

Space Program Description
33 versions of space with only one fault in each
were created by the ESA, 2 more were created by
the research team
Initial pool of 10 000 test cases were obtained
from Vokolos and Frankl
Used these as a base and added cases until each
statement and edge was exercised by at least 30
test cases
Created a branch coverage adequate test suite in
the same way as the Siemens program
Also created mutants via Proteum

Empirical Studies and Results
4 different studies using the 8 programs
Siemens programs with APFD measured relative to
Siemens faults
Siemens programs with APFD measured relative to
mutants
Space with APFD measured relative to actual
faults
Space with APFD measure relative to mutants

Siemens programs with APFD measured relative to
Siemens faults Study Format
M2 to M9 were applied to each of the 1000 test
suites, resulting in 8000 prioritized test suites
The original 1000 were used as M1
Calculated the APFD relative to the faults
provided by the program

29
Example boxplot
30

Study 1 - Overall observations
M3 is markedly better than all of the others (as
expected)
The test case prioritization techniques offered
appear to have some improvement, but more
statistics needed to be done to confirm
Upon completion of these statistics, more results
were revealed
Branch based coverage did as well or better than
statement coverage
All except one indicates that total branch
coverage did as well or better than additional
branch coverage
All total statement coverage did as well or
better than additional statement coverage
In 5 of 7 programs, even randomly prioritized
test suites did better than untreated test suites

31
Example Groupings
32

Siemens programs with APFD measured relative to
mutants Study Format
Same format as the first study, 9000 test suites
used, 1000 for each prioritization technique
But rather than run those test cases on the small
subset of known errors, they were applied to
mutated programs that were created to form a
larger bed of programs to test against
Results
Additional and Total FEP prioritization
outperformed all others (except optimal)
Branch almost always outperformed statement
Total statement outperformed additional
But additional branch coverage outperformed total
branch coverage
However, in this study random did not outperform
the control

Space with APFD measured relative to Actual
Faults
M2 M9 were applied to each of the 50 test
suites, resulting in 400 test suites, plus the
original 50 resulting in 450 total test suites
Additional FEP outperformed all others, but there
was no significant difference among the rest
Also random is no better than the control

34
Study 3 Groupings
35

Space with APFD measured relative to mutants
Same technique as other space study, only using
132,163 mutant version of the software
Additional FEP outperformed all others
Branch and statement are indistinguishable
But additional coverage always outperforms its
total counterpart

36
Study 4 Groupings
37

Threats to Validity
Construct Validity You are measuring what you
say you are measuring (and not something else)
Internal Validity Ability to say that the
causal relationship is true
External Validity Ability to generalize results
across the field

Construct Validity
APFD is highly accurate, but it is not the only
method of measuring fault detection, could also
measure percentage of test suite that must be run
before all errors are found
No value to later tests that detect the same
error
FEP based calculations Other estimates may more
accurately capture the probability of a test case
finding a fault
Effectiveness is measured without cost

Internal Validity
Instrumentation bias can bias results especially
in APFD and prioritization measurement tools
Performed code revision
Also limit problems by running prioritization
algorithm on each test suite and each subject
program

External Validity
The Siemens programs are non-trivial but not
representative of real world programs. The space
program is, but is only one program
Faults in Siemens programs were seeded (not like
those in the real world)
Faults in space were found during development,
but these may differ from those found later in
the development process. Plus they are only one
set of faults found by one set of programmers
Single faults version programs are also not
representative of the real world
The test suites were created with only a single
method, other real world methods exist
These threats can only be answered by more
studies with different test suites, programs, and
errors

41
Additional Discussion And Practical Implications
42

Test case prioritization can substantially
improve rate of fault detection of test suites.
Additional FEP prioritization techniques do not
always justify the additional expenses incurred,
as is gathered from cases where specific coverage
based techniques outperformed them and also in
cases where the total gain in APFD, when the
additional FEP techniques did perform the best,
was not large enough.
Branch-coverage-based techniques almost always
performed as well if not better than
statement-coverage-based techniques. Thus if the
two techniques incur similar costs,
branch-coverage-techniques are advocated.

Total statement and branch coverage techniques
perform almost at par with the additional branch
and statement coverage techniques, entitling its
use due to its lower complexity.
However, this does not apply for space (Study 4)
program where the additional branch and statement
coverage techniques outperformed the total
statement and branch coverage techniques by a
huge margin.
Randomly prioritized test suites typically
outperform untreated test suites.

44
Conclusion
45

Any one of the prioritization techniques offer
some amount of improved fault detection
capabilities.
These studies are of interest only to research
groups, due to the high expense that they incur.
However, code coverage based techniques have
immediate practical implications.

46
Future Work
47

Additional studies to be performed using wider
range of programs, faults and test suites.
The gap between optimal prioritization and FEP
prioritization techniques is yet to be bridged.
Determining which prioritization technique is
warranted by particular types of programs and
test suites.
Other prioritization objectives have to be
investigated.
Version specific techniques
Techniques may not only be applied to regression
testing but also during the initial testing of
the software.