Test Case Filtering and Prioritization Based on Coverage of Combinations of Program Elements - PowerPoint PPT Presentation

About This Presentation
Title:

Test Case Filtering and Prioritization Based on Coverage of Combinations of Program Elements

Description:

Test case filtering is concerned with selecting from a test suite T a subset T' ... Operating on an initial population of candidate solutions or chromosomes ... – PowerPoint PPT presentation

Number of Views:301
Avg rating:3.0/5.0
Slides: 23
Provided by: wassim4
Category:

less

Transcript and Presenter's Notes

Title: Test Case Filtering and Prioritization Based on Coverage of Combinations of Program Elements


1
Test Case Filtering and Prioritization Based on
Coverage of Combinations of Program Elements
  • Wes Masri and Marwa El-Ghali
  • American Univ. of BeirutECE Department
  • Beirut, Lebanon
  • wm13_at_aub.edu.lb

2
Test Case Filtering
  • Test case filtering is concerned with selecting
    from a test suite T a subset T that is capable
    of revealing most of the defects revealed by T
  • Approach T to cover all elements covered by T

3
Test Case Filtering What to Cover?
  • Existing techniques cover singular program
    elements of varying granularity
  • methods, statements, branches, def-use pairs,
    slice pairs and information flow pairs
  • Previous studies have shown that increasing the
    granularity leads to revealing more defects at
    the expense of larger subsets

4
Test Case Filtering
  • This work explores covering suspicious
    combinations of simple program elements
  • The number of possible combinations is
    exponential w.r.t. the number of singular
    elements ? use an approximation algorithm
  • We use a genetic algorithm

5
Test Case Filtering Conjectures
  • Combinations of program elements are more likely
    to characterize complex failures
  • The percentage of failing tests is typically much
    smaller than that of the passing tests
  • Each defect causes a small number of tests to
    fail
  • Given groups of (structurally) similar tests,
    smaller ones are more likely to be
    failure-inducing than larger ones

6
Test Case Filtering Steps
  1. Given a test suite T, generate execution profiles
    of simple program elements (statements, branches,
    and def-use pairs)
  2. Choose a threshold Mfail for the maximum number
    of tests that could fail due to a single defect
  3. Use the genetic algorithm to generate C, a set
    of combinations of simple program elements that
    were covered by less than Mfail tests ?
    suspicious combinations
  4. Use a greedy algorithm to extract T, the
    smallest subset of T that covers all the
    combinations in C

7
Genetic Algorithm
  • A genetic algorithm solves a problem by
  • Operating on an initial population of candidate
    solutions or chromosomes
  • Evaluating their quality using a fitness function
  • Uses transformation to create new generations
    with improved quality
  • Ultimately evolving to a single solution

8
Fitness Function
  • We use the following equation
  • fitness(combination) 1 - tests
  • where tests is the percentage of test cases
    that exercised the combination
  • The smaller the percentage the higher the
    fitness
  • The aim is to end up with a manageable set of
    combinations in which each combination occurred
    in at most Mfail tests

9
Initial Population Generation
  • Generated from union of all execution profiles
  • Size 50 in our implementation
  • 0?0 always, 1?1 with small probability P

10
Transformation Operator
  • Combines two parent chromosomes to produce a
    child
  • Passes down properties from each, favoring the
    parent with the higher fitness.
  • Goal child to have a better fitness than its
    parents
  • Replace the parent with the worse fitness with
    the child

11
Solution Set
  • The obtained solution set contains all the
    encountered combinations with high-enough fitness
    values ? suspicious combinations

12
Experimental Work
  • Our subject programs included
  • The JTidy HTML syntax checker and pretty printer
    1000 tests 8 defects 47 failures
  • The NanoXML XML parser 140 tests 4 defects 20
    failures

13
Experimental Work
  • We profiled the following program elements
  • basic-blocks or statements (BB)
  • basic-block edges or branches (BBE)
  • def-use pairs (DUP)
  • Next we applied the genetic algorithm to generate
    the following
  • a pool of BBcomb
  • a pool of BBEcomb
  • a pool of DUPcomb
  • a pool of ALLcomb (combinations of BBs, BBEs and
    DUPs)
  • The values of Mfail we chose for JTidy, and
    NanoXML were 100, and 20, respectively

14
Profile Type Tests Selected Defects Revealed
BB 5.3 55.0
BBcomb 9.6 65.6
BBE 6.5 78.7
BBEcomb 10.2 87.5
DUP 11.7 81.2
DUPcomb 14.1 87.5
ALL 12.4 94.8
ALLcomb 14.1 100.0
SliceP 26.7 100.0
  • JTidy results
  • In the case of ALLcomb, 14.1 of the original
    test suite was needed to exercise all of the
    combinations exercised by the original test
    suite, and these tests revealed all the defects
    revealed by the original test suite
  • In previous work we showed that coverage of slice
    pairs (SliceP) performed better than coverage of
    BB, BBE and DUP this is why we are including the
    results of SliceP here for comparison.

15
  • Above Figure compares the various techniques to
    random sampling
  • All variations performed better than random
    sampling
  • BBcomb revealed 10.6 more defects than BB but
    selected 4.2 more tests
  • BBEcomb revealed 8.8 more defects than BBE but
    selected 3.7 more tests
  • DUPcomb revealed 6.3 more defects than DUP but
    selected 2.4 more tests
  • ALLcomb performed better than SliceP, since it
    revealed all defects, as SliceP did, but selected
    12.6 less tests

16
Experimental Work
  • Concerning BBcomb , BBEcomb , DUPcomb, the
    additional cost due to the selection of more
    tests might not be well justified, since the rate
    of improvement is no better than it is for random
    sampling
  • Concerning ALLcomb, not only did it perform
    better than SliceP, but it is considerably less
    costly
  • It took 90 seconds on average per test to
    generate its profiles (i.e., BBs, BBEs and
    DUPs), whereas it took 1200 seconds per test to
    generate the SliceP profiles (1 day vs. 2 weeks)

17
  • NanoXML observations
  • BB, BBE, DUP, and ALL did not perform any better
    than random sampling, whereas BBcomb, BBEcomb,
    DUPcomb, and ALLcomb performed noticeably better
  • BBcomb, BBEcomb, DUPcomb, and ALLcomb revealed
    all the defects, but at relatively high cost,
    since over 50 tests were needed to be executed
  • The cost of running the genetic algorithm and the
    greedy selection algorithm has to be factored in
    when comparing our techniques to others

18
Test Case Prioritization
  • Test case prioritization aims at scheduling the
    tests in T so that the defects are revealed as
    early as possible
  • Summary of our technique
  • Prioritize combinations in terms of their
    suspiciousness
  • Then assign the priority of a given combination
    to the tests that cover it

19
Test Case Prioritization Steps
  • Identify combinations that were exercised by 1
    test assign that test priority 1, and add it to
    T
  • Identify combinations that were exercised by 2
    tests assign those tests priority 2, and add
    them to T
  • and so on until all tests are prioritized, or
    Mfail is exceeded, or all combinations were
    explored
  • Use the greedy algorithm to reduce T
  • Any remaining tests that were not prioritized
    will be scheduled to run randomly following the
    prioritized tests

20
Element
Element tests defects
BBcomb 6.75 56.25
BBEcomb 7.55 81.25
DUPcomb 12.6 87.5
ALLcomb 13.05 100.0
JTidy prioritization results when step 3
is satisfied, i.e., when all tests are
prioritized, or Mfail is exceeded, or all
combinations were explored Observation Using
BBcomb, BBEcomb, and DUPcomb not all defects were
revealed. Combinations of BB, BBE, and DUP
(ALLcomb) are needed to reveal all defects.
21
Element
Element tests defects
BBcomb 50.2 100.0
BBEcomb 50.8 100.0
DUPcomb 52.8 100.0
ALLcomb 53.5 100.0
NanoXML prioritization results Observati
on All defects were revealed using BBcomb,
BBEcomb, DUPcomb , or ALLcomb, but at a high cost
of selected tests.
22
Conclusion
  • Our techniques performed better than similar
    coverage-based techniques that consider program
    elements of the same type and that do not take
    into account their combinations
  • Will conduct a more thorough empirical study
  • Will use APFD (Average Percentage of Faults
    Detected) approach to evaluate prioritization
Write a Comment
User Comments (0)
About PowerShow.com