Statistical Debugging: A Tutorial - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Statistical Debugging: A Tutorial

Description:

Statistical Debugging: A Tutorial Steven C.H. Hoi Acknowledgement: Some s in this tutorial were borrowed from Chao Liu at UIUC. Motivations Software is full of ... – PowerPoint PPT presentation

Number of Views:136
Avg rating:3.0/5.0
Slides: 25
Provided by: eduh114
Category:

less

Transcript and Presenter's Notes

Title: Statistical Debugging: A Tutorial


1
Statistical Debugging A Tutorial
  • Steven C.H. Hoi

Acknowledgement Some slides in this tutorial
were borrowed from Chao Liu at UIUC.
2
Motivations
  • Software is full of bugs
  • Windows 2000 had about 63,000 known bugs at its
    time of release, 2 bugs per 1000 lines
  • A study by the National Institute of Standards
    and Technology showed that software faults cost
    the U.S.economy about 59.5 billion
    annuallyhttp//www.nist.gov/director/prog-ofc/rep
    ort02-3.pdf
  • Testing and debugging are laborious and expensive
  • 50 of my company employees are testers, and the
    rest spends 50 of their time testing!
  • --Bill Gates, in 1995

3
Expedite Debugging
  • Manual debugging
  • Trace the executions step-by-step.
  • Verify observations against expectations
  • Automated debugging
  • Collect runtime behaviors as the program
    executes.
  • Identify bug-relevant points by contrasting the
    correct and incorrect executions
  • Best efforts so far bug localization

4
An Example
void subline(char lin, char pat, char sub)
int i, lastm, m lastm -1 i 0
while((lini ! ENDSTR)) m amatch(lin, i,
pat, 0) if (m gt 0) putsub(lin, i, m,
sub) lastm m if ((m -1)
(m i)) fputc(lini, stdout) i
i 1 else i m
void subline(char lin, char pat, char
sub) int i, lastm, m lastm -1
i 0 while((lini ! ENDSTR)) m
amatch(lin, i, pat, 0) if ((m gt 0) (lastm
! m) ) putsub(lin, i, m, sub) lastm
m if ((m -1) (m i))
fputc(lini, stdout) i i 1 else
i m
  • Symptoms
  • 563 lines of C code
  • 130 out of 5542 test cases fail to give correct
    outputs
  • No crashes
  • Conventional debugging
  • Few hints
  • Step-by-step tracing
  • Better method
  • Pinpoint the buggy line

5
Review of Recent Work
  • SOBER Algorithm
  • Cause Transition Algorithm
  • Statistical Debugging Liblit05
  • Statistical Debugging Simultaneous
    Identification of Multiple Bugs

6
SOBER Statistical Model-based Bug Localization
  • Program Predicates
  • Predicate Rankings
  • Experimental Results

7
Program Predicates
  • A predicate is a proposition about any program
    properties
  • e.g., idx lt BUFSIZE, a b c, foo() gt 0
  • Each can be evaluated multiple times during one
    execution
  • Every evaluation gives either true or false
  • Therefore, a predicate is simply a boolean random
    variable, which encodes program executions from a
    particular aspect.

8
Evaluation Bias of Predicate P
  • Evaluation bias
  • Defn the probability of being evaluated as true
    within one execution
  • Maximum likelihood estimation Number of true
    evaluations over the total number of evaluations
    in one run
  • Each run gives one observation of evaluation bias
    for predicate P
  • Suppose we have n correct and m incorrect
    executions, for any predicate P, we end up with
  • An observation sequence for correct runs
  • S_p (X_1, X_2, , X_n)
  • An observation sequence for incorrect runs
  • S_f (X_1, X_2, , X_m)
  • Can we infer whether P is suspicious based on S_p
    and S_f?

9
Underlying Populations
  • Imagine the underlying distribution of evaluation
    bias for correct and incorrect executions are
    and
  • S_p and S_f can be viewed as a random sample from
    the underlying populations respectively
  • One major heuristic is
  • The larger the divergence between and
    , the more relevant the predicate P is to
    the bug

10
Major Challenges
  • No knowledge of the closed forms of both
    distributions
  • Usually, we do not have sufficient incorrect
    executions to estimate
    reliably.

11
SOBERs Approach
12
Algorithm Outputs
  • A ranked list of program predicates w.r.t. the
    bug relevance score s(P)
  • Higher-ranked predicates are regarded more
    relevant to the bug
  • Whats the use?
  • Top-ranked predicates suggest the possible buggy
    regions
  • Several predicates may point to the same region

13
Cause Transition (CT)
  • Locating Causes of Program Failures, Cleve et
    al., published in ICSE05, May 15, 2005
  • A variant of delta debugging Z02
  • Previous state-of-the-art performance holder on
    Siemens suite
  • Cons it relies on memory abnormality, hence its
    performance is restricted.

14
Statistical Debugging Liblit05
  • Scalable Statistical bug isolation, Liblit et
    al., published in PLDI05, June 12, 2005
  • Main idea rank predicates according to their
    correlation with program crashes

15
Statistical Debugging Liblit05
  • Context (P) Pr(Crash P observed)
  • Failure (P) Pr(Crash P observed as true)
  • The probability difference
  • Increase (P) Failure (P) Context (P)
  • Limitation Ignores evaluation patterns of
    predicates within each execution

16
Experiment Results
  • Localization quality metric
  • Software bug benchmark
  • Quantitative metric
  • Related works
  • Cause Transition (CT), CZ05
  • Statistical Debugging, LN05
  • Performance comparisons

17
Bug Benchmark
  • Bug benchmark
  • Dreaming benchmark
  • Large number of known bugs on large-scale
    programs with adequate test suite
  • Siemens Program Suite
  • 130 variants of 7 subject programs, each of
    100-600 LOC
  • 130 known bugs in total
  • mainly logic (or semantic) bugs
  • Advantages
  • Known bugs, thus judgments are objective
  • Large number of bugs, thus comparative study is
    statistically significant.
  • Disadvantages
  • Small-scaled subject programs
  • State-of-the-art performance, so far claimed in
    literature,
  • Cause-transition approach, CZ05

18
Localization Quality Metric RR03
19
1st Example
1
10
2
6
3
5
4
7
9
8
T-score 70
20
2nd Example
1
10
2
6
3
5
7
4
9
8
T-score 20
21
Localized bugs w.r.t. Examined Code
22
Cumulative Effects w.r.t. Code Examination
23
Top-k Selection
  • Regardless of specific selection of k, both
    Liblit05 and SOBER are better than CT, the
    current state-of-the-art holder
  • From k2 to 10, SOBER is better than Liblit05
    consistently

24
Conclusion and Discussion
  • A tutorial on statistical debugging
  • Discussion on Future Work
  • Better Statistical Models
  • Identification of Multiple Bugs
  • Robust to Sampling
Write a Comment
User Comments (0)
About PowerShow.com