Title: Statistical Debugging: A Tutorial
1Statistical Debugging A Tutorial
Acknowledgement Some slides in this tutorial
were borrowed from Chao Liu at UIUC.
2Motivations
- Software is full of bugs
- Windows 2000 had about 63,000 known bugs at its
time of release, 2 bugs per 1000 lines - A study by the National Institute of Standards
and Technology showed that software faults cost
the U.S.economy about 59.5 billion
annuallyhttp//www.nist.gov/director/prog-ofc/rep
ort02-3.pdf - Testing and debugging are laborious and expensive
- 50 of my company employees are testers, and the
rest spends 50 of their time testing! - --Bill Gates, in 1995
3Expedite Debugging
- Manual debugging
- Trace the executions step-by-step.
- Verify observations against expectations
- Automated debugging
- Collect runtime behaviors as the program
executes. - Identify bug-relevant points by contrasting the
correct and incorrect executions - Best efforts so far bug localization
4An Example
void subline(char lin, char pat, char sub)
int i, lastm, m lastm -1 i 0
while((lini ! ENDSTR)) m amatch(lin, i,
pat, 0) if (m gt 0) putsub(lin, i, m,
sub) lastm m if ((m -1)
(m i)) fputc(lini, stdout) i
i 1 else i m
void subline(char lin, char pat, char
sub) int i, lastm, m lastm -1
i 0 while((lini ! ENDSTR)) m
amatch(lin, i, pat, 0) if ((m gt 0) (lastm
! m) ) putsub(lin, i, m, sub) lastm
m if ((m -1) (m i))
fputc(lini, stdout) i i 1 else
i m
- Symptoms
- 563 lines of C code
- 130 out of 5542 test cases fail to give correct
outputs - No crashes
- Conventional debugging
- Few hints
- Step-by-step tracing
- Better method
- Pinpoint the buggy line
5Review of Recent Work
- SOBER Algorithm
- Cause Transition Algorithm
- Statistical Debugging Liblit05
- Statistical Debugging Simultaneous
Identification of Multiple Bugs
6SOBER Statistical Model-based Bug Localization
- Program Predicates
- Predicate Rankings
- Experimental Results
7Program Predicates
- A predicate is a proposition about any program
properties - e.g., idx lt BUFSIZE, a b c, foo() gt 0
- Each can be evaluated multiple times during one
execution - Every evaluation gives either true or false
- Therefore, a predicate is simply a boolean random
variable, which encodes program executions from a
particular aspect.
8Evaluation Bias of Predicate P
- Evaluation bias
- Defn the probability of being evaluated as true
within one execution - Maximum likelihood estimation Number of true
evaluations over the total number of evaluations
in one run - Each run gives one observation of evaluation bias
for predicate P - Suppose we have n correct and m incorrect
executions, for any predicate P, we end up with - An observation sequence for correct runs
- S_p (X_1, X_2, , X_n)
- An observation sequence for incorrect runs
- S_f (X_1, X_2, , X_m)
- Can we infer whether P is suspicious based on S_p
and S_f?
9Underlying Populations
- Imagine the underlying distribution of evaluation
bias for correct and incorrect executions are
and - S_p and S_f can be viewed as a random sample from
the underlying populations respectively - One major heuristic is
- The larger the divergence between and
, the more relevant the predicate P is to
the bug
10Major Challenges
- No knowledge of the closed forms of both
distributions - Usually, we do not have sufficient incorrect
executions to estimate
reliably.
11SOBERs Approach
12Algorithm Outputs
- A ranked list of program predicates w.r.t. the
bug relevance score s(P) - Higher-ranked predicates are regarded more
relevant to the bug - Whats the use?
- Top-ranked predicates suggest the possible buggy
regions - Several predicates may point to the same region
-
13Cause Transition (CT)
- Locating Causes of Program Failures, Cleve et
al., published in ICSE05, May 15, 2005 - A variant of delta debugging Z02
- Previous state-of-the-art performance holder on
Siemens suite - Cons it relies on memory abnormality, hence its
performance is restricted.
14Statistical Debugging Liblit05
- Scalable Statistical bug isolation, Liblit et
al., published in PLDI05, June 12, 2005 - Main idea rank predicates according to their
correlation with program crashes
15Statistical Debugging Liblit05
- Context (P) Pr(Crash P observed)
- Failure (P) Pr(Crash P observed as true)
- The probability difference
- Increase (P) Failure (P) Context (P)
- Limitation Ignores evaluation patterns of
predicates within each execution
16Experiment Results
- Localization quality metric
- Software bug benchmark
- Quantitative metric
- Related works
- Cause Transition (CT), CZ05
- Statistical Debugging, LN05
- Performance comparisons
17Bug Benchmark
- Bug benchmark
- Dreaming benchmark
- Large number of known bugs on large-scale
programs with adequate test suite - Siemens Program Suite
- 130 variants of 7 subject programs, each of
100-600 LOC - 130 known bugs in total
- mainly logic (or semantic) bugs
- Advantages
- Known bugs, thus judgments are objective
- Large number of bugs, thus comparative study is
statistically significant. - Disadvantages
- Small-scaled subject programs
- State-of-the-art performance, so far claimed in
literature, - Cause-transition approach, CZ05
18Localization Quality Metric RR03
191st Example
1
10
2
6
3
5
4
7
9
8
T-score 70
202nd Example
1
10
2
6
3
5
7
4
9
8
T-score 20
21Localized bugs w.r.t. Examined Code
22Cumulative Effects w.r.t. Code Examination
23Top-k Selection
- Regardless of specific selection of k, both
Liblit05 and SOBER are better than CT, the
current state-of-the-art holder - From k2 to 10, SOBER is better than Liblit05
consistently
24Conclusion and Discussion
- A tutorial on statistical debugging
- Discussion on Future Work
- Better Statistical Models
- Identification of Multiple Bugs
- Robust to Sampling
-