Statistical Debugging: A Tutorial - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

Statistical Debugging: A Tutorial

Description:

Statistical Debugging: A Tutorial Steven C.H. Hoi Acknowledgement: Some s in this tutorial were borrowed from Chao Liu at UIUC. Motivations Software is full of ... – PowerPoint PPT presentation

Number of Views:136

Avg rating:3.0/5.0

Slides: 25

Provided by: eduh114

Category:

more less

Transcript and Presenter's Notes

Title: Statistical Debugging: A Tutorial

1
Statistical Debugging A Tutorial

Steven C.H. Hoi

Acknowledgement Some slides in this tutorial
were borrowed from Chao Liu at UIUC.
2
Motivations

Software is full of bugs
Windows 2000 had about 63,000 known bugs at its
time of release, 2 bugs per 1000 lines
A study by the National Institute of Standards
and Technology showed that software faults cost
the U.S.economy about 59.5 billion
annuallyhttp//www.nist.gov/director/prog-ofc/rep
ort02-3.pdf
Testing and debugging are laborious and expensive
50 of my company employees are testers, and the
rest spends 50 of their time testing!
--Bill Gates, in 1995

3
Expedite Debugging

Manual debugging
Trace the executions step-by-step.
Verify observations against expectations
Automated debugging
Collect runtime behaviors as the program
executes.
Identify bug-relevant points by contrasting the
correct and incorrect executions
Best efforts so far bug localization

4
An Example
void subline(char lin, char pat, char sub)
int i, lastm, m lastm -1 i 0
while((lini ! ENDSTR)) m amatch(lin, i,
pat, 0) if (m gt 0) putsub(lin, i, m,
sub) lastm m if ((m -1)
(m i)) fputc(lini, stdout) i
i 1 else i m
void subline(char lin, char pat, char
sub) int i, lastm, m lastm -1
i 0 while((lini ! ENDSTR)) m
amatch(lin, i, pat, 0) if ((m gt 0) (lastm
! m) ) putsub(lin, i, m, sub) lastm
m if ((m -1) (m i))
fputc(lini, stdout) i i 1 else
i m

Symptoms
563 lines of C code
130 out of 5542 test cases fail to give correct
outputs
No crashes
Conventional debugging
Few hints
Step-by-step tracing
Better method
Pinpoint the buggy line

5
Review of Recent Work

SOBER Algorithm
Cause Transition Algorithm
Statistical Debugging Liblit05
Statistical Debugging Simultaneous
Identification of Multiple Bugs

6
SOBER Statistical Model-based Bug Localization

Program Predicates
Predicate Rankings
Experimental Results

7
Program Predicates

A predicate is a proposition about any program
properties
e.g., idx lt BUFSIZE, a b c, foo() gt 0
Each can be evaluated multiple times during one
execution
Every evaluation gives either true or false
Therefore, a predicate is simply a boolean random
variable, which encodes program executions from a
particular aspect.

8
Evaluation Bias of Predicate P

Evaluation bias
Defn the probability of being evaluated as true
within one execution
Maximum likelihood estimation Number of true
evaluations over the total number of evaluations
in one run
Each run gives one observation of evaluation bias
for predicate P
Suppose we have n correct and m incorrect
executions, for any predicate P, we end up with
An observation sequence for correct runs
S_p (X_1, X_2, , X_n)
An observation sequence for incorrect runs
S_f (X_1, X_2, , X_m)
Can we infer whether P is suspicious based on S_p
and S_f?

9
Underlying Populations

Imagine the underlying distribution of evaluation
bias for correct and incorrect executions are
and
S_p and S_f can be viewed as a random sample from
the underlying populations respectively
One major heuristic is
The larger the divergence between and
, the more relevant the predicate P is to
the bug

10
Major Challenges

No knowledge of the closed forms of both
distributions
Usually, we do not have sufficient incorrect
executions to estimate
reliably.

11
SOBERs Approach
12
Algorithm Outputs

A ranked list of program predicates w.r.t. the
bug relevance score s(P)
Higher-ranked predicates are regarded more
relevant to the bug
Whats the use?
Top-ranked predicates suggest the possible buggy
regions
Several predicates may point to the same region

13
Cause Transition (CT)

Locating Causes of Program Failures, Cleve et
al., published in ICSE05, May 15, 2005
A variant of delta debugging Z02
Previous state-of-the-art performance holder on
Siemens suite
Cons it relies on memory abnormality, hence its
performance is restricted.

14
Statistical Debugging Liblit05

Scalable Statistical bug isolation, Liblit et
al., published in PLDI05, June 12, 2005
Main idea rank predicates according to their
correlation with program crashes

15
Statistical Debugging Liblit05

Context (P) Pr(Crash P observed)
Failure (P) Pr(Crash P observed as true)
The probability difference
Increase (P) Failure (P) Context (P)
Limitation Ignores evaluation patterns of
predicates within each execution

16
Experiment Results

Localization quality metric
Software bug benchmark
Quantitative metric
Related works
Cause Transition (CT), CZ05
Statistical Debugging, LN05
Performance comparisons

17
Bug Benchmark

Bug benchmark
Dreaming benchmark
Large number of known bugs on large-scale
programs with adequate test suite
Siemens Program Suite
130 variants of 7 subject programs, each of
100-600 LOC
130 known bugs in total
mainly logic (or semantic) bugs
Advantages
Known bugs, thus judgments are objective
Large number of bugs, thus comparative study is
statistically significant.
Disadvantages
Small-scaled subject programs
State-of-the-art performance, so far claimed in
literature,
Cause-transition approach, CZ05

18
Localization Quality Metric RR03
19
1st Example
1
10
2
6
3
5
4
7
9
8
T-score 70
20
2nd Example
1
10
2
6
3
5
7
4
9
8
T-score 20
21
Localized bugs w.r.t. Examined Code
22
Cumulative Effects w.r.t. Code Examination
23
Top-k Selection

Regardless of specific selection of k, both
Liblit05 and SOBER are better than CT, the
current state-of-the-art holder
From k2 to 10, SOBER is better than Liblit05
consistently

24
Conclusion and Discussion