Title: Building a Better Backtrace: Techniques for Postmortem Program Analysis
1Building a Better Backtrace Techniques for
Postmortem Program Analysis
2A Few Grim Realities
- Programs fail post-deployment
- Ship with known bugs
- Users discover new bugs
- Users are lousy testers
- Never do the same thing twice
- Wild variation in execution environment
- Poor bug reporting, if any
- Users bugs are the ones that really matter
3Program Analysis for Pessimists
- Assume prepare for postmortem analysis
- Compile-time analysis, stashed away for later
- Lightweight (deployable) instrumentation
- Analyze failed program instances
- Mix of automated / interactive tools
- Not quite static analysis, not quite dynamic
- Help humans find and fix bugs that matter
4This Talk Reconstructing Execution Chronologies
- Control flow decision history captures important
properties - Fundamental questions
- How in the world did it get here?
- What happened just before this point?
- How can I make this happen again?
- Broader interest than just crashes
5Striking a Compromise
- Heavyweight approaches
- Replay debugging
- Program tracing
- Lightweight approaches
- Examine stack trace in debugger
- printf() debugging
- Middleweight (our) approach
- How might we have gotten here, given ?
6Striking a Compromise
- Heavyweight approaches
- Replay debugging
- Program tracing
- Lightweight approaches
- Examine stack trace in debugger
- printf() debugging
- Middleweight (our) approach
- How might we have gotten here, given ?
7Striking a Compromise
- Heavyweight approaches
- Replay debugging
- Program tracing
- Lightweight approaches
- Examine stack trace in debugger
- printf() debugging
- Middleweight (our) approach
- How might we have gotten here, given ?
8Striking a Compromise
- Heavyweight approaches
- Replay debugging
- Program tracing
- Lightweight approaches
- Examine stack trace in debugger
- printf() debugging
- Middleweight (our) approach
- How might we have gotten here, given ?
9The Big Idea Gotten Here is Control Flow
Reachability
10The Big Idea Gotten Here is Control Flow
Reachability
?
?
11The Big Idea Gotten Here is Control Flow
Reachability
?
?
?
?
12The Big Idea Gotten Here is Control Flow
Reachability
- Interested in paths
- How, not just yes/no
- Transitive paths within one function
- Multiple functions?
- Matched call/return paths
- This is a form of context free language
reachability
?
?
?
?
?
?
13Global Control Flow Graph
call
return
entry
exit
call
return
14Variations in Matching Grammar
()()()(())
- Complete execution
- All calls returns must be matched
15Variations in Matching Grammar
()()()(())
- Aborted execution
- Some calls without returns
- We use a variant of this
16CFL Reachability Algorithm
- Similar to transitive graph search
- Use a work list to incrementally extend frontier
- Forward from a or backward from ?
- Transitively adding flow edges is one case
- Several additional cases for calls/returns
- Complexity
- O(N3) for arbitrary grammar and graph
- O(E) for our analyses (and many others)
17Reconstruction WithCrash Site Only
- Work backward from crash site
- Remember why each path is extended
- Record justifications in route map
- route(x, z) r1, , rn
- ri cross from x to y, then see route(y, z)
- x and y must be adjacent one of four cases
- route(a, ?) defines possible chronologies
18Reconstruction WithCrash Site Only
- One case, unmatched call, determines stack
(
19Reconstruction WithCrash Site Only
- One case, unmatched call, determines stack
- Unmatched parens ()()()(())
- Stack trace (
(
20Reconstruction WithCrash Site Only
- One case, unmatched call, determines stack
- Unmatched parens ()()()(())
- Stack trace (
- But we probably havea specific stack tracein
mind
(
21Reconstruction WithCrash Site Stack Trace
- S vector of call edges
- Build S 1 clones of global flow graph
22Reconstruction WithCrash Site Stack Trace
- S vector of call edges
- Build S 1 clones of global flow graph
- Two types of call edge
- (i must match )i
- Stays on same layer
23Reconstruction WithCrash Site Stack Trace
- S vector of call edges
- Build S 1 clones of global flow graph
- Two types of call edge
- (i must match )i
- Stays on same layer
- ci must be unmatched
- Only way to next layer
- Determined by S
c6
c3
c14
24Reconstruction WithCrash Site Stack Trace
- Possible histories
- Start at a on top layer
- End at ? on bottom layer
- route(áa, 0ñ, á?, Sñ)
- Backward, not forward
- More deterministic
- Complexity
- O(E) work, S 1 times
c6
c3
c14
25Reconstruction WithCrash Site Event Trace
- V vector of trace nodes
- Use V 1 layered clones, as before
- Must report event when crossing trace node
- On each layer, knock out all trace nodes but one
- On bottommost layer, no trace nodes at all!
- Further restricts set of possible paths
- Complexity O(EV)
26Reconstruction With
- Stack trace event trace
- Multiple event traces
- Ambiguous traces
- Incomplete event trace
- Recent-branch registers
- Program counter sampling
- Finite state machine of your choosing
27Practical Considerations
- Dynamic dispatch / function pointers
- Usual static techniques (points-to,
receiver-class, etc.) - Event tracing can help
- Note stack trace is never dynamic
- Interactivity
- Backward analysis is best most bugs are close to
crash - FIFO work list, demand-driven search
- Deterministic versus non-deterministic state
machines
28Areas For Future Exploration
- Sparsity of trace information
- Identify state-preserving regions
- Explore such regions only once
- Summarization / visualization
- Basis dominator tree walk-back
- Opportunity for novel algorithms here
29Areas For Future Exploration
- Adaptive Gap Reduction
- Programmer inquiries guide future annotation
- Which way did this branch really go?
- How many times did this loop really execute?
- Identification of key inflection points
- Insert lightweight event tracing nodes
- Related work in efficient path profiling
- More evidence for future reconstructions
30Summary and Conclusions
- Program analysis in an imperfect world
- Post-crash unique challenges / leverage points
- CFL path recovery as basis for analysis
- Efficient, demand-driven, adaptable
- Future work
- Adaptive annotation to fill in gaps
- Leveraging multiple runs
- Data value modeling