Title: Building a Better Backtrace: Techniques for Postmortem Program Analysis
1Building a Better Backtrace Techniques for
Postmortem Program Analysis
2A Few Grim Realities
- Programs fail post-deployment
- Ship with known bugs
- Users discover new bugs
- Users are lousy testers
- Never do the same thing twice
- Wild variation in execution environment
- Poor bug reporting, if any
- Users bugs are the ones that really matter
3Program Analysis for Pessimists
- Assume prepare for postmortem analysis
- Compile-time analysis, stashed away for later
- Lightweight (deployable) instrumentation
- Analyze failed program instances
- Mix of automated / interactive tools
- Not quite static analysis, not quite dynamic
- Help humans find and fix bugs that matter
4This Talk Reconstructing Execution Chronologies
- Control flow decision history captures important
properties - Fundamental questions
- How in the world did I get here?
- What happened just before this point?
- How can I make this happen again?
- Broader interest than just crashes
5This Talk Reconstructing Execution Chronologies
- Heavyweight (academic) approaches
- Replay debugging
- Program tracing
- Lightweight (industrial) approaches
- Examine stack trace in debugger
- printf() debugging
- Middleweight (our) approach
- How might we have gotten here, given ?
6This Talk Reconstructing Execution Chronologies
- Heavyweight (academic) approaches
- Replay debugging
- Program tracing
- Lightweight (industrial) approaches
- Examine stack trace in debugger
- printf() debugging
- Middleweight (our) approach
- How might we have gotten here, given ?
7This Talk Reconstructing Execution Chronologies
- Heavyweight (academic) approaches
- Replay debugging
- Program tracing
- Lightweight (industrial) approaches
- Examine stack trace in debugger
- printf() debugging
- Middleweight (our) approach
- How might we have gotten here, given ?
8This Talk Reconstructing Execution Chronologies
- Heavyweight (academic) approaches
- Replay debugging
- Program tracing
- Lightweight (industrial) approaches
- Examine stack trace in debugger
- printf() debugging
- Middleweight (our) approach
- How might we have gotten here, given ?
9The Big Idea Gotten Here is Control Flow
Reachability
10The Big Idea Gotten Here is Control Flow
Reachability
?
?
11The Big Idea Gotten Here is Control Flow
Reachability
?
?
?
?
12The Big Idea Gotten Here is Control Flow
Reachability
- Interested in paths
- How, not just yes/no
- Transitive paths within one function
- Multiple functions?
- Matched call/return paths
- This is a form of context free language
reachability
?
?
?
?
?
?
13Global Control Flow Graph
- Split each function invocation site into
- Call node
- Return node
- Edge call node ? function entry node
- Edge function exit node ? return node
- No edge call node ? return node
/
14Global Control Flow Graph
call
return
entry
exit
call
return
15Enforcing Matched Call/Return
- Label call and return edges
- ( for first call edge ) for return
- and for next sites edges
- and for next
- In practice, (i and )i for invocation site i
- Execution paths obey a context free language of
matched parentheses
16Global Control Flow Graph
call
return
(
)
entry
exit
call
return
17Variations in Matching Grammar
()()()(())
- Complete execution
- All calls returns must be matched
18Variations in Matching Grammar
()()()(())
- Aborted execution
- Some calls without returns
- We use a variant of this
19Variations in Matching Grammar
()()()(())
- Arbitrary subinterval of execution
- Prefix containing unmatched returns
- Suffix containing unmatched calls
- Used by context-sensitive points-to analyses
20Implementation Notes
- Similar to transitive graph search
- Use a work list to incrementally extend frontier
- Forward from a or backward from ?
- Transitively adding flow edges is one case
- Several additional cases for calls/returns
- Complexity
- O(N3) for arbitrary grammar and graph
- O(E) for our analyses (and many others)
21Case 1 Transitive Flow
22Case 2 Seeding a New Function
)i
23Case 3a Bridge Discovery
)i
(i
24Case 3b Crossing Known Bridge
)i
(i
25Case 4 Unmatched Call
(i
26Reconstruction WithCrash Site Only
- Work backward from crash site
- Remember why each edge is added
- Record justifications in route map
- route(x, z) r1, , rn
- ri cross from x to y, then see route(y, z)
- x and y must be adjacent one of four cases
- route(a, ?) defines possible chronologies
27Reconstruction WithCrash Site Only
- Case 4 (unmatched call) defines stack trace
- Unmatched parens ()()()(())
- Stack trace (
- But we probably have a specific stack trace in
mind
28Reconstruction WithCrash Site Stack Trace
- S vector of call edges
- Build S 1 clones of global flow graph
29Reconstruction WithCrash Site Stack Trace
- S vector of call edges
- Build S 1 clones of global flow graph
- Two types of call edge
- (i can match )i
- Stays on same layer
- ci are unmatched
- Only way to next layer
- Determined by S
c6
c3
c14
30Reconstruction WithCrash Site Stack Trace
- Possible histories
- Start at a on top layer
- End at ? on bottom layer
- route(áa, 0ñ, á?, Sñ)
- Backward, not forward
- more deterministic
- Complexity
- O(E) work, S 1 times
c6
c3
c14
31Reconstruction WithCrash Site Event Trace
- V vector of trace nodes
- Use V 1 layered clones, as before
- Must report event when crossing trace node
- On each layer, knock out all trace nodes but one
- On bottommost layer, no trace nodes at all!
- Further restricts set of possible paths
- Complexity O(EV)
32Reconstruction WithWhatever Youve Got Handy
- Stack trace event trace
- Multiple event traces
- Ambiguous traces
- Incomplete event trace
- Recent-branch registers
- Program counter sampling
- Finite state machine of your choosing
33Practical Considerations
- Dynamic dispatch / function pointers
- Usual static techniques (points-to,
receiver-class, etc.) - Event tracing can help
- Note stack trace is never dynamic
- Interactivity
- Backward analysis is ideal most bugs are close
to crash - FIFO work list, demand-driven search
- Deterministic versus non-deterministic state
machines - Summarization / visualization
- Dominator tree walk-back with progressive
disclosure
34Summary and Conclusions
- Program analysis in an imperfect world
- Post-crash unique challenges / leverage points
- CFL path recovery as basis for analysis
- Efficient, demand-driven, adaptable
- Future work
- Adaptive annotation to fill in gaps
- Leveraging multiple runs
- Data value modeling