Debugging Concurrent Software by ContextBounded Analysis - PowerPoint PPT Presentation

1 / 53
About This Presentation
Title:

Debugging Concurrent Software by ContextBounded Analysis

Description:

KISS strategy. Q encodes executions of P with small number of context switches ... KISS features. KISS trades off soundness for scalability ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 54
Provided by: shazq
Category:

less

Transcript and Presenter's Notes

Title: Debugging Concurrent Software by ContextBounded Analysis


1
Debugging Concurrent Software by Context-Bounded
Analysis
  • Shaz Qadeer
  • Microsoft Research
  • Joint work with
  • Jakob Rehof, Microsoft Research
  • Dinghao Wu, Princeton University

2
Concurrent software
Thread 1
Processor 1
Thread 2
Thread 3
Processor 2
Thread 4
  • Operating systems, device drivers
  • Databases, web servers, browsers, GUIs, ...
  • Modern languages C, Java

3
Concurrency is increasingly important
  • Single-chip multiprocessors are an architectural
    inflexion point
  • Software running on these chips will be even more
    concurrent
  • Embedded systems
  • Airplanes, cars, PDAs, cellphones
  • Web services

4
Reliable concurrent software?
  • Correctness Problem
  • does program behave correctly for all inputs and
    all interleavings?
  • Bugs due to concurrency are insidious
  • nondeterministic, timing dependent
  • difficult to detect, reproduce, eliminate
  • coverage from testing very poor

5
Analysis of concurrent programs is difficult (1)
  • Finite-data single-procedure program
  • n lines
  • m states for global data variables
  • 1 thread
  • n m states
  • K threads
  • (n)K m states

6
Analysis of concurrent programs is difficult (2)
  • Finite-data program with procedures
  • n lines
  • m states for global data variables
  • 1 thread
  • Infinite number of states
  • Can still decide assertions in O(n m3)
  • SLAM, ESP, BLAST implement this algorithm
  • K ? 2 threads
  • Undecidable! (Ramalingam 00)

7
Context-bounded verification of concurrent
software
Context switch
Context switch
?
?
?
?
?
?
?
?
?
?
Context
Context
Context
Analyze all executions with small number of
context switches !
8
Why context-bounded analysis?
  • Many subtle concurrency errors are manifested in
    executions with a small number of contexts
  • Context-bounded analysis can be performed
    efficiently

9
KISS A static checker for concurrent software
  • An implementation of context-bounded analysis
  • Technique to use any sequential checker to
    perform context-bounded concurrency analysis
  • Has found a number of concurrency errors in NT
    device drivers

10
KISS A static checker for concurrent software
No error found
?
Sequential Checker
KISS
Sequential program Q
Concurrent program P
?
Error in Q indicates error in P
11
KISS A static checker for concurrent software
No error found
?
KISS
SDV
Sequential program Q
Concurrent program P
?
Error in Q indicates error in P
12
KISS A static checker for concurrent software
No error found
?
KISS
PREfix
Sequential program Q
Concurrent program P
?
Error in Q indicates error in P
13
KISS A static checker for concurrent software
No error found
?
KISS
ESP
Sequential program Q
Concurrent program P
?
Error in Q indicates error in P
14
Inside a static checker for sequential programs
int x, y, z void foo ( ) if (x y)
y x if (y z)
z y
assert (x z)
  • Symbolically analyze all paths
  • Check the assertion for each path
  • Interprocedural analysis
  • e.g., PREfix, ESP, SLAM, BLAST

15
KISS strategy
  • Q encodes executions of P with small number of
    context switches
  • instrumentation introduces lots of extra paths to
    mimic context switches
  • Leverage all-path analysis of sequential checkers

16
PnpStop( ) int t de-stopping
T t AtomicDecr( de-count)
if (t 0) SetEvent( de-stopEve
nt) WaitEvent( de-stopEvent)
DispatchRoutine( ) int t if (!
de-stopping) AtomicIncr( de
-count) // do useful work //
t AtomicDecr( de-count)
if (t 0) SetEve
nt( de-stopEvent)
17
DispatchRoutine( ) int t if (!
de-stopping) AtomicIncr( de
-count) // do useful work //
t AtomicDecr( de-count)
if (t 0) SetEve
nt( de-stopEvent)
PnpStop( ) int t if () return
de-stopping T if () return t A
tomicDecr( de-count) if () return i
f (t 0) SetEvent( de-stopEvent)
if () return WaitEvent( de-stopEvent)

18
PnpStop( ) int t if () return
de-stopping T if () return t A
tomicDecr( de-count) if () return i
f (t 0) SetEvent( de-stopEvent)
if () return WaitEvent( de-stopEvent)

19
PnpStop( ) int t if () return
de-stopping T if () return t A
tomicDecr( de-count) if () return i
f (t 0) SetEvent( de-stopEvent)
if () return WaitEvent( de-stopEvent)

main( ) DispatchRoutine( )
20
PnpStop( ) int t CODE de-stop
ping T CODE t AtomicDecr( de-co
unt) CODE if (t 0) SetEve
nt( de-stopEvent) CODE WaitEvent( d
e-stopEvent) CODE
main( ) PnpStop( )
21
KISS features
  • KISS trades off soundness for scalability
  • Cost of analyzing a concurrent program P cost
    of analyzing a sequential program Q
  • Size of Q asymptotically same as size of P
  • Unsoundness is precisely quantifiable
  • for 2-thread program, explores all executions
    with up to two context switches
  • for n-thread program, explores up to 2n-2 context
    switches
  • Allows any sequential checker to analyze
    concurrency

22
(No Transcript)
23
Experimental Evaluation of KISS
24
Driver Stopping Error in Bluetooth Driver (1 KLOC)
DispatchRoutine() int t if (! de-sto
pping) AtomicIncr( de-count)
assert ! driverStopped // do useful wo
rk // t AtomicDecr( de-coun
t) if (t 0) SetEvent( de-
stopEvent)
PnpStop() int t de-stopping T
t AtomicDecr( de-count)
if (t 0) SetEvent( de-stopEvent
) WaitEvent( de-stopEvent) driverSto
pped T
25
int t if (! de-stopping)
Assertion fails!
26
IRP Cancellation Error in Packet Driver (2.5
KLOC)
DispatchRoutine(IRP irp) irp-Cance
lRoutine PacketCancelRoutine Enq
ueue(irp)
IoMarkIrpPending(irp)
IoCancelIrp(IRP irp) IoAcquireCancelSpinLo
ck() if (irp-CancelRoutine) (irp
-CancelRoutine)(irp) Packet
CancelRoutine(IRP irp) Dequeue(irp)
IoCompleteRequest(irp) IoReleaseCance
lSpinLock()
27
irp-CancelRoutine PacketCancelRoutine

Enqueue(irp)
Error An irp should not be marked pending after
it has been completed !
28
Data-race Conditions in DDK Sample Drivers
  • Device extension shared among threads
  • Data-races on device extension fields
  • 18 sample DDK drivers
  • Range 0.5-9.2 KLOC
  • Total 70 KLOC
  • Each field checked separately with resource limit
    of 20 minutes and 800MB
  • Two threads each calls nondeterministically
    chosen dispatch routine

29
Total 30 races
30
DevicePnpState Field in Toaster/toastmon
ToastMon_DispatchPnp( DEVICE_OBJECT obj, IRP
irp) IoAcquireRemoveLock()
case IRP_MN_QUERY_STOP_DEVICE // R
ace write access deviceExt-DevicePnPSta
te StopPending
break IoReleaseRemoveLock()

ToastMon_DispatchPower( DEVICE_OBJECT obj, IR
P irp) // Race read access i
f (deviceExt-DevicePnpState Deleted)

31
Acknowledgments
  • Tom Ball
  • Byron Cook
  • John Henry
  • Doron Holan
  • Vladimir Levin
  • Jakob Lichtenberg
  • Adrian Oney
  • Sriram Rajamani
  • Peter Wieland

32
Keep It Simple and Sequential
  • Context-bounded analysis by leveraging existing
    sequential checkers
  • Validates the hypothesis that many concurrency
    errors require few context switches to show up

33
However
  • Hard limit on number of explored contexts
  • e.g., two context switches for concurrent program
    with two threads
  • Case study Concurrent transaction management
    code written in C (Naik-Rehof 04)
  • Analyzed by the Zing model checker after
    automatically translating to the Zing input
    language
  • Found three bugs each requiring between three and
    four context switches

34
Is a tuning knob possible?
Given a concurrent boolean program P and a
positive integer c, does P go wrong by failing a
n assertion via an execution with at most c conte
xts?
Decidable
Given a concurrent boolean program P with
unbounded fork-join parallelism and a positive i
nteger c, does P go wrong by failing an assertio
n via an execution with at most c contexts?
Decidable
35
Context switch
Context switch
?
?
?
?
?
?
?
?
?
?
Context
Context
Context
  • Problem
  • Unbounded computation possible within each
    context!
  • Unbounded execution depth and reachable state
    space
  • Different from bounded-depth model checking

36
Sequential boolean program
Global store g, valuation to global
variables Local store l, valuati
on to local variables Stack s,
sequence of local stores
State (g, s)
37
Example
bool a F void main( ) L1 a T L2
flip(a) L3 void flip(bool x) L4
a !x
L5
(a, ?x, pc?)
(F, ?_, L1?)
(T, ?_, L2?)
(T, ?_, L3? ?T, L4?)
(F, ?_, L3? ?T, L5?)
(F, ?_, L3?)
(F, ?)
38
Sequential boolean program
Global store g, valuation to global
variables Local store l, valuati
on to local variables Stack s,
sequence of local stores
State (g, s)
39
Reachability problem for sequential boolean
program
Given (g, s), is there s such that
(g, s) ? (error,s)?
40
Aggregate state
Set of stacks ss Aggregate state (g
, ss) (g,s) s ? ss
Reach(g, ss) (g, s) exists s ? ss such t
hat (g, s) ? (g, s)
41
Aggregate transition relation
  • Observations
  • There is a unique smallest partition of Reach(g,
    ss)
  • into aggregate states (g1, ss1) ? ? (gn,
    ssn)
  • The number of elements in the partition is
  • bounded by the number of global stores

(g, ss) ? (g1, ss1) . . . (g, ss) ? (gn, ss
n)
42
Theorem (Buchi, Schwoon00)
  • If ss is regular and (g, ss) ? (g, ss), then
    ss is regular.
  • If ss is given as a finite automaton A, then a
    finite automaton A for ss can be constructed
    from A in polynomial time.

43
Algorithm
Problem Given (g, s), is there s such that (g
, s) ? (error,s)?
Solution Compute automaton for ss such that (
g, s) ? (error, ss) and check if ss
is nonempty.
44
Concurrent boolean program
Global store g, valuation to global
variables Local store l, valuati
on to local variables Stack s,
sequence of local stores
State (g, s1, s2)
45
Reachability problem for concurrent boolean
program
Given (g, s1, s2), are there s1 and s2 such
that (g, s1, s2) reaches (error, s1, s2) via an
execution with at most c contexts?
46
Aggregate transition relation
(g, ss1, ss2) (g, s1, s2) s1 ? ss1, s2 ?
ss2
47
Algorithm 2 threads, c contexts
Compute the set of reachable aggregate states.
Report an error if (g, ss1, ss2) is reachable
and g error, ss1 is nonempty, and ss2 is nonemp
ty.
48
Complexity 2 threads, c contexts
(g, s1, s2)
?
1
2
?
?
?
1
2
1
2
?
?
Depth of tree context bound c
Branching factor bounded by G ? 2 (G of
global stores) Number of edges bounded by (G ? 2)
(c1) Each edge computable in polynomial time
49
Context-bounded analysis of concurrent software
  • Many subtle concurrency errors are manifested in
    executions with few context switches
  • Experience with KISS on Windows drivers
  • Experience with Zing on transaction manager
  • Algorithms for context-bounded analysis are more
    efficient than those for unbounded analysis
  • Reducibility to sequential checking with KISS
  • Decidability of assertion checking for concurrent
    boolean programs

50
Applications of context-bounded analysis
  • Coverage metric for testing concurrent software
  • Analysis of computer protocols
  • networking
  • cache-coherence

51
Unbounded fork-join parallelism
  • Fork operation x fork
  • Join operation join(x)
  • Copy thread identifier from one variable to
    another

52
Algorithm unbounded fork-join parallelism, c
contexts
  • At most c threads may perform a transition
  • Reduce to previously solved problem with c
    threads and c contexts
  • Nondeterministically pick c forked threads for
    execution

53
start 1, , c ? boolean, initialized
to ? i. (i 1) end 1, , c ? boolean,
initialized to ? i. false
  • c statically created threads
  • thread i starts execution when starti is true

  • thread i sets endi to true on termination

count 1, , c, initialized to 1
Write a Comment
User Comments (0)
About PowerShow.com