Title: An%20Case%20for%20an%20Interleaving%20Constrained%20Shared-Memory%20Multi-Processor
1An Case for an Interleaving Constrained
Shared-Memory Multi-Processor
- Jie Yu and Satish Narayanasamy
- University of Michigan
2Why is Parallel Programming Hard?
- Is single-threaded programming relatively easy?
- Verification is NP-hard
- BUT, properties such as a functions
pre/post-conditions, loop invariants are
verifiable in polynomial time - Parallel programming is harder
- Verifying properties for even small code regions
is NP-hard - Reason Unbounded number of legal thread
interleavings exposed to the parallel runtime - Impractical to test/verify properties for all
legal interleavings
3Too much freedom given to parallel runtime?
Legal Thread Interleavings
Tested Correct Interleavings
Incorrect interleavings found during testing
4Solution Limit Freedom
Interleaving constraints from correct test runs
are encoded in the program binary
Runtime System Avoids Untested Interleavings
i.e. avoid corner cases
Programmer tests as many legal interleavings as
practically possible
5Result of Constraining Interleavings
- A majority of the concurrency bugs are avoidable
- Data races, atomicity violations, and also order
violations - Performance overhead is low
- Untested interleavings in well-tested programs
are likely to manifest rarely - Processor support helps reduce the cost of
enforcing interleaving constraints
6Challenges
- How to encode tested interleavings in a programs
binary? - Predecessor Set (PSet) interleaving constraints
- How to efficiently enforce interleaving
constraints at runtime? - Detect violations of PSet constraints using
processor support - Avoid violations by stalling or using
rollback-and-re-execution support
7Outline
- Overview
- Encoding and Enforcing tested interleavings
- Predecessor Set (PSet) Interleaving Constraints
- Processor Support
- Results
- Conclusion
8Encoding Tested Interleavings
- Interleaving Constraints from Test Runs
- Too specific to a test input ? Performance loss
for a different input - Too generic ? Might allow untested interleavings
- Predecessor Set (Pset)
- PSet(m)defined for each static memory operation m
- pred e PSet(m), if m is immediately and remotely
memory dependent on pred in at least one tested
execution
9A Test Run
Thread 1
Thread 2
Thread 3
PSet(W1) PSet(R1) PSet(R2)
W1 PSet(R3) W1 PSet(R4) PSet(W2)
R3,R4 PSet(W3) W2
W1
R1
R2
W1
R3
W1
R4
W2
R2
R3, R4
W3
W1
W2
R4
10Enforcing Tested Interleaving
- Processor support for detecting and avoiding PSet
constraints - Detecting PSet constraint violations
- For each memory location, track its last accessor
- Cache extension
- Detect PSet constraint violation
- Piggyback cache coherence reply with last
accessor - Processor executes PSet membership test by
executing additional micro-ops - Overcoming a PSet Constraint violation
- Stall
- Re-execute using checkpoint-and-rollback support
- E.g. SafetyNet, ReVive, etc.
11Two Case Studies
- Case Study 1
- An Atomicity Violation Bug in MySQL
- Avoided using stall
- Case Study 2
- An order violation bug in Mozilla
- neither a data race nor an atomicity violation
- Avoided using rollback and re-execution
12Two Case Studies
- Case Study 1
- An Atomicity Violation Bug in MySQL
- Avoided using stall
- Case Study 2
- An order violation bug in Mozilla
- neither a data race nor an atomicity violation
- Avoided using rollback and re-execution
13An Atomicity Violation Bug in MySQL
Thread 1
Thread 2
mysql_insert() if (log_status !
LOG_CLOSED) // write into a log
file
MYSQL_LOGnew_file() close()
open()
R1
log_status LOG_CLOSED
W1
log_status LOG_OPEN
W2
sql/log.cc
sql/sql_insert.cc
14Correct Interleaving 1 -- frequent,
therefore likely to be tested
Thread 1
Thread 2
log_status ! LOG_CLOSED ?
R1
log_status LOG_CLOSED
W1
R1
log_status LOG_OPEN
W2
PSet(W1) R1 PSet(W2) PSet(R1)
15Correct Interleaving 2 -- frequent,
therefore likely to be tested
Thread 1
Thread 2
log_status LOG_CLOSED
W1
R1
log_status LOG_OPEN
W2
log_status ! LOG_CLOSED ?
R1
W2
PSet(W1) R1 PSet(W2) PSet(R1) W2
16Incorrect Interleaving -- rare, and therefore
likely to be untested
Thread 1
Thread 2
log_status LOG_CLOSED
W1
R1
Constraint Violation
log_status LOG_OPEN
W2
log_status ! LOG_CLOSED ?
R1
W2
17Two Case Studies
- Case Study 1
- An Atomicity Violation Bug in MySQL
- Avoided using stall
- Case Study 2
- An order violation bug in Mozilla
- neither a data race nor an atomicity violation
- Avoided using rollback and re-execution
18Correct Test Run
TimerThreadRun() ... Lock(lock)
mProcessing TRUE while (mProcessing)
... mWaiting TRUE
Wait(cond, lock) mWaiting FALSE
Unlock(lock) ... TimerThread.cpp
Thread 1
Thread 2
W
mWaiting TRUE
W
TimerThreadShutdown() ...
Lock(lock) mProcessing FALSE if
(mWaiting) Notify(cond, lock)
Unlock(lock) ... mThread-gtJoin()
return NS_OK TimerThread.cpp
if (mWaiting) ?
R
R
W
PSet(W) PSet(R) W
19Avoiding Order Violation
TimerThreadRun() ... Lock(lock)
mProcessing TRUE while (mProcessing)
... mWaiting TRUE
Wait(cond, lock) mWaiting FALSE
Unlock(lock) ... TimerThread.cpp
Thread 1
Thread 2
W
if (mWaiting) ?
R
W
Rollback
Constraint Violation
TimerThreadShutdown() ...
Lock(lock) mProcessing FALSE if
(mWaiting) Notify(cond, lock)
Unlock(lock) ... mThread-gtJoin()
return NS_OK TimerThread.cpp
mWaiting TRUE
W
R
20Outline
- Overview
- Encoding and enforcing tested interleavings
- Predecessor Set (PSet)
- Processor Support
- Results
- Conclusion
21Methodology
- Pin based analysis
- 17 documented bugs analyzed
- MySQL, Apache, Mozilla, pbzip, aget, pfscan
- Parsec, Splash for performance study
- Applications tested using regression test suites
when available or random test input
22PSet Constraints from Test Runs
- Concurrent workload
- MySQL run regression test suite in parallel with
OSDB - FFT, pbzip2 random test input
23Bug Avoidance Capability
- 17 bugs from MySQL, Apache, Mozilla, pbzip,
aget, pfscan - 15/17 bugs avoided by enforcing PSet contraints
- Including a bug that is neither a data race nor
an atomicity violation bug - 2/17 false negatives
- a multi-variable atomicity violation
- a context sensitive deadlock bug
- 6 bugs are avoided using stalling mechanism.
Other require rollback mechanism.
24PSet violations in Bug Free Execution
- 2 PSet constraint violations in MySQL not
avoided - MySQL, bmove512 unrolls a loop 128 times
25PSet Size of Instructions
- Over 95 of the inst. have PSets of size zero
- Less than 2 of static memory inst. have a PSet
of size greater than two
26Summary
- Multi-threaded programming is hard
- Existing shared-memory programming model exposes
too many legal interleavings to the runtime - Most interleavings remain untested in production
code -
- Interleaving constrained shared-memory
multiprocessor - Avoids untested (rare) interleavings to avoid
concurrency bugs - Predecessor Set interleaving constraints
- 15/17 concurrency bugs are avoidable
- Acceptable performance and space overhead
27Thanks
28Memory Space Overhead
Program App. Size PSet Pairs Overhead w.r.t App.
Pbzip2 39KB 201 2.16
Aget 90KB 365 1.69
Pfscan 17KB 295 7.34
Apache 2435KB 4119 0.69
MySQL 4284KB 6604 0.64
FFT 24KB 158 2.74
FMM 73KB 1764 10.13
LU 24KB 244 4.31
Radix 21KB 255 5.00
Blackscholes 54KB 41 0.32
Canneal 59KB 752 5.24
- Space Overhead
- In the worst case, 10 code size increase