Title: On Building Reliable Concurrent Systems Vijay K. Garg Professor, Department of ECE and CS Director,
1On Building Reliable Concurrent SystemsVijay K.
GargProfessor, Department of ECE and
CSDirector, PDSLThe University of Texas at
AustinAustin, TX 78712
2Motivation Reliable Software
- Multithreaded Distributed programs are prone to
errors. - Concurrency, nondeterminism, process and channel
failures - Techniques to ensure program correctness
- Before program development Model Checking
- During Testing and Debugging
- After Software Fault-Tolerance
3Paradise Approach
- Key Abstraction Global Properties
- Model Checking and Verification Check global
properties against model of the program (Promela) - Testing and Debugging Global breakpoints, Trace
analysis - Software Fault-Tolerance Monitoring for global
properties, Controlled Reexecution
4Talk Outline
- Motivation
- Monitoring Distributed Systems
- Clock Tracking Dependency
- Camera Global Snapshot (Checkpoint)
- Sensor Detecting Global Properties
- Slicer Computation Slicing
- Supervisor Controlling Execution
- Other Projects at PDSL
5Paradise Environment
Control
6Trace Model Total Order vs Partial Order
- Total order interleaving of events in a trace
- Partial order Lamports happened-before model
Successful Trace
CS2
CS1
CS2
CS1
Specification CS1 ? CS2
f2
f1
e2
e1
Partial Order Trace
Faulty Trace
CS1
CS1
P1
e1
e2
CS1
CS2
CS1
CS2
CS2
CS2
?
P2
e2
f1
f2
e1
f1
f2
7Tracking Dependency
- computation a set of events ordered by happened
before relation - Problem Timestamp events to answer
- e happened before f ?
- e concurrent with f ?
8Clocks in a Distributed System
(1,0,0)
(2,1,0)
(3,1,0)
P1
(0,1,0)
(0,2,0)
P2
(0,0,1)
(0,0,2)
(2,1,3)
P3
- Result s happened before t i the vector at s is
less than the vector at t. - Vector Clocks Fidge 89, Mattern 89
9Dynamic Chain Clocks
- Problem with vector clocks scalability, dynamic
process structure - Idea Computing the chains in an online fashion
Aggarwal and Garg PODC 05 for relevant events
a
P1 P2 P3 P4
b
c
e
d
h
f
g
The relevant subcomputation
A computation with 4 processes
10Experimental Results
- Simulation of a computation with 1 relevant
events - Measured
- number of components vs number of threads
- total time overhead vs number of threads
11Talk Outline
- Motivation and Overview
- Instrumentation
- Clock Tracking Dependency
- Property Checking
- Camera Global Snapshot (Checkpoint)
- Sensor Detecting Global Properties
- Slicer Computation Slicing
12Global Snapshot
- Problem Compute a global snapshot/checkpoint of
the system (state of the processes and the
channels) - Motivation
- Checkpointing for fault tolerance
- Distributed debugging
- Detecting stable predicates
13Key Difficulties Taking care of messages
- Two sites A and B with 400 each. Site A sends a
message with 100 to site B. - Problem 1 Inconsistent State checkpoint (A)
before message sent - 400 message received
before checkpoint (B) - 500 - Problem 2 Messages in transit message sent
before checkpoint (A) - 300 checkpoint (B)
before message received - 400
14Current Algorithms
- Key idea white/red processes and messages
- A process must be red to act on a red message
- Record white messages received by red processes
- Chandy and Lamport 85 A "marker" message on
every channel - Mattern 89, SBF 04 Number of white messages
sent on every channel - O(N2) messages for completely connected topology
15Our Algorithms
- joint work with Rahul Garg, and Yogish Sabharwal,
IBM IRL. (ACM International Conference on
Supercomputing 2006) - Lower Bound O(N log w) messages
- Implementation Blue Gene/L with MPI.
- w average number of messages in transit
16Global Property Detection
- Predicate A global condition expressed using
variables on processes - e.g., more than one process is in critical
section, there is no token in the system - Problem find a global state that satisfies the
given predicate
17The Main Difficulty in Partial Order
- Algorithm for general predicate Cooper and
Marzullo 91
e2, e1, f2, f1, -
e1
e2
P1
e2, e1, f1, -
e1, f2, f1, -
e2, e1, -
-
e1, f1, -
T
P2
f1, -
e1, -
f1
f2
-
- Too many global states A computation may
contain as many as O(kn) global states - k maximum number of events on a process
- n number of processes
18Efficient Predicate Detection for Special Cases
- stable predicate Chandy and Lamport 85
- once the predicate becomes true, it stays true
e.g., deadlock - unstable predicate
- observer independent predicate Charron-Bost et
al 95 occurs in one interleaving
? occurs in all interleavings
e.g., any disjunction of local predicate - linear predicate Chase and Garg 95
- e.g., conjunctive predicates such as there is
no leader in the system - relational predicate x1 x2 xn k Chase
and Garg 95 e.g., violation of k-mutual
exclusion
19Linear Predicates
- The set of consistent cuts that satisfy a linear
predicate is closed under intersection Chase and
Garg 95. - Examples
- conjunctive predicates critical1 and critical2
- channel predicates all channels are empty ,
there are exactly k messages in the channel from
process P to Q - some relational predicates x1 x2 k, when xi
is mon. non-decreasing
20Conjunctive Predicates
- A predicate that can be expressed as l1 ? l2 ?
ln , where li is local to Pi. - Detect errors that may be hidden in some run due
to race conditions. - Examples
- mutual exclusion problem (P1 in CS) and (P2 in
CS) - missing primary (P1 is secondary) and (P2 is
secondary) and (P3 is secondary) - Importance Sufficient for detection of any
boolean expression of local predicates
21Conjunctive Predicates Centralized Algorithm
- (l1 ? l2 ? ln ) is true iff there exist si in
Pi such that li is true in state si, and si and
sj are incomparable for distinct i,j.
22Algorithms for Conjunctive Predicates
- Centralized Algorithm Garg and Waldecker 92
Each non-checker
process maintains its local vector and sends to
the checker process the chain clock whenever - local predicate is true
- at most once in each message interval.
- Time complexity Checker requires at most O(n2m)
comparisons. - token based algorithm Garg and Chase 95
- completely distributed algorithm Garg and Chase
95 - keeping queues shorter Chiou and Korfhage 95
- avoiding control messages Hurfin, Mizuno,
Raynal, Singhal 96
23Other Special Classes of Predicates
- Relational Predicates
- Let xi number of token at Pi
- Sxi lt k loss of tokens
- Algorithms max-flow techniques Groselj 93,
Chase and Garg 95, Wu and Chen 98 - Dilworth's partition Tomlinson and Garg 96
24Relational Predicates
- Let xi ? 0 be a variable at Pi . Predicates of
the form Chase and Garg 95 - ? xi ? k
- Algorithm Consistent cut with minimum value
min cut in the flow graph
25Predicate Detection in General
e2, e1, f2, f1, -
e2, e1, f1, -
e1, f2, f1, -
e2, e1, -
e1, f1, -
f1, -
e1, -
-
- Explore the state-space (need to examine all
global states) without constructing the graph - breadth first manner Cooper and Marzullo 91
- depth first manner Alagar and Venkatesan 94
- lexical order Garg 03
26Talk Outline
- Motivation and Overview
- Instrumentation
- Clock Tracking Dependency
- Property Checking
- Camera Global Snapshot (Checkpoint)
- Sensor Detecting Global Properties
- Slicer Computation Slicing
- Supervisor Controlling Execution
27The Main Idea of Computation Slicing
state explosion
Partial order trace
keep all red global states
slicing
slice
28How does Computation Slicing Help?
Partial order trace
check b1 ? b2
satisfy b1
retain all global states satisfying b1
slicing for b1
slice
check b2
29Example
- Detect predicate (xy z lt 5) ? (x 1) ? (z 3)
Slice with respect to (x 1) ? (z 3)
Computation
30Slice
- slice a sub-trace such that
- it contains all consistent cuts of the trace
satisfying the given predicate - it contains the least number of consistent cuts
- Garg and Mittal 01, Mittal and Garg 01
predicate
slice
trace
31Results
- Efficient polynomial-time algorithms for
computing the slice for - linear predicates Garg and Mittal 01
- time-complexity O(n2m)
- general predicate
- Theorem Given a computation, if a predicate b
can be detected efficiently then the slice for b
can also be computed efficiently. Mittal,Sen and
Garg 03 - combining slices Boolean operators
- temporal logic operators EF, AG, EG
- approximate slice For arbitrary boolean
expression - n number of processes
- m number of events
32POTA Architecture Sen and Garg 04
Predicate (Specification)
Program
Analyzer
yes/ witness
Slice
Instrumentor
Slicer
Predicate Detector
no/ counter example
Trace
Slice
Instrumented Program
Promela
yes
Trace
Translator
Execute Program
Execute SPIN
no/ counter example
Specification
33Experiments Dining Philosophers Trace
Verification
- POTA Partial Order Trace Analyzer (based on
slicing) Sen and Garg 03 - SPIN A widely used model checking tool Holzmann
97
- SPIN 250 seconds for n 6, runs out of memory
for n gt 6. - POTA can handle n 200. Used 400 seconds.
- Predicate Two neighboring dining philosophers do
not eat concurrently
34Supervisor Motivation for Control
- maintain global invariants or proper order of
events - Examples Distributed Debugging
- ensure that busy1 V busy2 is always true
- ensure that m1 is delivered before m2
- Fault tolerance
- On fault, rollback and execute under control
35Rollback Recovery for Software Faults
- Re-execution Problem
- To re-execute in order to avoid a recurrence of a
previously detected failure - Progressive Retry Wang et al 97
- Controlled Re-execution Tarafdar and Garg 98
36Controlled Re-execution
- Add the synchronization necessary to maintain
safety property - e.g., mutual exclusion
37Results
- Efficient algorithms for computing the
synchronization for - Locks Tarafdar, Garg DISC98
- O(nm) algorithm for various types of locks
- disjunctive predicate Mittal, Garg 00
- e.g., (n-1)-mutual exclusion
- time-complexity O(m2)
- minimizes the number of synchronization arrows
- region predicate Mittal, Garg PODC 00
- e.g., virtual clocks of processes are
approximately synchronized - time-complexityO(nm2)
- maximizes the concurrency in the controlled
computation - n number of processes, m number of events
38Conclusions
- Efficient algorithms possible for monitoring
global properties - Observation and Control a powerful abstraction
- Current execution engines are designed for
performance rather than fault-tolerance
39Other Research Projects
- Distributed simulation GVT algorithms,
fault-tolerance - Recovery Schemes Optimistic Message Logging,
fault-tolerance without replication - Model Checking Partial Order Methods
- Formal Methods Petri Nets, Lattice Theory, Max
Plus Algebra
40Questions