Runtime Safety Analysis of Concurrent and Distributed Systems - PowerPoint PPT Presentation

1 / 50
About This Presentation
Title:

Runtime Safety Analysis of Concurrent and Distributed Systems

Description:

... my airplane is landing then the runway that the airport has allocated matches ... landing (runway = _at_airportallocRunway) 9/4/09. 14 /50. http://osl.cs.uiuc.edu ... – PowerPoint PPT presentation

Number of Views:99
Avg rating:3.0/5.0
Slides: 51
Provided by: ksen4
Category:

less

Transcript and Presenter's Notes

Title: Runtime Safety Analysis of Concurrent and Distributed Systems


1
Runtime Safety Analysis of Concurrent and
Distributed Systems
  • Koushik Sen
  • University of Illinois at
  • Urbana-Champaign, USA

Joint work with Gul Agha, Grigore Rosu and Abhay
Vardhan.
2
Increasing Software Reliability
  • Current solutions
  • Human review of code and testing
  • Most used in practice
  • Usually ad-hoc, intensive human support
  • (Advanced) Static analysis
  • Often scales up
  • False positives and negatives, annotations
  • (Traditional) Formal methods
  • Model checking and theorem proving
  • General, good confidence, do not always scale up

3
Runtime Verification
  • Merge testing and temporal logic specification
  • Specify safety properties in proper temporal
    logic.
  • Monitor safety properties against a run of the
    program.
  • Examples JPaX (NASA Ames), Upenn's Java MaC
    analyzes the observed run.
  • Disadvantage
  • Lack of coverage.
  • Not suitable for Distributed Systems.

Run
Naïve Observer
4
Our Approach
  • For Distributed Programs
  • Use Distributed Temporal Logic.
  • Use KnowledgeVector
  • Decentralize Monitoring by distributing monitors
    to all processes.
  • For MultiThreaded Programs
  • Use smart observers to predict safety violations
  • Vector Clock Algorithm for MultiThreaded Programs
  • Construct Computation Lattice
  • Analyze Lattice level by level
  • Use Causality Cone Heuristics to increase
    efficiency

5
Decentralized Distributed Program Monitoring
(DIANA)
http//osl.cs.uiuc.edu/
6
Centralized Monitoring Approach
  • Distributed Systems
  • Global state is distributed
  • To Monitor
  • For every state update send state to a central
    monitor
  • Central monitor assembles them to form consistent
    execution traces
  • Verify global safety property on these execution
    traces

7
An Example
  • Mobile node a requests certain value from node b
  • b computes the value and sends it to a
  • Property no node receives a value from another
    node to which it had not sent a request

8
Centralized Monitoring Example
If a receives a value from b then b calculated
the value after receiving request from a
For large number of nodes size of property in LTL
may be large
Message is sent to monitor for every state update
valRcv ? ?(valComputed ? ?valReq)
valRcv ? ?(valComputed ? ?valReq)
valReq
?valReq
valComputed ? ?valReq
?(valComputed ? ?valReq)
b
valComputed
a
valReq
valRcv
9
Decentralized Approach
  • Distribute property
  • Properties expressed with respect to a process
  • Local properties at every process
  • Decentralize Monitoring
  • Maintain knowledge of global state at each
    process
  • Update knowledge with incoming messages
  • Attach knowledge with outgoing messages
  • At each process check safety property against
    local knowledge

10
Decentralized Monitoring Example
If a receives a value from b then b calculated
the value after receiving request from a
valRcv ? _at_a(?(valComputed ? _at_b(?valReq)))
valComputed ? _at_b(?valReq)
?(valComputed ? _at_b(?valReq))
_at_b(?valReq)
Formulas w.r.t processes
No extra message
b
No Need for Global Snapshot
valComputed
a
valReq
valRcv
?valReq
valRcv ? _at_a(?(valComputed ? _at_b(?valReq)))
11
Past time Distributed Temporal Logic (pt-DTL)
  • Based on epistemic logic
  • Properties with respect to a process, say p
  • Interpreted over a sequence of global states that
    the process p is aware of
  • Each process monitors the properties local to it
  • No need for extra messages to create a relevant
    portion of global state
  • KnowledgeVector keeps track of relevant global
    state that can effect a property.

12
Remote Expressions in pt-DTL
  • Remote expressions arbitrary expressions
    related to the state of a remote process
  • Propositions constructed from remote and local
    expressions
  • If my alarm is set then eventually in past
    difference between my temperature and temperature
    at process b exceeded the allowed value
  • alarm ? ?((myTemp - _at_btemp) gt allowed)

13
Safety in Airplane Landing
  • If my airplane is landing then the runway that
    the airport has allocated matches the one that I
    am planning to use
  • landing ? (runway _at_airportallocRunway)

14
Leader Election Example
  • If a leader is elected then if the current
    process is a leader then, at its knowledge, none
    of the other processes is a leader
  • elected ? (stateleader ? /\i?j(_at_j(state ?
    leader)))

15
pt-DTL syntax and semantics
  • Fi true false P(Ei) Fi Fi Æ Fi
    propositional
  • Fi Fi ?Fi Fi S Fi temporal
  • _at_jFj epistemic
  • Ei c vi 2 Vi f(Ei) functional
  • _at_jEj epistemic
  • c constant
  • vi variable at process I
  • P(Ei) predicate on Ei
  • f(Ei) function f applied to Ei
  • _at_jEj expression Ej at process j
  • Fi previously Fi
  • Fi always in past Fi
  • ?Fi eventually in past Fi
  • Fi S Fi Fi since Fi
  • _at_jFj Fj at process j

16
Interpretation of _at_jEj at process i
s31
s32
s33
p3
m4
m1
m2
s22
p2
s23
s21
m3
p1
s12
s11
Since, at s23 p2 is aware of s12 of p1 value of
_at_1E in s23 at p2 value of E in s12 at p1
17
Monitoring Algorithm
  • Requirements
  • Should be fast so that online monitoring is
    possible
  • Little memory overhead
  • Additional messages sent should be minimal
    ideally zero
  • KnowledgeVector
  • Motivated by Vector Clocks
  • Unlike Vector Clocks size independent of number
    of processes

18
KnowledgeVector
  • KV is vector
  • one entry for each process appearing in formula
  • KVj denotes entry for process j
  • KVj.seq is the sequence number of last event
    seen at process j
  • KVj.values stores values of j-expressions and
    j-formulae

19
KnowledgeVector Algorithm
  • internal event (at process i)
  • store eval(Ei,si) and eval(Fi,si) for each _at_iEi
    and _at_iFi in KVii.values
  • send m
  • KVii.seq à KVii.seq 1. Send KVi with m as
    KVm
  • receive m
  • for all j, if KVmj.seq gt KVij.seq then
  • KVij.seq à KVmj.seq
  • KVij.values à KVmj.value

20
Example
p3
p2
Y7
Y3
violation
p1
X5
X9
X6
KV1.seq
(Y _at_1X) at p2
KV1.values
21
DIANA Architecture
pt-DTL Monitor
22
MultiThreaded Program Analysis (JMPaX)
23
MultiThreaded Smart Observer
  • Ideas
  • A single execution trace contains more
    information than appears at first sight
  • Extract other possible runs from a single
    execution
  • Analyze all these runs intelligently.
  • A technique between model checking and testing.

Run
Smart Observer
24
MultiPathExplorer JMPaX (Java)
  • Based on smart observers
  • Smartness obtained by proper instrumentation
    vector clocks
  • Possible global states generated dynamically ?
    form a lattice
  • Analysis is performed on a level-by-level basis
    in the lattice of global states

25
Motivating Example Safe Landing
Safe Landing Land the air/space craft only after
approval from ground and only if, since then, the
radio signal has not been lost
  • Three variables
  • Landing indicating air/space craft is landing
  • Approved indicating landing has been approved
  • Radio indicating radio signal is live

?Landing ? ?Approved, ?Radio?
26
Code of a Landing Controller
  • Two threaded program to control landing
  • int landing 0, approved 0, radio 1
  • void thread1()
  • askLandingApproval()
  • if (approved 1)
  • print("Landing approved") landing1
    print("Landing started")
  • else print("Landing not approved")
  • void askLandingApproval()
  • if (radio 1) approved 1 else
    approved 0
  • void thread2()
  • while (true) checkRadio()

27
Landing Safety Violation
  • Suppose the plane has received approval for
    landing and just before it started landing the
    radio signal went off
  • the plane must abort landing!
  • A simple observer will most likely not detect the
    bug.
  • JMPaX can construct a possible run in which radio
    goes off between approval and landing

approved 1
landing 1
28
Events in Multithreaded Programs
  • Given n threads p1, p2, ..., pn,
  • A multithreaded execution is a sequence of events
    e1 e2 er of type
  • internal or,
  • read of a shared variable or,
  • write of a shared variable.
  • eij represents the jth event generated by thread
    pi since the start of its execution.

29
Causality in Multithreaded Programs
  • Define the partial order Á on the set of events
    as follows
  • eik Á eil if k lt l
  • e Á e' if there is some x 2 S such that e ltx e'
    and at least one of e, e is a write.
  • e Á e'' if e Á e' and e' Á e''.

30
Vector Clocks and Relevant Events
  • Consider a subset R of relevant events.
  • (typically those writing specifications
    variables)
  • R-relevant causality is a relation C µ Á
  • C is a projection of Á on R R.
  • We provide a technique based on vector clocks
    that correctly implements the relevant causality
    relation.

31
Vector Clock Algorithm
  • Let Vi be an n-dimensional vector of natural
    numbers for each thread pi.
  • Let Vxa and Vxw be vectors for each shared
    variable x.
  • if eik is relevant, i.e., if eik 2 R, then
  • Vii à Vii 1
  • if eik is a read of a variable x then
  • Vi à maxVi,Vxw
  • Vxa à maxVxa,Vi
  • if eik is a write of a variable x then
  • Vxw à Vxa à Vi à maxVxa,Vi
  • if eik is relevant then
  • send message h eik, i, Vi i to observer.

32
Correspondence with Standard Vector Clocks
33
Implementing Causality by Vector Clocks
  • Theorem If he, i, Vi and he', j, V' i are
    messages sent by our algorithm, then
  • e C e' iff Vi V'i
  • If i and j are not given, then
  • e C e' iff V lt V

34
Example with Two Threads
  • thread T1
  • x
  • ...
  • y x 1
  • thread T2
  • z x 1
  • ...
  • x

(initially x -1)
35
Relevant Global State
  • The program state after the events
    ek11,ek22,...,eknn is called a relevant global
    multithreaded state or simply a state.
  • A state ?k1 k2 kn is called consistent if and
    only if it can be seen in some possible run of
    the system.

36
MultiThreaded Run
  • e1e2 eR is a multithreaded run iff it
    generates a sequence of global states ?K0 ?K1
    ?KR such that
  • each ?Kr is consistent and
  • ?Kr after event er becomes ?Kr1.
  • (consecutive states)

37
Computation Lattice
  • We say ? À ?' when there is some run in which ?
    and ?' are consecutive states
  • Consistent global states together with the
    transitive closure of À form a lattice
  • Multithreaded runs are paths in the lattice

38
Example Revisited
  • thread T1
  • x
  • ...
  • y x 1
  • thread T2
  • z x 1
  • ...
  • x

39
Monitoring Safety Formula
(x gt 0) ! (y 0), (y gt z))s
40
Safety Violation in a Possible Run
(x gt 0) ! (y 0), (y gt z))s
41
Past Time Linear Temporal Logic Syntax
  • F true false a 2 A F F op F
    Propositional ops
  • O F ltgt F F F Ss F F Sw F Standard
    ops
  • " F F F,F)s F,F)w
    Monitoring ops

42
Semantics
  • ? ² ltgt F iff ? ² F or (n gt 1 and ?n-1 ² ltgt F)
  • ? ² F iff ? ² F and (n gt 1 implies ?n-1 ²
    F)
  • ? ² F1 Ss F2 iff ? ² F2 or (n gt 1 and ? ² F1
    and ?n-1 ² F1 Ss F2)
  • ? ² F1 Sw F2 iff ? ² F2 or (? ² F1 and (n gt 1
    and ?n-1 ² F1 Sw F2))
  • ? ² F1,F2)s iff ? 2 F2 and (? ² F1 or (n gt 1
    and ?n-1 ² F1,F2)w))
  • ? ² F1,F2)w iff ? 2 F2 and (? ² F1 or (n gt 1
    implies ?n-1 ² F1,F2)w))

43
Safety Against All Runs
  • Number of possible runs can be exponential
  • Traverse the state lattice level by level
  • Avoids analyzing an exponential number of runs
  • Maintain a queue of events
  • Enqueue an event as soon as it arrives
  • Construct a new level from the set of states in
    the previous level and the events in the queue
  • Monitor safety formula against all states in a
    level using dynamic programming and intelligent
    merging.

44
Algorithm Pseudocode
  • for each (e 2 Q)
  • if exists s 2 CurrentLevel s.t. isNextState(s,e)
    then
  • NextLevel à addToSet(NextLevel,createState(s,e))
  • if isUnnecessary(s) then
  • remove(s,CurrentLevel)
  • if isEmpty(CurrentLevel) then
  • monitorAll(NextLevel)
  • CurrentLevel à NextLevel NextLevel Ã
  • Q Ã removeUnnecessaryEvents(CurrentLevel,Q)

45
Complexity
  • Time complexity is O(w.2m.n)
  • w width of the lattice
  • m size of the formula
  • n length of the run
  • Memory used is O(w.2m)
  • w width of the lattice
  • m number of temporal operators in the formula
  • Further optimizations
  • Consider bounded width w of queue Q

46
Computation Lattice Width
  • The number of states in a level can be large
  • Observe that all states are not equi-probable
  • Ignore states in lattice that are formed by
    events that are largely separated by distance
  • Distance can be measured
  • In terms of real-time between the events
  • Some notion of distance between Vector Clocks
  • Euclidean Distance

47
Causality Cone Heuristics
48
JMPaX Architecture
49
Further Applications
  • Security
  • Security policies as safety requirements
  • Predict safety violations efficiently!

?communicate(A,B,K) ? ? (sendKey(S,(A,B),K) ?
? requestKey(S,A,B))
50
Future Work
  • Evaluate JMPaX and DiAna on real, large
    applications
  • Investigate for programmer friendly and more
    expressive logics
  • Extend EAGLE logic (NASA Ames)
  • Add more epistemic operators
  • Find techniques to increase coverage of analysis
  • Use Machine Learning
  • Apply Statistical Analysis
  • Investigate efficient instrumentation techniques
Write a Comment
User Comments (0)
About PowerShow.com