SRC%20Task%201031.001%20%20Verification%20of%20Shared%20Memory%20Consistency%20(Models%20and)%20Protocols%20%20Start%20Date%20:%20September%202002%20%20Third%20Year%20Annual%20Review,%20Boulder%20CO,%20March%2029,%202005%20%20%20Ganesh%20Gopalakrishnan%20(PI)%20Konrad%20Slind%20(Co-PI%20) - PowerPoint PPT Presentation

About This Presentation
Title:

SRC%20Task%201031.001%20%20Verification%20of%20Shared%20Memory%20Consistency%20(Models%20and)%20Protocols%20%20Start%20Date%20:%20September%202002%20%20Third%20Year%20Annual%20Review,%20Boulder%20CO,%20March%2029,%202005%20%20%20Ganesh%20Gopalakrishnan%20(PI)%20Konrad%20Slind%20(Co-PI%20)

Description:

NOVA HAS been to the Microprocessor Forum and captured this ... Sheesh Kebab! 8 x 2 cpus x 2-way SMT = '32 shared memory cpus' on the palm. Released in 2000 ... – PowerPoint PPT presentation

Number of Views:107
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: SRC%20Task%201031.001%20%20Verification%20of%20Shared%20Memory%20Consistency%20(Models%20and)%20Protocols%20%20Start%20Date%20:%20September%202002%20%20Third%20Year%20Annual%20Review,%20Boulder%20CO,%20March%2029,%202005%20%20%20Ganesh%20Gopalakrishnan%20(PI)%20Konrad%20Slind%20(Co-PI%20)


1
SRC Task 1031.001 Verification of Shared Memory
Consistency (Models and) Protocols Start Date
September 2002 Third Year Annual Review,
Boulder CO, March 29, 2005Ganesh
Gopalakrishnan (PI)Konrad Slind (Co-PI )
2
Released in 2000 -- Peak Performance 12.3
teraflops. -- Processors used IBM RS6000 SP
Power3's - 375 MHz. -- There are 8,192 of these
processors -- The total amount of RAM is 6Tb.
-- Two hundred cabinets - area of two basket
ball courts.
(Article by Nebojsa Novakovic Thursday 16
October 2003) NOVA HAS been to the Microprocessor
Forum and captured this picture of POWER5 chief
scientist Balaram Sinharoy holding this eight way
POWER5 MCM with a staggering 144MB of cache.
Sheesh Kebab! 8 x 2 cpus x 2-way SMT 32
shared memory cpus on the palm
2
3
  • Cannot afford to do eager updates across large
    SMP systems
  • Delayed updates allow considerable latitude in
    memory consistency
  • protocol design
  • ? Weak memory models tend to reduce
    overall protocol complexity
  • ? Future challenges
  • Multiple protocols that cooperate
  • Multicore chips

performance
time
3
4
  • Verification of shared memory consistency
    protocols
  • Emphasis is on correctness in two broad areas
  • Shared Memory Consistency Models
  • Also known as Memory Models
  • Shared Memory Consistency Protocols
  • Subsumes Cache Coherence protocols
  • Demonstrate results on realistic memory models as
    well as consistency protocols

4
5
  • Protocols modeled at the asynchronous rule
    level
  • Assume that these are highly optimized
    hand-crafted protocols for which synthesis is not
    (yet) an option
  • We are yet to work on protocol engine (hardware)
    verification

5
6
  • 15-minute overview
  • Then 15-minutes of a few specific details
  • Finally an Appendix
  • Presented in Chronological Order, emphasizing
  • Year of task performance (Y1, Y2, or Y3) in slide
    heading
  • Student involvement, graduation, and employment
  • How much supported by SRC
  • Highlights (papers, ideas, and tool development)
  • Slides withY3 in the title include new work
    since last review

6
7
7
8
Overview of Results to Date
8
9
How our students have benefited from SRC Funding
( students funded under SRC others under
contemporaneous NSF grant with complementary
goals )
9
10
  • See previous annual review slides for details
  • Work in progress now
  • Distributed Random-walk does not record error
    trails
  • Emphasis high state-generation rate bug
    location
  • Error-trails will be generated using Bounded
    Model-checking
  • Symbolic techniques to reconstruct error trails,
    capitalizing on knowledge of error state
  • Student involved Xiaofang Chen (PhD student just
    shifting from systems research to FV)

10
11
  • Operational Approach (see appendix)
  • Abstract Machine Models
  • Axiomatic Approach (see appendix)
  • Constraints that Characterize Legal Executions
  • Theoretical understanding (see Sezgins
    dissertation)
  • Definition of memory models using transducers
    clarification of widely publicized undecidability
    results

11
12
2.3 A fully SAT-based approach Y3
  • Yu Yang ( ! Yue Yang), Hemanthkumar Sivaraj, the
    PI
  • A parser for value-annotated Itanium MP Assembly
    Programs
  • A tool called MPEC to check value-annotated
    Itanium assembly programs against the Itanium
    memory model
  • Generates SAT instances from given execution
    trace and the memory model rules
  • If SAT, produce witness (interleaving)
  • If UNSAT, extract evidence (UNSAT Core draw a
    cycle for user annotated with which memory model
    rules were violated)
  • Highlights
  • The MPEC tool (described next slide) was released
  • Papers CAV04

12
13
13
14
2.3 contd
  • What exactly has been achieved?
  • How to specify industrial memory models in higher
    order logic
  • How well does SAT-based Checking of MP Executions
    work/scale
  • How to extract Annotated Execution Cycles from
    UNSAT Cores
  • A Preliminary Assessment of QBF (yet to continue)
  • Code released to check Itanium Executions
  • 1700 lines of Ocaml
  • 5600 lines of C

14
15
2.3 contd
  • Potential uses / impact
  • MPEC (MP Execution Checker) can help understand
    the Itanium memory model through litmus tests
  • Scaling to larger execution lengths will require
    either
  • Debugging approach (not full verification) --
    like the Sun TSO-Tool
  • Considerable SAT engineering needed (not
    justified now, unless a member company shows
    definite interest)
  • An MS student (Oystein Thorsen) at Michigan
    Technological University is porting MPEC to work
    for the Unified Parallel C memory model
  • UPC is used to write shared memory code for
    Scientific Programming
  • Collaboration with Lisa Higham (University of
    Calgary) initiated

15
16
3 Partial Order enabled Murphi (Bhattacharyas
PhD) much of the work during Y3
PO Reduction through SAT-based Symbolic Analysis
i 02 a 0..3 b array0..2 of
bool Rule 1 True ? a (a1) 4 Rule 2
True ? a (a2) 4 Rule 3 i gt 0 ?
bi1 True Rule 4 i lt 0 ? bi2
False
16
17
3 POeM (..contd)
  • What exactly has been achieved ?
  • New tool POeM (Partial Order Enabled Murphi)
    developed and is available
  • 2900 lines of Common Lisp and 1400 lines of Perl
  • Technical Report detailing algorithm
  • One paper under submission (DAC04) others in
    preparation
  • A useful by-product of POeM
  • A tool Mu2SMV to translate Murphi to SMV (Uclid
    is similar to SMV, so would be able to re-target)
  • It can help create SMV / Uclid models
  • It also helps conduct more objective
    explicit-state vs. implicit-state
    model-checking studies

17
18
3 contd
18
19
3 Specific Contributions
  • New Ample-set Computation Algorithm for Murphi
  • Basically reported during Y2 novelty during Y3
    are
  • The POeM Tool has been released
  • being tried at Intel by Park and IBM by German
  • It employs a novel SAT Carry-over idea
  • A new C1 condition will soon be incorporated
  • In the details section, we will focus on
  • SAT Carry-over (discussed as topic 3.1), and
  • The new C1 condition (discussed as topic 3.2)

19
20
6 Parameterized Verification of Coherence
Protocols work done entirely during Y3
  • Sudhindra Pandavs MS work
  • Counterexample Guided Invariant Discovery method
  • Tailored for Cache Coherence Protocols
  • Takes advantage of the guard ? action style of
    specification, and the syntax of the property
    being verified
  • Filtering Heuristics to eliminate irrelevant
    predicates are based on the nature of
    directory-based protocols
  • All experiments done in the UCLID framework
  • Can be performed in any system that decides over
    a similar fragment of logic and generates
    concrete counterexamples

20
21
6 ..contd
  • What exactly has been achieved?
  • Methodology for Parameterized Cache Protocol
    Verification based on Counter-example Guided
    Invariant Discovery
  • Worked straight out of the box on a new
    protocol called the German Ring protocol
  • Also applied to the German protocol and the FLASH
    protocol (modeling both mutual exclusion and data
    consistency)
  • 9 invariants for the mutual exclusion property of
    German 2 more for data consistency (29 for
    Lahiri)
  • 7 invariants for Mutex of FLASH 15 more for
    data consistency (more frugal invariants than in
    Parks work using PVS)
  • 2 days to model and verify the new German Ring
    protocol

21
22
22
23
  • Accept a guarded command system of transitions
  • G1 ? A1
  • G2 ? A2
  • Gn ? An
  • Build independence matrix by checking enabledness
    and commutativity via SAT-based analysis
  • At run-time, pick a transition and build its
    dependence closure
  • Check C0, C1, C2, C3 conditions of the CGP book
    and generate reduced state-space
  • C1 is approximated

Enabledness plus commutativity
23
24
3.1 Carry-Over of Independence Relations
  • When does it work?
  • Systems parameterized over scalarset variables
  • Rulesets based on system parameters
  • Variables parameterized on system parameters
  • Proof, details - BGG05

25
3.2 Improved Ample Set Heuristic
  • Ideal No path via the green triangle will ever
    wake-up the disabled red transitions
  • Existing naïve C1
  • Make sure that the
  • red triangle is empty
  • - Causes too many C1 condition
  • Violations (16k in German)
  • Improved heuristic
  • Ensure that the red ones can be woken up ONLY by
    the firing of the blue triangle of transitions
  • Can pre-compute it -- like independence
  • Experimental results are being awaited !!

Transition t
ts dependency closure
Disabled dependents
Enabled independents
26
The POeM Tool and Follow-on Efforts
  • The POeM tool will be engineered for higher
    efficiency
  • We will be studying what other property might
    carry over similar to independence

26
27
6 Counterexample Guided Invariant Discovery
details of the work
Pun-proven
Pproven
done
Pun-proven
Y
N
Add P
Auxiliary Invariant
Pick a property P from Pun-proven
Automated Decision Procedure D
Counterexample Analysis Procedure
System model M
counterexample
27
28
6 Highlights of the work
  • Exploits structure of transition system
    specification
  • Rules of the form guard ? action or g ? a
    are assumed
  • Exploits nature of property to be proved
  • Properties of the form Antecedent ? Consequent
  • How the method works
  • Start with most general state
  • Symbolically simulate a transition
  • Analyze failures exploiting structure of
    specifications
  • Construct auxiliary invariant
  • If strengthening involves too large a formula,
    use filtering heuristics to keep it small
  • Filtering heuristics exploit nature of directory
    protocols

28
29
Structure of a counterexample
  • Formally, a counterexample C can be expressed as
    a tuple ltss, d, st gt, where
  • ss is the initial state interpretation
  • st is the next state interpretation
  • d is the transition rule of form g gt a
  • Depending on the structure of the counterexample,
    we construct an auxiliary invariant from
  • those predicates in the property P, which have
    been violated
  • predicates from the guard of the rule involved,
    and ITE (if-then-else) conditions in the action

30
Counterexample cases
  • Counterexample C ltss, d, st gt
  • Property P "X.A(X) gt C(X)
  • Since, the initial state satisfies the property
    and the next state violates it, we can classify
    counterexamples into three different classes.
  • (ss A, ss C) (st A, st ! C)
  • (ss ! A, ss ! C) (st A, st ! C)
  • (ss ! A, ss C) (st A, st ! C)

31
Case I (ss A, ss C) (st A, st ! C)
Transition assigned variable of consequent
suppress this when the antecedent is initially
true.
A gt g
s
t
g
SC(A,ss)
A C
A C
SC(g,ss)
  • Candidate invariant generated
  • SC(A, ss) gt SC(g,ss)
  • where SC is the satisfying core under
    interpretation ss.
  • SC can be easily computed from the structure
    of formula
  • SC includes relevant predicates

32
Case II (ss ! A, ss ! C) (st A, st !
C)
Transition assigned variable of antecedent
suppress this when the consequent is initially
false.
g gt C
SC(g, ss)
s
t
g
VC(C, ss)
a
A C
A C
retains
  • Candidate invariant generated
  • VC(C,ss) gt SC(g, ss)
  • where VC is the violating core under
    interpretation ss.
  • VC can be easily computed from the structure
    of formula
  • Pandavs thesis explains more general versions of
    these ideas

33
Example from the German protocol
  • Property P cache(i) ex ? cache(j) in

g ch2(cid) grant_ex
t
s
Cache(j) ex
a cache(cid) ex
Cache(i) in
Ch2(i) grant_ex
Cache(i)ex Cache(j)ex
cid i
(cache(j) in) gt (ch2(i) grant_ex)
Aux. Invariant Generated
34
Other Heuristics Consistency Requirement
  • Verifying predicate of form (p r) in the
    consequent.
  • Counterexample of form

g
s
t
p ! r
q ! r
Action p q
Candidate invariant of form g gt (q r)
This simple heuristic was very useful in data
consistency verification of GERMAN and FLASH.
35
Filtering Heuristics
  • Guards of transition rules are complex,
    containing many predicates.
  • More the number of predicates in an auxiliary
    invariant, more the number of counterexamples
  • Need heuristics to filter out irrelevant
    predicates
  • Used in data consistency verification of FLASH
  • Heuristics are protocol dependent
  • Observed to be successful in handling three
    directory based protocols
  • German, German Ring, FLASH
  • Mutual Exclusion and Data Consistency verified

36
Filtering Heuristics (contd.)
  • Based on rule-type that triggered counter-example
  • Directory protocols employ P-rules and N-rules
  • P-Rule Initiated by requesting node
  • N-Rule Initiated by messages from network
    (remote node)
  • Further classified into Remote Requests and
    Grants
  • Our method describes heuristics based on
  • message-type involved in counter-examples, and
  • a ranking of variables

37
37
38
German Ring mutex verification
  • Property to be verified
  • ((i!j) cache(i)excl cache(j)excl))
  • Result
  • 3 invariants.
  • Uclid time 1.08s
  • User time a day

39
Comparison GERMAN
  • Mutual Exclusion (GERMAN-I)
  • Lahiris earlier manual proof had 25 auxiliary
    invariants and took 47.93s of uclid time.
  • Ours 9 auxiliary invariants with 6.02s
  • Mutual exclusion (GERMAN-II)
  • Lahiris indexed predicate discovery method to
    construct inductive invariant.
  • Automated - had 24 predicates, 143s uclid time
  • Our method manual, 13 predicates, 2.16s

40
Comparison FLASH
  • Mutual Exclusion
  • Automated predicate discovery method based on
    computing weakest precondition (Lahiri) to
    generate inductive invariant
  • Initial set of predicates
  • From the property ? failed to converge to a
    fixpoint
  • From our auxiliary invariants ? converged to
    inductive invariant in 3 iterations.

41
Modeling tricks divide rules
  • UCLID cannot model nested ITEs.
  • Transition rules contain nested ITEs
  • Trick Split the rules
  • R if (c1) then b1 else if (c2) then b2
  • Split into two rules
  • R1 if (c1) then b2
  • R2 if (c1 c2) then b2

42
Modeling tricks quantified conditions in guards
  • UCLID doesnt allow quantifiers (", )in the
    description.
  • Conditions like Qx. a(x) present in the guards,
    where
  • Q ?", . Example emptiness test of
    sh_list in GERMAN.
  • Consider a predicate of form x. a(x)
  • To model this, we introduce an auxiliary boolean
    variable b. Replace x. a(x) by b.
  • Then introduce an axiom of form "x.(a(x) gt b)
  • Justification
  • b ? x. a(x)
  • gt (x. a(x) ) gt b
  • gt "x.(a(x) gt b)

43
Status of Invariant Discovery work
  • More experience is necessary with our invariant
    discovery approach
  • Assess how easy in practice on a variety of
    protocols
  • Make the process intuitive to a designer
  • Support front-ends such as tabular descriptions
    or scenarios
  • Reflect error-trails back onto these descriptions
  • Allow designers to understand verification state
    through controlled symbolic simulations

43
44
Concluding Remarks and Future Work
  • We are assembling a suite of examples including
    many public examples such as Grbics multi-ring
    protocol
  • We will finish the work on the random-walk
    checker, coupling it with an error-trace
    generator.
  • Wed like to work on using scenarios in cache
    protocol design
  • We will begin our work on hierarchical cache
    protocols
  • Wed like to return to the post-Si verification
    problem of runtime verification under limited
    observability.
  • Checking Memory Orderings
  • Liveness (perhaps as bounded safety, focussing on
    architectural elements responsible for liveness
    violation?)
  • ABSTRACTION MECHANISMS !!

44
45
45
46
2.1 The Operational Approach to memory model
specification (mostly finished Y1, Y2 one
dissertation finished Y3)
  • Developed by Yue Yang (PhD under the PI and
    Lindstrom)
  • Defended PhD June04 and working for Microsoft
    since July04
  • Developed the Uniform Memory Model based on
    abstract machines
  • Coded in Murphi (obtain executable memory model
    specs)
  • The UMM parameterized executable models can cover
    a whole range of memory models
  • Coherence, PRAM, and Manson / Pugh Java shared
    memory have all been specified in UMM
  • Papers
  • Concurrency and Computation Practice and
    Experience (galley proofs of journal paper done)
  • Joint ACM Java Grande, 2002

46
47
2.1 contd
  • What exactly has been achieved?
  • Understanding of how to structure operational
    memory model specifications to cover wide range
    of memory models uniformly
  • Murphi code of 2800 lines demonstrating ideas
  • Potential uses
  • Starting-point for developers of new memory
    models
  • Our Itanium Operational model of ICCD99 could be
    specified in the UMM style if there is interest

47
48
2.2 contd
  • What exactly has been achieved?
  • Understanding of how to structure axiomatic
    memory model specifications to cover wide range
    of memory models uniformly
  • Constraint-Prolog code of 6000 lines
    demonstrating ideas
  • Potential uses
  • Understand memory models experiment with them
    in the Constraint Logic-programming Context
  • Can develop a memory-model sensitive race
    analyzer (ICFEM04)

48
49
2.2 The Axiomatic Approach (mostly finished Y1,
Y2 SAT approach dissertation finished
Y3)
  • Constraint Prolog programs for
  • Classical Memory Models (Processor Consistency,
    etc.)
  • Itanium (without semaphores, IO space ops, or
    partial writes)
  • Can add these if member companies interested
  • Feasibility of SAT-based solution demonstrated
    (coding by PI)
  • Highlights
  • The MPEC tool (described next slide) was a direct
    an outcome
  • Papers Charme03 ICFEM04 IPDPS04 CSJP04

49
50
3 A Symbolic PO Reduction Method for Rule-based
Specifications of Protocols (Y2, Y3)
  • Ritwik Bhattacharyas PhD work
  • The industry employs rule-based specifications
    for cache coherence protocols
  • Murphi (used by most industries)
  • TLA (used at least within Intel)
  • Rule-based specifications capture the parallel
    condition / action style specification of cache
    coherency engines in Protocol Tables
  • Specifies the concurrency in cache protocol
    engines naturally
  • Specifies the low-latency bursts of
    computation
  • Check local state and incoming messages
  • Produce outgoing messages and update local state

50
51
3 State Space Reduction Techniques in Murphi
  • Symmetry Reduction
  • Canonicalize based on Scalar Sets
  • Hash Compaction
  • Store only Hash Signatures of states
  • Partial-Order Reduction is not available in
    Murphi
  • In a nutshell
  • If N concurrent actions can all be done in any
    order,
  • then dont force the model-checker to examine
    all N! interleavings
  • Difficult to syntactically determine when two
    rules commute
  • Existing PO Reduction algorithms are based on
  • Syntactic checking of commutation between two
    transitions
  • Exploiting Sequential Process Structure
  • These dont work with Murphi

51
52
My favorite example explaining the C1 condition
Ch empty in this state
true
Ch ! 3
Ch?x
bug
Some transition
Process P
Process Q
  • Moving P then Q finds the bug (right ample-set)
  • Moving Q then P will miss the bug! Reason
  • there is a transition outside of ample set
    (namely
  • Ch!3) whose move enables Ch?x that is
    dependent on
  • the true move (part of if/fi). So in the
    full state-space
  • there is a dependent move that occurs before
    ample-set move.

52
Write a Comment
User Comments (0)
About PowerShow.com