SRC%20Task%201031.001%20%20Verification%20of%20Shared%20Memory%20Consistency%20(Models%20and)%20Protocols%20%20Start%20Date%20:%20September%202002%20%20Third%20Year%20Annual%20Review,%20Boulder%20CO,%20March%2029,%202005%20%20%20Ganesh%20Gopalakrishnan%20(PI)%20Konrad%20Slind%20(Co-PI%20)

About This Presentation

Title:

SRC%20Task%201031.001%20%20Verification%20of%20Shared%20Memory%20Consistency%20(Models%20and)%20Protocols%20%20Start%20Date%20:%20September%202002%20%20Third%20Year%20Annual%20Review,%20Boulder%20CO,%20March%2029,%202005%20%20%20Ganesh%20Gopalakrishnan%20(PI)%20Konrad%20Slind%20(Co-PI%20)

Description:

NOVA HAS been to the Microprocessor Forum and captured this ... Sheesh Kebab! 8 x 2 cpus x 2-way SMT = '32 shared memory cpus' on the palm. Released in 2000 ... – PowerPoint PPT presentation

Number of Views:107

Avg rating:3.0/5.0

Slides: 53

Provided by: csU51

Learn more at: https://www.cs.utah.edu

Category:

more less

Transcript and Presenter's Notes

Title: SRC%20Task%201031.001%20%20Verification%20of%20Shared%20Memory%20Consistency%20(Models%20and)%20Protocols%20%20Start%20Date%20:%20September%202002%20%20Third%20Year%20Annual%20Review,%20Boulder%20CO,%20March%2029,%202005%20%20%20Ganesh%20Gopalakrishnan%20(PI)%20Konrad%20Slind%20(Co-PI%20)

1
SRC Task 1031.001 Verification of Shared Memory
Consistency (Models and) Protocols Start Date
September 2002 Third Year Annual Review,
Boulder CO, March 29, 2005Ganesh
Gopalakrishnan (PI)Konrad Slind (Co-PI )
2
Released in 2000 -- Peak Performance 12.3
teraflops. -- Processors used IBM RS6000 SP
Power3's - 375 MHz. -- There are 8,192 of these
processors -- The total amount of RAM is 6Tb.
-- Two hundred cabinets - area of two basket
ball courts.
(Article by Nebojsa Novakovic Thursday 16
October 2003) NOVA HAS been to the Microprocessor
Forum and captured this picture of POWER5 chief
scientist Balaram Sinharoy holding this eight way
POWER5 MCM with a staggering 144MB of cache.
Sheesh Kebab! 8 x 2 cpus x 2-way SMT 32
shared memory cpus on the palm
2
3

Cannot afford to do eager updates across large
SMP systems
Delayed updates allow considerable latitude in
memory consistency
protocol design
? Weak memory models tend to reduce
overall protocol complexity
? Future challenges
Multiple protocols that cooperate
Multicore chips

performance
time
3
4

Verification of shared memory consistency
protocols
Emphasis is on correctness in two broad areas
Shared Memory Consistency Models
Also known as Memory Models
Shared Memory Consistency Protocols
Subsumes Cache Coherence protocols
Demonstrate results on realistic memory models as
well as consistency protocols

4
5

Protocols modeled at the asynchronous rule
level
Assume that these are highly optimized
hand-crafted protocols for which synthesis is not
(yet) an option
We are yet to work on protocol engine (hardware)
verification

5
6

15-minute overview
Then 15-minutes of a few specific details
Finally an Appendix
Presented in Chronological Order, emphasizing
Year of task performance (Y1, Y2, or Y3) in slide
heading
Student involvement, graduation, and employment
How much supported by SRC
Highlights (papers, ideas, and tool development)
Slides withY3 in the title include new work
since last review

6
7
7
8
Overview of Results to Date
8
9
How our students have benefited from SRC Funding
( students funded under SRC others under
contemporaneous NSF grant with complementary
goals )
9
10

See previous annual review slides for details
Work in progress now
Distributed Random-walk does not record error
trails
Emphasis high state-generation rate bug
location
Error-trails will be generated using Bounded
Model-checking
Symbolic techniques to reconstruct error trails,
capitalizing on knowledge of error state
Student involved Xiaofang Chen (PhD student just
shifting from systems research to FV)

10
11

Operational Approach (see appendix)
Abstract Machine Models
Axiomatic Approach (see appendix)
Constraints that Characterize Legal Executions
Theoretical understanding (see Sezgins
dissertation)
Definition of memory models using transducers
clarification of widely publicized undecidability
results

11
12
2.3 A fully SAT-based approach Y3

Yu Yang ( ! Yue Yang), Hemanthkumar Sivaraj, the
PI
A parser for value-annotated Itanium MP Assembly
Programs
A tool called MPEC to check value-annotated
Itanium assembly programs against the Itanium
memory model
Generates SAT instances from given execution
trace and the memory model rules
If SAT, produce witness (interleaving)
If UNSAT, extract evidence (UNSAT Core draw a
cycle for user annotated with which memory model
rules were violated)
Highlights
The MPEC tool (described next slide) was released
Papers CAV04

12
13
13
14
2.3 contd

What exactly has been achieved?
How to specify industrial memory models in higher
order logic
How well does SAT-based Checking of MP Executions
work/scale
How to extract Annotated Execution Cycles from
UNSAT Cores
A Preliminary Assessment of QBF (yet to continue)
Code released to check Itanium Executions
1700 lines of Ocaml
5600 lines of C

14
15
2.3 contd

Potential uses / impact
MPEC (MP Execution Checker) can help understand
the Itanium memory model through litmus tests
Scaling to larger execution lengths will require
either
Debugging approach (not full verification) --
like the Sun TSO-Tool
Considerable SAT engineering needed (not
justified now, unless a member company shows
definite interest)
An MS student (Oystein Thorsen) at Michigan
Technological University is porting MPEC to work
for the Unified Parallel C memory model
UPC is used to write shared memory code for
Scientific Programming
Collaboration with Lisa Higham (University of
Calgary) initiated

15
16
3 Partial Order enabled Murphi (Bhattacharyas
PhD) much of the work during Y3
PO Reduction through SAT-based Symbolic Analysis
i 02 a 0..3 b array0..2 of
bool Rule 1 True ? a (a1) 4 Rule 2
True ? a (a2) 4 Rule 3 i gt 0 ?
bi1 True Rule 4 i lt 0 ? bi2
False
16
17
3 POeM (..contd)

What exactly has been achieved ?
New tool POeM (Partial Order Enabled Murphi)
developed and is available
2900 lines of Common Lisp and 1400 lines of Perl
Technical Report detailing algorithm
One paper under submission (DAC04) others in
preparation
A useful by-product of POeM
A tool Mu2SMV to translate Murphi to SMV (Uclid
is similar to SMV, so would be able to re-target)
It can help create SMV / Uclid models
It also helps conduct more objective
explicit-state vs. implicit-state
model-checking studies

17
18
3 contd
18
19
3 Specific Contributions

New Ample-set Computation Algorithm for Murphi
Basically reported during Y2 novelty during Y3
are
The POeM Tool has been released
being tried at Intel by Park and IBM by German
It employs a novel SAT Carry-over idea
A new C1 condition will soon be incorporated
In the details section, we will focus on
SAT Carry-over (discussed as topic 3.1), and
The new C1 condition (discussed as topic 3.2)

19
20
6 Parameterized Verification of Coherence
Protocols work done entirely during Y3

Sudhindra Pandavs MS work
Counterexample Guided Invariant Discovery method
Tailored for Cache Coherence Protocols
Takes advantage of the guard ? action style of
specification, and the syntax of the property
being verified
Filtering Heuristics to eliminate irrelevant
predicates are based on the nature of
directory-based protocols
All experiments done in the UCLID framework
Can be performed in any system that decides over
a similar fragment of logic and generates
concrete counterexamples

20
21
6 ..contd

What exactly has been achieved?
Methodology for Parameterized Cache Protocol
Verification based on Counter-example Guided
Invariant Discovery
Worked straight out of the box on a new
protocol called the German Ring protocol
Also applied to the German protocol and the FLASH
protocol (modeling both mutual exclusion and data
consistency)
9 invariants for the mutual exclusion property of
German 2 more for data consistency (29 for
Lahiri)
7 invariants for Mutex of FLASH 15 more for
data consistency (more frugal invariants than in
Parks work using PVS)
2 days to model and verify the new German Ring
protocol

21
22
22
23

Accept a guarded command system of transitions
G1 ? A1
G2 ? A2
Gn ? An
Build independence matrix by checking enabledness
and commutativity via SAT-based analysis
At run-time, pick a transition and build its
dependence closure
Check C0, C1, C2, C3 conditions of the CGP book
and generate reduced state-space
C1 is approximated

Enabledness plus commutativity
23
24
3.1 Carry-Over of Independence Relations

When does it work?
Systems parameterized over scalarset variables
Rulesets based on system parameters
Variables parameterized on system parameters
Proof, details - BGG05

25
3.2 Improved Ample Set Heuristic

Ideal No path via the green triangle will ever
wake-up the disabled red transitions
Existing naïve C1
Make sure that the
red triangle is empty
- Causes too many C1 condition
Violations (16k in German)
Improved heuristic
Ensure that the red ones can be woken up ONLY by
the firing of the blue triangle of transitions
Can pre-compute it -- like independence
Experimental results are being awaited !!

Transition t
ts dependency closure
Disabled dependents
Enabled independents
26
The POeM Tool and Follow-on Efforts

The POeM tool will be engineered for higher
efficiency
We will be studying what other property might
carry over similar to independence

26
27
6 Counterexample Guided Invariant Discovery
details of the work
Pun-proven
Pproven
done
Pun-proven
Y
N
Add P
Auxiliary Invariant
Pick a property P from Pun-proven
Automated Decision Procedure D
Counterexample Analysis Procedure
System model M
counterexample
27
28
6 Highlights of the work

Exploits structure of transition system
specification
Rules of the form guard ? action or g ? a
are assumed
Exploits nature of property to be proved
Properties of the form Antecedent ? Consequent
How the method works
Start with most general state
Symbolically simulate a transition
Analyze failures exploiting structure of
specifications
Construct auxiliary invariant
If strengthening involves too large a formula,
use filtering heuristics to keep it small
Filtering heuristics exploit nature of directory
protocols

28
29
Structure of a counterexample

Formally, a counterexample C can be expressed as
a tuple ltss, d, st gt, where
ss is the initial state interpretation
st is the next state interpretation
d is the transition rule of form g gt a
Depending on the structure of the counterexample,
we construct an auxiliary invariant from
those predicates in the property P, which have
been violated
predicates from the guard of the rule involved,
and ITE (if-then-else) conditions in the action

30
Counterexample cases

Counterexample C ltss, d, st gt
Property P "X.A(X) gt C(X)
Since, the initial state satisfies the property
and the next state violates it, we can classify
counterexamples into three different classes.
(ss A, ss C) (st A, st ! C)
(ss ! A, ss ! C) (st A, st ! C)
(ss ! A, ss C) (st A, st ! C)

31
Case I (ss A, ss C) (st A, st ! C)
Transition assigned variable of consequent
suppress this when the antecedent is initially
true.
A gt g
s
t
g
SC(A,ss)
A C
A C
SC(g,ss)

Candidate invariant generated
SC(A, ss) gt SC(g,ss)
where SC is the satisfying core under
interpretation ss.
SC can be easily computed from the structure
of formula
SC includes relevant predicates

32
Case II (ss ! A, ss ! C) (st A, st !
C)
Transition assigned variable of antecedent
suppress this when the consequent is initially
false.
g gt C
SC(g, ss)
s
t
g
VC(C, ss)
a
A C
A C
retains

Candidate invariant generated
VC(C,ss) gt SC(g, ss)
where VC is the violating core under
interpretation ss.
VC can be easily computed from the structure
of formula
Pandavs thesis explains more general versions of
these ideas

33
Example from the German protocol

Property P cache(i) ex ? cache(j) in

g ch2(cid) grant_ex
t
s
Cache(j) ex
a cache(cid) ex
Cache(i) in
Ch2(i) grant_ex
Cache(i)ex Cache(j)ex
cid i
(cache(j) in) gt (ch2(i) grant_ex)
Aux. Invariant Generated
34
Other Heuristics Consistency Requirement

Verifying predicate of form (p r) in the
consequent.
Counterexample of form

g
s
t
p ! r
q ! r
Action p q
Candidate invariant of form g gt (q r)
This simple heuristic was very useful in data
consistency verification of GERMAN and FLASH.
35
Filtering Heuristics

Guards of transition rules are complex,
containing many predicates.
More the number of predicates in an auxiliary
invariant, more the number of counterexamples
Need heuristics to filter out irrelevant
predicates
Used in data consistency verification of FLASH
Heuristics are protocol dependent
Observed to be successful in handling three
directory based protocols
German, German Ring, FLASH
Mutual Exclusion and Data Consistency verified

36
Filtering Heuristics (contd.)

Based on rule-type that triggered counter-example
Directory protocols employ P-rules and N-rules
P-Rule Initiated by requesting node
N-Rule Initiated by messages from network
(remote node)
Further classified into Remote Requests and
Grants
Our method describes heuristics based on
message-type involved in counter-examples, and
a ranking of variables

37
37
38
German Ring mutex verification

Property to be verified
((i!j) cache(i)excl cache(j)excl))
Result
3 invariants.
Uclid time 1.08s
User time a day

39
Comparison GERMAN

Mutual Exclusion (GERMAN-I)
Lahiris earlier manual proof had 25 auxiliary
invariants and took 47.93s of uclid time.
Ours 9 auxiliary invariants with 6.02s
Mutual exclusion (GERMAN-II)
Lahiris indexed predicate discovery method to
construct inductive invariant.
Automated - had 24 predicates, 143s uclid time
Our method manual, 13 predicates, 2.16s

40
Comparison FLASH

Mutual Exclusion
Automated predicate discovery method based on
computing weakest precondition (Lahiri) to
generate inductive invariant
Initial set of predicates
From the property ? failed to converge to a
fixpoint
From our auxiliary invariants ? converged to
inductive invariant in 3 iterations.

41
Modeling tricks divide rules

UCLID cannot model nested ITEs.
Transition rules contain nested ITEs
Trick Split the rules
R if (c1) then b1 else if (c2) then b2
Split into two rules
R1 if (c1) then b2
R2 if (c1 c2) then b2

42
Modeling tricks quantified conditions in guards

UCLID doesnt allow quantifiers (", )in the
description.
Conditions like Qx. a(x) present in the guards,
where
Q ?", . Example emptiness test of
sh_list in GERMAN.
Consider a predicate of form x. a(x)
To model this, we introduce an auxiliary boolean
variable b. Replace x. a(x) by b.
Then introduce an axiom of form "x.(a(x) gt b)
Justification
b ? x. a(x)
gt (x. a(x) ) gt b
gt "x.(a(x) gt b)

43
Status of Invariant Discovery work

More experience is necessary with our invariant
discovery approach
Assess how easy in practice on a variety of
protocols
Make the process intuitive to a designer
Support front-ends such as tabular descriptions
or scenarios
Reflect error-trails back onto these descriptions
Allow designers to understand verification state
through controlled symbolic simulations

43
44
Concluding Remarks and Future Work

We are assembling a suite of examples including
many public examples such as Grbics multi-ring
protocol
We will finish the work on the random-walk
checker, coupling it with an error-trace
generator.
Wed like to work on using scenarios in cache
protocol design
We will begin our work on hierarchical cache
protocols
Wed like to return to the post-Si verification
problem of runtime verification under limited
observability.
Checking Memory Orderings
Liveness (perhaps as bounded safety, focussing on
architectural elements responsible for liveness
violation?)
ABSTRACTION MECHANISMS !!

44
45
45
46
2.1 The Operational Approach to memory model
specification (mostly finished Y1, Y2 one
dissertation finished Y3)

Developed by Yue Yang (PhD under the PI and
Lindstrom)
Defended PhD June04 and working for Microsoft
since July04
Developed the Uniform Memory Model based on
abstract machines
Coded in Murphi (obtain executable memory model
specs)
The UMM parameterized executable models can cover
a whole range of memory models
Coherence, PRAM, and Manson / Pugh Java shared
memory have all been specified in UMM
Papers
Concurrency and Computation Practice and
Experience (galley proofs of journal paper done)
Joint ACM Java Grande, 2002

46
47
2.1 contd

What exactly has been achieved?
Understanding of how to structure operational
memory model specifications to cover wide range
of memory models uniformly
Murphi code of 2800 lines demonstrating ideas
Potential uses
Starting-point for developers of new memory
models
Our Itanium Operational model of ICCD99 could be
specified in the UMM style if there is interest

47
48
2.2 contd

What exactly has been achieved?
Understanding of how to structure axiomatic
memory model specifications to cover wide range
of memory models uniformly
Constraint-Prolog code of 6000 lines
demonstrating ideas
Potential uses
Understand memory models experiment with them
in the Constraint Logic-programming Context
Can develop a memory-model sensitive race
analyzer (ICFEM04)

48
49
2.2 The Axiomatic Approach (mostly finished Y1,
Y2 SAT approach dissertation finished
Y3)

Constraint Prolog programs for
Classical Memory Models (Processor Consistency,
etc.)
Itanium (without semaphores, IO space ops, or
partial writes)
Can add these if member companies interested
Feasibility of SAT-based solution demonstrated
(coding by PI)
Highlights
The MPEC tool (described next slide) was a direct
an outcome
Papers Charme03 ICFEM04 IPDPS04 CSJP04

49
50
3 A Symbolic PO Reduction Method for Rule-based
Specifications of Protocols (Y2, Y3)

Ritwik Bhattacharyas PhD work
The industry employs rule-based specifications
for cache coherence protocols
Murphi (used by most industries)
TLA (used at least within Intel)
Rule-based specifications capture the parallel
condition / action style specification of cache
coherency engines in Protocol Tables
Specifies the concurrency in cache protocol
engines naturally
Specifies the low-latency bursts of
computation
Check local state and incoming messages
Produce outgoing messages and update local state