QB%20or%20not%20QB:%20An%20Efficient%20Execution%20Verification%20tool%20for%20Memory%20Orderings%20%20Ganesh%20Gopalakrishnan*%20School%20of%20Computing,%20University%20of%20Utah,%20%20Salt%20Lake%20City,%20UT%20Yue%20Yang*%20Microsoft%20Research,%20Redmond,%20WA%20Hemanthkumar%20Sivaraj*%20Intel%20Corporation, - PowerPoint PPT Presentation

About This Presentation
Title:

QB%20or%20not%20QB:%20An%20Efficient%20Execution%20Verification%20tool%20for%20Memory%20Orderings%20%20Ganesh%20Gopalakrishnan*%20School%20of%20Computing,%20University%20of%20Utah,%20%20Salt%20Lake%20City,%20UT%20Yue%20Yang*%20Microsoft%20Research,%20Redmond,%20WA%20Hemanthkumar%20Sivaraj*%20Intel%20Corporation,

Description:

{id=1; proc=0; pc=0; op= St; var=0; data=1; wrID=0; ... 1; pc=0; op= LdAcq; var=1; data=1; wrID=-1; ... {id=8; proc=1; pc=1; op= Ld; var=0; data=0; wrID=-1; ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: QB%20or%20not%20QB:%20An%20Efficient%20Execution%20Verification%20tool%20for%20Memory%20Orderings%20%20Ganesh%20Gopalakrishnan*%20School%20of%20Computing,%20University%20of%20Utah,%20%20Salt%20Lake%20City,%20UT%20Yue%20Yang*%20Microsoft%20Research,%20Redmond,%20WA%20Hemanthkumar%20Sivaraj*%20Intel%20Corporation,


1
QB or not QBAn Efficient Execution
Verification toolfor Memory OrderingsGanesh
Gopalakrishnan School of Computing, University
of Utah, Salt Lake City, UTYue
YangMicrosoft Research, Redmond,
WAHemanthkumar SivarajIntel Corporation,
Bangalore, India
Work supported in part by SRC Contract 1031.001
and NSF Award 0219805
2
Efficient Multiprocessors must have
Efficient Shared
Memory Systems
CPU performance
Memory performance
3
Building Efficient Memory
Allow reorderings between load / stores that
fall on DIFFERENT addresses
Example
st c,1 st d,2
ld d ld c
Program
CPU
CPU
Memory
st c,1 st d,2
ld d, 2 ld c, 0
Execution
  • Helps hide latencies
  • Simplifies design of directory protocols
  • System programmers will bite the bullet -)

4
Permitted reorderings are specified by the
shared memory consistency model
A VERY complex specification for a real
architecture (e.g. Itanium, PowerPC, ) Also
of growing concern in Software (e.g. Java
Memory Model, Unified Parallel C model, )
5
MODULAR SPECIFICATION OF MEMORY MODELS
legal_itanium exec ( a given execution )
?order. requireLinearOrder exec order
/\ requireWriteOperationOrder exec order
/\ requireProgramOrder exec order
/\ requireMemoryDataDependence exec order
/\ requireDataFlowDependence exec order
/\ requireCoherence exec order
/\ requireReadValue exec order
/\ requireAtomicWBRelease exec order
/\ requireSequentialUC exec order
/\ requireNoUCBypass exec order
See IPDPS 2004
6
A MEMORY MODEL RULE IN HOL
requireCoherence exec order !i j. i IN exec
/\ j IN exec gt isWr i /\ isWr j /\ (i.var
j.var) /\ order i j /\
((attr_of i.var WB) \/ (attr_of
i.var UC)) /\ ((i.wrTypeLocal)
/\ (j.wrTypeLocal) /\
(i.procj.proc) \/
(i.wrTypeRemote) /\ (j.wrTypeRemote) /\
(i.wrProcj.wrProc))
gt !p q. p IN exec /\ q IN exec gt
isWr p /\ isWr q /\
(p.wrID i.wrID) /\ (q.wrID j.wrID) /\
(p.wrType Remote) /\ (q.wrType
Remote) /\(p.wrProc q.wrProc)
gt order p q
7
How do we know that the actual silicon matches
the shared memory model ?
?
! X . X in exec ? ? Y . Y in exec ? . ?
! /\ \/ .
  • Pray
  • Run tests and manually check results
  • ? What else ?

8
FORMALLY VERIFY interesting EXECUTIONS
st8 12ca20 7f869af546f2f14c ld8 r25 45180
lt87b5e547172644a8gt ld2 r26 2c2a2c lt44a8gt ld2
r27 45aa2a ltc58egt
P1s exec
st8 45180 87b5e547172644a8 ld8 r25 45180
lt87b5e547172644a8gt st2 2c2a2c 44a8 st2
45aa2a c58e
P2s exec

9
TWO APPROACHES - explicitly QB - implicitly
QB
Given Execution
QBF
BOOLIFY
SPEC OF MEMORY MODEL IN hol
CONVERT TO EXECUTION CHECKER PROGRAM
SAT PROBLEM
PROGRAM
Given Execution
10
AN EXAMPLE
requireMickeyMouse exec order !i j. i IN exec
/\ j IN exec gt( i.op read /\
i.data 35 /\ j.op write /\
j.data 46 gt order j i)
GIVEN MP EXECUTION PROCESSOR 1
PROCESSOR 2 -----------
----------- read(ADDR, 35)
write(ADDR, 46)
11
requireMickeyMouse exec order !i j. i IN exec
/\ j IN exec gt( i.op read /\
i.data 35 /\ j.op write /\
j.data 46 gt order j i)
Explicitly QB
! i j Bool . BOOLIFIED MATRIX
Implicitly QB
FOR I 1 to 2 DO /\ (FOR j 1 to 2 DO
/\ ( BOOLIFIED MATRIX )
12
The Intel Itanium Processor memory model
  • Has these kinds of instructions

weak load or ordinary load -- ld
strong load or acquire-load -- ld.acq
weak store or ordinary store --
st strong store or release store --
st.rel memory fence (NOT barrier!) --
mf A few semaphore-types Allows sub-word
writes, I/O spaces
We dont model these
details momentarily
13
EVEN THIS EXAMPLE HAS A 1-page proof
A manual proof
P st x 1 mf ld r1 y lt0gt
R ld . acq r2 y lt1gt ld r3 x
lt0gt
Q st . rel y 1
Atomicity of st.rel
Load of initial value is before store of every
other value
14
CONTRIBUTIONS
Wrote a formal description of Itanium In Higher
Order Logic - modular - extensible - works
for many architectures
As opposed to relying on concurrent data
structures that pretend to be Itanium (the
operational style )
Showed how, using SAT, executions can be formally
verified against the spec
15
Our Approach
MP execution to be verified
Mechanical Program Derivation (to be automated)
Itanium Ordering rules in HOL

Checker Program
R ld.acq r2 y lt1gt ld r3 x
lt0gt
P st x 1 mf ld r1 y lt0gt
Q st.rel y 1
Satisfiability Problem with Clauses
carrying annotations
Sat Solver
RECENT WORK
Sat
Unsat
Unsat Core Extraction using Zcore
Explanation in the form of one possible interleavi
ng
  • Find Offending Clauses
  • Trace their annotations
  • Determine ordering cycle

16
Largest example tried to date (courtesy S.
Zeisset, Intel)
Proc 2 ld4 r24 733a74
lt415e304gt st4.rel 175984 96ab4e1f 67 more
instructions ld8 r87 56460
ltb5c113d7ce4783b1gt
Proc 1 st8 12ca20 7f869af546f2f14c ld r25
45180 lt87b5e547172644a8gt 58 more
instructions st2 7c2a00 4bca
  • Initially the tool gave a trivial violation
  • Diagnosed to be forgotten memory initialization
  • Added method to incorporate memory
    initialization in our tool
  • Our tool found the exact same cycle as pointed
    out by author of test

Cycle found thru our tool st.rel (line 18,
P1) ? ld (line 22, P2) ? mf ? ld (line 30, P2) ?
st (line 11, P1)
17
Statistics Pertaining to Case Study
  • 140 total instructions
  • All runs were on a 1.733 GHz 1GB Redhat
    Linux V9 Athlon
  • 1 minutes to generate Sat instance
  • 9M clauses ( O(n3) in terms of
    instructions )
  • 117,823 variables ( not a problem )
  • 1 minute to run Sat (unsat here) 0.2 sec to
    do real work
  • Zcore runs fast gave 23 clauses in one
    iteration

18
The rest of the talk
  • An Intuitive presentation of the Itanium memory
    model
  • Example of how a HOL rule was turned into a SAT
    generator
  • How the SAT part was done

Throwing an efficient transitivity blanket
over a problem to cover it with whatever
transitivity it begs for !!
  • What more to expect
  • Related work

19
Itanium memory model thru examples
Ordinary store

Can freely slide in a sequential program
st x 2

Only rule is coherence
The same applies to an ordinary load

ld reg1 x

20
Itanium memory model thru examples
Release store

st.rel x 2
Things before it in sequential program
order cant happen after it
Things after it in sequential program Order may
happen before it !!
21
Itanium memory model thru examples
Acquire load

ld.acq r3 y
Things before it in sequential program order may
happen after it
Things after it in sequential program Order cant
happen before it !!
22
But with these rules alone, we cant explain
the following legal outcome in Itanium
st.rel y 1
st.rel x 2
Data dep.
ld.acq r4 x lt2gt
ld.acq r3 y lt1gt
ld.acq rule
ld reg1 x lt0gt
ld reg2 y lt0gt
Itanium specification DOES NOT try to explain
outcomes in terms of shuffles of the original
instructions!
23
Itanium rules explain execution outcomes in
terms of progenies of stores and loads
This has turned out to be an unspoken convention
in this area for other memory models also
A store generates (n1) progenies
Other instructions generate only one
st y 1
ld.acq r3 y
Local copy for P0
remote copy for P0
remote copy for P1
24
We wrote such a breeding assembler
P1 St a,1 Ld r1,a lt1gt St
b,r1 lt1gt
P2 Ld.acq r2,b lt1gt Ld r3,a lt0gt
Tuple 1
id0 proc0 pc0 op St var0 data1
wrID0 wrTypeLocal wrProc0 reg-1
useRegfalse id1 proc0 pc0 op St
var0 data1 wrID0 wrTypeRemote
wrProc0 reg-1 useRegfalse id2 proc0
pc0 op St var0 data1 wrID0
wrTypeRemote wrProc1 reg-1 useRegfalse
id3 proc0 pc1 op Ld var0 data1
wrID-1 wrTypeDontCare wrProc-1 reg0
useRegtrue id4 proc0 pc2 op St
var1 data1 wrID4 wrTypeLocal
wrProc0 reg0 useRegtrue id5 proc0
pc2 op St var1 data1 wrID4
wrTypeRemote wrProc0 reg0 useRegtrue
id6 proc0 pc2 op St var1 data1
wrID4 wrTypeRemote wrProc1 reg0
useRegtrue id7 proc1 pc0 op LdAcq
var1 data1 wrID-1 wrTypeDontCare
wrProc-1 reg1 useRegtrue id8 proc1
pc1 op Ld var0 data0 wrID-1
wrTypeDontCare wrProc-1 reg2 useRegtrue
...
Tuple 9
25
Itanium rules specify how to line-up the
tuples to explain the load-outcomes !!
P0
P1
st y 1
st x 2
ld.acq r3 y lt1gt
ld.acq r4 x lt2gt
ld reg1 x lt0gt
ld reg2 y lt0gt
st y 1 l
st x 2 l
st x 2 rp0
st y 1 rp0
st x 2 rp1
st y 1 rp1
Now, arrange the split copies
st y 1 l
Explanation
ld.acq r3 y lt1gt
Dependencies
st x 2 l
ld.acq r4 x lt2gt
st y 1 rp0
st x 2 rp1
ld reg1 x lt0gt
st x 2 rp0
Anti- dependencies
ld reg2 y lt0gt
st y 1 rp1
26
Gist of our method Illustration on SC and of
Itanium
The tuples to be ordered
The tuples to be ordered
legalItanium(exec) Exists order. (
requireStrictTotalOrder exec order
/\ requireWriteOperationOrder exec
order /\ requireItProgramOrder
exec order /\ requireMemoryDataDependence exec
order /\ requireDataFlowDependence exec
order /\ requireCoherence
exec order /\ requireAtomicWBRelease
exec order /\ requireSequentialUC
exec order /\ requireNoUCBypass
exec order /\ requireReadValue
exec order
SC(exec) Exists order. ( requireStrictTotalO
rder exec order /\ requireProgramOrder
exec order /\ requireReadValue
exec order
Find an arrangement under SC constraints
Find arrangement as per above constraints
27
Gist of constraints
  • Some arrangements are statically known

Implies
and
  • Others are conditional
  • Some must form an atomic set

Everybody else Strictly before or Strictly after.
  • Many are unordered
  • Find a strict total order satisfying all
    the above !

28
Gist of constraint ENCODING
j
1
N
1
1
  • Use Boolean precedence matrix
  • Capture i before j by m_ij

1
i
1
N
Statically known
? Unit clauses
? Boolean formula
Implies
and
Atomic set
? See how SAT-generator is derived
  • Spew out irreflexivity and totality axioms
  • Then throw a transitivity blanket
  • on top of all tuples

Strict total order
29
  • Also tried E_ij method
  • and some incremental SAT
  • (see paper)

30
Approaches to transitivity blanket
Naïve For all tuples i, j, and k, generate
m_ij /\ m_jk ? m_jk Too many
clauses (1B for a 1000-tuple program) Better
Obtain transitive-closure of known orderings
and then prune irrelevant parts of
the blanket
E.g., if m_ij is known, dont generate
m_ij /\ ? as well as
/\ m_ij ?
31
Obtaining SAT-generator from HOL
atomicWBRelease(exec,order) forall (i
in exec).(j in exec).(k in exec). (i.op
StRel) /\ (i.wrType Remote) /\ (attr_of i.var
WB) /\ (i.wrID k.wrID)
/\ order(i,j) /\ order(j,k) gt (j.wrID
i.wrID) atomicWBRelease(exec,order) forall
(i in exec).(j in exec).(k in exec). (i.op
StRel) /\ (i.wrType Remote) /\ (attr_of i.var
WB) /\ (i.wrID k.wrID)
/\ (j.wrID i.wrID) gt (order(i,j) /\
order(j,k)) atomicWBRelease(exec,order)
forall (i in exec). (i.op StRel) /\ (i.wrType
Remote) /\ (attr_of i.var WB)
gt forall (k in exec).
(i.wrID k.wrID)
gt forall (j in exec).
(j.wrID i.wrID)

gt

(order(i,j) /\ order(j,k))

Initial Spec
Applying Contrapositive
After Reducing quantifier Scopes
32
Obtaining SAT-generator from HOL
atomicWBRelease(exec,order) forall (i in
exec). (i.op StRel) /\ (i.wrType Remote) /\
(attr_of i.var WB)
gt forall (k in exec). (i.wrID
k.wrID)
gt forall (j in exec). (j.wrID
i.wrID)

gt

(order(i,j) /\ order(j,k)) atomicWBRelease(exec
) forall(i,exec,wb(i)) wb(i) if
((attr_of i.varWB) (i.opStRel)
(i.wrTypeRemote) then true
else forall(k,exec,wb1(i,k)) wb1(i,k) if
(i.wrIDk.wrID)
then true
else forall(j,exec,wb2(i,k,j)) wb2(i,k,j)
if (j.wrIDi.wrID)
then true
else (order(i,j) order(j,k))
forall(i,S, e(i)) for all i in S
e(i) ( foldr( map (fn i -gt e(i)) (S)
(), true) )
Transformed Spec

Functional Program that generates the constraints
(will be automated)
33
Clause annotations for the unsat core for example
op1 11 op2 -1 op3 -1 op4 -1 rule
ReadValue op1 11 op2 -1 op3 -1 op4 -1
rule ReadValue op1 11 op2 -1 op3 -1
op4 -1 rule ReadValue op1 11 op2 10
op3 -1 op4 -1 rule ReadValue op1 -1
op2 -1 op3 -1 op4 -1 rule NoRule op1
12 op2 -1 op3 -1 op4 -1 rule
ReadValue op1 12 op2 -1 op3 -1 op4 -1
rule ReadValue op1 12 op2 -1 op3 -1
op4 -1 rule ReadValue op1 12 op2 -1
op3 -1 op4 -1 rule ReadValue op1 12
op2 4 op3 -1 op4 -1 rule ReadValue op1
12 op2 -1 op3 -1 op4 -1 rule
ReadValue op1 -1 op2 -1 op3 -1 op4 -1
rule NoRule op1 10 op2 12 op3 -1 op4
-1 rule AtomicWBRelease op1 10 op2 11
op3 -1 op4 -1 rule AtomicWBRelease op1
10 op2 11 op3 10 op4 -1 rule
AtomicWBRelease op1 10 op2 11 op3 9 op4
-1 rule AtomicWBRelease op1 10 op2 11
op3 8 op4 -1 rule AtomicWBRelease op1
10 op2 11 op3 8 op4 -1 rule
AtomicWBRelease op1 10 op2 11 op3 8 op4
-1 rule AtomicWBRelease op1 10 op2 11
op3 8 op4 -1 rule AtomicWBRelease
op1 1 op2 -1 op3 -1 op4 -1 rule
Reflexive op1 4 op2 5 op3 6 op4 -1
rule TransitiveOrder op1 4 op2 5 op3
-1 op4 -1 rule ProgramOrder op1 4 op2
6 op3 8 op4 -1 rule TransitiveOrder op1
4 op2 11 op3 12 op4 -1 rule
TransitiveOrder op1 5 op2 6 op3 -1 op4
-1 rule ProgramOrder op1 6 op2 8 op3
-1 op4 -1 rule TotalOrder op1 10 op2
11 op3 -1 op4 -1 rule TotalOrder op1
11 op2 4 op3 8 op4 -1 rule
TransitiveOrder op1 11 op2 4 op3 -1 op4
-1 rule TotalOrder op1 11 op2 12 op3
-1 op4 -1 rule ProgramOrder op1 -1 op2
-1 op3 -1 op4 -1 rule NoRule op1 6
op2 -1 op3 -1 op4 -1 rule
ReadValue op1 6 op2 -1 op3 -1 op4 -1
rule ReadValue op1 6 op2 -1 op3 -1 op4
-1 rule ReadValue op1 6 op2 -1 op3
-1 op4 -1 rule ReadValue op1 6 op2 8
op3 -1 op4 -1 rule ReadValue op1 6 op2
-1 op3 -1 op4 -1 rule ReadValue op1
-1 op2 -1 op3 -1 op4 -1 rule
NoRule op1 11 op2 -1 op3 -1 op4 -1
rule ReadValue op1 11 op2 10 op3 -1
op4 -1 rule ReadValue
34
denotes an op
1 2 3 4
st x 1
5
mf
Denotes op numbers. Store has both local and
remote exec
6
ld r1 y lt0gt
7 8 9 10
st.rel y 1
ld.acq r2 y lt1gt
11
12
ld r3 x lt0gt
35
1 2 3 4
st x 1
op1 4 op2 5 op3 -1 op4 -1 rule
ProgramOrder
5
mf
6
ld r1 y lt0gt
7 8 9 10
st.rel y 1
ld.acq r2 y lt1gt
11
12
ld r3 x lt0gt
36
1 2 3 4
st x 1
5
mf
op1 5 op2 6 op3 -1 op4 -1 rule
ProgramOrder
6
ld r1 y lt0gt
7 8 9 10
st.rel y 1
ld.acq r2 y lt1gt
11
12
ld r3 x lt0gt
37
1 2 3 4
st x 1
op1 6 op2 -1 op3 -1 op4 -1 rule
ReadValue op1 6 op2 -1 op3 -1 op4 -1
rule ReadValue op1 6 op2 -1 op3 -1
op4 -1 rule R eadValue op1 6 op2 -1
op3 -1 op4 -1 rule ReadValue op1 6
op2 8 op3 -1 op4 -1 rule
ReadValue op1 6 op2 -1 op3 -1 op4 -1
rule ReadValue
5
mf
6
ld r1 y lt0gt
7 8 9 10
st.rel y 1
ld.acq r2 y lt1gt
11
12
ld r3 x lt0gt
38
1 2 3 4
op1 10 op2 12 op3 -1 op4 -1 rule
AtomicWBRelease op1 10 op2 11 op3 -1 op4
-1 rule AtomicWBRelease op1 10 op2 11
op3 10 op4 -1 rule AtomicWBRelease op1
10 op2 11 op3 9 op4 -1 rule
AtomicWBRelease op1 10 op2 11 op3 8 op4
-1 rule AtomicWBRelease op1 10 op2 11
op3 8 op4 -1 rule AtomicWBRelease op1
10 op2 11 op3 8 op4 -1 rule
AtomicWBRelease op1 10 op2 11 op3 8 op4
-1 rule AtomicWBRelease
st x 1
5
mf
6
ld r1 y lt0gt
7 8 9 10
st.rel y 1
ld.acq r2 y lt1gt
11
12
ld r3 x lt0gt
39
1 2 3 4
st x 1
op1 11 op2 -1 op3 -1 op4 -1 rule
ReadValue op1 11 op2 10 op3 -1 op4 -1
rule ReadValue op1 11 op2 -1 op3 -1
op4 -1 rule ReadValue op1 11 op2 -1
op3 -1 op4 -1 rule ReadValue op1 11
op2 -1 op3 -1 op4 -1 rule
ReadValue op1 11 op2 10 op3 -1 op4 -1
rule ReadValue
5
mf
6
ld r1 y lt0gt
7 8 9 10
st.rel y 1
ld.acq r2 y lt1gt
11
12
ld r3 x lt0gt
40
1 2 3 4
st x 1
5
mf
op1 11 op2 12 op3 -1 op4 -1 rule
ProgramOrder
6
ld r1 y lt0gt
7 8 9 10
st.rel y 1
ld.acq r2 y lt1gt
11
12
ld r3 x lt0gt
41
1 2 3 4
st x 1
op1 12 op2 -1 op3 -1 op4 -1 rule
ReadValue op1 12 op2 -1 op3 -1 op4 -1
rule ReadValue op1 12 op2 -1 op3 -1
op4 -1 rule ReadValue op1 12 op2 -1
op3 -1 op4 -1 rule ReadValue op1 12
op2 4 op3 -1 op4 -1 rule
ReadValue op1 12 op2 -1 op3 -1 op4 -1
rule ReadValue
5
mf
6
ld r1 y lt0gt
7 8 9 10
st.rel y 1
ld.acq r2 y lt1gt
11
12
ld r3 x lt0gt
42
CONCLUSIONS
  • An execution verification method for real memory
    models
  • Convert HOL spec of memory model to
    SAT-generator
  • Given an execution, run SAT-generator, and
    generate
  • a SAT-instance
  • Unsat core gives violating cycle
  • Works for a few hundred total assembly language
  • instructions

43
What to expect
  • There is only so much engineering one can put-in
    before
  • making the checker code suspect
  • About 500 total instructions may be checkable
  • To scale beyond this size, we may need to
    sacrifice
  • completeness (e.g. limited transitivity
    instantiation good
  • for bug-hunting)
  • Incremental SAT methods can definitely pay-off
  • Worst-case (for exhaustive checking) is still
    bad

44
Related Work
  • Yuan Yu encoded Alpha axioms in FOL and solved
    using
  • Simplify
  • TSOtool (ISCA04, Hangal et.al.)
  • TSO much simpler than Itanium
  • They deliberately omit ordering rules to keep
    their
  • checker polynomial (e.g. ordering unrelated
    stores)
  • - Hence incomplete
  • Very long executions checked
  • - Most industrial in-house checkers are similar

45
Extra Slides
46
A real example Atomic WB Release
Informal statement Store-Releases to
write-back memory become visible to all
processors in the same order
Implementation All copies of a split st.rel
are visible atomically
st.rel x 1
Atomic set
47
One standard way of specifying atomicity All
other events e are strictly before or strictly
after the atomic set
e
e
Another standard way of specifying
atomicity If some event e is between two
events in the atomic set, then e also belongs
to the atomic set
e
e
48
Constraint (Sat) Encoding Approach 1
  • n logn approach (small domain encoding)
  • Attach a word w_t of 2 bits to each tuple t
  • Tuple i before Tuple j --gt Assert wi lt wj
  • StrictTotalOrder --gt Assert that the wt
    words are distinct
  • Smaller of Boolean Vars
  • Much Harder SAT instances (abandoned for now)

Illustration on 4 tuples
requireStrictTotalOrder exec order
requireOtherOrder exec order
requireReadValue exec order
For all i, j xi1,xi0 ! xj1, xj0
x00 x01
x10 x11
A system of constraints with primitive constraint
xi1, xi0 lt xj1, xj0
x20 x21
x30 x31
49
Constraint Encoding Approach 2
  • n n approach (e_ij encoding)
  • Assign a matrix position mij for each pair of
    tuples ti and tj
  • Tuple i before Tuple j --gt Assert mij true
  • StrictTotalOrder --gt Assert
    Irreflexitivity, Transitivity, Totality
  • Larger of Boolean Vars
  • Easier SAT instances (being pursued now)

Illustration on 4 tuples
Forall i mii Forall i,j mij \/ mji Forall
i,j,k mij /\ mjk gt mik
requireStrictTotalOrder exec order
requireOtherOrder exec order
requireReadValue exec order
i . . . . j . mij . . . .
. . . . . .
A system of constraints with primitive constraint
mij
50
Table of Results (somewhat dated)
SAT-instance generation time for n logn method
Tuples Total Order Other Order
32 0.2 1.6
64 1.2 17.1
128 5.7 179.0
SAT-instance generation time for n n method
Tuples Total Order Other Order
32 0.5 0.1
64 4.3 0.9
128 34.2 9.0
SAT-checking times
Tuples n logn
nn
Monolith TotalOrd OtherOrd
Monolith TotalOrd OtherOrd
32 9.6 0.6
4.3 0.33 0.69 0.05

64 247.17 29.53 37.6
2.73 6.17 0.5
128 abort 1341 abort
164.8 145.6 351.1
51
Example execution (Table 18, pg. 31 of App note)
  • The Sat instance generated for the above example
    is
  • UNSAT.
  • Next few slides show automated approach to
    detect
  • the root cause cycle.
  • We will ignore the reflexive and transitive
    rules in
  • these slides (they are necessary to force
    unsat, but
  • useless in building a cycle!!)

52
Good Case-study Illustrating Program Derivation
from Formal Specs
  • Initial specs HOL
  • Formal derivation of tail-recursive functional
    programs
  • Code generation consists of generating Boolean
    clauses
  • Choose Boolean encoding method
  • Re-target code generation correspondingly
  • Source-level optimizations
  • Record known orderings (e.g., i before j)
    these manifest as unit clauses
  • Infer others (e.g., not j before i) - generate
    unit-clauses for these too
  • Prevent generating transitivity axioms that
    depend on j before i
  • The use of incremental SAT can perhaps be
    directed by functional scripts that are
    automatically generated
  • Use of Unsat cores to pinpoint errors

53
Concluding Remarks
  • Main source of complexity the transitivity axiom
  • Lazy methods for handling transitivity must be
    investigated
  • Hybrid Sat encoding (partly nn and partly n log
    n) can also help as was the experience of Lahiri,
    Seshia, and Bryant
  • Analyzing larger programs
  • Somehow view program in terms of basic blocks
  • Treat each basic block as super instruction
  • If super-instruction unordered, no need to
    descend into basic block
  • Exploit incremental Sat when same litmus tests
    are rerun
  • Try modeling another weak memory model

54
Extra Slides
55
Unsat Core generation
  • The CNF file generated by the sat-generating
    program is solved using zchaff.
  • If SAT, then we get a satisfying assignment.
  • First nn variables in the assignment correspond
    to the nn variables in our ordering. Can be used
    to output a valid ordering of the exec.
  • If UNSAT, then need a way to find a root-cause
    for the illegality of the execution.
  • We use unsatisfiable core generation to get to
    the root cause.
  • An unsatisfiable core of an unsatisfiable Sat
    instance is a subset of clauses of the formula
    such that its conjunction is still UNSAT.

56
Generating Unsatisfiable Core
  • Zchaff can be told to generate resolution trace
    while checking for Sat.
  • Zcore tool that takes as input a CNF file and
    resolution trace produced by zchaff and produces
    unsatisfiable core.
  • Zcore available as part of zchaff.
  • Unsatisfiable core is another CNF file with the
    reduced set of clauses.
  • Can be fed back into zchaff/zcore to generate a
    potentially smaller unsatisfiable core.
  • Process repeated till fixed point reached.

57
Mapping back to root-cause
  • Clauses in the unsatisfiable core contain the
    ordering violation information in them
  • Tool to home in towards the root-cause for the
    violation
  • If the root cause is not something trivial, then
    the cause is usually a cycle of instructions.
    Each link in the cycle corresponds to an ordering
    requirement between the instuctions involved.
  • If cycle exists, then Transitivity can be applied
    to show that Irreflexivity is not satisfied.
  • Input to the tool to generate root cause
  • The original set of annotated machine
    instructions for all processors
  • The default values stored in memory locations at
    the beginning of the execution
  • Clause annotations for the clauses that form the
    unsatisfiable core

58
Root-cause cycle analysis algorithm
  • Each ReadValue rule generates a set of clauses.
  • From the annotations, find the tuples that come
    from the same ReadValue rule (two different exec
    will be involved in a rule)
  • Extract the exec out of the annotations and get
    the corresponding instructions (using the proc
    and pc values)
  • From the data being used in the ld instruction
    and the default date value for the corresponding
    memory address, it can be seen if the effect of a
    store is being reflected in a load.
  • This way the dependency between a load and a
    store is established.
  • The above is done for all the ReadValue rules in
    the annotations
  • exec (and the corresponding instructions) on both
    sides of a mf that form a link in the cycle are
    inferred based on ProgramOrder rule annotations
    and the pc values involved.
  • The other missing links in the violating cycle
    are also inferred based on the remaining
    ProgramOrder rule annotations.

59
A taxonomy of Formal methods to specify
industrial Relaxed Memory Models
  • Operational
  • Operational models of industrial memory models
    are complex
  • Running them inside a standard model-checker is
    too slow!
  • Utility for verification is limited
  • Provides limited insight
  • Axiomatic
  • Much more precise
  • Orderings must ideally be expressed thru an
  • ORTHOGONAL set of rules
  • No such prior axiomatic specs of industrial
    memory models

60
Post-Si verification of MP Orderings today
(oversimplified)
assembly program 1
assembly program n
Run repeatedly to catch one interleaving that
might reveal bug
...
New MP System
...
Check every execution against ordering rules
for compliance
assembly execution 1
assembly execution n
This is done ad-hoc How to make this formal
and efficient ? How to capitalize on repeated
re-runs ?
61
Explanation of Illegal Executions (p 31 of
Itanium App Note search 251429)
P st x 1 mf ld r1 y lt0gt
R ld . acq r2 y lt1gt ld r3 x
lt0gt
Q st . rel y 1
la
sr
us
mf
ul2
ul1
  • US gtgt MF hence RVr(US) ? F(MF)
  • MF gtgt UL1 hence F(MF) ? R(UL1)
  • many reasons hence R(UL1) ? RVp(SR)
  • If RVr(SR) ? R(UL1) and RVr(SR) ? UL1 ?
    RVp(SR) , WB release atomicity of SR
  • is violated, thus R(UL1) ? RVr(SR)
  • five lines of reasons Hence RVr(SR) ? R(LA)
  • Since LA gtgt UL2, R(LA) ? R(UL2)
  • Another para of reasons LV(Sr2) ? R(UL2) ?
    LV(SR1) ? RVp(SR1) ? RVq(SR1) ?
  • F(MF1) ? R(UL1) ? RVq(SR2) ? RVp(SR2). But
    cant allow due to atomicity of SR.

62
Checking Executions and Providing Explanations
(present approach)
P st x 1 mf ld r1 y lt0gt
R ld . acq r2 y lt1gt ld r3 x
lt0gt
Q st . rel y 1
  • Published approaches are very labor-intensive
    paper-and-pencil proofs
  • Clearly this cant scale (6 instruction MP
    program takes 1-page of detailed
  • mathematical proof
  • What about the combinatorics of reasoning about
    200 instructions?
  • Approaches actually used within the industry
    involves the use of checkers
  • Details of these checkers are unknown (How
    complete? How scalable?)

63
The rest of the talk
  • Itanium memory model in Higher Order Logic
    (well, not so high actually ? )
  • Our HOL specs ? translation ? sat-generating
    checker programs
  • Execution to be checked ? translation by above
    program to Sat
  • Each assembly instruction ? clauses it generates
    annotations
  • When Sat, what interleaving explains?
  • When Unsat, how to get core (root-cause)
    annotations on core
  • Translating annotations on core to cycle on
    original program

64
  • Itanium memory model in Higher Order Logic
    (well, not so high actually ? )
  • The initial focus of our presentation
  • How to model an execution ?
  • Why use split stores in modeling ?

65
But, how do we check executions against such
specs?
legalItanium(exec) Exists order. (
requireStrictTotalOrder exec order
/\ requireWriteOperationOrder exec
order /\ requireItProgramOrder
exec order /\ requireMemoryDataDependence exec
order /\ requireDataFlowDependence exec
order /\ requireCoherence
exec order /\ requireAtomicWBRelease
exec order /\ requireSequentialUC
exec order /\ requireNoUCBypass
exec order /\ requireReadValue
exec order
SC(exec) Exists order. ( requireStrictTotalO
rder exec order /\ requireProgramOrder
exec order /\ requireReadValue
exec order
Execution 1
Execution 2
st c,1 st d,2
ld d, 2 ld c, 1
st c,1 st d,2
ld d, 2 ld c, 0
e.g., which execution is legal under which memory
model ?
66
  • Itanium memory model in Higher Order Logic
    (well, not so high actually ? )
  • Our HOL specs ? translation ? sat-generating
    checker programs

67
  • Itanium memory model in Higher Order Logic
    (well, not so high actually ? )
  • Our HOL specs ? translation ? sat-generating
    checker programs
  • Execution to be checked ? translation by above
    program to Sat

68
How the SAT encoding is achieved...
Example Execution
  • Store c viewed at P1 for modeling bypassing
  • Store c viewed at P1 for modeling global
    visibility
  • Store c viewed at P2 for modeling global
    visibility
  • Store d viewed at P1 for modeling bypassing
  • Store d viewed at P1 for modeling global
    visibility
  • Store d viewed at P2 for modeling global
    visibility
  • Ld d viewed at P2 for modeling read value
  • Ld c viewed at P2 for modeling read value

st c,1 st d,2
ld d, 2 ld c, 0
Break it down into tuples
8 tuples obtained
legalItanium(exec) Exists order. (
requireStrictTotalOrder exec order
/\ requireOtherOrderItanium exec
order /\ requireReadValue exec
order
SC(exec) Exists order. ( requireStrictTotalO
rder exec order /\ requireOtherOrderSC exec
order /\ requireReadValue exec order
69
Explaining the results of Sat
  • Itanium memory model in Higher Order Logic
    (well, not so high actually ? )
  • Our HOL specs ? translation ? sat-generating
    checker programs
  • Execution to be checked ? translation by above
    program to Sat
  • Each assembly instruction ? clauses it generates
    annotations
  • When Sat, what interleaving explains?
  • When Unsat, how to get core (root-cause)
    annotations on core
  • Translating annotations on core to cycle on
    original program

70
Clause Annotations
  • Each clause generated by the sat-generating
    checker program also generates an associated
    tuple.
  • This tuple has information pertaining to the
    clauses source.
  • Each tuple has the following information
  • The exec involved in generating the clause (upto
    a maximum of 4 exec could generate a clause)
  • The proc value of the processor whose
    instructions were used to generate this clause
    (taken from the tuples generated by the gentuple
    program)
  • The pc value of the instruction that was the
    source for this tuple
  • The name of the memory ordering rule the
    application of which generated this tuple
    (ReadValue, ProgramOrder, Reflexive, etc)
  • The clause annotation looks as follows
  • lt proc, pc, op1, op2, op3, op4, RuleName gt
Write a Comment
User Comments (0)
About PowerShow.com