Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT - PowerPoint PPT Presentation

About This Presentation
Title:

Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT

Description:

The Itanium memory model is described next The Intel Itanium Processor memory model Has these kinds of instructions : weak load or ordinary load ... – PowerPoint PPT presentation

Number of Views:13
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT


1
Shared Memory Consistency Models A broad
survey Ganesh Gopalakrishnan School of
Computing, University of Utah, Salt Lake City,
UT
Past work supported in part by SRC Contract
1031.001, NSF Award 0219805 and an equipment
grant from Intel Corporation
2
Shared Memory Hardware Realities
3
Shared Memory Software Realities
  • Must define the formal semantics of
    shared-memory concurrent
  • programming while allowing for all reasonable
    optimizations
  • Defining the Shared Thread semantics for Java
    (Original Java
  • books Chapter 17 has essentially been ripped
    out)
  • Defining the Shared Memory Model for new
    languages such as
  • Unified Parallel C (UPC) for Scientific
    Programming
  • At a deeper level Must have formal basis for
    Automatic
  • Minimal Fence Insertion to make programs appear
    to execute
  • sequentially consistent

4
Topics
  • Motivations for strong and weak memory models
  • - How it affects consistency protocol design
  • - How it affects programming
  • Classical memory models
  • - Their power
  • Fence insertion during compilation
  • - Run on weak architectures but appear to
    run SC
  • Overview of some weak architectures
  • Itanium in a nutshell
  • SAT-based programs that check executions against
    memory
  • model specs
  • - Demo of MP Execution Checker (MPEC) tool
    for Itanium

5
Topics
  • Theoretical aspects of memory model
    specification
  • - Specify using Traces or Specify using
    Transducers
  • Why Traced-based Specification can allow one to
    talk about
  • unrealizable machines
  • - Hence undecidability of sequential
    consistency is not a
  • solved problem
  • Why trace-based verification methods need to
    exert some care
  • - Otherwise can prove conniving machines to
    be SC !!
  • A brief taxonomy of recent results in this area
  • - Mainly Alur et.al., Qadeer, Bingham et.al.,
    and Sezgin

6
Sequential Consistency The Most Basic Memory
Consistency Model
  • Requirements
  1. Exists a common total order
  2. Respects program order
  3. Read sees the latest write

Example
Initially, x y 0. Finally, can r1 r2
0? Thread 1 Thread
2
x 1 r1 y
y 2 r2 x
Under Sequential Consistency No Under many weak
models Yes
7
How to Think About Sequential Consistency
P1
P2
Pn
Memory
Initially, x y 0. Finally, can r1 r2
0? Thread 1 Thread
2
x 1 r1 y
y 2 r2 x
No! Not under SC ! But possible under many weak
memory models! An example of such a weak memory
model is Sparc TSO
8
Coherence Per-location Sequential Consistency
P1
P2
Pn
1-address Memory
Initially, x y 0. Finally, can r1 r2
0? Thread 1 Thread
2
x 1 r1 y
y 2 r2 x
Notice that the same execution is Coherent !
9
Memory Consistency Models
Defines the legal orderings of memory operations
that can be perceived at the user level
  • Processors intermittently throw colors onto
  • memory cells and also intermittently look at
    their colors

P1
P2
Pn
Pi
Memory Cell 1
Memory Cell 2

Memory Cell n
10
Memory Consistency Models
Defines the legal orderings of memory operations
that can be perceived at the user level
  • Many have been developed
  • Sequential Consistency (SC)
  • Coherence (per-location SC)
  • Parallel Random Access Memory (PRAM)
  • Causal Consistency
  • Processor Consistency (PC)
  • Release Consistency
  • Location Consistency
  • The Intel Itanim Memory Model
  • Java Memory Model (JMM)
  • and more!

11
Memory Consistency Model Specifications
A VERY complex specification for a real
architecture (e.g. Itanium, PowerPC, ) Also
of growing concern in Software (e.g. Java
Memory Model, Unified Parallel C model, )
12
Motivation for (weak) Memory Consistency models
A Hardware Perspective
  • Cannot afford to do industrious updates across
    large MP
  • systems
  • Delayed and re-orderable updates allow
    considerable latitude
  • in memory consistency protocol design ? less
    bugs in protocols !!

Intra-cluster protocols
Chip-level protocols

dir
dir
Inter-cluster protocols
mem
mem
13
Price Paid for Delayed Updates Bugs!
  • Algorithms such as Petersons Mutual Exclusion
    cease to work!
  • Thread 1
    Thread 2
  • ------------
    -----------
  • Flags1 BUSY
    Flags2 BUSY
  • Turn 2
    Turn 1
  • While (Flags2 BUSY While
    (Flags1 BUSY
  • Turn ! 1)
    Turn ! 2)
  • Critical section Critical
    section
  • Flags1 FREE
    FLAGS2 FREE

CAN READ OLD VALUE!!
CAN READ OLD VALUE!!
14
Scope of Tutorial
  • Survey of Classical Work
  • Survey of Current Activities (that this speaker
    is aware of)
  • Verification Challenges
  • Theoretical Questions
  • Justification for topic selection
  • Complement talks on Shared Memory Consistency
    Protocols
  • Intuitions more important than the detailzzz.
  • Knowing whos who in this area helps
  • Excuse for me to stick my neck out and learn
    something new

15
Organization
  1. Overview (mainly of classical works)
  2. Practical aspects of weak consistency models
    (more depth)
  3. Whats not apparent at first glance (still more
    depth)
  4. Conclusions and references

16
Part 1 Overview of Classical Work
17
Memory Serves to Plumb Data
Uniprocessor Write ( address 2 , data 33)
.. Read ( address 2 , returns data 33)

Multiprocessor P1
P2 ---- ---- Write (2,
33) Read (2, 33)
?
?
but respecting Coherence!
Multiprocessor P1
P2 ----
---- Write(2, 33) Write(2, 77)
Read (2, 77) Read(2, 33)
P1
P2 P3
P4 ----
---- ----
---- Write (2, 33) Write
(2, 77) Read(2, 33) Read(2,
77)
Read(2, 77)
Read(2, 33)
?
?
18
but Coherence is not sufficient
From Shasha and Snir, Figure 1, P. 282 (ACM
TOPLAS (10)2 1988)
Processor 1
Processor 2 -------------

-------------- Test_and_set1(LOCK)
Test_and_set2(LOCK) Read1(X)

Read2(X) Write1(X)
Write2(X) Reset1(L
OCK)
Reset2(LOCK)
The following memory access sequence respects
Coherence but breaks the critical section
Test_and_set1(LOCK) Read1(X) Reset1(LOCK)
Test_and_set2(LOCK) Read2(X)
Write1(X) Write2(X) Reset2(LOCK)
  • Consistent view ACROSS ADDRESS SPACE is needed
  • Most intuitive such Sequential Consistency !

19
Basic understanding of SC
  • Execute AS IF instructions in each thread were
  • executed sequentially and atomically
  • - respecting the program order in each thread
  • - no constraints across sequential programs

Requires effort to achieve above effect AS WELL
AS high performance
Write (4, 66) MISSES Read (2, 22) HITS
Write (2, 55) MISSES Read (4, 11) HITS
Which Read waits ?

CPU 1
CPU n
Memory and Bus Controller
20
Aggressive SC Implementations
From Adve, Pai, and Ranganathan (Proc IEEE,
(87)3, March 1999, p.448) If the accessed
location does not change its value until the Read
could have been non-speculatively issued, then
the speculation is successful. Otherwise,
roll-back speculation until incorrect load.
(Similar schemes used in HP PA-8000, Intel
Pentium Pro, MIPS R10K)
Write (4, 66) MISSES Read (2, 22) HITS
Write (2, 55) MISSES Read (4, 11) HITS

Snoops are Write(4,66) Write(2,55)
Snoops are Write(4,66) Write(2,55)
CPU 1
CPU n
Memory and Bus Controller
One way to implement this If bus-snoop for
Write(4,..) arrives before that for Write(2,..),
the Read(4, 11) is invalidated and it
reissues
21
Unexpected Interactions SC and Write Update
Protocols (from Grahn, Stenstrom, Dubois)
  • An important aspect of Sequential Consistency is
    Write Atomicity
  • Write-Invalidate protocols can easily guarantee
    Write Atomicity
  • However, Write-Update protocols are often
    recommended (Read-latency)
  • Ensuring Write-Atomicity in Write-Update
    Protocols is tricky
  • WEAK MEMORY MODELS TO THE RESCUE !
  • Dont care about Write Atomicity except at
    Acquire / Release points

Intra-cluster protocols
Chip-level protocols

dir
dir
Inter-cluster protocols
mem
mem
22
A Deeper Look at Coherence
Complexity of Checking Coherence of Executions is
in NPC
Cantins proof Reduction from SAT
Existence of a Coherent Schedule is tested
Example Consider (u1 \/ u2) /\ (u1 \/
u2) Create the following concurrent
processes h1 h2 h_u1
h_u1 h_u2 h_u2 h3 ---
--- ----- -------
----- ------- --- W(d_u1)
W(d_u1) R(d_u1) R(d_u1) R(d_u2)
R(d_u2) R(d_c1) W(d_u2)
W(d_u2) R(d_u1) R(d_u1) R(d_u2)
R(d_u2) R(d_c2)
W(d_c1) W(d_c2)
W(d_c1) W(d_u1)


W(d_c2) W(d_u2)


W(d_u1)


W(d_u2)


W(d_F)
Literal Gadget
Clause Gadget
23
A Deeper Look at Coherence
  • Memory models that relax coherence and how
    useful they are
  • PRAM (pipelined RAM Lipton and Sandberg) is of
    academic interest


P1
P2
Pn


One memory per processor Program order is
obeyed, but No Write-Atomicity
24
A Deeper Look at Coherence
  • Memory models that relax coherence and how
    useful they are
  • PRAM of academic interest
  • Location consistency
  • Proposed by Gao and Sarkar
  • They tout its advantages in terms of scalability
  • They describe an LC protocol machine
  • Analysis by Wallace et.al (PDPTA 2002
    1542-1550)
  • Shown that this LC machine is stronger than
    the LC definition
  • Question whether LC programs indeed appear
    to execute
  • with sequentially consistent outcomes
    assuming that they are
  • properly labeled
  • I have not seen many pubs on LC of late

25
Classical Weak Memory Models
  • Processor Consistency is widely known
  • Good discussions in Ahamad et.al.,
  • The Power of Processor Consistency
  • First understand PRAM
  • - For each processor p, there is a legal
    serialization S_p of
  • H_pw such that if o1 and o2 are in H_pw and
    o1 po-gt o2
  • then o1 s_p ? o2
  • For PC_g, we add the following condition
  • for any two processors p and q, and for any
    location x,
  • S_p (w,x) S_q (w,x)
  • Processor Consistency according to Goodman
    (PC_g)
  • is not the same as
  • PC_d processor consistency according to
    the DASH project

26
Execution thats PRAM and Coherent but not
PC_g
P w(x,0)
w(y,0) Q
r(y,0) w(x,1)
R r(x,1)
r(x,0)
Coherent! Just look at each color
separately Not PC_g Construct a history
per processor with all of the processors
actions and all of others writes in that
history PC_g requires the write-histories to
agree per variable but in our example,
History of Q w(x,0) w(x,1) while
History of R w(x,1) w(x,0)
27
The power of Processor Consistency
  • Can handle Peterson (Ahamad)
  • Cant handle Bakery (Ahamad)
  • What else? (Kawash and Higham, Bounds for
    mutual
  • exclusion with only Processor Consistency)
  • - Peterson is correct for PC-G (a
    multi-writer protocol)
  • - Bakery is incorrect for PC-G (a
    single-writer protocol)
  • - Kawash and Higham prove that for mutual
    exclusion under
  • PC-G, one multi-writer and n single-writers
    are necessary

28
Observations
  • Weak shared memory consistency models allow
    consistency
  • protocols to be efficient
  • Unfortunately programmers find weak models
    non-intuitive
  • How can we have the best of both worlds
  • weak models to be supported by the hardware
  • strong models to be presented by the software
  • This can be achieved through compilers that
    insert the minimal number of fence instructions
    to give the appearance of SC

29
Basics of Fence Insertion
  • Widely cited work is by Shasha and Snir
  • Recent work by Lee, Midkiff, and Padua extends
    the above
  • Let us go through some examples (initially all
    mem. locations are 0)

P1 P2 ----
---- write(x,1) read(y,
yd) write(y,1) read(x, xd)
Under SC, If yd 1, then
xd 1
30
Basics of Fence Insertion
P1 P2 ----
---- write(x,1) read(y,
yd) write(y,1) read(x, xd)
  • BUT if we allow instructions to re-order, then
    the guarantee
  • If yd 1, then xd 1
  • is lost !!
  • But often we CAN re-order without noticing an SC
    violation
  • When can we re-order ??

31
Basics of Fence Insertion
  • Widely cited work is by Shasha and Snir (our
    exs. from their paper)
  • Recent work by Lee, Midkiff, and Padua extends
    the above
  • Let us go through some examples (initially all
    mem. locations are 0)

P1 P2 ----
---- write(x,1) read(y,
yd) write(y,1) read(x, xd)
a
b
  • Which program order edges in P a,b must be
    respected
  • in order to guarantee SC-compliant executions ?
  • Preserving a alone Insufficient, as it can
    return xd0, yd1
  • Preserving b alone Insufficient, as it can
    return xd0, yd1
  • BOTH a and b need to be preserved how to
    compute this in general?
  • Terminology a,b in this example forms the
    Delay Set, D

32
Analysis is based on Critical Cycles
  • Locate all critical cycles in the concurrent
    program
  • Equate Delay Set D to all the program-order
    edges in all
  • critical cycles
  • Locating Critical Cycles
  • Locate all Conflict Edges C
  • . Locate two accesses that are concurrent and one
    of them is
  • a write these give the undirected Conflict
    Edges C
  • . A critical cycle is a cycle in P U C that
    has the following
  • properties
  • Contains at-most two operations from the
    same thread
  • that are consecutive in it
  • Contains 0, 2, or 3 accesses to each shared
    variable
  • that are consecutive in it (further
    properties omitted)

33
Finding Critical Cycles Example 1
P1 P2 ----
---- write(x,1) read(y,
yd) write(y,1) read(x, xd)
Program Order Edges P
Conflict Edges C
P1 P2 ----
---- write(x,1) read(y,
yd) write(y,1) read(x, xd)
Critical Cycle
Delay Set D all the P edges in Critical
Cycle P in our case
34
Finding Critical Cycles Example 2
P1 P2 ----
---- read(x, xd)
write(x,1) read(y, yd) write(y,1)
Basically a while loop
Conflict Edges
P1 P2 ----
---- read(x, xd)
write(x,1) read(y, yd) write(y,1)
Critical Cycle
b
c
a
Delay Set D b, c whereas P a, b, c
35
Finding Critical Cycles Example 3
a1 read A b1 read B c1 read C d1
read D
a2 write B b2 write C c2 write D d2
write A
D (a1,b1), (a1,c1), (a1,d1), (a2,d2),
(b2,d2), (c2,d2) suffices to ensure SC
! I.e., a1 is an acquire-read and d2 is a
release-write !!
36
Basic Approach to Fence Insertion
  • Goal Discover the minimal set of fences to be
    inserted into
  • a concurrent shared memory program
  • Suppose D is the delay-set discovered by the
    previous analysis
  • Suppose the underlying (weak) architecture
    supports orderings
  • D_o
  • Let D_m be the fences to be inserted to get the
    effect of D
  • D_m ( ( D U D_o ) )tr - D_o
  • where tr is the transitive reduction

a
  • Required Delay Set (a,b), (b,c), (a,d)
  • D_o (c,d)
  • ( (D U D_o ) )tr (a,b), (b,c), (c,d)
  • ( (D U D_o) )tr D_o (a,b), (b,c) -
    fences needed only here

b
c
d
37
Basic Approach to Fence Insertion
  • Required Delay Set (a,b), (b,c), (a,d)
  • D_o (c,d)
  • ( (D U D_o ) )tr (a,b), (b,c), (c,d)
  • ( (D U D_o) )tr D_o (a,b), (b,c) -
    fences needed only here

So, in a nutshell, .
a
a
fence
b
b
implements the desired delay-set
fence
c
d
c
d
Hardware-provided ordering
38
Deriving Fences from Correctness Proofs
Lamports paper How to make a Correct
Multiprocess Program
Execute Correctly on a Multiprocessor,
IEEE Trans Computer 46(7)
1997 provides a really good insight on
deriving required weak orderings thru proofs
  • Notations
  • A ? B Every event in A precedes every event
    in B
  • A -- gt B Some event in A precedes some event
    in B

Implies
Implies
39
Deriving Places to insert a Synch Instruction
There is a proof in Lamports paper that
with just these Synch instructions, mutual
exclusion is guaranteed.
Repeat forever noncritical section L x_i
true For j 1 until i-1 Do if x_j
then x_I false
while x_j do od
goto L fi oD For j
i1 until N do while x_j do od od
critical section x_j false End Repeat
Synch
Synch
Synch
40
Part 2 A Detailed Look at a Practical Weak
Memory Model Itanium (I do mention three others
briefly)
41
Well, lets look at the big picture first
  • Sparc TSO, PSO, RMO
  • Reads and Writes follow the
  • TSO, PSO, or RMO semantics
  • Additional Fence instructions
  • and others (e.g. semaphores)
  • Im not upto speed on these
  • Alpha
  • Reads (only coherence)
  • Writes (only coherence)
  • Load-Locked
  • Store-Conditional
  • Membar

42
Well, lets look at the big picture
  • Power-4
  • Reads and Writes (dont know much)
  • Sync (Synchronize)
  • Lwsync (Lightweight Sync new in Power4)
  • E I E I O (Enforce In-Order Execution of I/O)
  • Lwarx (Load word and reserve)
  • Ldarx (Load doubleword and reserve)
  • Stwcx (Store word conditional)
  • Stdcx (Store Doubleword Conditional)
  • Isync (Instruction synchronize)

Perhaps Old-McDonald knows more
43
IA-32, IA-64, AMD, ?
  • Generally thought to be Processor Consistency
  • Does it really help formally specify (or even
    reveal the details) ?
  • Intel thought so
  • The Itanium memory model is described next

44
The Intel Itanium Processor memory model
  • Has these kinds of instructions

weak load or ordinary load -- ld
strong load or acquire-load -- ld.acq
weak store or ordinary store --
st strong store or release store --
st.rel memory fence (NOT barrier!) --
mf A few semaphore-types Allows sub-word
writes, I/O spaces
We dont model these
45
Itanium memory model thru examples
Ordinary store

Can freely slide in a sequential program
st x 2

Only rule is coherence
The same applies to an ordinary load

ld reg1 x

46
Itanium memory model thru examples
Release store

st.rel x 2
Things before it in sequential program
order cant happen after it
Things after it in sequential program Order may
happen before it !!
47
Itanium memory model thru examples
Acquire load

ld.acq r3 y
Things before it in sequential program order may
happen after it
Things after it in sequential program Order cant
happen before it !!
48
But with these rules alone, we cant explain
the following legal outcome in Itanium
st.rel y 1
st.rel x 2
Data dep.
ld.acq r4 x lt2gt
ld.acq r3 y lt1gt
ld.acq rule
ld reg1 x lt0gt
ld reg2 y lt0gt
Itanium specification DOES NOT try to explain
outcomes in terms of shuffles of the original
instructions!
49
Itanium rules explain execution outcomes in
terms of progenies of stores and loads
This has turned out to be an unspoken convention
in this area for other memory models also
A store generates (n1) progenies
Other instructions generate only one
st y 1
ld.acq r3 y
Local copy for P0
remote copy for P0
remote copy for P1
50
We wrote such a breeding assembler
P1 St a,1 Ld r1,a lt1gt St
b,r1 lt1gt
P2 Ld.acq r2,b lt1gt Ld r3,a lt0gt
Tuple 1
id0 proc0 pc0 op St var0 data1
wrID0 wrTypeLocal wrProc0 reg-1
useRegfalse id1 proc0 pc0 op St
var0 data1 wrID0 wrTypeRemote
wrProc0 reg-1 useRegfalse id2 proc0
pc0 op St var0 data1 wrID0
wrTypeRemote wrProc1 reg-1 useRegfalse
id3 proc0 pc1 op Ld var0 data1
wrID-1 wrTypeDontCare wrProc-1 reg0
useRegtrue id4 proc0 pc2 op St
var1 data1 wrID4 wrTypeLocal
wrProc0 reg0 useRegtrue id5 proc0
pc2 op St var1 data1 wrID4
wrTypeRemote wrProc0 reg0 useRegtrue
id6 proc0 pc2 op St var1 data1
wrID4 wrTypeRemote wrProc1 reg0
useRegtrue id7 proc1 pc0 op LdAcq
var1 data1 wrID-1 wrTypeDontCare
wrProc-1 reg1 useRegtrue id8 proc1
pc1 op Ld var0 data0 wrID-1
wrTypeDontCare wrProc-1 reg2 useRegtrue
...
Tuple 9
51
Itanium rules specify how to line-up the
tuples to explain the load-outcomes !!
P0
P1
st y 1
st x 2
ld.acq r3 y lt1gt
ld.acq r4 x lt2gt
ld reg1 x lt0gt
ld reg2 y lt0gt
st y 1 l
st x 2 l
st x 2 rp0
st y 1 rp0
st x 2 rp1
st y 1 rp1
Now, arrange the split copies
st y 1 l
Explanation
ld.acq r3 y lt1gt
Dependencies
st x 2 l
ld.acq r4 x lt2gt
st y 1 rp0
st x 2 rp1
ld reg1 x lt0gt
st x 2 rp0
Anti- dependencies
ld reg2 y lt0gt
st y 1 rp1
52
Gist of our method Illustration on SC and of
Itanium
The tuples to be ordered
The tuples to be ordered
legalItanium(exec) Exists order. (
requireStrictTotalOrder exec order
/\ requireWriteOperationOrder exec
order /\ requireItProgramOrder
exec order /\ requireMemoryDataDependence exec
order /\ requireDataFlowDependence exec
order /\ requireCoherence
exec order /\ requireAtomicWBRelease
exec order /\ requireSequentialUC
exec order /\ requireNoUCBypass
exec order /\ requireReadValue
exec order
SC(exec) Exists order. ( requireStrictTotalO
rder exec order /\ requireProgramOrder
exec order /\ requireReadValue
exec order
Find an arrangement under SC constraints
Find arrangement as per above constraints
53
Our Itanium Formal Model (extracted from
Intel Documents written as a HOL Theory)
legal_itanium exec ( a given execution )
?order. requireStrictTotalOrder exec order
/\ requireWriteOperationOrder exec order
/\ requireProgramOrder exec order
/\ requireMemoryDataDependence exec order
/\ requireDataFlowDependence exec order
/\ requireCoherence exec order
/\ requireReadValue exec order
/\ requireAtomicWBRelease exec order
/\ requireSequentialUC exec order
/\ requireNoUCBypass exec order
See Charme03, IPDPS04, CAV04 Various
contributions by Yue Yang, Gopalakrishnan,
Lindstrom, Slind, Sivaraj, Yu Yang
54

requireStrictTotalOrder exec order
55

requireWriteOperationOrder exec order
Local Write before Local Global Write Local
Write before Remote Global Writes
56

requireProgramOrder exec order
Program Order is defined solely through
Acquires, Releases,
and Fences
57

requireMemoryDataDependence exec order
Order two accesses (Read or Write) under these
conditions IF program-ordered AND the
same variable AND Write is local
and RAW (and Read of course is local)
OR Write is local and WAR OR Both
writes are local and WAW OR Both
writes are remote and WAW and Fall in same
processor
58

requireDataFlowDependence exec order
Data Dependence Thru the Register-Space
59

requireCoherence exec order
Just Plain-Old Coherence but for TWO WRITES
falling in the WB or UC space and for EITHER
Two Local Writes OR two Remote
Writes in the same processor
60

requireReadValue exec order
Reads return Most Recent Writes
61

requireAtomicWBRelease exec order
All Remote Events Stemming from the Same
Release-Write Instruction appear to be an Atomic
Set
62

requireSequentialUC exec order
In the UC Space, Program-Ordered UC Read and
Write Events, both of which are Local are
ordered as per program order (the two
operations in question could be RR, RW, WR, or WW)
63

requireNoUCBypass exec order
UC-space Operations Do Not Exhibit Read
Bypassing as in TSO
64
A MEMORY MODEL RULE IN HOL
requireCoherence exec order !i j. i IN exec
/\ j IN exec gt isWr i /\ isWr j /\ (i.var
j.var) /\ order i j /\
((attr_of i.var WB) \/ (attr_of
i.var UC)) /\ ((i.wrTypeLocal)
/\ (j.wrTypeLocal) /\
(i.procj.proc) \/
(i.wrTypeRemote) /\ (j.wrTypeRemote) /\
(i.wrProcj.wrProc))
gt !p q. p IN exec /\ q IN exec gt
isWr p /\ isWr q /\
(p.wrID i.wrID) /\ (q.wrID j.wrID) /\
(p.wrType Remote) /\ (q.wrType
Remote) /\(p.wrProc q.wrProc)
gt order p q
65
One use we have put our Spec to Post-Si
Verification of MP Systems
How do we know that the actual silicon matches
the shared memory model ?
?
! X . X in exec ? ? Y . Y in exec ? . ?
! /\ \/ .
  • Pray
  • Run tests and manually check results
  • ? What else ?

66
FORMALLY VERIFY interesting EXECUTIONS
st8 12ca20 7f869af546f2f14c ld8 r25 45180
lt87b5e547172644a8gt ld2 r26 2c2a2c lt44a8gt ld2
r27 45aa2a ltc58egt
P1s exec
st8 45180 87b5e547172644a8 ld8 r25 45180
lt87b5e547172644a8gt st2 2c2a2c 44a8 st2
45aa2a c58e
P2s exec

67
TWO APPROACHES - explicitly QB - implicitly
QB
Given Execution
(Prototyped this but definitely need to
re-code this)
QBF
BOOLIFY
SPEC OF MEMORY MODEL IN hol
CONVERT TO EXECUTION CHECKER PROGRAM
SAT PROBLEM
PROGRAM
Given Execution
68
The alternative is to produce a manual proof
Even this simple Litmus Test has a 1-page
detailed proof
P st x 1 mf ld r1 y lt0gt
R ld . acq r2 y lt1gt ld r3 x
lt0gt
Q st . rel y 1
Atomicity of st.rel
Load of initial value is before store of every
other value
69
The MPEC Tool Flow
MP execution to be verified
Mechanical Program Derivation (to be automated)
Itanium Ordering rules in HOL

Checker Program
R ld.acq r2 y lt1gt ld r3 x
lt0gt
P st x 1 mf ld r1 y lt0gt
Q st.rel y 1
Satisfiability Problem with Clauses
carrying annotations
Sat Solver
RECENT WORK
Sat
Unsat
Unsat Core Extraction using Zcore
Explanation in the form of one possible interleavi
ng
  • Find Offending Clauses
  • Trace their annotations
  • Determine ordering cycle

70
Largest example tried to date (courtesy S.
Zeisset, Intel)
Proc 2 ld4 r24 733a74
lt415e304gt st4.rel 175984 96ab4e1f 67 more
instructions ld8 r87 56460
ltb5c113d7ce4783b1gt
Proc 1 st8 12ca20 7f869af546f2f14c ld r25
45180 lt87b5e547172644a8gt 58 more
instructions st2 7c2a00 4bca
  • Initially the tool gave a trivial violation
  • Diagnosed to be forgotten memory initialization
  • Added method to incorporate memory
    initialization in our tool
  • Our tool found the exact same cycle as pointed
    out by author of test

Cycle found thru our tool st.rel (line 18,
P1) ? ld (line 22, P2) ? mf ? ld (line 30, P2) ?
st (line 11, P1)
71
Statistics Pertaining to Case Study
  • 140 total instructions
  • All runs were on a 1.733 GHz 1GB Redhat
    Linux V9 Athlon
  • 1 minutes to generate Sat instance
  • 9M clauses ( O(n3) in terms of
    instructions )
  • 117,823 variables ( not a problem )
  • 1 minute to run Sat (unsat here) 0.2 sec to
    do real work
  • Zcore runs fast gave 23 clauses in one
    iteration

72
Overview of MPEC
  • Example of how a HOL rule was turned into a SAT
    generator
  • How the SAT part was done

Throwing an efficient transitivity blanket
over a problem to cover it with whatever
transitivity it begs for !!
  • What more to expect
  • Related work

73
Gist of constraints
  • Some arrangements are statically known

Implies
and
  • Others are conditional
  • Some must form an atomic set

Everybody else Strictly before or Strictly after.
  • Many are unordered
  • Find a strict total order satisfying all
    the above !

74
Gist of constraint ENCODING
j
1
N
1
1
  • Use Boolean precedence matrix
  • Capture i before j by m_ij

1
i
1
N
Statically known
? Unit clauses
? Boolean formula
Implies
and
Atomic set
? See how SAT-generator is derived
  • Spew out irreflexivity and totality axioms
  • Then throw a transitivity blanket
  • on top of all tuples

Strict total order
75
Other Approaches Tried
  • Small Domain method (n logn encoding)
  • Generates fantastically hard SAT problems!
  • Chokes many SAT solvers Zchaff-II can handle
    it well
  • Incremental SAT (see CAV04)
  • QBF version initial prototype needs lots of
    work
  • can serve to provide good QBF benchmarks..

76
Approaches to transitivity blanket
Naïve For all tuples i, j, and k, generate
m_ij /\ m_jk ? m_jk Too many
clauses (1B for a 1000-tuple program) Better
Obtain transitive-closure of known orderings
and then prune irrelevant parts of
the blanket
E.g., if m_ij is known, dont generate
m_ij /\ ? as well as
/\ m_ij ?
77
Obtaining SAT-generator from HOL
atomicWBRelease(exec,order) forall (i
in exec).(j in exec).(k in exec). (i.op
StRel) /\ (i.wrType Remote) /\ (attr_of i.var
WB) /\ (i.wrID k.wrID)
/\ order(i,j) /\ order(j,k) gt (j.wrID
i.wrID) atomicWBRelease(exec,order) forall
(i in exec).(j in exec).(k in exec). (i.op
StRel) /\ (i.wrType Remote) /\ (attr_of i.var
WB) /\ (i.wrID k.wrID)
/\ (j.wrID i.wrID) gt (order(i,j) /\
order(j,k)) atomicWBRelease(exec,order)
forall (i in exec). (i.op StRel) /\ (i.wrType
Remote) /\ (attr_of i.var WB)
gt forall (k in exec).
(i.wrID k.wrID)
gt forall (j in exec).
(j.wrID i.wrID)

gt

(order(i,j) /\ order(j,k))

Initial Spec
Applying Contrapositive
After Reducing quantifier Scopes
78
Obtaining SAT-generator from HOL
atomicWBRelease(exec,order) forall (i in
exec). (i.op StRel) /\ (i.wrType Remote) /\
(attr_of i.var WB)
gt forall (k in exec). (i.wrID
k.wrID)
gt forall (j in exec). (j.wrID
i.wrID)

gt

(order(i,j) /\ order(j,k)) atomicWBRelease(exec
) forall(i,exec,wb(i)) wb(i) if
((attr_of i.varWB) (i.opStRel)
(i.wrTypeRemote) then true
else forall(k,exec,wb1(i,k)) wb1(i,k) if
(i.wrIDk.wrID)
then true
else forall(j,exec,wb2(i,k,j)) wb2(i,k,j)
if (j.wrIDi.wrID)
then true
else (order(i,j) order(j,k))
forall(i,S, e(i)) for all i in S
e(i) ( foldr( map (fn i -gt e(i)) (S)
(), true) )
Transformed Spec

Functional Program that generates the constraints
(will be automated)
79
Clause annotations for the unsat core for example
op1 11 op2 -1 op3 -1 op4 -1 rule
ReadValue op1 11 op2 -1 op3 -1 op4 -1
rule ReadValue op1 11 op2 -1 op3 -1
op4 -1 rule ReadValue op1 11 op2 10
op3 -1 op4 -1 rule ReadValue op1 -1
op2 -1 op3 -1 op4 -1 rule NoRule op1
12 op2 -1 op3 -1 op4 -1 rule
ReadValue op1 12 op2 -1 op3 -1 op4 -1
rule ReadValue op1 12 op2 -1 op3 -1
op4 -1 rule ReadValue op1 12 op2 -1
op3 -1 op4 -1 rule ReadValue op1 12
op2 4 op3 -1 op4 -1 rule ReadValue op1
12 op2 -1 op3 -1 op4 -1 rule
ReadValue op1 -1 op2 -1 op3 -1 op4 -1
rule NoRule op1 10 op2 12 op3 -1 op4
-1 rule AtomicWBRelease op1 10 op2 11
op3 -1 op4 -1 rule AtomicWBRelease op1
10 op2 11 op3 10 op4 -1 rule
AtomicWBRelease op1 10 op2 11 op3 9 op4
-1 rule AtomicWBRelease op1 10 op2 11
op3 8 op4 -1 rule AtomicWBRelease op1
10 op2 11 op3 8 op4 -1 rule
AtomicWBRelease op1 10 op2 11 op3 8 op4
-1 rule AtomicWBRelease op1 10 op2 11
op3 8 op4 -1 rule AtomicWBRelease
op1 1 op2 -1 op3 -1 op4 -1 rule
Reflexive op1 4 op2 5 op3 6 op4 -1
rule TransitiveOrder op1 4 op2 5 op3
-1 op4 -1 rule ProgramOrder op1 4 op2
6 op3 8 op4 -1 rule TransitiveOrder op1
4 op2 11 op3 12 op4 -1 rule
TransitiveOrder op1 5 op2 6 op3 -1 op4
-1 rule ProgramOrder op1 6 op2 8 op3
-1 op4 -1 rule TotalOrder op1 10 op2
11 op3 -1 op4 -1 rule TotalOrder op1
11 op2 4 op3 8 op4 -1 rule
TransitiveOrder op1 11 op2 4 op3 -1 op4
-1 rule TotalOrder op1 11 op2 12 op3
-1 op4 -1 rule ProgramOrder op1 -1 op2
-1 op3 -1 op4 -1 rule NoRule op1 6
op2 -1 op3 -1 op4 -1 rule
ReadValue op1 6 op2 -1 op3 -1 op4 -1
rule ReadValue op1 6 op2 -1 op3 -1 op4
-1 rule ReadValue op1 6 op2 -1 op3
-1 op4 -1 rule ReadValue op1 6 op2 8
op3 -1 op4 -1 rule ReadValue op1 6 op2
-1 op3 -1 op4 -1 rule ReadValue op1
-1 op2 -1 op3 -1 op4 -1 rule
NoRule op1 11 op2 -1 op3 -1 op4 -1
rule ReadValue op1 11 op2 10 op3 -1
op4 -1 rule ReadValue
80
Building an Error-trail for UNSAT (infeasible
executions)
denotes an op
1 2 3 4
st x 1
5
mf
Denotes op numbers. Store has both local and
remote exec
6
ld r1 y lt0gt
7 8 9 10
st.rel y 1
ld.acq r2 y lt1gt
11
12
ld r3 x lt0gt
81
Building an Error-trail
1 2 3 4
st x 1
op1 4 op2 5 op3 -1 op4 -1 rule
ProgramOrder
5
mf
6
ld r1 y lt0gt
7 8 9 10
st.rel y 1
ld.acq r2 y lt1gt
11
12
ld r3 x lt0gt
82
Building an Error-trail
1 2 3 4
st x 1
5
mf
op1 5 op2 6 op3 -1 op4 -1 rule
ProgramOrder
6
ld r1 y lt0gt
7 8 9 10
st.rel y 1
ld.acq r2 y lt1gt
11
12
ld r3 x lt0gt
83
Building an Error-trail
1 2 3 4
st x 1
op1 6 op2 -1 op3 -1 op4 -1 rule
ReadValue op1 6 op2 -1 op3 -1 op4 -1
rule ReadValue op1 6 op2 -1 op3 -1
op4 -1 rule R eadValue op1 6 op2 -1
op3 -1 op4 -1 rule ReadValue op1 6
op2 8 op3 -1 op4 -1 rule
ReadValue op1 6 op2 -1 op3 -1 op4 -1
rule ReadValue
5
mf
6
ld r1 y lt0gt
7 8 9 10
st.rel y 1
ld.acq r2 y lt1gt
11
12
ld r3 x lt0gt
84
Building an Error-trail
1 2 3 4
op1 10 op2 12 op3 -1 op4 -1 rule
AtomicWBRelease op1 10 op2 11 op3 -1 op4
-1 rule AtomicWBRelease op1 10 op2 11
op3 10 op4 -1 rule AtomicWBRelease op1
10 op2 11 op3 9 op4 -1 rule
AtomicWBRelease op1 10 op2 11 op3 8 op4
-1 rule AtomicWBRelease op1 10 op2 11
op3 8 op4 -1 rule AtomicWBRelease op1
10 op2 11 op3 8 op4 -1 rule
AtomicWBRelease op1 10 op2 11 op3 8 op4
-1 rule AtomicWBRelease
st x 1
5
mf
6
ld r1 y lt0gt
7 8 9 10
st.rel y 1
ld.acq r2 y lt1gt
11
12
ld r3 x lt0gt
85
Building an Error-trail
1 2 3 4
st x 1
op1 11 op2 -1 op3 -1 op4 -1 rule
ReadValue op1 11 op2 10 op3 -1 op4 -1
rule ReadValue op1 11 op2 -1 op3 -1
op4 -1 rule ReadValue op1 11 op2 -1
op3 -1 op4 -1 rule ReadValue op1 11
op2 -1 op3 -1 op4 -1 rule
ReadValue op1 11 op2 10 op3 -1 op4 -1
rule ReadValue
5
mf
6
ld r1 y lt0gt
7 8 9 10
st.rel y 1
ld.acq r2 y lt1gt
11
12
ld r3 x lt0gt
86
Building an Error-trail
1 2 3 4
st x 1
5
mf
op1 11 op2 12 op3 -1 op4 -1 rule
ProgramOrder
6
ld r1 y lt0gt
7 8 9 10
st.rel y 1
ld.acq r2 y lt1gt
11
12
ld r3 x lt0gt
87
Building an Error-trail
1 2 3 4
st x 1
op1 12 op2 -1 op3 -1 op4 -1 rule
ReadValue op1 12 op2 -1 op3 -1 op4 -1
rule ReadValue op1 12 op2 -1 op3 -1
op4 -1 rule ReadValue op1 12 op2 -1
op3 -1 op4 -1 rule ReadValue op1 12
op2 4 op3 -1 op4 -1 rule
ReadValue op1 12 op2 -1 op3 -1 op4 -1
rule ReadValue
5
mf
6
ld r1 y lt0gt
7 8 9 10
st.rel y 1
ld.acq r2 y lt1gt
11
12
ld r3 x lt0gt
88
MPEC (MP Execution Checker) Tool Demo
HOL Rules For Itanium In a HOL Theory File
Ganesh sitting down and coding
An MPECcable Ocaml Program
Gentuple Assembler SAT Converter Zchaff-II or
other
Printout of Cycle Revealing Error
Zcore CORE Extractor Explain Error
Explainer And DOT file Generator GhostView
SAT Result
SAT (Gives Interleaving)
UNSAT
89
Other Tools Developed in UV Group
  • Yue (Jason) Yangs Dissertation webpage
  • Itanium Litmus-test Checker in Constraint Prolog
  • NemosFinder Easily Parameterizable
    Litmus-Checker Suite
  • in Constraint Prolog
  • UMM Tool Easily Parameterizable Murphi
    Operational Model
  • for writing Operational Specs of Memory Models
  • DefectFinder Demo Prototype of Memory-model
    Aware
  • Race Analyzer
  • Now at MSR
  • (www.cs.utah.edu/yyang/) -- now
    jasony_at_microsoft.com

90
Part 3 Whats not apparent at first
glance
91
Topics
  • Formal verification approaches to memory
    consistency compliance
  • How to model the interface of the shared
    memory?
  • Execution based
  • IO mappings based
  • What is wrong if an Execution based approach
    is chosen ?
  • Finite-state realizability
  • A transducer-based model of shared memory
  • - Highlights of results
  • Whither undecidability ?

92
Formal Verification Approaches
Agreement
Imp of Shared Memory Consistency Model (a
protocol)
Spec of Shared Memory Consistency Model
  • Several paper-and-pencil proofs
  • Arons (pvs-based)
  • McMillan (CTL model-checking based)
  • Nalumasu et.al. (Test Automata based)
  • Qadeer (1. Finding a serializer. 2.
    Automated for simple write order)
  • Bingham et.al. (Window observer based)

93
Other Formal Approaches
  • Park, Dill, Nowatzyk
  • Pong and Dubois (several papers)
  • Colliers work
  • Ghughals adaptation of above for weak memory
    models
  • Chatterjee (CAV02)
  • Yu, Tuttle, Lamport
  • Shen, Arvind
  • Ahamad, Neiger
  • (Check webpage of MPV00 www.cs.utah.edu/mpv )
  • Steinke and Nutt
  • Gibbons, Gharachorloo
  • Adve, Pugh
  • (a survey will take too long)

94
Modeling the Interface of Shared Memory
Spec
Imp
  • Trace Based
  • - Most existing works
  • IO Mappings Based
  • The original Lazy-caching paper (casual use)
  • Kawash and Higham (defines Specs this way

  • Implementations not addressed)
  • Sezgin et.al. (defines Specs and Imps
    Correspondence)

Read(proc, addr, data), Write(proc,addr,data),
Read_i(proc, addr), Write_i(proc,addr,data),
Spec
Imp
Read_o(proc, addr, data), Write_o(proc,addr,data),

95
Whats wrong with trace-based approaches?
  • Permits making statements about uninteresting or
  • unrealizable machines
  • Muddies exact import of the famous
    undecidability result
  • (Alur et.al)

96
Example 1 Finiteness cannot be
adequately described thru regular sets of
executions alone
Consider the set of executions w(1,a,2)
r(1,a,1) r(2,a,2) w(2,a,1) -- defines the
TEMPORAL order of events All these are
considered SC because we can build a LOGICAL
order w(1,a,2) r(2,a,2) w(2,a,1)
r(1,a,1) But how can the above TEMPORAL order
be generated by a FSM ?
P1 P2 ---
--- w(a,2) r(a,2)
r(a,2) r(a,1)
r(a,1) r(a,2)
r(a,1) w(a,1)
97
Example 1 continued (take specific unravelling
of )
Temporal Order
Logical Order w(1,a,2) r(1,a,1)2N
r(2,a,2)2N w(2,a,1) w(1,a,2) r(2,a,2)2N
w(2,a,1)2N r(1,a,1)
Program fed So far
Output generated So far
w(1,a,2)
A FSM Implementation Of Seq Consistency With N
Internal States
w(1,a,2)
w(1,a,2)
A FSM Implementation Of Seq Consistency With N
Internal States
w(1,a,2) r(1,a)K, r(2,a)L
w(1,a,2) r(1,a,1)
A FSM Implementation Of Seq Consistency With N
Internal States
w(1,a,2) r(1,a)K, r(2,a)L NO w(2,a,1)
FAIL ! O/P w/o Input !!
98
Example 1 continued (take specific unravelling
of )
Temporal Order
Logical Order w(1,a,2) r(1,a,1)2N
r(2,a,2)2N w(2,a,1) w(1,a,2) r(2,a,2)2N
w(2,a,1)2N r(1,a,1)
Program fed So far
Output generated So far
wo(1,a,2)
A FSM Implementation Of Seq Consistency With N
Internal States
wi(1,a,2)
wo(1,a,2)
A FSM Implementation Of Seq Consistency With N
Internal States
wi(1,a,2) ri(1,a)K, ri(2,a)L
wo(1,a,2)
A FSM Implementation Of Seq Consistency With N
Internal States
wi(1,a,2) ri(1,a)K, ri(2,a)L wi(2,a,1)
FAIL ! Too many inputs w/o output
99
Example 1 continued (take specific unravelling
of )
Temporal Order
Logical Order w(1,a,2) r(1,a,1)2N
r(2,a,2)2N w(2,a,1) w(1,a,2) r(2,a,2)2N
w(2,a,1)2N r(1,a,1)
wo(1,a,2)
A FSM Implementation Of Seq Consistency With N
Internal States
wi(1,a,2) ri(1,a)K, ri(2,a)L wi(2,a,1)
FAIL ! Too many inputs w/o output
wi(1,a,2)
wi(1,a,2)
Labeled by
ri(1,a)K, ri(2,a)L
We can pump this loop, thus making it possible
to generate the SAME execution for arbitrary long
programs !!
100
Restrictions in contemporary work that enables SC
verification
  • Bingham, Condon, Hu
  • - Require Prefix Closure (no outputs w/o
    input)
  • e.g. the trace of length 1 r(1,a,1)
  • - Rule out Prophetic Inheritance

i.e. Temporal Orders of the form
w(1,a,2) r(1,a,1)2N
r(2,a,2)2N w(2,a,1)
101
Restrictions in contemporary work that enables SC
verification
  • Qadeer
  • Requires Simple Write Ordering
  • The order of the writes to the same address
  • in the temporal order and the logical order
  • must be the same
  • (But they provide an automated model-checking
  • based verification method for this class of
    SC protocols)

Temporal Order w(1,a,1) w(2,a,2)
r(3,a,2) r(4,a,1)
Required Logical Order w(2,a,2) r(3,a,2)
w(1,a,1) r(4,a,1)
lt diagram of Lazy Caching here gt
102
Taxonomy of formal SC modeling approaches
  • Alur et.al.
  • Not Necessarily Prefix Closed (NNPC) regular
    traces model the SC language
  • Checking containment of the (regular) language of
    the Implementation is undecidable
  • Bingham, Condon, and Hu
  • - DSC trace set (Decisive Sequential
    Consistency)
  • Sezgins work
  • Models memory systems using regular transducers
  • Defines EXACTLY what finite-state realizable SC
    systems are
  • SC verification is language containment
  • Provides a semi-decision procedure for SC
    verification in this setting

103
Example 2 (Sezgin) The dangers of trace-based
modeling
  • Imagine a memory system implementation that does
    this
  • Accept reads and writes
  • If the first P A instructions are writes,
    and further
  • these contain exactly one write by each
    processor to each address
  • THEN go into malevolent mode (disconnect the
    shared memory)
  • ELSE go into benevolent mode (behave like
    serial memory)

P1
P2
Pn
Benevolent Mode Connections
Malevolent Mode Connections

Single Serial Memory Unit M
M1
M2
Mn
104
Example 2 (Sezgin)
Example P 1,2,3 and Aa and D
0,1,2
Benevolent Mode from now on, since the second
instrn is a read
w(1,a,2) r(3,a, 2) w(2,a,1) r(1,a, 1)
Malevolent Mode from now on, as we have pa writes
w(1,a,1) w(3,a,2) w(2,a,0) r(1,a,1)
r(2,a,0) r(3,a,2) w(1,a,2)
w(2,a,1) r(1,a,2) r(2,a,1) r(3,a,2)
LOGICAL ORDER
w(1,a,1) r(1,a,1) w(1,a,2) r(1,a,2)
w(2,a,0) r(2,a,0) w(2,a,1)
r(2,a,1) w(3,a,2) r(3,a,2) r(3,a,2)
105
Whoa? Any Logical Order will do?!
TEMPORAL ORDER
w(1,a,1) w(3,a,2) w(2,a,0) r(1,a,1)
r(2,a,0) r(3,a,2) w(1,a,2)
w(2,a,1) r(1,a,2) r(2,a,1) r(3,a,2)
LOGICAL ORDER
w(1,a,1) r(1,a,1) w(1,a,2) r(1,a,2)
w(2,a,0) r(2,a,0) w(2,a,1)
r(2,a,1) w(3,a,2) r(3,a,2) r(3,a,2)
  • A Logical Order had better be not fiction it
    should be a possible schedule
  • in a could have happened sense
  • Viewed from that angle, the above logical order
    is nonsense because it allows
  • certain actions to be postponed unboundedly
  • Sezgins formal definition of Implementations
    builds in boundedness
  • BCH address an instance of this in their
    past-time SC idea
  • Sezgins SC machines give logical order out as
    Commit Order

106
Status of SC undecidability
  • Alur et.al. UNDECIDABLE
    NNPC is

  • under NNPC unrealistic
  • Qadeer Decidable
    Simple Write Order

  • under simple write order rules out some


  • protocols
  • Bingham, Condon, and Hu Decidable under
    simple These dont capture

  • write order also in exactly those
    that

  • DSC_k are FS
    realizable
  • Sezgins work Decidability
    open Captures exactly the

  • class
    of FS realizable

  • protocols in a
    detailed manner

  • (Input or programs explicitly
    modeled)

107
Concluding Remarks
  • Importance of topic unlikely to diminish
  • Platform compliance is a big deal
  • High-performance OS kernel writers need to know
  • Think of proving a distributed Garbage Collector
    running on a Weak Memory Model (would be a great
    PhD topic)
  • Ive omitted too many important names I cant
    even remember
  • Partial list Adve, Gharachorloo, Pugh, Arvind,
    Collier,

108
Acknowledgements (sorry for omissions)
  • Past students / postdoc Nalumasu, Ghughal,
    Mokkedem, Hosabettu, Jones, Sivaraj, Yang, Yang,
    Kuramkote
  • Faculty colleagues Lindstrom, Slind, Carter
  • Funding agencies NSF, SRC
  • Industrial Liaisons Corella, Chou, German,
    Vaid, Neiger, Zeisset, Park
  • Other favorable influences Mathews, Tuttle, Yu,
    Joshi, Dill, Pong, Nowatzyk, Lamport, Hu, Condon,
    Higham, Kawash, Jackson
  • Who am I forgetting?
Write a Comment
User Comments (0)
About PowerShow.com