A Behavioral Memory Model for the UPC Language - PowerPoint PPT Presentation

1 / 18

About This Presentation

Title:

A Behavioral Memory Model for the UPC Language

Description:

Defines a data race as: ... Defines a race-free program as one in which: ... A program that produces only race-free executions is sequentially consistent ... – PowerPoint PPT presentation

Number of Views:24

Avg rating:3.0/5.0

Slides: 19

Provided by: kathyy

Learn more at: https://people.eecs.berkeley.edu

Category:

more less

Transcript and Presenter's Notes

Title: A Behavioral Memory Model for the UPC Language

1
A Behavioral Memory Model for the UPC Language

Kathy Yelick
Joint work with Dan Bonachea, Jason Duell,
Chuck Wallace

2
Proposal for UPC Spec

Replace wording in body of spec with prose that
Defines a data race as
Two concurrent memory operations from two
different threads to the same memory location in
which at least one is a write.
Defines a race-free program as one in which
All executions of the program are free of data
races (would be nice if the user could only worry
about naïve implementations)
And states that programs will behave as if all
operations from each thread execute in order if
one of the following holds
The program is race-free
The program contains no relaxed operations
Refers readers to an appendix for programs with
races

3
Formalism

The appendix (or later section) of the language
references would contain something akin to the
following formalism
This can be done in a page or 2
In addition it would refer to an extended report
on
An operational (state machine) model of
semantics
A study of various optimizations techniques and
whether or not they are correct
Caching (when to flush, problems with not
flushing)
Reordering by the compiler (should be allowed on
relaxed operations as long as there are no
dependencies)
Use of non-blocking operations or weak hw models
(fences)

4
Behavioral Approach

Problems with operations specifications
Implicit assumptions about implementation
strategy (e.g., caches)
May unnecessarily restrict implementations
Intuitive in principle, but complicated in
practice
A Behavioral Approach
Based on partial and total orders
Using Sequential Consistency definition as model
Processor order defines a total order on each
thread
Their union defines a partial order
9 a consistent total order that is correct as a
serial execution

5
Some Basic Notation

The set of operations is
Ot the set of operations issued by thread t
The set of memory operations is
M m0, m1,
Mt the set of memory operations from thread t
Each memory operations has properties
Thread(mi) is the thread that executed the
operation
Location(mi) is the memory location involved
Memory operations are partitioned into 6 sets,
given by
S Strict, RRelaxed, PPrivate
WWrite, RRead (in the 2nd position)
Some useful groups Strict(M) SW(M) SR(M)
W(M)
SW(M) RW(M) PW(M)

6
Compiler Assumption

For specification purposes, assume the code is
compiled by a naïve compiler in to ISO C machine
Real compilers may do optimizations
E.g., reorder, remove, insert memory operations
Even strict operations may be reordered with
sufficient analysis (cycle detection)
These must produce an execution whose
input/output and volatile behavior is identical
to that of an unoptimized program (ISO C)

7
Orderings on Strict Operations

Threads must agree on an ordering of
For pairs of strict accesses, it will be total
For a strict/relaxed pair on the same thread,
they will all see the program order

8
Orderings on Local Operations

Conflicting accesses have the usual definition
Given a serial execution S o1,on defining ltS
let St be the subsequence of operations issued by
t
S conforms to program order for thread t iff
St is consistent with the program text for t
(follows control flow)
S conforms to program dependence order for t iff
9 a permutation P(S) such that
P(S) conforms to program order for t
8 (m1, m2) 2 Conflicting(M) m1 ltS m2 , m1 ltP(S)
m2

9
UPC Consistency

An execution on T threads with memory ops M is
UPC consistent iff
9 a partial ltstrict that orients all pairs in
allStrict(M)
And for each thread t 9 a total order ltt on Ot
W(M) SR(M)
ltt is consistent with ltstrict
All threads agree on ordering of strict
operations
ltt conforms to program dependence order
Local dependencies are observed
ltt is a correct execution
Reads return most recent write values

10
Intuition on Strict Oderings

Each thread may build its own total order to
explain behavior
They all agree on the strict ordering shown above
in black, but
Different threads may see relaxed writes in
different orders
Allows non-blocking writes to be used in
implementations
Each thread sees own dependencies, but not those
of other threads
Weak, but otherwise there would place consistency
requirements on some relaxed operations (e.g.,
local cache control insufficient)
Preserving dependencies requires usual
compiler/hw analysis

11
Synchronization Operations

UPC has both global and pairwise synchronization
In addition to the synchronization properties,
they also have memory model implications
Locks
upc_lock is a strict read
upc_unlock is a strict write
Barriers (which may be split-phase)
upc_notify (begin barrier) is a strict write
upc_wait (end of barrier) is a strict read
upc_barrier upc_notify upc_wait

12
Alternative Models

As specified, two relaxed writes to the same
location may be viewed differently by different
processors
Nothing to force eventual consistency (likely in
implementations)
May add this to barrier points, at least
So far it looks ad hoc
Adding directionality to reads/writes seems
reasonable
Strict reads fence things that follow
Strict writes fence things that precede
Simply replace for StrictOnThreads definition
Support user-defined synchronization primitive
built from strict operations

13
Some Bizarre Behavior

The following out of thin air behavior
Given shared variables xy, where xy are
initially 0
t0 r1 x y r1
t1 r2 y x r2
x and y end with 42 (or any other arbitrary
value)
How does this happen?
t0 speculates that x is 42 and writes that value
to y
t1 sees 42 in y and writes it into x
this validates t0s speculative read

14
Atomicity Issues

Atomicity Is there a word size (or type) such
that
A write of anything larger is defined as a set of
word-sized operations (so a user might see a
partial update)
E.g., is writing a struct the same as writing
each field (or some maximum size)
Tearing Is there a word size (or type) such that
Can two writes to the same location result in a
merged value?
Clobbering Is there a word size (or type) such
that
If something smaller is written, it might clobber
writes to a neighboring value
E.g., two processors write to two consecutive
bytes in an array, the processor does a
read-modify-write for each, one can be lost
Conflicts on what size are these defined?

15
UPC Bulk Operation Semantics

Are upc_memput, upc_memget, upc_memcpy relaxed or
strict?
If relaxed, then the user can get strict behavior
by putting a strict operation (or operations in
the nonsymmetric case) before and after
Will this be surprising to users?
What do current implementations do?

16
UPC Fence Operations

Should UPC have separate functions for
read fence prevents memory operations from
moving before it
write fence prevents memory operations from
moving after it
Or let the programming build these by doing a
stricture read/write to some otherwise unused
variable?

17
Future Plans

Show that various implementations satisfy this
spec
Use of non-blocking writes for relaxed writes
with write fench/synch at strict points
Compiler-inserted prefetching of relaxed reads
Compiler-inserted message vectorization to
aggregate a set of small operations into one
larger one
A software caching implementation with cache
flushes at strict points
Develop an operational model and show equivalence
(or at least that it implements the spec)
Define the data unit of atomicity
Fundamental unit of interleaving, Data tearing,
Conflicts

18
Properties of UPC Consistency

A program containing only strict operations is
sequentially consistent
A program that produces only race-free executions
is sequentially consistent
A UPC consistent execution of a program is
race-free if for all threads t and all enabling
orderings ltt
For all potential races
If m1ltt m2 then 9 synchronization operations o1,
o2 such that m1ltt o1ltt o2ltt m2 and Thread(o1)
Thread(m1) and Thread(o2) Thread (m2) and
either
o1 is upc_notify and o2 is upc_wait or
o1 is upc_unlock and o2 is upc_lock on the same
lock variable