A Behavioral Memory Model for the UPC Language - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

A Behavioral Memory Model for the UPC Language

Description:

Defines a data race as: ... Defines a race-free program as one in which: ... A program that produces only race-free executions is sequentially consistent ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 19
Provided by: kathyy
Category:

less

Transcript and Presenter's Notes

Title: A Behavioral Memory Model for the UPC Language


1
A Behavioral Memory Model for the UPC Language
  • Kathy Yelick
  • Joint work with Dan Bonachea, Jason Duell,
  • Chuck Wallace

2
Proposal for UPC Spec
  • Replace wording in body of spec with prose that
  • Defines a data race as
  • Two concurrent memory operations from two
    different threads to the same memory location in
    which at least one is a write.
  • Defines a race-free program as one in which
  • All executions of the program are free of data
    races (would be nice if the user could only worry
    about naïve implementations)
  • And states that programs will behave as if all
    operations from each thread execute in order if
    one of the following holds
  • The program is race-free
  • The program contains no relaxed operations
  • Refers readers to an appendix for programs with
    races

3
Formalism
  • The appendix (or later section) of the language
    references would contain something akin to the
    following formalism
  • This can be done in a page or 2
  • In addition it would refer to an extended report
    on
  • An operational (state machine) model of
    semantics
  • A study of various optimizations techniques and
    whether or not they are correct
  • Caching (when to flush, problems with not
    flushing)
  • Reordering by the compiler (should be allowed on
    relaxed operations as long as there are no
    dependencies)
  • Use of non-blocking operations or weak hw models
    (fences)

4
Behavioral Approach
  • Problems with operations specifications
  • Implicit assumptions about implementation
    strategy (e.g., caches)
  • May unnecessarily restrict implementations
  • Intuitive in principle, but complicated in
    practice
  • A Behavioral Approach
  • Based on partial and total orders
  • Using Sequential Consistency definition as model
  • Processor order defines a total order on each
    thread
  • Their union defines a partial order
  • 9 a consistent total order that is correct as a
    serial execution
  • P0
  • P1

5
Some Basic Notation
  • The set of operations is
  • Ot the set of operations issued by thread t
  • The set of memory operations is
  • M m0, m1,
  • Mt the set of memory operations from thread t
  • Each memory operations has properties
  • Thread(mi) is the thread that executed the
    operation
  • Location(mi) is the memory location involved
  • Memory operations are partitioned into 6 sets,
    given by
  • S Strict, RRelaxed, PPrivate
  • WWrite, RRead (in the 2nd position)
  • Some useful groups Strict(M) SW(M) SR(M)
  • W(M)
    SW(M) RW(M) PW(M)

6
Compiler Assumption
  • For specification purposes, assume the code is
    compiled by a naïve compiler in to ISO C machine
  • Real compilers may do optimizations
  • E.g., reorder, remove, insert memory operations
  • Even strict operations may be reordered with
    sufficient analysis (cycle detection)
  • These must produce an execution whose
    input/output and volatile behavior is identical
    to that of an unoptimized program (ISO C)

7
Orderings on Strict Operations
  • Threads must agree on an ordering of
  • For pairs of strict accesses, it will be total
  • For a strict/relaxed pair on the same thread,
    they will all see the program order

8
Orderings on Local Operations
  • Conflicting accesses have the usual definition
  • Given a serial execution S o1,on defining ltS
    let St be the subsequence of operations issued by
    t
  • S conforms to program order for thread t iff
  • St is consistent with the program text for t
    (follows control flow)
  • S conforms to program dependence order for t iff
    9 a permutation P(S) such that
  • P(S) conforms to program order for t
  • 8 (m1, m2) 2 Conflicting(M) m1 ltS m2 , m1 ltP(S)
    m2

9
UPC Consistency
  • An execution on T threads with memory ops M is
    UPC consistent iff
  • 9 a partial ltstrict that orients all pairs in
    allStrict(M)
  • And for each thread t 9 a total order ltt on Ot
    W(M) SR(M)
  • ltt is consistent with ltstrict
  • All threads agree on ordering of strict
    operations
  • ltt conforms to program dependence order
  • Local dependencies are observed
  • ltt is a correct execution
  • Reads return most recent write values

10
Intuition on Strict Oderings
  • P0
  • P1
  • Each thread may build its own total order to
    explain behavior
  • They all agree on the strict ordering shown above
    in black, but
  • Different threads may see relaxed writes in
    different orders
  • Allows non-blocking writes to be used in
    implementations
  • Each thread sees own dependencies, but not those
    of other threads
  • Weak, but otherwise there would place consistency
    requirements on some relaxed operations (e.g.,
    local cache control insufficient)
  • Preserving dependencies requires usual
    compiler/hw analysis

11
Synchronization Operations
  • UPC has both global and pairwise synchronization
  • In addition to the synchronization properties,
    they also have memory model implications
  • Locks
  • upc_lock is a strict read
  • upc_unlock is a strict write
  • Barriers (which may be split-phase)
  • upc_notify (begin barrier) is a strict write
  • upc_wait (end of barrier) is a strict read
  • upc_barrier upc_notify upc_wait

12
Alternative Models
  • As specified, two relaxed writes to the same
    location may be viewed differently by different
    processors
  • Nothing to force eventual consistency (likely in
    implementations)
  • May add this to barrier points, at least
  • So far it looks ad hoc
  • Adding directionality to reads/writes seems
    reasonable
  • Strict reads fence things that follow
  • Strict writes fence things that precede
  • Simply replace for StrictOnThreads definition
  • Support user-defined synchronization primitive
    built from strict operations

13
Some Bizarre Behavior
  • The following out of thin air behavior
  • Given shared variables xy, where xy are
    initially 0
  • t0 r1 x y r1
  • t1 r2 y x r2
  • x and y end with 42 (or any other arbitrary
    value)
  • How does this happen?
  • t0 speculates that x is 42 and writes that value
    to y
  • t1 sees 42 in y and writes it into x
  • this validates t0s speculative read

14
Atomicity Issues
  • Atomicity Is there a word size (or type) such
    that
  • A write of anything larger is defined as a set of
    word-sized operations (so a user might see a
    partial update)
  • E.g., is writing a struct the same as writing
    each field (or some maximum size)
  • Tearing Is there a word size (or type) such that
  • Can two writes to the same location result in a
    merged value?
  • Clobbering Is there a word size (or type) such
    that
  • If something smaller is written, it might clobber
    writes to a neighboring value
  • E.g., two processors write to two consecutive
    bytes in an array, the processor does a
    read-modify-write for each, one can be lost
  • Conflicts on what size are these defined?

15
UPC Bulk Operation Semantics
  • Are upc_memput, upc_memget, upc_memcpy relaxed or
    strict?
  • If relaxed, then the user can get strict behavior
    by putting a strict operation (or operations in
    the nonsymmetric case) before and after
  • Will this be surprising to users?
  • What do current implementations do?

16
UPC Fence Operations
  • Should UPC have separate functions for
  • read fence prevents memory operations from
    moving before it
  • write fence prevents memory operations from
    moving after it
  • Or let the programming build these by doing a
    stricture read/write to some otherwise unused
    variable?

17
Future Plans
  • Show that various implementations satisfy this
    spec
  • Use of non-blocking writes for relaxed writes
    with write fench/synch at strict points
  • Compiler-inserted prefetching of relaxed reads
  • Compiler-inserted message vectorization to
    aggregate a set of small operations into one
    larger one
  • A software caching implementation with cache
    flushes at strict points
  • Develop an operational model and show equivalence
    (or at least that it implements the spec)
  • Define the data unit of atomicity
  • Fundamental unit of interleaving, Data tearing,
    Conflicts

18
Properties of UPC Consistency
  • A program containing only strict operations is
    sequentially consistent
  • A program that produces only race-free executions
    is sequentially consistent
  • A UPC consistent execution of a program is
    race-free if for all threads t and all enabling
    orderings ltt
  • For all potential races
  • If m1ltt m2 then 9 synchronization operations o1,
    o2 such that m1ltt o1ltt o2ltt m2 and Thread(o1)
    Thread(m1) and Thread(o2) Thread (m2) and
    either
  • o1 is upc_notify and o2 is upc_wait or
  • o1 is upc_unlock and o2 is upc_lock on the same
    lock variable
Write a Comment
User Comments (0)
About PowerShow.com