Concurrent Programming Without Locks - PowerPoint PPT Presentation

1 / 75
About This Presentation
Title:

Concurrent Programming Without Locks

Description:

Consider a blocked thread responsible for real time task- causes a lot of damage! ... No threads need to wait for an access to a resource, and disjoint access ... – PowerPoint PPT presentation

Number of Views:75
Avg rating:3.0/5.0
Slides: 76
Provided by: creo
Category:

less

Transcript and Presenter's Notes

Title: Concurrent Programming Without Locks


1
Concurrent Programming Without Locks
  • Based on Fraser Harris paper

2
Motivation Whats wrong with mutual exclusion
locks?
  • Hard to design scalable locking strategies, and
    therefore need special programming care.
  • Certain interactions between locks cause errors
    such as deadlock, livelock and priority inversion
    (high priority thread is forced to wait to a low
    priority thread holding a resource he needs).
  • For good performance, programmers need to make
    sure software will not hold locks for longer
    than necessary.
  • Also for high performance, programmers must
    balance the granularity at which locking operates
    against the time that the application will spend
    acquiring and releasing locks.
  • Consider a blocked thread responsible for real
    time task- causes a lot of damage!

3
Solutions without using locks
  • Still safe for use in multi threaded multi
    processor shared memory machines.
  • Maintain several important characteristics.
  • Reduces responsibilities from the programmer.
  • The implementations we will see are non blocking.

4
Definitions
  • Non- blocking even if any set of threads is
    stalled then the remaining threads can still make
    progress.
  • Obstruction free the weakest guarantee. A thread
    is only guaranteed to make progress (and finish
    his operation in a bounded number of steps) so
    long as it doesnt contend with other threads for
    access to any location.

5
  • Lock- freedom adds the requirement that the
    system as a whole makes progress, even if there
    is contention. Lock-free algorithm can be
    developed from obstruction free one by adding
    helping mechanism.
  • helping mechanism- If a thread t2 encounters
    thread t1 obstruction it, then t2 helps t1 to
    complete t1s operation, and then complete its
    own.
  • A thread can decide to wait for the other
    thread to complete his operation, or even cause
    him to abort it.
  • an obstruction-free transactional API
    requires transactions to eventually
  • commit successfully if run in isolation, but
    allows a set of transactions to livelock aborting
    one another if they contend. A lock-free
    transactional API requires some transaction to
    eventually commit successfully even if there is
    contention.

6
Overview
  • Goals of the design
  • Introducing the APIs
  • Programming with the APIs
  • Design methods
  • Practical design implementation of the APIs
    (shortly)
  • Performance of data structures built over the
    APIs.

7
Goals
  • Concreteness- This means we build from atomic
    single word read, write and compare-and-swap
    (CAS) operations.
  • Remainder Compare and Swap
  • atomically word CAS (word a, word e, word
    n)
  • word x a
  • if ( x e ) a n
  • return x

8
  • Linearizability basically means that the
    function appear to occur atomically (to other
    threads) at some point between when it is called
    and when it returns.
  • Non blocking progress- mentioned before.
  • Disjoint access parallelism- operations that
    access disjoint sets of words in memory should be
    able to execute in parallel.
  • Dynamicity- Our APIs should be able to support
    dynamically-sized data structures, such as lists
    and trees.
  • Practicable space cost- Space costs should scale
    well with the number of threads and the volume of
    data managed using the API. (reserve no more than
    2 bits in each word)

9
  • Composability - If multiple data structures
    separately provide operations built with one of
    our APIs then these should be composable to form
    a single compound operation which still occurs
    atomically (and which can itself be composed with
    others).
  • Read parallelism- Implementations of our APIs
    should allow shared data that is read on
    different CPUs to remain in shared mode in those
    CPUs data caches. (reduces overhead)

10
Common to all APIs
  • Linearizable
  • Non blocking
  • Support dynamically sized data structures

11
(No Transcript)
12
Overview
  • Goals of the design
  • Introducing the APIs
  • Programming with the APIs
  • Design methods
  • Practical design implementation of the APIs
    (shortly)
  • Performance of data structures built over the
    APIs.

13
The APIs
  • Multi word compare swap (MCAS)
  • Word based software transactional memory (WSTM)
  • Object based software transactional memory (OSTM)

14
MCAS is defined to operate on N distinct memory
locations (ai), expected values (ei), and new
values (ni) each ai is updated to value ni if
and only if each ai contains the expected value
ei before the operation. MCAS returns TRUE if
these updates are made and FALSE otherwise. //
Update locations a0..aN-1 from e0..eN-1
to n0..nN-1 bool MCAS (int N, word a ,
word e , word n ) This action atomically
updates N memory locations.
15
Heap accesses to words which may be subject to a
concurrent MCAS must be performed by calling
MCASRead. Reason MCAS implementation places
its own values in these locations while they are
updated. // Read the contents of location
a word MCASRead (word a)
16
Advantages
  • Effective when need to update a number of memory
    locations from one consistent state to another.
  • Eases the burden of ensuring correct
    synchronization of updates.

17
Disadvantages
  • it is a low level API gt programmers must
    remember that the subsequent MCAS is not aware of
    locations that have been read. Therefore the
    programmer needs to keep lists of such locations
    and their values, and to pass them to MCAS to
    confirm that the values are consistent.
  • (unlike transactions that have read check
    phase)
  • This API will not allow us to reach our goal of
  • composability.
  • No read parallelism because reading also require
  • updating.

18
What can be done to improve these disadvantages?
Transactions!
19
Reminder- software transactional memory A
concurrency control mechanism for controlling
access to shared memory. It functions as an
alternative to lock based synchronization, and is
implemented in a lock free way. Transaction-
code that executes a series of reads and writes
to shared memory. The operation in which the
changes of a transaction are validated, and if
validation is successful made permanent, is
called commit. A transaction can abort at any
time, causing all of its prior changes to be
undone. If a transaction cannot be committed due
to conflicting changes, it is aborted and
re-executed until it succeeds.
20
The APIs related to STM that we will see follow
an optimistic style the core sequential code is
wrapped in a loop which retries to commit the
operations until it gets done. Every thread
completes its modifications to shared memory
without regard for what other threads might be
doing. It is the reader responsibility, after
completing the transaction, to make sure other
trades havent concurrently made changes to
memory that is accessed in the past.
Advantage increased concurrency. No threads need
to wait for an access to a resource, and disjoint
access parallelism is achieved. Disadvantage
overhead in retrying transactions that failed.
However in realistic programs conflicts arise
rarely.
21
The APIs
  • Multi word compare swap (MCAS)
  • Word based software transactional memory (WSTM)
  • Object based software transactional memory (OSTM)

22
// Transaction management wstm transaction
WSTMStartTransaction() bool
WSTMCommitTransaction(wstm transaction tx)
void WSTMAbortTransaction(wstm transaction
tx) // Data access word WSTMRead(wstm
transaction tx, word a) void
WSTMWrite(wstm transaction tx, word a, word
d)
  • WSTMRead WSTMWrite must be used when accessing
    words that may be subject to a concurrent
    WSTMCommitTransaction.
  • The implementation allows a transaction to commit
    as long as no thread has committed an update to
    one of the locations we accessed.
  • Transactions succeed or fail atomically.

23
What did we improve here?
  • All relevant reads and writes are grouped into a
    transaction that applied to the heap atomically.
    ( read check phase during commit operation)
  • Composability is easy (Why?)
  • This implementation doesnt reserve space in each
    word- allows acting on full word size data rather
    than just on pointer valued fields in which
    sparse bits can be reserved.

24
The APIs
  • Multi word compare swap (MCAS)
  • Word based software transactional memory (WSTM)
  • Object based software transactional memory (OSTM)

25
// Transaction management ostm transaction
OSTMStartTransaction() bool
OSTMCommitTransaction(ostm transaction tx)
void OSTMAbortTransaction(ostm transaction
tx) // Data access t OSTMOpenForReading(ostm
transaction tx, ostm handlelttgt o) t
OSTMOpenForWriting(ostm transaction tx, ostm
handlelttgt o) // Storage management ostm
handleltvoidgt OSTMNew(size t size) void
OSTMFree(ostm handleltvoidgt ptr)
  • We add a level of indirection- OSTM objects are
    accessed by OSTM handles, rather than accessing
    words individually.
  • Before the data it contains can be accessed, an
    OSTM handle must be opened in order to obtain
    access to the underlying object.
  • OSTMOpenForWriting return value refers to a
    shadow copy of the underlying object that is, a
    private copy on which the thread can work before
    attempting to commit its updates.
  • OSTM implementation may share data between
    objects that have been opened for reading between
    multiple threads.

26
  • The OSTM interface leads to a different cost
    profile from WSTM OSTM introduces a cost on
    opening objects for access and potentially
    producing shadow copies to work on, but
    subsequent data access is made directly (rather
    than through functions like WSTMRead and
    WSTMWrite).
  • Furthermore, it admits a simplified non-blocking
    commit operation.

The APIs implementation is responsible for
correctly ensuring that conflicting operations do
not proceed concurrently and for preventing
deadlock and priority inversion between
concurrent operations. The APIs caller remains
responsible for ensuring scalability by making it
unlikely that concurrent operations will need to
modify overlapping sets of locations. However,
this is a performance problem rather than a
correctness or liveness one.
27
Some notes
  • While locks require thinking about overlapping
    operations and demand locking policy, in STM
    things are much simpler
  • each transaction can be viewed in isolation
    as a single thread computation. The programmer is
    not need to worry about deadlock, for example.
  • However, transactions must always be able to
    detect invalidity, and then decide on an action
    (help, wait, abort and retry).

28
Overview
  • Goals of the design
  • Introducing the APIs
  • Programming with the APIs
  • Design methods
  • Practical design implementation of the APIs
    (shortly)
  • Performance of data structures built over the
    APIs.

29
  • typedef struct int key struct node next
    node
  • typedef struct node head list
  • void list insert mcas (list l, int k)
  • node n new node(k)
  • do
  • node prev MCASRead( (l?head) )
  • node curr MCASRead( (prev?next) )
  • while ( curr?key lt k )
  • prev curr
  • curr MCASRead( (curr?next) )
  • n?next curr
  • while ( ?MCAS (1, prev?next,
    curr, n) )
  • Fig. 2. Insertion into a sorted list managed
    using MCAS.
  • typedef struct int key struct node next
    node
  • typedef struct node head list
  • void list insert single threaded (list l, int
    k)
  • node n new node(k)
  • node prev l?head
  • node curr prev?next
  • while ( curr?key lt k ) prev curr
  • curr curr?next
  • n?next curr
  • prev?next n
  • Fig. 1. Insertion into a sorted list.

30
  • typedef struct int key struct node next
    node
  • typedef struct node head list
  • void list insert wstm (list l, int k)
  • node n new node(k)
  • do
  • wstm transaction tx
    WSTMStartTransaction()
  • node prev WSTMRead(tx, (l?head))
  • node curr WSTMRead(tx,
    (prev?next))
  • while ( curr?key lt k )
  • prev curr
  • curr WSTMRead(tx, (curr?next))
  • n?next curr
  • WSTMWrite(tx, (prev?next), n)
  • while ( ?WSTMCommitTransaction(tx) )
  • Fig. 3. Insertion into a sorted list managed
    using WSTM.
  • typedef struct int key ostm handleltnodegt
    next h node
  • typedef struct ostm handleltnodegt head h
    list
  • void list insert ostm (list l, int k)
  • node n new node(k)
  • ostm handleltnodegt n h new ostm
    handle(n)
  • do
  • ostm transaction tx
    OSTMStartTransaction()
  • ostm handleltnodegt prev h l?head h
  • node prev OSTMOpenForReading(tx,
    prev h)
  • ostm handleltnodegt curr h prev?next
    h
  • node curr OSTMOpenForReading(tx,
    curr h)
  • while ( curr?key lt k )
  • prev h curr h prev curr
  • curr h prev ? next h curr
    OSTMOpenForReading(tx, curr h)
  • n?next h curr h
  • prev OSTMOpenForWriting(tx, prev h)
  • prev?next h n h
  • while ( ?OSTMCommitTransaction(tx) )

31
Some notes
  • The APIs could be used directly by expert
    programmers.
  • They can help build a layer inside a complete
    system. (like language features)
  • Adding runtime code generation to support a level
    of indirection that OSTM objects need.

32
Overview
  • Goals of the design
  • Introducing the APIs
  • Programming with the APIs
  • Design methods
  • Practical design implementation of the APIs
    (shortly)
  • Performance of data structures built over the
    APIs.

33
The key problem in the APIs ensuring that a set
of memory accesses appears to occur atomically
when it is implemented by a series of individual
instructions accessing one word at a time. We
separate a locations physical contents in memory
from its logical contents when accessed through
one of the APIs. For each of the APIs there is
only one operation which updates the logical
contents of memory locations MCAS,
WSTMCommitTransaction and OSTMCommit-Transaction.
For each of the APIs we present our design in a
series of four steps.
34
Memory formats
  • Define the format of the heap, the temporary data
    structures that are used.
  • All three of our implementations introduce
    descriptors which (i) set out the before and
    after versions of the memory accesses that a
    particular commit operation proposes to make, and
    (ii) provides a status field indicating how far
    the commit operation has progressed.
  • Descriptors properties (i) managed by garbage
    collector (ii) a descriptors contents is
    unchanged once it is made reachable from shared
    memory (iii) once the outcome of a particular
    commit operation has been decided then the
    descriptors status field remains constant.
  • These first two properties mean that there is
    effectively a one-to-one association between
    descriptor references and the intent to perform a
    given atomic update.

35
Logical contents
  • Each of our API implementations uses descriptors
    to define the logical contents of memory
    locations by providing a mechanism for a
    descriptor to own a set of memory locations.
  • Logical contents of all of the locations in a
    transaction that are need to be updated, is
    updated at once, even if the physical contents of
    these locations is not.

36
Uncontended commit operation
  • Well see how the commit operation arranges to
    atomically update the logical contents of a set
    of locations when it executes without
    interference from concurrent commit operations.
  • A Commit operation contains 3 stages
  • a) acquires exclusive ownership of the
    locations being updated.
  • b) read-check phase ensures that locations
    that have been read but not updated hold the
    values expected in them. This is followed by the
    decision point at which the outcome of the commit
    operation is decided and made visible to other
    threads through the descriptors status field.
  • c) release phase in which the thread
    relinquishes ownership of the locations being
    updated.

Start commit
Decision point
Finish commit
Release a2
Acquire a2
Read check a1
Read a1
Exclusive access to location a2
Location a1 guaranteed valid
37
  • Descriptors status field starts with
    UNDECIDED. If there is a read check phase it
    becomes READ CHECK. At the decision point it is
    set to SUCCESSFUL if all of the required
    ownerships were acquired and the read-checks
    succeeded otherwise it is set to FAILED.
  • In order to show that an entire commit operation
    appears atomic we identify a linearization point
    within its execution at which it appears to
    operate atomically on the logical contents of the
    heap from the point of view of other threads.
  • Unsuccessful commit operation linearization
    point is straightforward, when detected failure.
  • Successful commit operation
  • a) no read check phase the linearization
    point and decision point
  • coincide.
  • b) with read check phase the linearization
    point occurs at the start of
  • the read-check phase.

The linearization point comes before the decision
point!?
problem?
38
Solution
  • The logical contents are dependent on the
    descriptors status field and so updates are not
    revealed to other threads until the decision
    point is reached. We reconcile this definition
    with the use of a read-check phase by ensuring
    that concurrent readers help commit operations to
    complete, retrying the read operation once the
    transaction has reached its decision point. This
    means that the logical contents do not need to be
    defined during the read-check phase because they
    are never required.

39
Contended commit operation
Consider a case when thread t2 encounters a
location currently hold by t1. 1) t1s status
is already decided (successful or failed). All of
the designs rely on having t2 help t1
complete its work, using the information in t1s
descriptor to do so.
2) t1s status is undecided and the algorithm
does not include a READ-CHECK phase.
strategy a) t2 causes t1 to abort if it has not
yet reached its decision
point- that is, if t1s status is still
UNDECIDED. This leads to an
obstruction free progress property and the risk
of livelock unless
contention management is employed to prevent t1
retrying its operation and
aborting t2. strategy b) threads sort the
locations that they require and t2 helps t1 to
complete its operation, even
if the outcome is currently UNDECIDED.
3) t1s status is undecided and the algorithm
does include a READ-CHECK phase. Here there is a
constraint on the order in which locations are
accessed because a thread must acquire locations
it wants to update before it enters the read
check phase.
40
Status read check Data read y Data updated x
Status read check Data read x Data updated y
B
A
reading
reading
x
y
Solution abort at least one of the operations to
break the cycle. however, care must be taken not
to abort them all if we wish to ensure
lock-freedom rather than obstruction-freedom. (In
OSTM it can be done by imposing a total order on
all operations, based on the transactions
descriptor machine address)
41
Overview
  • Goals of the design
  • Introducing the APIs
  • Programming with the APIs
  • Design methods
  • Practical design implementation of the APIs
    (shortly)
  • Performance of data structures built over the
    APIs.

42
MCAS
The MCAS function can be defined sequentially
as atomically bool MCAS (int N, word a ,
word e , word n ) for ( int i 0 i
lt N i ) if ( ai ei ) return FALSE
for ( int i 0 i lt N i ) ai ni
return TRUE MCAS implementation uses CCAS
(conditional compare swap) atomically word
CCAS (word a, word e, word n, word cond)
word x a if ( (x e) (cond 0) )
a n return x
43
Memory formats
typedef struct word
status int N word
aMAX N, eMAX N, nMAX N
mcas descriptor The type of a value read from a
heap location can be tested using the IsMCASDesc
and IsCCASDesc predicates if either predicate
evaluates true then the tested value is a pointer
to the appropriate type of descriptor. These
function implementation require reserving 2 bits
in each word in order to distinguish between
descriptor and other values.
44
Logical contents
  • A location holds an ordinary value this value is
    the logical contents.
  • The location refers to an UNDECIDED descriptor
    the descriptors old value (ei) is the logical
    contents.
  • The location refers to an FAILED descriptor like
    in 2.
  • The location refers to an SUCCESSFUL descriptor
    the new value (ni) is the logical contents of the
    location.

word MCASRead (word a) word v
retry read v CCASRead(a)
if (IsMCASDesc(v)) for (
int i 0 i lt v?N i )
if ( v?ai a )
if (v?status SUCCESSFUL)
if (CCASRead(a) v) return
v?ni else
if (CCASRead(a)
v) return v?ei
goto retry read
return v
the descriptor is searched for an entry
relating to the address being read
re-check of ownership ensures that the status
field was not checked too late once the
descriptor had lost ownership of the location and
was consequently not determining its logical
contents.
45
Commit operations
46
(No Transcript)
47
The conditional part of CCAS ensures that the
descriptors status is still UNDECIDED, meaning
that ei is correctly defined as the logical
contents of ai this is needed in case a
concurrent thread helps complete the MCAS
operation via a call to mcas help at line 18.
An unexpected non-descriptor value is seen (line
17).
when the status field is updated (line 23) then
all of the locations ai must hold references to
the descriptor and consequently the single status
update changes the logical contents of all of the
locations. This is because the update is made by
the first thread to reach line 23 for the
descriptor and so no threads can yet have reached
lines 25-27 and have starting releasing the
addresses.
48
WSTM
  • In WSTM there are 3 improvement of MCAS
  • Space doesnt need to be reserved in each heaps
    location.
  • WSTM implementation is responsible for tracking
    the locations accessed, instead of the caller
    (done in the read check phase)
  • Improving read parallelism locations that are
    read but not updated can be cached in a shared
    memory.

The cost the implementation is more
complex! Therefore we will see first a lock-based
framework and then an obstruction free one.
nothing comes for free!
49
Memory formats
WSTM is based on associating version numbers with
locations in the heap. It uses these numbers to
detect conflicts between transactions.
A table of ownership records (orecs) hold the
information that WSTM uses to co-ordinate access
to the application heap. The orec table has a
fixed size. A hash function is used to map
addresses to orecs.
Each orec holds a version number, or, when update
is being commited, refers to the descriptor of
the transaction involves. We use the notation
to
indicate that address ai is being updated from
value oi at version number voi, to value ni at
version number vni. For read only access oini,
voivni. For an update vnivoi1.
50
used to co-ordinate helping between transactions.
We will use the function isWSTMDesc to obtain if
an orec hold a reference to a transaction.
51
Logical contents
Here there are 3 cases to consider 1 The orec
holds a version number. the logical contents in
this case comes directly from the application
heap. 2 The orec refers to a descriptor that
contains an entry for the address. Here that
entry gives the logical contents. For instance,
the logical contents of a1 is 100 because the
descriptor status is UNDECIDED. 3 The orec
refers to a descriptor that does not contain an
entry for the address. Here the logical contents
come from the application heap. For instance, the
logical contents of a101 is 300 because
descriptor tx1 does not involve a101 even though
it owns r1. Note we cannot determine the logical
content during read-check phase because the
decision point comes only after. Therefore we
rely on threads encountering such a descriptor to
help decide its outcome. A descriptor is well
formed if for each orec it either (i) contains at
most one entry associated with that orec, or (ii)
contains multiple entries associated with that
orec, but the old version number and new version
number are the same in all of them.
52
Lines 32-38 ensure that the descriptor will
remain well formed when the new entry is added to
it. This is done by searching for any other
entries relating to the same orec (line 33). If
there is an existing entry then the old version
numbers must match (line 34). If the numbers do
not match then a concurrent transaction has
committed an update to a location involved with
the same orec tx is doomed to fail (line 35).
Line 36 ensures the descriptor remains well
formed even if it is doomed to fail. Line
37 ensures that the new entry has the same new
version as the existing entries e.g. if there
was an earlier WSTMWrite in the same transaction
that was made to another address associated with
this orec.
53
WSTMWrite updates the entrys new value (lines
4850). It must ensure that the new version
number indicates that the orec has been updated
(lines 5152), both in the entry for addr and, to
ensure well formedness, in any other entries for
the same orec.
54
Uncontended commit operations
a read-check phase checks that the version
numbers in orecs associated with reads are still
current (lines 28-31). Meaning that other
transactions havent changed the value we have
read.
We can see here the steps taking place during a
commit operation that we mentioned earlier.
Notice how this function follow the requirement
of setting the status to SUCCESSFUL atomically
updates the logical contents of all of the
locations written by the transaction.
55
The read-check phase uses read check orec to
check that the current version number associated
with an orec matches the old version in the
entries in a transaction descriptor. If it
encounters another transaction descriptor, then
it ensures that its outcome is decided (line 12)
before examining it.
56
Line 9 the orec is already owned by the current
transaction because its descriptor holds multiple
entries for the same orec. Note that the loop at
line 13 will spin while the value seen in the
orec is another transactions descriptor.
  • This implementation uses orecs as mutual
    exclusion locks, allowing only one transaction to
    own an orec at a given time.
  • A transaction owns an orec when the orec holds a
    pointer to this transactions descriptor.

57
Graphic description of a commit operation
58
Obstruction-free contended commit operations
Designing a non-blocking way to resolve
contention in WSTM is more complex than in MCAS.
The problem with WSTM is that a thread cannot be
helped while it is writing updates to the heap.
If a thread is pre-empted just before one of the
stores then it can be re-scheduled at any time
and perform that delayed update, overwriting
updates from subsequent transactions. In order
to make an obstruction-free WSTM, we make delayed
updates safe by ensuring that an orec remains
owned by some transaction while its possible that
delayed writes may occur to locations associated
with it. This means that the logical contents of
locations that may be subject to delayed updates
are taken from the owning transaction related to
the orec.
59
We will use a new field in the orecs data
structure count field. An orecs count field is
increased each time a thread successfully
acquires ownership of it. The count field is
decreased each time a thread releases ownership
in the obstruction-free variant of release orec.
A count field of zero therefore means that no
thread is in its update phase for locations
related to the orec.
In this implementation we allow ownership to
transfer directly between 2 descriptors. The
release_orec function takes care that the most
recent updates are placed back to the heap.
60
OSTM
  • The design organizes memory locations into
    objects which act as the unit of concurrency and
    update.
  • Another level of indirection data structures
    contain references to OSTM handles.
  • As a transaction runs, the OSTM implementation
    maintains sets holding the handles it accessed,
    depending on the mode they were opened in
    (read-only or read-write). The list of writable
    objects include pointers to old and new versions
    of the data.
  • This organization make it easier on conflicting
    transaction (one that wishes to access a
    currently acquired object) to finds the data it
    needs. However the conflicting transaction must
    peruse the read write list of the transaction, in
    order to find the current copy of a given object.
  • meaning Concurrent reads can still determine
    the objects current value by searching the
    sorted write list and returning the appropriate
    data-block depending on the transactions status.

61
  • Since there is no use in hash function, there is
    no risk of false contention due to hash
    collisions.
  • OSTM implements a simple strategy for conflicting
    resolution if 2 transactions attempt to write
    the same object, the one that acquires the object
    first is considered to be winner.
  • To ensure non-blocking progress, the latter
    arriving thread peruses the winners metadata and
    recursively helps it complete its commit.
  • This, in addition to sorting the transactions
    read write list and acquire them in order, allows
    avoiding circular dependencies and deadlock.
  • While WSTM supports obstruction free
    transactions, OSTM supports lock free progress.

62
Memory formats
  • The current contents of an OSTM object are stored
    within a data-block.
  • We assume that a pointer uniquely identifies a
    particular use of a particular block of memory.
  • Outside of a transaction context, shared
    references to an OSTM object point to a
    word-sized OSTM handle.
  • The state of incomplete transactions is
    encapsulated within a per-transaction descriptor.
    It contains the current status of the transaction
    and lists of objects that have been opened in
    read-only mode and in read-write mode.
  • Ordinarily, OSTM handles refer to the current
    version of the objects data via a pointer to the
    current data-block.
  • If a transaction is in the process of
    committing an update to the object, then the
    handle refers to the descriptor of the owning
    transaction.

We will use the predicate IsOSTMDesc to
distinguishes between references to a datablock
and references to a transaction descriptor
63
Figure (a) shows an example OSTM-based structure
which might be used by a linked-list.
A transaction in progress
64
Logical contents
  • Simpler than WSTM, where there was a many to one
    relationship between orecs and heap words.
  • There are 2 cases
  • 1 The OSTM handle refers to a data-block.
    That block forms the objects logical contents.
  • 2 The OSTM handle refers to a transaction
    descriptor. We then take the descriptors new
    value for the block (if it is SUCCESSFUL) and its
    old value for the block if it is UNDECIDED or
    FAILED.
  • As usual, if a thread encounters a descriptor
    in its read check phase, we require it to help
    advance it to its decision point, where the
    logical content is well defined.

65
Commit operations
As in WSTM, there are 3 stages in a transactions
commit operation Acquire phase handles of
objects opened in read-write mode are acquired in
some global total order. (this is done using CAS
to replace the data block pointer to the
transactions descriptor) Read check phase
handles of objects opened in read mode are
checked in order to see if changed by other
transactions since read. Release phase each
updated object has its data block pointer
pointing to the correct value of the data.
66
A note
  • In both implementations transactions operate
    entirely in private, and descriptors are only
    revealed when ready to commit. Therefore
    contention is discovered, if existed, only at the
    end of the transaction. This is the Lazy Approach
    mentioned before (remember?).
  • It allows many short time readers to co-exist
    with a long time writer.

67
Overview
  • Goals of the design
  • Introducing the APIs
  • Programming with the APIs
  • Design methods
  • Practical design implementation of the APIs
    (shortly)
  • Performance of data structures built over the
    APIs.

68
Performance evaluation
  • The experiment that was done compared 14 set
    implementation, 6 based on red-black trees and 8
    based on skip lists.
  • Some of them are lock based, ant others are
    implemented using CAS, MCAS, WSTM and OSTM.
  • The experiment compared the implementations both
    in low contention and in varying contention.
  • The trades running committing operations like
    lookup_key, add_key and remove_key
  • The operation chosen by the threads is random,
    but the different probabilities are effected by
    the fact that reads dominates writes in most of
    the cases.

69
performance under low contention.
  • parallel writers are extremely unlikely to update
    overlapping sections of the data structure.
  • A well-designed algorithm which provides
    disjoint-access parallelism will avoid
    introducing contention between these logically
    non-conflicting operations.
  • As expected, the STM-based implementations
    perform poorly compared with the other lock-free
    schemes.
  • The reason there are significant overheads
    associated
  • with the read and write operations (in WSTM)
    or with maintaining the lists of opened objects
    and constructing shadow copies of updated objects
    (in OSTM).
  • The lock-free CAS-based and MCAS-based designs
    perform extremely well because they add only
    minor overheads on each memory access.

70
Performance of skip lists implementations.
Performance of red black trees implementations.
71
performance under varying contention.
  • In non blocking algorithms, when conflicts occur
    they are handled using a fairly mechanism such as
    recursive helping or interaction with the thread
    scheduler.
  • The poor performance of MCAS when contention is
    high is because many operations must retry
    several times before they succeed.
  • We can see here the weakness of locks the
    optimal granularity of locking depends on the
    level of contention.
  • Lock-free techniques avoid the need to take this
    into consideration.
  • in red-black trees implementations, Both
    lock-based schemes suffer contention for cache
    lines at the root of the tree where most
    operations must acquire the multireader lock.

72
Performance of skip lists implementations.
Performance of red black trees implementations.
73
Conclusions
  • The non blocking implementation that we have seen
    can match or surpass the performance of lock
    based alternatives.
  • APIs like STM have benefits in ease of use
    compared with mutual exclusion locks.
  • STM avoids the need to consider problems like
    granularity of locking (which changed dynamically
    according to the level of contention) and the
    order of locking (that can cause deadlock).
  • Therefore, it is possible to use lock free
    techniques in places where traditionally we would
    use lock based synchronization.

74
Bibliography
  • Concurrent Programming Without Locks, KEIR FRASER
    (University of Cambridge Computer Laboratory) and
    TIM HARRIS (Microsoft Research Cambridge), ACM
    Journal Name, Vol. V, No. N, M 20YY, Pages 159.
  • Concurrent Programming Without Locks, by KEIR
    FRASER (University of Cambridge Computer
    Laboratory) and TIM HARRIS (Microsoft Research
    Cambridge), ACM Journal Name, Vol. V, No. N, M
    20YY, Pages 1-48.
  • Lowering the Overhead of Nonblocking Software
    Transactional Memory, Virendra J. Marathe,
    Michael F. Spear, Christopher Heriot, Athul
    Acharya, David Eisenstat, William N. Scherer III,
    and Michael L. Scott, Technical Report 893
    Department of Computer Science, University of
    Rochester, March 2006
  • En.wikipedia.org

75
The End
Write a Comment
User Comments (0)
About PowerShow.com