Distributed%20Shared%20Memory%20CIS825%20Project%20Presentation - PowerPoint PPT Presentation

About This Presentation
Title:

Distributed%20Shared%20Memory%20CIS825%20Project%20Presentation

Description:

A mechanism is to be defined for the access of the shared location, otherwise ... sends a message to DSM inquiring about the consistency level provided by the DSM. ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 52
Provided by: sky66
Category:

less

Transcript and Presenter's Notes

Title: Distributed%20Shared%20Memory%20CIS825%20Project%20Presentation


1
Distributed Shared MemoryCIS825 Project
Presentation
  • Sathish R. Yenna Avinash Ponugoti
  • Rajaravi Kollarapu Yogesh Bharadwaj
  • Sethuraman Subramanian Nagarjuna Nagulapati
  • Manmohan Uttarwar

2
Distributed Shared Memory
  • Introduction
  • Consistency models
  • Sequential consistency
  • PRAM consistency
  • Release consistency
  • Final System
  • Performance Evaluation

3
Introduction
  • What is shared memory?
  • - Memory location or object accessed by two or
    more different processes running on same machine
  • - A mechanism is to be defined for the access of
    the shared location, otherwise unpredicted states
    will result
  • - Many operating systems provide various
    mechanisms to avoid simultaneous access of the
    shared memory
  • For example semaphores, monitors etc..

4
  • Ex Consider the Reader/Writer Problem
  • We have a shared buffer into which writer writes
    and reader reads form the values from the same.
  • For avoiding writing on existing value and
    reading the same twice, we need to have a
    mechanism.
  • We have semaphore/monitors provided by OS to
    avoid simultaneous access.
  • What if writer is writing from one machine and
    reader is reading from another machine???

Reader
Memory
Writer
5
  • What is Distributed shared memory?
  • - Memory accessed by two or more processes
    running on different machines connected via a
    communication network
  • Formal Definition
  • A Distributed Shared Memory System is a pair
    (P,M) where P is a set of N processors P1, P2,
    P3, . . . Pn and M is a shared memory.
  • Each process Pi sequentially executes read and
    write operations on data items in M in the order
    defined by the program running on it.

6
  • DSM improves the performance of the whole system
  • An abstraction like DSM simplifies the
    application programming
  • BUT
  • - The main problem is how to keep the memory
    consistent
  • We dont have traditional semaphores or monitors
    to control the accesses in DSM
  • We can implement by keeping the memory at a
    central location and allowing processes at
    different machines to access
  • We can only use the message transmission as an
    aid to control the accesses

7
  • But networks are slow, so for performance
    improvement, we have to keep various copies of
    the same variable at various machines
  • Maintaining perfect consistency (i.e., Any read
    to a variable x returns the value stored by the
    most recent write operation to x) of all the
    copies is hard and results in low performance as
    the processes are on different machines
    communicating over a slow network
  • The solution is to accept less than perfect
    consistency as the price for better performance
  • Moreover many application programs dont require
    strict consistency
  • Because of all these reasons many consistency
    models were defined

8
Consistency Models
  • A consistency model is essentially a contract
    between the software and the memory. If the
    software agrees to obey certain rules, the memory
    promises to work correctly.
  • In our project we are implementing three of
    them
  • - Sequential consistency
  • - PRAM consistency
  • - Release consistency

9
Sequential Consistency
  • A system is sequentially consistent if the
    result of the any execution is the same as if
  • - the operations of all the processors were
    executed in some sequential order, and
  • - the operations of each individual processor
    appear in this sequence in the order specified
    by its program.

10
  • - When processes run in parallel on different
    machines, any valid interleaving is acceptable
    behavior, but all processes must see the same
    sequence of memory references.
  • - Note that nothing is said about time that is
    no reference to the most recent store.
  • - It merely guarantees that all processes see all
    memory references in the same order.
  • - Two possible results of the same program
  • P1 W(x)1 P1 W(x)1
  • ---------------------------------
    ------------------------------------
  • P2 R(x)0 R(x)1 P2 R(x)1 R(x)1

11
Implementation
  • Browns Algorithm
  • Each Process has a queue INi of invalidation
    requests
  • W( x) v Perform all invalidations in IN queue.
  • Update the main memory and cache.
  • Place invalidation requests in IN queue of
    each process.
  • R( x) If x is in cache then read it from
    cache
  • Else
  • Perform all invalidations in INi
  • Read from the main memory

12
Problems with Browns Implementation
  • All the three operations in W( x) v i.e.,
    updating cache, main memory and broadcasting
    invalid message should be done atomically.
  • For ensuring the above atomicity, we will have to
    use robust mechanism involving an agreement by
    all the processes. There is lot of communication
    overhead involved in ensuring this atomicity.
  • For a single write, we have N invalid messages
    being transmitted, where N is the number of
    processes.

13
Sequentially Consistent DSM Protocol- J. Zhou,
M. Mizuno, and G. Singh
  • DSM System consists shared memory module (SMem
    Manager) and local manager (Processor Manager) at
    each machine.
  • Each Processor manager
  • - handles requests to read or write objects
    from the user processes
  • - communicates with SMem manager.
  • SMem manager
  • - processes request messages from
    processor managers to read or write objects.

14
Protocol Description
  • SMem manages the following data structures
  • - Object memory MObject Range
  • - Twodimensional binary array
  • Hold_Last_WriteProcessor Range, Object
    Range
  • At any time T ,
  • - Hold_Last_Writei x1 object x in
    the cache at processor i holds a
    value written by the last- write with
    respect to T ,
  • - Hold_Last_Writei x0 object x in
    the cache at processor i does not hold
    a value written by the last-write with
    respect to T.
  • Each element of Hold_Last_Write is
    initialized to 0.
  • Let us say n processors and m objects.

15
  • Each processor i maintains the following data
    structures
  • Onedimensional binary array Validi Object
    Range
  • -- Validi x 1 object x in the cache is
    valid
  • -- Validi x 0 object x in the cache is
    not valid
  • Each element of Validi is initialized to 0.
  • For each object x such that Validi x 1, Ci
    x (Ci x is cache memory to hold value of
    object x)

16
  • Operations at processor i
  • Write(x v)
  • sendwrite,x v to SMem
  • receiveInvalid_array1..m message from SMem
  • Validi 1..m Invalid_array1..m
    //elementwise // assignments
  • Ci x v
  • Read(x)
  • if Validi x 0 then
  • sendread,x message to SMem
  • receive v, Invalid_array1..m from SMem
  • Validi 1..m Invalid_array1..m
  • Ci x v
  • endif
  • return Ci x

17
  • Operations at SMem
  • Process write,x v message from processor i
  • M x v
  • Hold_Last_Write1..n x 0
  • Hold_Last_Writei x 1
  • send Hold_Last_Writei 1..m to processor i
  • /send processor i's row of Hold_Last_Write
    to i.
  • Processor i receives the row in Invalid
    array /
  • Process read,x message from processor i
  • Hold_Last_Writei x 1
  • send M x, Hold_Last_Writei 1..m to
    processor i
  • Each procedure is executed atomically.

18
Advantages of the SC-DSM Protocol by J. Zhou,
M. Mizuno, and G. Singh
  • The number of messages to be exchanged for read
    and write operations is the same and requires
    considerably less.
  • - A write operation requires one round of
    message exchange between the processor and
    the shared memory.
  • - A read operation at a processor also requires
    one round of message exchange between the
    processor and the shared memory if the object
    is not found in its local cache.
  • The protocol does not require an atomic
    broadcast.
  • The protocol does not require any broadcast of
    messages.

19
Release Consistency
  • Sequential and PRAM consistencies are
    restrictive.
  • For the case when a process is reading or writing
    some variables inside a CS.
  • Drawback
  • No way for memory to differentiate between
    entering or leaving a CS.
  • So, release consistency is introduced.

20
Release Consistency
  • Three classes of variables
  • Ordinary variables
  • Shared data variables
  • Synchronization variables Acquire and Release
    (CS)
  • DSM has to guarantee the consistency of the
    shared
  • data variables. If a shared variable is read
    without
  • acquire memory has no obligation to return the
    current
  • value.

21
Protected Variables
  • Acquire and release do not have to apply to all
    of the memory.
  • Only specific shared variables may be guarded, in
    which case these variables are kept consistent
    and called protected variables.
  • On acquire, the memory makes sure that all the
    local copies of protected variables are made
    consistent and changes are propagated to other
    machines on release.

22
P1 Acq(L) W(x)1 W(x)2 Rel(L)
P2 Acq(L) R(x)2 Rel(L)
P3
R(x)1 Fig Valid event sequence
for release consistency.
23
Rules for release consistency
  • Before an ordinary access to a shared variable is
    performed, all previous acquires done by the
    process must have completed successfully.
  • Before a release is allowed to be performed, all
    previous reads and writes done by the process
    must have completed.
  • The acquire and release accesses must be
    processor consistent (sequential consistency is
    not required).

24
Implementation of Release Consistency
  • Two types of implementation
  • Eager release consistency
  • Broadcast of modified data to all other
    processors is done at the time of release.
  • Lazy release consistency
  • A process gets the most recent values of the
    variables when it tries to acquire them.

25
Our Implementation
  • Eager release consistency
  • All the operations are done locally by the
    process and then sent to the DSM, which then
    broadcasts the updated values to all the other
    processes.

26
Data Structures
Each process Pi maintains the following data
structures Cache array cache1..n //
cache memory Array valid1..n
// whether the value in the cache is valid or
not (0/1) Array locked1..n //
whether the variable is locked or not (0/1) Array
request1..m // which variables
it wants to lock Distributed Shared Memory
(DSM) maintains the following data
structures Memory array M1..n //
central memory Array lock1..n
// to keep track of which variables are locked
(0/1) Array whom1..n // locked
by which processor Array pending1..m
// processes who are yet to be replied Array
invalidate1..m // values processes need
to invalidate
27
Operations at Processor Pi
lock(list of variables) send(Pid, ACQUIRE,
no_of_variables, request1..m) receive(ACK
and received_values) for i 1 to
m lockedi 1 read(i) if
lockedi return cachei else if
validi return cachei else send(Pid,
READ, i) receive(x) cachei
x validi 1
28
Operations at Processor Pi
write(i, x) if lockedi cachei
x validi 1 else send(Pid,
WRITE, i, x) cachei x validi
1 unlock(list of variables) send(Pid,
RELEASE, locked1..m, cache1..m)
receiveACK for i 1 to n lockedi
0
29
Operations at DSM
receive() switch(message) case READ
send(Mi) break case WRITE Mi
x break
30
case ACQUIRE / for all the variable indices in
request1..m, check in lock if they are free
/ for i 0 to no_of_variables if
(lockrequesti 0) lockrequesti
1 whomrequesti Pid requested_variabl
e_valuesi Mrequesti continue else
for i 0 to no_of_variables
lockrequesti 0 whomrequesti
0 / add requesti to pending
/ pendingPid, i requesti break
send(requested_variable_values) break
31
case RELEASE / has received arrays locked and
cache / for i 0 to no_of_variables
Mlockedi cachei invalidatei
lockedi broadcast(invalidate) receive
(ACK) for i 0 to no_of_variables lockloc
kedi 0 whomlockedi
0 send(Pid, ACK) check(pending) chec
k() for i 0 to n / if all pendingi
1, send(ACK, Pid) / break
32
Code for P1 Lock (a, b ,c)
Write (a) Read (b)
Write (c) Write (c)
Unlock (a, b, c)
Sample Execution
P1
DSM
P2



(ACQUIRE, request)


(ACK, values)

Enter CS Write (a)
Read (b)
Write (c) Write (b)
(RELEASE, locked, cache
Exit CS
BROADCAST
ACK
RELEASE_ACK
Leave CS
33
Performance Issues
  • Knowing the Execution History
  • Broadcast overhead can be reduced
  • No potential deadlocks
  • Operations inside the critical section are atomic

34
PRAM Consistency
  • The total ordering of requests leads to
    inefficiency due to more data movement and
    synchronization requirements than what a program
    may really call for.
  • A more relaxed version than Sequential
    consistency is PRAM.

35
PRAM(contd)
  • PRAM stands for Pipelined RAM, thus, pipelined
    random access
  • Writes done by a single process are received
    by all the processes in the order in which they
    were issued but writes from different processes
    may be seen in a different order by different
    processes.

36
Example
  • P1 W(x)1
  • P2 R(x)1 W(x)2
  • P3 R(x)1 R(x)2
  • P4 R(x)2 R(x)1
  • Fig Valid sequence of events for the PRAM
    consistency.

37
Weak Restrictions
  • Only write operations performed by a single
    process are required to be viewed by other
    processes in the order that they were performed.
  • In other terms, all writes generated by different
    processes are concurrent.
  • Only the write order from same process needs to
    be consistent, thus the name pipelined.
  • This is a weaker model than the causal model.

38
System Architecture
Cache
Cache
Cache
Cache
MiddleWare (Java Groups)
DSM System
Central Memory
39
Implementation
  • The operations by the processes are carried as
    shown below
  • Write(x)
  • Update the local cache value.
  • Send the updated value to all the
    processes.
  • Read(x)
  • If present in the cache, read it from
    cache.
  • else goto main memory for the variable.

40
continued
  • Whenever a write is carried, the value is pushed
    to all the processes, thus writes done by a
    process are always seen in the order in which
    they are written in the program as each is
    broadcasted after its occurrence

41
Data Structures
  • Central Memory (CM)
  • - An array CM of shared variables var1..var2.
  • - We can do read operations and write
    operations on this array.
  • - Array implemented using a Vector
  • Local Cache
  • - An array C of type int of size equal to
    that of Central memorys.
  • - A boolean one-dimensional array V for
    validity of the ith variable.
  • - We can do read operations and write
    operations on cache.
  • - Arrays implemented using a Vector

42
Pseudo Code
  • At Processn
  • Read ( in)
  • - If (valid(in))
  • fetch the element in from the cache
    vector Vc
  • else
  • - send read(in,n) to CM
  • - receive value(in,n) from CM
  • - update element in in cache
  • - set valid(in) true
  • - return value(in)

43
Continued
  • write( in, valn)
  • - write value valn into element in of cache
    vector
  • - send write( in, valn) to CM
  • Receive ( in, valn)
  • - write value valn into element in of cache
    vector

44
  • At Central memory
  • Write (index in, value vn)
  • - write value vn into element in of vector.
  • - send in, vn to all the n
    processes.
  • Read (processn, index in )
  • - fetch the element in from the vector.
  • - send value(in) to the processn.

45
Issues
  • Easy to implement
  • - No guarantee about the order in which
    different processes see writes.
  • - Except, that writes issued by a particular
    process must arrive in pipeline
  • Processor does not have to stall waiting for each
    one to complete before starting the next one.

46
Final System
  • We are using Java Groups as Middleware
  • We have only a single group containing all the
    processes and the central DSM.
  • We are using the Reliable, FIFO JChannel for the
    communication between the processes and the DSM.
  • We have only two types of communications unicast
    and broadcast which are efficiently provided by
    Jchannel.

47
  • DSM Initialization
  • DSM will be given an argument saying which
    consistency level it should provide for the
    processes.
  • Process Initialization
  • When a process starts execution, it
  • - sends a message to DSM inquiring about the
    consistency level provided by the DSM.
  • - waits for the response
  • - Initializes the variables related to the
    consistency level so as to use the
    corresponding library for communicating with
    the DSM.

48
  • In order to connect to the system each process
    should know
  • Group Address/Group Name
  • Central DSM Address
  • Scalable
  • Easy to connect, with just one round of messages
  • Less load on the network.

49
Performance Evaluation
  • Planning to test the performance of each
    consistency level with large number of processes
    accessing the shared memory
  • Calculating the write cycle time and read cycle
    time for each consistency level at the
    application level
  • Comparing our implementation of each consistency
    level with the above criteria

50
References
  • Brown, G. Asynchronous multicaches. Distributed
    Computing, 431-36, 1990.
  • Mizuno, M., Raynal, M., and Zhou J.Z. Sequential
    consistency in distributed systems.
  • A Sequentially Consistent Distributed Shared
    Memory, J. Zhou, M. Mizuno, and G. Singh.
  • Distributed Operating Systems, Andrew S.
    Tanenbaum.
  • www.javagroups.com
  • www.cis.ksu.edu/singh

51
  • Suggestions
Write a Comment
User Comments (0)
About PowerShow.com