Title: Distributed Shared Memory: A Survey of Issues and Algorithms
1Distributed Shared MemoryA Survey of Issues and
Algorithms
- B,. Nitzberg and V. LoUniversity of Oregon
2INTRODUCTION
- Distributed shared memory is a software
abstraction allowing a set of workstations
connected by a LAN to share a single paged
virtual address space
3Why bother with DSM?
- Key idea is to build fast parallel computers that
are - Cheaper than shared memory multiprocessor
architectures - As convenient to use
4Conventional parallel architecture
CPU
CPU
CPU
CPU
Shared memory
5Todays architecture
- Clusters of workstations are much more cost
effective - No need to develop complex bus and cache
structures - Can use off-the-shelf networking hardware
- Gigabit Ethernet
- Myrinet (1.5 Gb/s)
- Can quickly integrate newest microprocessors
6Limitations of cluster approach
- Communication within a cluster of workstation is
through message passing - Much harder to program than concurrent access to
a shared memory - Many big programs were written for shared memory
architectures - Converting them to a message passing architecture
is a nightmare
7Distributed shared memory
main memories
DSM one shared global address space
8Distributed shared memory
- DSM makes a cluster of workstations look like a
shared memory parallel computer - Easier to write new programs
- Easier to port existing programs
- Key problem is that DSM only provides the
illusion of having a shared memory architecture - Data must still move back and forth among the
workstations
9Basic approaches
- Hardware implementations
- Use extensions of traditional hardware caching
architecture - Operating system/library implementations
- Use virtual memory mechanisms
- Compiler implementations
- Compiler handles all shared accesses
10Design Issues (I)
- Structure and granularity
- Big units are more efficient
- Virtual memory pages
- Can have false sharing whenever page contains
different variables that are accessed at the same
time by different processors
11False Sharing
accesses y
accesses x
x y
page containing x and y will move back and
forthbetween main memories of workstations
12Design Issues (II)
- Structure and granularity (cont'd)
- Shared objects can also be
- Objects from a distributed object-oriented
system - Data types from an extant language
13Design Issues (III)
- 2. Coherence semantics
- Strict consistency is not possible
- Various authors have proposed weaker consistency
models - Cheaper to implement
- Harder to use in a correct fashion
14Design Issues (IV)
- 3. Scalability
- Possibly very high but limited by
- Central bottlenecks
- Global knowledge operation and storage
15Design Issues (V)
- 4. Heterogeneity
- Possible but complex to implement
16Portability Issues
Not in paper
- Portability of programs
- Some DSMs allow programs written for a
multiprocessor architecture to run on a cluster
of workstations without any modifications (dusty
decks) - More efficient DSMs require more changes
- Portability of DSM
- Some DSMs require specific OS features
17Implementation Issues (I)
- 1. Data Location and Access
- Keep data a single centralized location
- Let data migrate (better) but must have way to
locate them - Centralized server (bottleneck)
- Have a "home" node associated with each piece
of data - Will keep track of its location
18Implementation Issues (II)
- Data Location and Access (cont'd)
- Can either
- Maintain a single copy of each piece of data
- Replicate it on demand
- Must either
- Propagate updates to all replicas
- Use an invalidation protocol
-
19Invalidation protocol
- Before update
- At update time
INVALID
INVALID
20Main advantage
- Locality of updates
- A page that is being modified has a high
likelihood of being modified again - Invalidation mechanism minimizes consistency
overhead - One single invalidation replaces many updates
21A realization Munin
- Developed at Rice University
- Based on software objects (variables)
- Used the processor virtual memory to detect
access to the shared objects - Included several techniques for reducing
consistency-related communication - Only ran on top of the V kernel
22Munin main strengths
- Excellent performance
- Portability of programs
- Allowed programs written for a multiprocessor
architecture to run on a cluster of workstations
with a minimum number of changes(dusty decks)
23Munin main weakness
- Very poor portability of Munin itself
- Depended of some features of the V kernel
- Not maintained since the late 80's
24Consistency model
- Munin uses software release consistency
- Only requires the memory to be consistent at
specific synchronization points
25SW release consistency (I)
- Well-written parallel programs use locks to
achieve mutual exclusion when they access shared
variables - P(mutex) and V(mutex)
- lock(csect) and unlock(csect)
- acquire( ) and release( )
- Unprotected accesses can produce unpredictable
results
26SW release consistency (II)
- SW release consistency will only guarantee
correctness of operations performed within a
request/release pair - No need to export the new values of shared
variables until the release - Must guarantee that workstation has received the
most recent values of all shared variables when
it completes a request
27SW release consistency (III)
- shared int x
- acquire( )// wait for new value of x
- xrelease ( )
- // export x2
- shared int x
- acquire( ) x 1release ( )
- // export x1
28SW release consistency (IV)
- Must still decide how to release updated values
- Munin uses eager release
- New values of shared variables were propagated at
release time
29SW release consistency (V)
Eager release
Each release forwards the update to the two
other processors.
30Multiple write protocol
- Designed to fight false sharing
- Uses a copy-on-write mechanism
- Whenever a process is granted access to
write-shared data, the page containing these data
is marked copy-on-write - First attempt to modify the contents of the page
will result in the creation of a copy of the
page modified (the twin).
31Creating a twin
Not in paper
32Example
Not in paper
Before
First write access
x 1 y 2
x 1 y 2
twin
After
Compare with twin
x 3 y 2
New value of x is 3
33Other DSM Implementations (I)
- Software release consistency with lazy release
(Treadmarks) - Faster and designed to be portable
- Sequentially-Consistent Software DSM (IVY)
- Sends messages to other copies at each write
- Much slower
-
34Other DSM Implementations (II)
- Entry consistency (Midway)
- Requires each variable to be associated to a
synchronization object (typically a lock) - Acquire/release operations on a given
synchronization object only involve the variables
associated with that object - Requires less data traffic
- Does not handle well dusty decks
35Other DSM Implementations (III)
- Structured DSM Systems (Linda)
- Offer to the programmer a shared tuple space
accessed using specific synchronized methods - Require a very different programming style
36TODAY'S IMPACT
- Very low
- According to W. Zwaepoel. truth is that computer
clusters are "only suitable for coarse-grained
parallel computation" and this is "a fortiori
true for DSM" - DSM competed with OpenMP model and OPenMP model
won