CSS434: Parallel - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

CSS434: Parallel

Description:

Sequential Consistency: Operations of each individual process appear in the same ... FIFO Consistency: writes by a single process are visible to all other processes ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 27
Provided by: munehir
Category:

less

Transcript and Presenter's Notes

Title: CSS434: Parallel


1
CSS434 Distributed Shared Memory Textbook Ch18
Professor Munehiro Fukuda
2
Basic Concept
address
Distributed Shared Memory (exists only virtually)
write(address, data)
Data read(address)

Communication Network
A cache line or a page is transferred to and
cached in the requested computer.
3
Writer Process on DSM
include "world.h" struct shared int a,b
Program Writer main() int x struct
shared p methersetup() / Initialize the
Mether run-time / p (struct shared
)METHERBASE / overlay structure on
METHER segment / p-gta p-gtb 0 /
initialize fields to zero / while(TRUE) /
continuously update structure fields / p gta
p gta 1 p gtb p gtb - 1
4
Reader Process on DSM
Program Reader main() struct shared
p methersetup() p (struct shared
)METHERBASE while(TRUE) / read the fields
once every second / printf("a d, b
d\n", p gta, p gtb) sleep(1)
5
Why DSM?
  • Simpler abstraction
  • Underlying tedious communication primitives are
    all shielded by memory accesses
  • Better portability of distributed application
    programs
  • Natural transition from sequential to distributed
    application
  • Better performance of some applications
  • Data locality, one-demand data movement, and
    large memory space reduce network traffic and
    paging/swapping activities.
  • Flexible communication environment
  • Sender and receiver have no need to know each
    other. They even need not coexist.
  • Ease of process migration
  • Migration is completed only by transferring the
    corresponding PCB to the destination.

6
Main Issues
  • Granularity
  • Fine (less false sharing but more network
    traffic)? Cache line (e.g. Dash and Alewife),
    Object (e.g. Orca and Linda), Page (e.g. Ivy) ?
    Coarse(more false sharing but less network
    traffice)
  • Memory coherence and access synchronization
  • Strict, Sequential, Causal, Weak, and Release
    Consistency models
  • Data location and access
  • Broadcasting, centralized data locator, fixed
    distributed data locator, and dynamic distributed
    data locator
  • Replacement strategy
  • LRU or FIFO (The same issue as OS virtual memory)
  • Thrashing
  • How to prevent a block from being exchanged back
    and forth between two nodes.
  • Heterogeneity

7
Consistency ModelsTwo processes accessing shared
variables
At the beginning a b 0
DSM needs a consistency model.
8
Consistency ModelsStrict Consistency
  • Wi(x, a) Processor i writes a on variable x,
    (i.e., x a).
  • b?Ri(x) Processor i reads b from variable x.
    (i.e., y x y b).
  • Any read on x must return the value of the most
    recent write on x.

Strict Consistency
Not Strict Consistency
P3
P2
P2
P1
P1
P3
W2(x, a)
W2(x, a)
nil?R1(x)
a?R1(x)
a?R1(x)
a?R3(x)
a?R3(x)
a?R1(x)
9
Consistency ModelsLinearizability and Sequential
Consistency
  • Linearlizability Operations of each individual
    process appear to all processes in the same order
    as they happen.
  • Sequential Consistency Operations of each
    individual process appear in the same order to
    all processes.

Linearlizability
Sequential Consistency
P4
P2
P3
P1
P3
P4
P2
P1
W2(x, a)
W2(x, a)
Nil lt-R1(x)
W3(x, b)
W3(x, b)
a?R1(x)
b?R1(x)
a?R4(x)
b?R4(x)
b?R4(x)
b?R1(x)
a?R4(x)
a?R1(x)
10
Consistency ModelsFIFO and Processor Consistency
  • FIFO Consistency writes by a single process are
    visible to all other processes in the order in
    which they were issued.
  • Processor Consistency FIFO Consistency all
    write to the same memory location must be visible
    in the same order.

FIFO Consistency
Processor Consistency
P4
P2
P3
P2
P1
P1
P3
P4
W2(x, a)
W2(x, a)
W3(x, 0)
a?R1(x)
W2(x, b)
a?R1(x)
W3(y, 0)
W2(x, b)
0?R1(x)
W3(x, 1)
a?R1(x)
0?R1(y)
a?R1(x)
W3(y, 1)
0?R1(y)
b?R1(x)
1?R1(y)
0?R1(x)
1?R1(y)
W3(z, 1)
W3(z, a)
1?R1(x)
b?R1(x)
W2(y, a)
1?R1(x)
W2(y, a)
b?R1(x)
b?R1(x)
1?R1(z)
1?R1(z)
a?R1(y)
a?R1(y)
a?R1(y)
1?R1(z)
1?R1(z)
a?R1(y)
11
Consistency ModelsCausal Consistency
  • Causally related write must be visible to all
    processes in the same order. Concurrent writes
    may be propagated in a different order.

Causal Consistency
Not Causal Consistency
P4
P3
P1
P2
P4
P2
P3
P1
W2(x, a)
W2(x, a)
a?R3(x)
a?R4(x)
a?R3(x)
a?R3(x)
W2(x, c)
W3(x, b)
W3(x, b)
b?R4(x)
c?R1(x)
a?R1(x)
b?R4(x)
c?R4(x)
b?R1(x)
b?R1(x)
a?R4(x)
12
Consistency ModelsWeak Consistency
  • Accesses to synchronization variables must obey
    sequential consistency.
  • All previous writes must be completed before an
    access to a synchronization variable.
  • All previous accesses to synchronization
    variables must be completed before access to
    non-synchronization variable.

Weak Consistency
Not Weak Consistency
P2
P3
P3
P1
P2
P1
W2(x, a)
W2(x, a)
W2(x, b)
W2(y, c)
W2(y, c)
b?R4(x)
W2(x, b)
a?R4(x)
S3
Nil?R4(y)
S3
S1
S1
S2
S2
b?R4(x)
a?R4(x)
b?R4(x)
c?R4(y)
c?R4(y)
c?R4(y)
c?R4(y)
b?R4(x)
13
Consistency ModelsRelease Consistency
  • Access to acquire and release variables obey
    processor consistency.
  • Previous acquires requested by a process must be
    completed before the process performs a data
    access.
  • All previous data accesses performed by a process
    must be completed before the process performs a
    release.

P3
P2
P1
Acq1(L)
W1(x, a)
W1(x, b)
Rel1(L)
Acq2(L)
b?R2(x)
b?R2(x)
a?R3(x)
Rel2(L)
14
Consistency ModelsRelease Consistency (Example)
Process 1 acquireLock() // enter critical
section a a 1 b b 1 releaseLock()
// leave critical section Process 2
acquireLock() // enter critical
section print ("The values of a and b are ", a,
b) releaseLock() // leave critical section
15
Implementing Sequential ConsistencyReplicated
and Migrating Data Blocks
Node 1
Node 3
x
m
b
Then what if Node 2 updates x?
16
Implementing Sequential ConsistencyWrite
Invalidation
Client wants to write
new copy
a copy of block
block
a copy of block
17
Implementing Sequential ConsistencyWrite Update
Client wants to write
a copy of block
block
a copy of block
18
Implementing Sequential ConsistencyRead/Write
Request
Unused
Read (Read a copy from the onwer)
Replacement
Replacement
Replacement
Replacement
Nil
Write invalidate
Read only
Read (Read from memory and get an ownership)
Write invalidate
Write (invalidate others if they have a copy and
get an ownership)
Write (invalidate others if they have a copy and
get an ownership)
Write invalidate
Writable
Read-owned
Write (invalidate others if they have a copy)
19
Implementing Sequential ConsistencyLocating Data
Fixed Distributed-Server Algorithms
Processor 0
Processor 1
Processor 2
Addr0 writable
Addr3 read owned
Addr2 read owned
Addr1 read owned
Addr7 writable
Addr4 read owned
Addr5 writable
Addr6 writable
Read addr2
Addr8 read owned
Addr2 read only
20
Implementing Sequential ConsistencyLocating Data
Dynamic Distributed-Server Algorithms
Processor 0
Processor 1
Processor 2
  • Breaking the chain of nodes
  • When the node receives an invalidation
  • When the node relinquishes ownership
  • When the node forwards a fault request
  • The node points to a new owner

Addr0 writable
Addr3 read owned
Addr2 read owned
Addr2 read only
Addr1 read owned
Addr7 writable
Addr4 read owned
Addr8 read owned
Addr5 writable
Read addr2
Addr2 read owned
21
Replacement Strategy
  • Which block to replace
  • Non-usage based (e.g. FIFO)
  • Usage based (e.g. LRU)
  • Mixed of those (e.g. Ivy )
  • Unused/Nil replaced with the highest priority
  • Read-only the second priority
  • Read-owned the third priority
  • Writable the lowest priority and LRU used.
  • Where to place a replaced block
  • Invalidating a block if other nodes have a copy.
  • Using secondary store
  • Using the memory space of other nodes

22
Thrashing
  • Thrashing
  • Two or more processes try to write the same
    shared block.
  • An owner keeps writing its block shared by two or
    more reader processes.
  • The larger a block, the more chances of false
    sharing that causes thrashing.
  • Solutions
  • Allow a process to prevent a block from accessed
    from the others, using a lock.
  • Allow a process to hold a block for a certain
    amount of time.
  • Apply a different coherence algorithm to each
    block.
  • What do those solutions require users to do?
  • Are there any perfect solutions?

23
Paper Review by Students
  • IVY
  • Dash
  • Munin
  • Linda/Jini/JavaSpace
  • Discussions
  • Classify which system is based on sequential
    consistency, release consistency, and lazy
    release consistency.
  • Classify the shared data granularity of these
    systems cache-line based, page-based, and
    object-based.
  • Classify the implementation of these systems
    hardware implementation, OS implementation, and
    User-level implementation.

24
Non-Turn-In Exercises
  • Is the memory underlying the following execution
    of two processes sequentially consistent
    (assuming that, initially, all variables are set
    to zero)? (Textbook p780 Q18.6)
  • P1 R(x)1 R(x)2 W(y)1
  • P2 W(x)1 R(y)1 W(x)2
  • Show that the following history is not causally
    consistent. (Textbook p781 Q18.18)
  • P1 W(a)0 W(a)1
  • P2 R(a)1 W(b)2
  • P3 R(b)2 R(a)0
  • Explain the relationship between false sharing
    and data granularity in DSM.

25
Non-Turn-In Exercises
Processor 3 ownership table
Processor 1 ownership table
Processor 2 ownership table
addr
owner
shared
addr
owner
shared
addr
owner
shared
6
P3
3
P2
0
P0
4
7
P2
1
P0
P3
4
P3
8
P2
5
P0
2
P3
data items
data items
data items
addr 2
addr 3
addr 0
addr 4
addr 7
addr 1
addr 6
addr 8
event
copyaddr1
  • There is a DSM system that is based on the
    write-invalidation protocol, uses a fixed
    distributed-server algorithm for locating a given
    data item, and consists of three processors such
    as 1, 2, and 3. Each processor has the following
    data items and an ownership/sharing-processor
    table.

26
Non-Turn-In Exercises
Given the following sequence of memory accesses,
draw additional arrows and circles in the above
figure as instructed. To distinguish which arrow
corresponds to which operation, add the operation
number 1 8 to each arrow. Also, update the
corresponding ownership table entries. (1)
Memory access 1 Processor 2 reads data from
address 2. Add arrows in the above figure to
indicate operations required for the memory
access 1. 1. Send a query to search for the
address 2 2. Send a request to read from the
address 2 3. Read data from the address 2 to
Processor 2 Update the corresponding ownership
table entry. (Just add P2 in the share
field.) Draw a circle to indicate that a copy of
address 2 was created on Processor 2. (2)
Memory access 2 Processor 1 reads data from
address 2. Add arrows in the above figure to
indicate operations required for the memory
access 2. 4. Send a query to search for the
address 2 5. Send a request to read from the
address 2 6. Read data from the address 1 to
Processor 2 Update the corresponding ownership
table entry. (Just add P1 in the share
field.) Draw a circle to indicate that a copy of
address 2 was created on Processor 1. (3) Memory
access 3 Processor 2 writes data to address
2. Add arrows in the above figure to indicate
operations required for the memory access 3. 7.
Send a request to update the ownership
information on the address 2 8. Send a write
invalidation to all non-owner processors sharing
the address 2 Update the corresponding ownership
table entry. (Make Processor 2 a new owner of
address 2 and cross out all other processor Ids
in the entry.) Cross out all circles to indicate
that old copies of address 2 were all invalidated.
Write a Comment
User Comments (0)
About PowerShow.com