CSS434: Parallel - PowerPoint PPT Presentation

1 / 26

About This Presentation

Title:

CSS434: Parallel

Description:

Sequential Consistency: Operations of each individual process appear in the same ... FIFO Consistency: writes by a single process are visible to all other processes ... – PowerPoint PPT presentation

Number of Views:60

Avg rating:3.0/5.0

Slides: 27

Provided by: munehir

Learn more at: http://courses.washington.edu

Category:

more less

Transcript and Presenter's Notes

Title: CSS434: Parallel

1
CSS434 Distributed Shared Memory Textbook Ch18
Professor Munehiro Fukuda
2
Basic Concept
address
Distributed Shared Memory (exists only virtually)
write(address, data)
Data read(address)

Communication Network
A cache line or a page is transferred to and
cached in the requested computer.
3
Writer Process on DSM
include "world.h" struct shared int a,b
Program Writer main() int x struct
shared p methersetup() / Initialize the
Mether run-time / p (struct shared
)METHERBASE / overlay structure on
METHER segment / p-gta p-gtb 0 /
initialize fields to zero / while(TRUE) /
continuously update structure fields / p gta
p gta 1 p gtb p gtb - 1
4
Reader Process on DSM
Program Reader main() struct shared
p methersetup() p (struct shared
)METHERBASE while(TRUE) / read the fields
once every second / printf("a d, b
d\n", p gta, p gtb) sleep(1)
5
Why DSM?

Simpler abstraction
Underlying tedious communication primitives are
all shielded by memory accesses
Better portability of distributed application
programs
Natural transition from sequential to distributed
application
Better performance of some applications
Data locality, one-demand data movement, and
large memory space reduce network traffic and
paging/swapping activities.
Flexible communication environment
Sender and receiver have no need to know each
other. They even need not coexist.
Ease of process migration
Migration is completed only by transferring the
corresponding PCB to the destination.

6
Main Issues

Granularity
Fine (less false sharing but more network
traffic)? Cache line (e.g. Dash and Alewife),
Object (e.g. Orca and Linda), Page (e.g. Ivy) ?
Coarse(more false sharing but less network
traffice)
Memory coherence and access synchronization
Strict, Sequential, Causal, Weak, and Release
Consistency models
Data location and access
Broadcasting, centralized data locator, fixed
distributed data locator, and dynamic distributed
data locator
Replacement strategy
LRU or FIFO (The same issue as OS virtual memory)
Thrashing
How to prevent a block from being exchanged back
and forth between two nodes.
Heterogeneity

7
Consistency ModelsTwo processes accessing shared
variables
At the beginning a b 0
DSM needs a consistency model.
8
Consistency ModelsStrict Consistency

Wi(x, a) Processor i writes a on variable x,
(i.e., x a).
b?Ri(x) Processor i reads b from variable x.
(i.e., y x y b).
Any read on x must return the value of the most
recent write on x.

Strict Consistency
Not Strict Consistency
P3
P2
P2
P1
P1
P3
W2(x, a)
W2(x, a)
nil?R1(x)
a?R1(x)
a?R1(x)
a?R3(x)
a?R3(x)
a?R1(x)
9
Consistency ModelsLinearizability and Sequential
Consistency

Linearlizability Operations of each individual
process appear to all processes in the same order
as they happen.
Sequential Consistency Operations of each
individual process appear in the same order to
all processes.

Linearlizability
Sequential Consistency
P4
P2
P3
P1
P3
P4
P2
P1
W2(x, a)
W2(x, a)
Nil lt-R1(x)
W3(x, b)
W3(x, b)
a?R1(x)
b?R1(x)
a?R4(x)
b?R4(x)
b?R4(x)
b?R1(x)
a?R4(x)
a?R1(x)
10
Consistency ModelsFIFO and Processor Consistency

FIFO Consistency writes by a single process are
visible to all other processes in the order in
which they were issued.
Processor Consistency FIFO Consistency all
write to the same memory location must be visible
in the same order.

FIFO Consistency
Processor Consistency
P4
P2
P3
P2
P1
P1
P3
P4
W2(x, a)
W2(x, a)
W3(x, 0)
a?R1(x)
W2(x, b)
a?R1(x)
W3(y, 0)
W2(x, b)
0?R1(x)
W3(x, 1)
a?R1(x)
0?R1(y)
a?R1(x)
W3(y, 1)
0?R1(y)
b?R1(x)
1?R1(y)
0?R1(x)
1?R1(y)
W3(z, 1)
W3(z, a)
1?R1(x)
b?R1(x)
W2(y, a)
1?R1(x)
W2(y, a)
b?R1(x)
b?R1(x)
1?R1(z)
1?R1(z)
a?R1(y)
a?R1(y)
a?R1(y)
1?R1(z)
1?R1(z)
a?R1(y)
11
Consistency ModelsCausal Consistency

Causally related write must be visible to all
processes in the same order. Concurrent writes
may be propagated in a different order.

Causal Consistency
Not Causal Consistency
P4
P3
P1
P2
P4
P2
P3
P1
W2(x, a)
W2(x, a)
a?R3(x)
a?R4(x)
a?R3(x)
a?R3(x)
W2(x, c)
W3(x, b)
W3(x, b)
b?R4(x)
c?R1(x)
a?R1(x)
b?R4(x)
c?R4(x)
b?R1(x)
b?R1(x)
a?R4(x)
12
Consistency ModelsWeak Consistency

Accesses to synchronization variables must obey
sequential consistency.
All previous writes must be completed before an
access to a synchronization variable.
All previous accesses to synchronization
variables must be completed before access to
non-synchronization variable.

Weak Consistency
Not Weak Consistency
P2
P3
P3
P1
P2
P1
W2(x, a)
W2(x, a)
W2(x, b)
W2(y, c)
W2(y, c)
b?R4(x)
W2(x, b)
a?R4(x)
S3
Nil?R4(y)
S3
S1
S1
S2
S2
b?R4(x)
a?R4(x)
b?R4(x)
c?R4(y)
c?R4(y)
c?R4(y)
c?R4(y)
b?R4(x)
13
Consistency ModelsRelease Consistency

Access to acquire and release variables obey
processor consistency.
Previous acquires requested by a process must be
completed before the process performs a data
access.
All previous data accesses performed by a process
must be completed before the process performs a
release.

P3
P2
P1
Acq1(L)
W1(x, a)
W1(x, b)
Rel1(L)
Acq2(L)
b?R2(x)
b?R2(x)
a?R3(x)
Rel2(L)
14
Consistency ModelsRelease Consistency (Example)
Process 1 acquireLock() // enter critical
section a a 1 b b 1 releaseLock()
// leave critical section Process 2
acquireLock() // enter critical
section print ("The values of a and b are ", a,
b) releaseLock() // leave critical section
15
Implementing Sequential ConsistencyReplicated
and Migrating Data Blocks
Node 1
Node 3
x
m
b
Then what if Node 2 updates x?
16
Implementing Sequential ConsistencyWrite
Invalidation
Client wants to write
new copy
a copy of block
block
a copy of block
17
Implementing Sequential ConsistencyWrite Update
Client wants to write
a copy of block
block
a copy of block
18
Implementing Sequential ConsistencyRead/Write
Request
Unused
Read (Read a copy from the onwer)
Replacement
Replacement
Replacement
Replacement
Nil
Write invalidate
Read only
Read (Read from memory and get an ownership)
Write invalidate
Write (invalidate others if they have a copy and
get an ownership)
Write (invalidate others if they have a copy and
get an ownership)
Write invalidate
Writable
Read-owned
Write (invalidate others if they have a copy)
19
Implementing Sequential ConsistencyLocating Data
Fixed Distributed-Server Algorithms
Processor 0
Processor 1
Processor 2
Addr0 writable
Addr3 read owned
Addr2 read owned
Addr1 read owned
Addr7 writable
Addr4 read owned
Addr5 writable
Addr6 writable
Read addr2
Addr8 read owned
Addr2 read only
20
Implementing Sequential ConsistencyLocating Data
Dynamic Distributed-Server Algorithms
Processor 0
Processor 1
Processor 2

Breaking the chain of nodes
When the node receives an invalidation
When the node relinquishes ownership
When the node forwards a fault request
The node points to a new owner

Addr0 writable
Addr3 read owned
Addr2 read owned
Addr2 read only
Addr1 read owned
Addr7 writable
Addr4 read owned
Addr8 read owned
Addr5 writable
Read addr2
Addr2 read owned
21
Replacement Strategy

Which block to replace
Non-usage based (e.g. FIFO)
Usage based (e.g. LRU)
Mixed of those (e.g. Ivy )
Unused/Nil replaced with the highest priority
Read-only the second priority
Read-owned the third priority
Writable the lowest priority and LRU used.
Where to place a replaced block
Invalidating a block if other nodes have a copy.
Using secondary store
Using the memory space of other nodes

22
Thrashing

Thrashing
Two or more processes try to write the same
shared block.
An owner keeps writing its block shared by two or
more reader processes.
The larger a block, the more chances of false
sharing that causes thrashing.
Solutions
Allow a process to prevent a block from accessed
from the others, using a lock.
Allow a process to hold a block for a certain
amount of time.
Apply a different coherence algorithm to each
block.
What do those solutions require users to do?
Are there any perfect solutions?

23
Paper Review by Students

IVY
Dash
Munin
Linda/Jini/JavaSpace
Discussions
Classify which system is based on sequential
consistency, release consistency, and lazy
release consistency.
Classify the shared data granularity of these
systems cache-line based, page-based, and
object-based.
Classify the implementation of these systems
hardware implementation, OS implementation, and
User-level implementation.

24
Non-Turn-In Exercises

Is the memory underlying the following execution
of two processes sequentially consistent
(assuming that, initially, all variables are set
to zero)? (Textbook p780 Q18.6)
P1 R(x)1 R(x)2 W(y)1
P2 W(x)1 R(y)1 W(x)2
Show that the following history is not causally
consistent. (Textbook p781 Q18.18)
P1 W(a)0 W(a)1
P2 R(a)1 W(b)2
P3 R(b)2 R(a)0
Explain the relationship between false sharing
and data granularity in DSM.

25
Non-Turn-In Exercises
Processor 3 ownership table
Processor 1 ownership table
Processor 2 ownership table
addr
owner
shared
addr
owner
shared
addr
owner
shared
6
P3
3
P2
0
P0
4
7
P2
1
P0
P3
4
P3
8
P2
5
P0
2
P3
data items
data items
data items
addr 2
addr 3
addr 0
addr 4
addr 7
addr 1
addr 6
addr 8
event
copyaddr1

There is a DSM system that is based on the
write-invalidation protocol, uses a fixed
distributed-server algorithm for locating a given
data item, and consists of three processors such
as 1, 2, and 3. Each processor has the following
data items and an ownership/sharing-processor
table.

26
Non-Turn-In Exercises
Given the following sequence of memory accesses,
draw additional arrows and circles in the above
figure as instructed. To distinguish which arrow
corresponds to which operation, add the operation
number 1 8 to each arrow. Also, update the
corresponding ownership table entries. (1)
Memory access 1 Processor 2 reads data from
address 2. Add arrows in the above figure to
indicate operations required for the memory
access 1. 1. Send a query to search for the
address 2 2. Send a request to read from the
address 2 3. Read data from the address 2 to
Processor 2 Update the corresponding ownership
table entry. (Just add P2 in the share
field.) Draw a circle to indicate that a copy of
address 2 was created on Processor 2. (2)
Memory access 2 Processor 1 reads data from
address 2. Add arrows in the above figure to
indicate operations required for the memory
access 2. 4. Send a query to search for the
address 2 5. Send a request to read from the
address 2 6. Read data from the address 1 to
Processor 2 Update the corresponding ownership
table entry. (Just add P1 in the share
field.) Draw a circle to indicate that a copy of
address 2 was created on Processor 1. (3) Memory
access 3 Processor 2 writes data to address
2. Add arrows in the above figure to indicate
operations required for the memory access 3. 7.
Send a request to update the ownership
information on the address 2 8. Send a write
invalidation to all non-owner processors sharing
the address 2 Update the corresponding ownership
table entry. (Make Processor 2 a new owner of
address 2 and cross out all other processor Ids
in the entry.) Cross out all circles to indicate
that old copies of address 2 were all invalidated.

Write a Comment

User Comments (0)