Title: Introduction to MIMD Architectures Sima, Fountain and Kacsuk Chapter 15
1Introduction to MIMD ArchitecturesSima,
Fountain and KacsukChapter 15
2Architectural Concepts
- Distributed Memory MIMD
- Replicate the processor/memory pairs
- Connect them via an interconnection network
- Shared Memory MIMD
- Replicate the processors
- Replicate the memories
- Connect them via an interconnection network
3Distributed Memory Machine
- Access to local memory module is much faster than
remote - Hardware remote accesses via
- Load/Store primitive
- Message passing layer
- Cache memory for local memory traffic
- Message
- Memory-memory
- Cache-cache
4Advantages of Distributed Memory
- Local memory traffic less contention than in
shared memory - Highly scalable
- Dont need sophisticated synchronization features
like monitors, semaphores. Message passing serves
dual purpose - To send the data
- Provide synchronization
5Problems of Distributed Memory
- Load balancing
- Message passing can lead to synchronization
failures, including deadlock - BlockingSend -gt BlockingReceive
- BlockingReceive -gt BlockingSend
- Intensive data copying of whole structures
- Small message overheads are high
6Shared Memory Architecture
- All processors have equal access to shared memory
modules - Local Caches reduce
- Memory traffic
- Network traffic
- Memory access time
- IP Synchronisation
- Indivisible load/store
7Advantages of Shared Memory
- No need to partition code or data
- Occurs on the fly
- No need to move data explicitly
- Dont need new programming languages or compilers.
8Disadvantages of Shared Memory
- Synchronization is difficult
- Lack of scalability
- IPC becomes bottleneck
- Scalability can be addressed by
- High throughput, low latency network
- Cache Memories
- Causes coherence problem
- Distributed shared memory architecture
9Distributed Shared Memory
- Three design choices
- Non-uniform memory access (NUMA)
- Like Cray T3D
- Cache coherent non-uniforms memory access
(CC-NUMA) - Convex SPP, Stanford DASH
- Cache-only memory access (COMA)
- Like KSR-1
10Non-uniform memory access (NUMA)
Interconnection Network
11Cache coherent non-uniforms memory access
(CC-NUMA)
Interconnection Network
12Cache-only memory access (COMA)
Interconnection Network
13Classification of MIMD Computers
14Problems of Scalable Computers
- Tolerate and hide the latency of remote loads
- Worse if output of one computation relies on
another to complete - Tolerate and hide idling due to synchronization
among processors
15Tolerating Remote Loads
PEn
PE0
PE1
Pn
P0
P1
rA
B
rB
A
Result
Mn
M0
M1
Load A
Interconnection Network
Load B
Result A B
16Tolerating Latency
- Cache memory
- Simply lowers the cost of remote access
- Introduces cache coherence problem
- Prefetching
- Already present, so cost is low
- Increases network load
- Threads fast context switching
- Accept that it will take a long time and cover
the overhead - These solutions dont solve synchronization
issues - Latency tolerant algorithms
17Design issues of scalable MIMD
- Processor Design
- Pipelining, parallel instruction issue
- Atomic data access, prefetching, cache memory,
message passing, etc - Interconnection network design
- Scalable, high bandwidth, low latency
- Memory design
- Shared memory design
- Cache coherence
- IO Subsystem
- Parallel IO