Chapter 9 Multiprocessors and Clusters - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Chapter 9 Multiprocessors and Clusters

Description:

Processors share a single address space ... UMA vs. NUMA ... Non-uniform memory access (NUMA): some memory accesses are faster than others ... – PowerPoint PPT presentation

Number of Views:154
Avg rating:3.0/5.0
Slides: 23
Provided by: kevinsc5
Category:

less

Transcript and Presenter's Notes

Title: Chapter 9 Multiprocessors and Clusters


1
Chapter 9Multiprocessors and Clusters
Computer Organization
  • Kevin Schaffer
  • Department of Computer Science
  • Hiram College

2
Multiprocessors
  • Two or more processors that work together as a
    single computer system
  • Why?
  • Performance (throughput)
  • Memory
  • Redundancy

3
Design Issues
  • Number/type of processors
  • Memory
  • Shared memory
  • Message passing
  • Interconnection network
  • Coupling
  • Programming

4
Shared Memory
5
Shared Memory
  • Processors share a single address space
  • Communication between processors is implicit,
    using shared variables in memory
  • Have to synchronize access to shared data
  • Generally easier to program
  • Doesn't scale well

6
OpenMP Example
  • define N 1000
  • int i, aN, bN, cN
  • pragma omp parallel for
  • for (i 0 i lt N i)
  • ai bi ci

7
UMA vs. NUMA
  • Uniform memory access (UMA) memory access time
    is same for all processors and all addresses
  • UMA processors are also known as symmetric
    multiprocessors (SMPs)
  • Non-uniform memory access (NUMA) some memory
    accesses are faster than others depending on
    processor and address accessed

8
Cache Coherency
  • To reduce accesses to shared memory, each
    processor has its own cache
  • Possible for multiple copies of the same data
  • If one processor updates data in its cache,
    others need to notified

9
Snooping
  • Cache controllers watch the bus
  • If there's a write to a block in the cache,
    invalidate that block or update it with the new
    data

10
Directories
  • Snooping only works on a shared bus, where all
    processors can see all memory accesses
  • An alternative is directory-based cache coherence
    protocols
  • Information about each memory block and which
    caches contain those blocks is in the directory
  • Directory tells processors when to invalidate or
    write-back cache blocks

11
Message Passing
12
Message Passing
  • Also known as distributed memory
  • Each processor has its own private address space,
    separate from the other processors
  • All communication is explicit, using send and
    receive operations
  • Generally more difficult to program
  • Scales well

13
MPI Example
  • if (id 0)
  • MPI_Send(data, 1, MPI_INT, 1, ...)
  • MPI_Recv(data, 1, MPI_INT, 1, ...)
  • else if (id 1)
  • MPI_Send(data, 1, MPI_INT, 0, ...)
  • MPI_Recv(data, 1, MPI_INT, 0, ...)

14
Clusters
  • Network of PCs that operate as a single computer
    system
  • Connected by a local area network (LAN)
  • Cheap to build, but can be expensive to maintain
  • Take advantage of advances in uniprocessor
    technology
  • Poor performance on code with a lot of
    interprocessor communication

15
Networks
  • Bus
  • Ring
  • 2D Mesh
  • Hypercube
  • Multistage network (Omega)
  • Crossbar

16
Ring, Mesh Hypercube
17
Omega Network
18
Crossbar
19
Multithreading
  • Time-share a single processor in order to create
    the illusion of multiple virtual processors
  • Thread-specific state (PC, registers, etc.) has
    to be replicated
  • Expensive resources (functional units, cache,
    control logic) is shared between threads
  • Additional threads fill in stall cycles that
    would be present with a single thread

20
Multithreading
  • Fine-grain multithreading switch threads after
    every instruction
  • Coarse-grain multithreading switch threads on
    long latency operations, such as L2 cache miss
  • Simultaneous multithreading (SMT) execute
    multiple threads concurrently using unused issue
    slots in a wide issue superscalar

21
Multithreading
22
Multicore Processors
  • A multicore processor is a multiprocessor on a
    single chip
  • Also known as a chip multiprocessor (CMP)
  • Typically shared memory with a shared L2 cache
Write a Comment
User Comments (0)
About PowerShow.com