Chapter 6 Multiprocessor System - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

Chapter 6 Multiprocessor System

Description:

Chapter 6 Multiprocessor System Introduction Each processor in a multiprocessor system can be executing a different instruction at any time. The major advantages of ... – PowerPoint PPT presentation

Number of Views:136

Avg rating:3.0/5.0

Slides: 25

Provided by: 8332

Category:

more less

Transcript and Presenter's Notes

Title: Chapter 6 Multiprocessor System

1
Chapter 6 Multiprocessor System
2
Introduction

Each processor in a multiprocessor system can be
executing a different instruction at any time.
The major advantages of MIMD system
Reliability
High performance
The overhead involved with MIMD
Communication between processors
Synchronization of the work
Waste of processor time if any processor runs out
of work to do
Processor scheduling

3
Introduction (continued)

task
An entity to which a processor is assigned
a program, a function or a procedure in execution
process
another word for a task
processor (or processing element)
hardware resource on which tasks are executed

4
Introduction (continued)

Thread
The sequence of tasks performed in succession by
a given processor
The path of execution of a processor through a
number of tasks.
Multiprocessors provide for the simultaneous
presence of a number of threads of execution in
an application.
Refer to Example 6.1 (degree of parallelism
3)

5
R-to-C ratio

A measure of how much overhead is produced per
unit of computation.
R the length of the run time of the task
(computation time)
C the communication overhead
This ratio signifies task granularity
A high R-to-C ratio implies that communication
overhead is insignificant compared to computation
time.

6
Task granularity

Task granularity
Coarse grain parallelism
High R-to-C ratio
Fine grain parallelism
Low R-to-C ratio
The general tendency to maximum performance is to
resort to the finest possible granularity. ?
providing for the highest degree of parallelism.
Maximum parallelism does not lead to maximum
overhead. ? a trade-off is required to reach an
optimum level.

7
6.1 MIMD Organization(Figure 6.2)

Two popular MIMD organizations
Shared memory (or tightly coupled ) architecture
Message passing (or loosely coupled) architecture
Share memory architecture
UMA (uniform memory architecture)
Rapid memory access
Memory contention

8
6.1 MIMD Organization (continued)

Message-passing architecture
Distributed memory MIMD system
NUMA (nonuniform memory access)
Heavy communication overhead for remote memory
access
No memory contention problem
Other models
Mixed of two

9
6.2 Memory Organization

Two parameters of interest in MIMD memory system
design
bandwidth
latency.
Memory latency is reduced by increasing the
memory bandwidth.
By building the memory system with multiple
independent memory modules (Banked and
interleaved memory architecture)
By reducing the memory access and cycle times

10
Multi-port memories

Figure 6.3 (b)
Each memory module is a three-port memory device.
All three ports can be active simultaneously.
The only restriction is that only one location
can be write data into a memory location.

11
Cache incoherence

The problem wherein the value of a data item is
not consistent throughout the memory system.
Write-through
A processor updates the cache and also the
corresponding entry in the main memory.
Updating protocol
Invalidating protocol
Write-back
An updated cache-block is written back to the
main memory just before that block is replaced in
the cache.

12
6.2 Memory Organization (continued)

Cache coherence schemes
Not to use private caches (Figure 6.4)
With private cache architecture, but to cache
only non-sharable data items.
Cache flushing
Shared data are allowed to be cached only when
it is known that only one processor will be
accessing the data

13
6.2 Memory Organization (continued)

Cache coherence schemes (continued)
Bus watching (or bus snooping) (Figure 6.5)
Bus watching schemes incorporate hardware that
monitors the shared bus for data LOAD and STORE
into each processors cache controller.
Write-once
The first STORE causes a write-through to the
main memory.
Ownership protocol

14
6.3 Interconnection Network

Bus (Figure 6.6)
Bus window (Figure 6.7(a))
Fat tree (Figure 6.7 (b))
Loop or ring
token ring standard
Mesh

15
6.3 Interconnection Network(continued)

Hypercube
Routing is straightforward.
The number of nodes must be increased by powers
of two.
Crossbar
It offers multiple simultaneous communications
but at a high hardware complexity.
Multistage switching networks

16
6.4 Operating System Considerations

The major functions of the multiprocessor system
Keeping track of the status of all the resources
at all time
Assigning tasks to processors in a justifiable
manner
Spawning and creating new processors such that
they can be executed in parallel or independently
of each other.
Collecting their individual results when all the
spawned processed are completed and passing them
to other processors as required.

17
6.4 Operating System Considerations (continued)

Synchronization mechanisms
Processes in an MIMD operate in a cooperative
manner and a sequence control mechanism is needed
to ensure the ordering of operations.
Processes compete with each other to gain access
to shared data items.
An access control mechanism is needed to maintain
orderly access

18
6.4 Operating System Considerations (continued)

Synchronization mechanisms
The most primitive synchronization techniques
Test set
Semaphores
Barrier synchronization
Fetch add
Heavy-weight process and Light-weight process
Scheduling
Static
Dynamic load balancing

19
6.5 Programming (continued)

Four main structures of parallel programming
Parbegin / parend
Fork / join
Doall
Processes, tasks, procedures, and so on can be
declared for parallel execution.

20
6.6 Performance Evaluation and Scalability

Performance evaluation
Speed-up S Ts / Tp
To TpP-Ts ? Tp(ToTs)/P
S Ts P/(ToTs)
Efficiency E S/p
Ts/(TsTo)
1/(1To/Ts)

21
Scalability

Scalability the ability to increase speedup as
the number of processors increase.
A parallel system is scalable if its efficiency
can be maintained at a fixed value by increasing
the number of processors as the problem size
increases.
Time-constrained scaling
Memory-constrained scaling

22
Isoefficiency function

E 1/(1To/Ts)
? To/Ts(1-E)/E.
Hence, TsETo/(1-E)
For a given value of E, E/(1-E) is a constant,
K.
Then TsKTo (Isoefficency function)
A small isoeffiency function indicates that small
increments in problem size are sufficient to
maintain efficiency when p is increased.

23
6.6 Performance Evaluation and Scalability
(continued)

Performance models
The basic model
Each task is equal and takes R time units to be
executed on a processor.
If two tasks on different processors wish to
communicate with each other, they do so at a cost
C time units.
Model with linear communication overhead
Model with overlapped communication
Stochastic model

24
Examples