Models of Parallel Processing - PowerPoint PPT Presentation

1 / 20

About This Presentation

Title:

Models of Parallel Processing

Description:

Number of Views:45

Avg rating:3.0/5.0

Slides: 21

Provided by: spe9

Category:

Tags: chordal | models | parallel | processing

Transcript and Presenter's Notes

Title: Models of Parallel Processing

1
Models of Parallel Processing
by Shietung Peng
2
SIMD Parallel Computers

In a SIMD computer, each processor can execute or
ignore the instruction being broadcast based on
its local state or data-dependent conditions.
However, this leads to some inefficiency in
executing conditional computations. A possible
cure is to use the asynchronous version of SIMD,
known as SPMD.
A SIMD computer can be designed based on
commodity (off-the-shelf) components or with
custom chips.

3
MIMD Parallel Computers

MIMD computers are most effective for medium- to
coarse-grain parallel applications, where the
computation is divided into relatively large
tasks whose executions are assigned to the
various processors.
Within the MIMD, there are three important design
issues
Massively or moderately parallel processors.
Tightly or loosely coupled MIMD.
Explicitly message passing or virtual shared
memory.

4
Global Versus Distributed Memory

A global-memory multiprocessor is characterized
by the type and number p of processors, the
capacity and number m of network modules, and the
network architecture.
Example networks include crossbar, single or
multiple busses, and multistage interconnection
networks (MIN).

5
Global Versus Distributed Memory

A distributed-memory multi-computer is a
collection of p processors, each with its own
private memory, communicates through an
interconnection network.
Distributed-memory MIMD can be interconnected by
a variety of direct networks. Examples of direct
networks will be introduced later.

6
The PRAM Shared-memory Model

7
The PRAM Shared-memory Model

The PRAM model is highly theoretical. If one were
to build a physical PRAM, the processor-to-memory
connectivity would have to be realized by an
interconnection network.
The figure below shows PRAM with some hardware
details.

8
The Graph (Distributed-memory) Model

The networks used by distributed-memory computers
is usually represented as a graph.
The important parameters of an interconnection
network include diameter, bisection bandwidth,
and node degree.
Network diameter the longest of the shortest
paths between various pairs of nodes, which
should be relatively small if network latency is
to be minimized.
Bisection bandwidth the smallest number of links
that need to be cut in order to divided the
network into two sub-networks of half the size.
Node degree the number of communication ports
required of each node, which should be a constant
if the architecture is to be scalable to larger
sizes.

9
The Sea of Interconnection Networks
10
Topological Parameters
11
Associative Memory (AM) Model

12
Scalable Parallel Computer Architectures

13
Pitfalls of Scaling up
14
The Cluster Computer Architecture

15
Hierarchical-bus architectures

A variety of hierarchical-bus architectures are
available for reducing bus traffic by taking
advantage of the locality of communication within
small clusters of processors.
An example of hierarchical interconnection
network

16
Abstract Models for Distributed-memory MIMD

The development of efficient algorithms suffers
from the proliferation of available
interconnection networks, for algorithm design
must be done virtually from scratch for each new
architecture.
It would be nice if we could abstract away the
effects of the interconnection topology in order
to free the algorithm designer from a lot of
machine-specific details.
The idea is to replace the topological
information with a small number of parameters
that capture the effect of interconnection
topology highly accurately.

17
The LogP Model

In LogP model, the communication architecture of
a parallel computer is captured in four
parameters
L Latency upper bound when a small message is
sent from an arbitrary source node to an
arbitrary destination node.
o The overhead defined as the length of time
when a processor is dedicated to the transmission
or reception of a message.
g The gap defined as the minimum time that must
elapse between consecutive message transmissions
or receptions by a single processor.
p Processor multiplicity.

18
Exercise 3

Associative processing
Devise an AM algorithm to find the largest number
among the m unsigned integers in the memory.
Devise an AM algorithm to find the kth largest
number among the m unsigned integers in the
memory.
Extend the above algorithm to deal with signed
integers in 2s- complement format (the sign bit
carries a negative weight so that 1010 represents
-8 2 -6).

19
Exercise 3

Topological parameters add entries of the
following topologies in Table shown in the
lecture note.
An X-tree a complete binary tree with nodes on
the same level connected as a linear array.
A hierarchical bus architecture with a maximum
branching factor b.
A degree-4 chordal ring with skip distance s
i.e., a p-node ring in which processor i is also
connected to processor is mod p and i-s mod p.

20
Exercise 3

Consider the hierarchical multilevel bus
architecture with four processors in each of the
low-level clusters. Consider the shear-sort
algorithm and assume that each transfer over a
shared bus to another processor or to a switch
node takes unit time.
How long does this system take to emulate
shear-sort on a 4-by-6 mesh if each processor
holds a single data item and each cluster emulate
a column of the mesh?
How long does this system take to emulate
shear-sort on a 6-by-4 mesh?
Devise an algorithm for performing a parallel
prefix computation on this architecture.