SIMD Architectures

About This Presentation

Title:

SIMD Architectures

Description:

Operations can be performed in parallel on each element of a large regular data ... When computers were large, could amortize the control portion of many replicated ... – PowerPoint PPT presentation

Number of Views:45

Avg rating:3.0/5.0

Slides: 25

Provided by: laxmib

Learn more at: http://www.cs.ucr.edu

Category:

more less

Transcript and Presenter's Notes

Title: SIMD Architectures

1
SIMD Architectures

Laxmi Narayan Bhuyan
http//www.cs.ucr.edu/bhuyan

2
(No Transcript)
3
(No Transcript)
4
(No Transcript)
5
(No Transcript)
6
(No Transcript)
7
(No Transcript)
8
(No Transcript)
9
(No Transcript)
10
Data Parallel Model

Operations can be performed in parallel on each
element of a large regular data structure, such
as an array
1 Control Processsor broadcast to many PEs (see
Ch. 1, Fig. 1-25, page 45 of CSG99)
When computers were large, could amortize the
control portion of many replicated PEs
Condition flag per PE so that can skip
Data distributed in each memory
Early 1980s VLSI gt SIMD rebirth 32 1-bit PEs
memory on a chip was the PE
Data parallel programming languages lay out data
to processor

11
Data Parallel Model

Vector processors have similar ISAs, but no data
placement restriction
SIMD led to Data Parallel Programming languages
Advancing VLSI led to single chip FPUs and whole
fast µProcs (SIMD less attractive)
SIMD programming model led to Single Program
Multiple Data (SPMD) model
All processors execute identical program
Data parallel programming languages still useful,
do communication all at once Bulk Synchronous
phases in which all communicate after a global
barrier

12
SIMD Programming High-Performance Fortran (HPF)

Single Program Multiple Data (SPMD)
FORALL Construct similar to Fork
FORALL (I1N), A(I) B(I) C(I), END
FORALL
Data Mapping in HPF
1. To reduce interprocessor communication
2. Load balancing among processors
http//www.npac.syr.edu/hpfa/
http//www.crpc.rice.edu/HPFF/

13
How does an SIMD computer work?

A Host computer is necessary to do the I/O
operations
The user program is loaded into the control
memory
The data is distributed to all the memory modules
The control unit decodes the instn and executes
it if it is a scalar instn. If it is a vector
instn, it broadcasts the control signals to the
PEs to do the executions
Before broadcasting the control signals, the CU
broadcasts an enable vector which will enable the
PEs

14
Masking and Data Routing Mechanisms

A,B,C working registers
Si status (1 active, 0 inactive)
Ri Data routing register
Di holds address
Ii Index register

15
Example
16
Matrix Multiplication
17
(No Transcript)
18
(No Transcript)
19
N N Mesh
20
The Illiac IV Architecture

Distributed memory architecture
64 PEs connected as an 8X8 2-D mesh with end
around connection

LDB Local Data Buffer
64, 64-bit each
PEM 2K X 64 bits memory

21
The Illiac IV Network
22
Maspar MP-1 Architecture

Configuration with 1K-16K PEs are available
Each PE has a 4-bit ALU, 1-bit logic unit, a
64-bit mantissa unit, a 16-bit exponent unit,
communication input and output ports
Each PE has 40 32-bit registers available to the
programmer
Each processor board has 1024 PEs arranges as 64
PE clusters (PECs) with 16 PEs per cluster
Each PEC is a chip connected to 8 neighbors via
an octagonal mesh
Another network, called Multistage Crossbar
Network, with three router stages gives a
function of 1024X1024 crossbar for routing from
any PEC to another PEC