Title: Configuration and Programming of Heterogeneous Multiprocessors on a MultiFPGA System Using TMDMPI
1Configuration and Programming of
Heterogeneous Multiprocessors on a Multi-FPGA
System Using TMD-MPI
- by
- Manuel Saldaña, Daniel Nunes, Emanuel Ramalho,
and Paul Chow - University of Toronto
- Department of Electrical and Computer Engineering
3rd International Conference on ReConFigurable
Computing and FPGAs (ReConFig06)San Luis Potosi,
MexicoSeptember, 2006
2Agenda
- Motivation
- Background
- TMD-MPI
- Classes of HPC
- Design Flow
- New Developments
- Example Application
- Heterogeneity test
- Scalability test
- Conclusions
3Motivation
How Do We Program This?
- 64-MicroBlaze MPSoC
- (Ring,2D-Mesh)
topologies - XC4VLX160 Not the largest one!
4Motivation
How Do We Program This?
512-MicroBlaze Multiprocessor System
5Background Classes of HPC Machines
- Class 1 Machines
- Supercomputers or clusters of workstations
6Background Classes of HPC Machines
- Class 1 Machines
- Supercomputers or clusters of workstations
- Class 2 Machines
- Hybrid network of CPU and FPGA hardware
- FPGA acts as external co-processor to CPU
7Background Classes of HPC Machines
- Class 1 Machines
- Supercomputers or clusters of workstations
- Class 2 Machines
- Hybrid network of CPU and FPGA hardware
- FPGA acts as external co-processor to CPU
- Class 3 Machines
- FPGA-based multiprocessor
- Recent area of academic and industrial focus
8Background MPSoC and MPI
- MPSoC (Class 3) has many similarities to typical
multiprocessor computers (Class 1), but also many
special requirements - Similar concepts but different implementations
- MPI for MPSoC is desirable (TIMA labs, OpenFPGA,
Berkeley BEE2, U. of Queensland, U. Rey Juan
Carlos, UofT TMD,...) - MPI is a broad standard and designed for big
machines - MPI Implementations are too big for embedded
systems
9Background TMD-MPI
MPSoC (TMD-MPI)
Linux Cluster (MPICH)
the same code
10Background TMD-MPI
Use multiple chips to have massive resources
TMD-MPI hides the complexity
11Background TMD-MPI
Implementation Layers
Application
MPI Application Interface
Point-to-Point MPI
TMD-MPI
Communication Functions
Hardware Access Functions
Hardware
12Background TMD-MPI
- Point-to-Point
- MPI_Send
- MPI_Recv
MPI Functions Implemented
- Miscellaneous
- MPI_Init
- MPI_Finalize
- MPI_Comm_Rank
- MPI_Comm_Size
- MPI_Wtime
- Collective Operations
- MPI_Barrier
- MPI_Bcast
- MPI_Gather
- MPI_Reduce
13Background Design Flow
Flexible Hardware-Software Co-design Flow
- Previous work
- Patel et al.1 (FCCM 2006)
- Saldaña et al.2 (FPL 2006)
ReConFig06
14New Developments
TMD-MPI for MicroBlaze
TMD-MPI for PowerPC405
TMD-MPE for Hardware engines
15New Developments TMD-MPE and TMD-MPI light
TMD-MPI
mP
TMD-MPI light
mP
TMD-MPI
TMD-MPI light
Hardware Engine With Message-Passing
16New Developments
TMD-MPE uses the Rendezvous message-passing proto
col
17New Developments
- TMD-MPE includes
- message queues to keep track of unexpected
messages - packetizing/depacketizing logic to handle large
messages
top
queue
18Heterogeneity Test
Heat Equation Application / Jacobi Iterations
Observe the change of temperature distribution
over time
TMD-MPI
TMD-MPE
TMD-MPI
19Heterogeneity Test
Heat Equation Application / Jacobi Iterations
TMD-MPI
TMD-MPE
TMD-MPI
20Heterogeneity Test
MPSoC Heterogeneous Configurations (9 Processing
Elements, single FPGA)
21Heterogeneity Test
Execution Time
22Scalability Test
- Heat Equation Application
- 5 FPGAS (XC2VP100) (7 mB 2 PPC405 per FPGA)
- 45 Processing Elements (35 mB 10 PPC405)
23Scalability Test
Fixed-size Speedup up to 45 Processors
24UofT TMD Prototype
25Conclusions
- TMD-MPI and TMD-MPE enable the parallel
programming of heterogeneous MPSoC across
multiple FPGAs including hardware engines - TMD-MPI hides the complexity of using
heterogeneous links - The Heat equation application code was executed
in a Linux Cluster and in our multi-FPGA system
with minimal changes - TMD-MPI can be adapted to a particular
architecture - TMD prototype is a good platform for further
research on MPSoC
26References
- 1 Arun Patel, Christopher Madill, Manuel
Saldaña, Christopher Comis, Régis Pomès, and Paul
Chow. A Scalable FPGA-based Multiprocessor. In
IEEE Symposium on Field-Programmable Custom
ComputingMachines (FCCM06), April 2006 - 2 Manuel Saldaña and Paul Chow. TMD-MPI An
MPI Implementation for Multiple Processors across
Multiple FPGAs. - In IEEE International Conference on
Field-Programmable Logic and Applications (FPL
2006), August 2006.
27 28Rendezvous Overhead
Rendezvous Synchronization Overhead
29Testing the Functionality
TMD-MPIbench
on-chip communication
Internal RAM (BRAM)
off-chip communication
round-trip tests
on-chip communication
External RAM (DDR)
off-chip communication
30TMD-MPI Implementation
TMD-MPI communication protocols
31Communication Tests
- TMD-MPIbench.c
- round trip
- bisection bandwidth
- round trips with congestion
- worst case traffic scenario
- all-node broadcasts
- synchronization performance (barriers/sec)
32Communication Tests
33Communication Tests
MicroBlaze throughput limit with external RAM
34Communication Tests
MicroBlaze throughput limit with internal RAM
Memory access time
MicroBlaze throughput limit with external RAM
35Communication Tests
Startup Overhead
Frequency
Measured Bandwidth _at_ 40 MHz
P4-Cluster
P3-NOW
36Communication Tests
37Many variables are involved
38Background TMD-MPI
- TMD-MPI provides a parallel programming model for
MPSoC in FPGAs with the following features - Portability - application unaffected by changes
in HW - Flexibility - to move from generic to
application-specific - Scalability - for large scale applications
- Reusability - do not learn a new API for similar
applications
39Testing the Functionality
40Testing the Functionality
41New Developments TMD-MPE
TMD-MPE use and the network
42Background TMD-MPI
- TMD-MPI
- is a lightweight subset of the MPI standard
- is tailored to a particular application
- does not require an Operating System
- has a small memory footprint 8.7KB
- uses a simple protocol
43New Developments TMD-MPE and TMD-MPI light
TMD-MPI
TMD-MPI light
TMD-MPI
TMD-MPI light