Configuration and Programming of Heterogeneous Multiprocessors on a MultiFPGA System Using TMDMPI - PowerPoint PPT Presentation

About This Presentation
Title:

Configuration and Programming of Heterogeneous Multiprocessors on a MultiFPGA System Using TMDMPI

Description:

Recent area of academic and industrial focus. Background: Classes of HPC Machines. 10/14/09 ... of heterogeneous MPSoC across multiple FPGAs including hardware engines ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 44
Provided by: msal72
Category:

less

Transcript and Presenter's Notes

Title: Configuration and Programming of Heterogeneous Multiprocessors on a MultiFPGA System Using TMDMPI


1
Configuration and Programming of
Heterogeneous Multiprocessors on a Multi-FPGA
System Using TMD-MPI
  • by
  • Manuel Saldaña, Daniel Nunes, Emanuel Ramalho,
    and Paul Chow
  • University of Toronto
  • Department of Electrical and Computer Engineering

3rd International Conference on ReConFigurable
Computing and FPGAs (ReConFig06)San Luis Potosi,
MexicoSeptember, 2006
2
Agenda
  • Motivation
  • Background
  • TMD-MPI
  • Classes of HPC
  • Design Flow
  • New Developments
  • Example Application
  • Heterogeneity test
  • Scalability test
  • Conclusions

3
Motivation
How Do We Program This?
  • 64-MicroBlaze MPSoC
  • (Ring,2D-Mesh)
    topologies
  • XC4VLX160 Not the largest one!

4
Motivation
How Do We Program This?
512-MicroBlaze Multiprocessor System
5
Background Classes of HPC Machines
  • Class 1 Machines
  • Supercomputers or clusters of workstations

6
Background Classes of HPC Machines
  • Class 1 Machines
  • Supercomputers or clusters of workstations
  • Class 2 Machines
  • Hybrid network of CPU and FPGA hardware
  • FPGA acts as external co-processor to CPU

7
Background Classes of HPC Machines
  • Class 1 Machines
  • Supercomputers or clusters of workstations
  • Class 2 Machines
  • Hybrid network of CPU and FPGA hardware
  • FPGA acts as external co-processor to CPU
  • Class 3 Machines
  • FPGA-based multiprocessor
  • Recent area of academic and industrial focus

8
Background MPSoC and MPI
  • MPSoC (Class 3) has many similarities to typical
    multiprocessor computers (Class 1), but also many
    special requirements
  • Similar concepts but different implementations
  • MPI for MPSoC is desirable (TIMA labs, OpenFPGA,
    Berkeley BEE2, U. of Queensland, U. Rey Juan
    Carlos, UofT TMD,...)
  • MPI is a broad standard and designed for big
    machines
  • MPI Implementations are too big for embedded
    systems

9
Background TMD-MPI
MPSoC (TMD-MPI)
Linux Cluster (MPICH)
the same code
10
Background TMD-MPI
Use multiple chips to have massive resources
TMD-MPI hides the complexity
11
Background TMD-MPI
Implementation Layers
Application
MPI Application Interface
Point-to-Point MPI
TMD-MPI
Communication Functions
Hardware Access Functions
Hardware
12
Background TMD-MPI
  • Point-to-Point
  • MPI_Send
  • MPI_Recv

MPI Functions Implemented
  • Miscellaneous
  • MPI_Init
  • MPI_Finalize
  • MPI_Comm_Rank
  • MPI_Comm_Size
  • MPI_Wtime
  • Collective Operations
  • MPI_Barrier
  • MPI_Bcast
  • MPI_Gather
  • MPI_Reduce

13
Background Design Flow
Flexible Hardware-Software Co-design Flow
  • Previous work
  • Patel et al.1 (FCCM 2006)
  • Saldaña et al.2 (FPL 2006)

ReConFig06
14
New Developments
TMD-MPI for MicroBlaze
TMD-MPI for PowerPC405
TMD-MPE for Hardware engines
15
New Developments TMD-MPE and TMD-MPI light
TMD-MPI
mP
TMD-MPI light
mP
TMD-MPI
TMD-MPI light
Hardware Engine With Message-Passing
16
New Developments
TMD-MPE uses the Rendezvous message-passing proto
col
17
New Developments
  • TMD-MPE includes
  • message queues to keep track of unexpected
    messages
  • packetizing/depacketizing logic to handle large
    messages

top
queue
18
Heterogeneity Test
Heat Equation Application / Jacobi Iterations
Observe the change of temperature distribution
over time
TMD-MPI
TMD-MPE
TMD-MPI
19
Heterogeneity Test
Heat Equation Application / Jacobi Iterations
TMD-MPI
TMD-MPE
TMD-MPI
20
Heterogeneity Test
MPSoC Heterogeneous Configurations (9 Processing
Elements, single FPGA)
21
Heterogeneity Test
Execution Time
22
Scalability Test
  • Heat Equation Application
  • 5 FPGAS (XC2VP100) (7 mB 2 PPC405 per FPGA)
  • 45 Processing Elements (35 mB 10 PPC405)

23
Scalability Test
Fixed-size Speedup up to 45 Processors
24
UofT TMD Prototype
25
Conclusions
  • TMD-MPI and TMD-MPE enable the parallel
    programming of heterogeneous MPSoC across
    multiple FPGAs including hardware engines
  • TMD-MPI hides the complexity of using
    heterogeneous links
  • The Heat equation application code was executed
    in a Linux Cluster and in our multi-FPGA system
    with minimal changes
  • TMD-MPI can be adapted to a particular
    architecture
  • TMD prototype is a good platform for further
    research on MPSoC

26
References
  • 1 Arun Patel, Christopher Madill, Manuel
    Saldaña, Christopher Comis, Régis Pomès, and Paul
    Chow. A Scalable FPGA-based Multiprocessor. In
    IEEE Symposium on Field-Programmable Custom
    ComputingMachines (FCCM06), April 2006
  • 2 Manuel Saldaña and Paul Chow. TMD-MPI An
    MPI Implementation for Multiple Processors across
    Multiple FPGAs.
  • In IEEE International Conference on
    Field-Programmable Logic and Applications (FPL
    2006), August 2006.

27
  • Thank you!
  • (Gracias!)

28
Rendezvous Overhead
Rendezvous Synchronization Overhead
29
Testing the Functionality
TMD-MPIbench
on-chip communication
Internal RAM (BRAM)
off-chip communication
round-trip tests
on-chip communication
External RAM (DDR)
off-chip communication
30
TMD-MPI Implementation
TMD-MPI communication protocols
31
Communication Tests
  • TMD-MPIbench.c
  • round trip
  • bisection bandwidth
  • round trips with congestion
  • worst case traffic scenario
  • all-node broadcasts
  • synchronization performance (barriers/sec)

32
Communication Tests
  • Latency

33
Communication Tests
MicroBlaze throughput limit with external RAM
34
Communication Tests
MicroBlaze throughput limit with internal RAM
Memory access time
MicroBlaze throughput limit with external RAM
35
Communication Tests
Startup Overhead
Frequency
Measured Bandwidth _at_ 40 MHz
P4-Cluster
P3-NOW
36
Communication Tests
37
Many variables are involved
38
Background TMD-MPI
  • TMD-MPI provides a parallel programming model for
    MPSoC in FPGAs with the following features
  • Portability - application unaffected by changes
    in HW
  • Flexibility - to move from generic to
    application-specific
  • Scalability - for large scale applications
  • Reusability - do not learn a new API for similar
    applications

39
Testing the Functionality
40
Testing the Functionality
41
New Developments TMD-MPE
TMD-MPE use and the network
42
Background TMD-MPI
  • TMD-MPI
  • is a lightweight subset of the MPI standard
  • is tailored to a particular application
  • does not require an Operating System
  • has a small memory footprint 8.7KB
  • uses a simple protocol

43
New Developments TMD-MPE and TMD-MPI light
TMD-MPI
TMD-MPI light
TMD-MPI
TMD-MPI light
Write a Comment
User Comments (0)
About PowerShow.com