Title: Optimization of Parallel Task Execution on the Adaptive Reconfigurable Group Organized Computing Sys
1Optimization of Parallel Task Execution on the
Adaptive Reconfigurable Group Organized Computing
System
Presenter Lev Kirischian Department of
Electrical and Computer Engineering RYERSON
Polytechnic University Toronto, Ontario, CANADA
2Application of parallel computing systems for
data-flow tasks
- Digital signal processing (DSP)
- High performance control Data acquisition
- Digital communication and broadcasting
- Cryptography and data security
- Process modeling and simulation.
3Presentation of a data-flow task in the form of a
data-flow graph
Data In
MO 1 - MO n - Macro-operators, e.g. digital
filtering, FFT, matrix scaling, etc.
Data Out
4Correspondence between task and computing system
architecture
- If the data-flow task is processed on
conventional SISD architecture processing
time often cannot satisfy specification
requirements - If the task is processed on SIMD or MIMD
architectures - cost-effectiveness of these
parallel computers strongly depend on the task
algorithm or data structure. - One of possible solutions to reach required
cost-performance requirements is to develop a
custom computing system where architecture
covers data-flow graph of the task.
5Limitations for the custom computing systems
with fixed architecture
1. Decrease of performance if task algorithm
or data structure changes 2. No possibility for
further modernization 3. High cost for
multi-task or multi-mode custom computing
systems.
6One of possible solutions Reconfigurable
parallel computing systems 1. Ability for
custom configuration of each processing
(functional) unit for a specific macro-operator
2. Ability for custom configuration of
information links between functional units
The above features allow hardware customization
for any data-flow graph and reconfiguration when
task processing is completed.
7Example of FPGA-based system with architecture
configured for the data-flow task
8- Concept of Group Processor in the reconfigurable
computing system - Group Processor (GP) a group of computing
resources dedicated for the task and configured
to reflect the task requirements.
9Group processor life- cycle 1. In the GP -links
and functional units are configured before
task processing 2. GP performs the task as long
as it is necessary without interruption or
time sharing with any other task 3. After
task completion all resources included in
the GP can be reconfigured for any other task.
10The concept of Reconfigurable Group Organized
computing system
Data Stream
Input / Output data bus
I/O
I/O
I/O
Functional Unit (FU)
Functional Unit (FU)
Functional Unit (FU)
Reconfigurable Interface Module (RIM)
Reconfigurable Interface Module (RIM)
Reconfigurable Interface Module (RIM)
Virtual Bus
Configuration Bus
Host PC
11Parallel processing of different tasks on the
separated Group Processors
Data out 2
Data out 3
Data in 2
GP 2
GP1 for Task 1
GP 3
Data out 1
I/O
I/O
I/O
I/O
Data in 1
FU 3
FU 2
FU 1
FU 4
Virtual Bus
12Concept of adaptation of the Group Processor
architecture on the task
- Architecture-to-task adaptation for the GP
- selection of resources configuration which
- satisfies all requirements for task processing
- (e.g. performance, data throughput,
reliability, etc.) - requires minimal hardware (I.e. logic gates)
Data in
Memory
Memory
Multiplier
Adder
Filter
TIME
T0 T1 T2
13Virtual Hardware Objects - the resource base of
reconfigurable computing system
- For FPGA-based systems all architecture
components (resources) can be presented as
Virtual Hardware Objects (VHOs) described in one
of the hardware description languages (for
example VHDL or AHDL) - Each resource can be presented in different
variants Ri,j, where i indicates the type of
resource (adder, multiplier, interface module,
etc.) and j- indicates variant of resource
presentation in the architecture (for example
8-bit adder, 16-bit adder, etc.).
14Concept of Architecture Configuration Graph (ACG)
Multiplier
Adder
Adder
Adder
Bus
Bus
Bus
Bus
Bus
Bus
1
2
3
4
5
6
7
8
9
10
11
12
15Architecture Configurations Graph arrangement
Architecture graph partial arrangement requires
two procedures 1. Local arrangement and
2. Hierarchic arrangement
Local arrangement of variants for each type of
system resources
Adder
40 nS
20 nS
Processing time
16Hierarchical arrangement of system resources
Arrangement criteria - K(Ri ) T max(Ri) -
Tmin (Ri) / (mi - 1)
Multiplier
Adder
80nS
20nS
40nS
20nS
40nS
Adder
Adder
Adder
Multiplier
Multiplier
40nS
80nS
20nS
1
2
3
4
5
6
1
2
3
4
5
6
20nS
120 80 60 100 60 40
120 100 80 60 60 40
120 - 60
120 - 100 K(Mult) ----------- 30
K(Adder) ------------ 20
3 - 1
2 - 1
17Selection of Group Processor architecture based
on the arranged ACG
Required processing time for the task Y A X
B is T
Multiplier
80nS
20nS
40nS
Adder
Adder
Adder
40nS
20nS
1
2
3
4
5
6
120 100 80 60 60 40
GP-architecture Multiplier (2) Adder (1)
Required performance
18Number of experiments for GP-architecture selectio
n
N (GP opt ) ( n 1 ) log 2 (m 1 m 2 ...m n
) n - number of resources (VHO)
included in the architecture of the
Group Processor m i - number of variants of
each type of resources
Example If n 16 and m1 m 2 m n
32 Total number of experiments (task run on
estimated GP-architecture) N (GP opt) 16 1
16 5 97
19Self-adaptation mechanism for FPGA-based
reconfigurable data-flow computing systems
Host - PC
Data Source
Architecture generator
Configuration Bus
Library of Virtual Hardware Objects
Reconfigurable platform
Architecture Selector
Performance Analyzer
20First prototype of Adaptive Reconfigurable Group
Organized (ARGO) computing platform
21Data Flow Graph for DVB MPEG2 processing
Input Data Streem - MPEG 2
Synchro-Signal Detect
PCR - detection
Null-packet analysis removing
Reference Frequency
Output frequency adjustment
PCR re-stamping
Output MPEG 2 data stream
22Architecture selection time for 6-mode DVB MPEG 2
stream processor
1. Average time for each architecture
configuration- 7.18 mS 2. Average time for
GP-architecture selection (for the specific
mode) -
175.6 mS 3.Total time for architecture selections
for all modes-1.054 S
23Hardware implementation of DVB MPEG 2 stream
processor for mode 1 and 4
Input Data -MPEG 2 stream
FU 1 (8 bit In- port)
FU 1
Synchro-Signal Detect
PCR - detection
Null-packet analysis removing
Virtual bus (16 lines)
FU 2
Output frequency adjustment
Reference Frequency
PCR re-stamping
FU 2 Out-port
Output MPEG 2 data stream
24Hardware implementation of DVB MPEG stream
processor for modes 2, 3, 5 and 6
Input Data -MPEG 2 stream
FU 1 (8 bit In- port)
FU 1
Synchro-Signal Detect
PCR - detection
Null-packet analysis removing
Virtual bus (16 lines)
FU 2
Reference Frequency
Output frequency adjustment
FU 3
PCR re-stamping
FU 3 Out-port
Output MPEG 2 data stream
25Summary
1. Adaptive Reconfigurable Group Organized (ARGO)
parallel computing system - FPGA-based
configurable system with ability for adaptation
on the task algorithm / data structure. 2. ARGO
-system allows parallel processing of different
data-flow tasks on the dynamically configured
Group Processors (GPs), where each
GP-architecture configuration corresponds to the
algorithm / data specifics of the task assigned
to this processor. 3. Above principles allows
development of cost-effective parallel computing
systems with programmable performance and
reliability with minimum cost of hardware
components and development time.