Title: Reconfigurable Architecture Exploration for Speeding up Execution of Control Code Generated from Hig
1Reconfigurable Architecture Exploration for
Speeding up Execution of Control Code Generated
from High-Level Specifications
- Frank Gennari and Yatish Patel
- Mentors William Y. Jiang, Luciano Lavagno, and
Massimo Baleani -
2Project Goals
- Evaluate EFSM runtime performance based on a
traditional embedded processor - Evaluate various reconfigurable architectures to
improve performance - Develop an algorithm to map EFSM to the new
architecture - Simulate and profile results
3Ideas ? FSM ? C
- Reconfigurable Co-processor architecture (FPGA
?-controller) or custom integrated FPGA as
functional unit
- Automatic code generation (C code FPGA control
blocks)
4Add a new functional unit
Data memory
instruction memory
PC
registers
ALU
4
FPGA
Example Tensilica Xtensa
5FPGA as a Co-processor
Processor
I-Cache
Memory
D-Cache
Example GARP
6FPGA as Functional Unit
- Decreased communication overhead to transfer data
to the FPGA - Co-processor requires n cycles per value sent to
FPGA m cycles per value output from FPGA
compared to 1 cycle to access all values directly
from the register file - Possible Synchronization Issues with multiple
clock domains
7Modified GCC Tool Set
- Generates MIPS R3000 Assembly
- Contains support for special FPGA instructions
- Cycle count specified in instruction
- Processor simulator to generate performance
statistics - Cycle counts for each line of C
8Specifying FPGA Instructions
- Insert pragmas around blocks that will be
implemented on the FPGA - int bar (int a, int b)
- int c
-
- pragma fpga shift_add 0x12 5 1 2 c a b
- c (a ltlt 2) b
- pragma end
-
- return ca
9Optimal Mapping
- What do we map on FPGA vs MCU
- All control implemented on FPGA
- All data implemented on FPGA
- Some combination?
10Tool Flow
Esterel
strl2shift
DC
Shift
Shiftv6.0
POLIS
BLIFMV
Cadence
MVSIS
GCC
C
11MVSIS Node Types
D
D
D
D
MV
MV
D
D
Data
C
P
Mux
MV
D
MV
MV
D
I1
IgtJ
C Sel? A B
C A B
12Clubbing Nodes Together
Primary Input
Delay Area
Primary Output
Control Node
Data Node
13Simulator Output
0 / I97 AUX_ACT_322_0_605_0_0_0 / 0
_N97_L0 / mid423 / 2737 if (!(0x1
I94)) 0 goto _N97_L2 0 0
_N97_L1 / mid734 / 0 I97 0 1000
goto _N97_END 0 _N97_L2 / mid426
/ 1315 if (!(0x1 I96)) 0 goto
_N97_L1 0 0 _N97_L3 / mid734 /
0 I97 1 0 goto _N97_END 0
_N97_END 0 / I98 AUX_ACT_320_0_600_0_0
_0 /
14MVSIS Output
pragma fpga FPGA_OP_4 0x4 1 2 2 I8 I9
state_V__225 state_V__227 / I8 exp859_out
/ I8state_V__225 1 / I9 exp886_out
/ I9state_V__227 1pragma end
15Source of Sample Example
- HW/SW Co-Design of a Multi-Injection Driver for
Engine Control systems - Alessandra Nardi and Fan Mo, Fall1999 EE249
Project - Describes basic behavior of a driver for engine
control systems - Fuel Injection profiles
16Results
17Improvements
- Bit Packing
- Possible better clubbing algorithm
- Constant propagation in control/data blocks
- Better automated LUT count generation
18Conclusion
- Everything boils down to cost/performance ratio
- An architecture using an FPGA functional unit
fits well in this context - Bottlenecks seem to be in the data portion,
however data nodes require more LUTs