Title: Designing a Runtime Reconfigurable Processor for General Purpose Applications
1Designing a Runtime Reconfigurable Processor for
General Purpose Applications
- Niyonkuru and Zeidler
- Universitat der Bundeswehr Hamburg
2- Basic goal
- An easy-to-use runtime reconfigurable processor
based on dynamic reconfiguration - General purpose Runs any application with
comparable performance
3Related Work
- Fixed conventional microarchitecture with
reconfigurable processing units - Act as a coprocessor
- Programs have to be partitioned
- One code portion executed on conventional path
- The other executed on reconfigurable path
- Programmers have to
- Invoke precompiled HW lib, or
- Code in HW themselves
- PRISC, DISC, CoMPARE, GARP, OneChip, etc.
4Related Work (contd.)
- The SCORE execution model
- Based on three essential components
- Compute page
- Memory segment
- Stream link
- An application is partitioned into consecutive
operators (multipliers, FFT, FIR-filter, etc.) - A conventional processor is required to sequence
the compute pages
5Related Work (contd.)
- The MATRIX architecture
- A flexible architecture that allows the
definition of microarchitecture for each
application - Basic functional units (BFU)
- Contains local memory, 8-bit ALU, Control logic
- Can be configured to instruction memory, data
memory, datapath element or control element - Hierarchical BFU interconnection
- Hand-coded mapping of algorithms on BFUs and
connection setup
6Related Work (contd.)
- Flexible instruction processors (FIPs)
- Processor templates that allow processor types to
be dynamically configured thru predefined params - Adapt the implementation to applications during
execution - Application behaviors determined by runtime
statistics and used to determine suitable
microarchitecture
7Related Work (contd.)
- Complexity Adaptive Processor (CAP)
- HW complexity and processor clock cycle adapted
at runtime - Augment conventional HW with partition enable
signals that turn HW partitions on/off - Reduced power dissipation
- Improved performance
8Related Work (contd.)
- All of the above make use of HW reconfiguration
to enhance performance - However, they require different HW/SW tools to
map applications on them - The authors approach an enhanced runtime
reconfigurable architecture compatible with
existing general purpose processors
9Designing A Partial Runtime Reconfigurable
Processor
- The choice of suitable device
- SRAM-based programmable device Xilinx Virtex-II
FPGA - Practical design flow Xilinx module based
partial reconfig design flow - Appropriate processor architecture
- 16-bit ARM Thumb ISA
- Software development tool chain can be used
directly
10Proposed Microarchitecture
11Proposed Microarchitecture (contd.)
- Instruction Memory (IM)
- Use dual-port block SelectRAM to get 64-bit BW
- Four 16-bit instructions can be fetched per cycle
- Fetch Unit / Predecoder (FU/P)
- Provides valid instruction addr to IM or trace
cache - Fetch from IM at program start or trace cache
miss - Fetch from trace cache on trace cache hit
- Notify configuration manager about execution
units needed after opcode predecode
12Proposed Microarchitecture (contd.)
- Trace Cache (TC)
- Originally to avoid instruction supply bottleneck
- In this paper used to determine HW resources
required at runtime - Decoder
- Act as a conventional instruction decoder
decodes instructions, reads operands from reg
file and sends to RUU
13Proposed Microarchitecture (contd.)
- Register Update Unit (RUU)
14Proposed Microarchitecture (contd.)
- Register Update Unit (RUU)
- Collects decoded instructions and dispatches them
to different execution units - Instruction queue stores instructions from
decoder - Dependency buffer keeps track of dependencies
- Allow out-of-order execution, in-order completion
- Input vectors from CM indicate the number of
execution units available
15Proposed Microarchitecture (contd.)
- Config1 / Config2 / Config3
- Execution units Int-ALU, Int-MDU, LSU, etc.
- A specific configuration provides a set of EUs
with a fixed number of each of them - Number of EUs changed dynamically by loading
different configs - Configuration Manager
- Stores predefined configurations
- Performs config swapping dynamically
16Proposed Microarchitecture (contd.)
- Data Memory
- Harvard architecture separate from IM
- Bus Macros
- The Xilinx bus macros
- 4 bits each, so multiple entities required
17Conclusion
- Design of a general purpose reconfigurable
processor (ARM ISA) using Xilinx modular design
flow - Functional units partitioned into fixed module
and configurable module - Future work
- Real hardware implementation
- A model to analyze power consumption
- Performance investigation
18SysteMorph Dynamic/Online/Adaptive System-Level
Optimization for SoC
- Yoshimatsu et al.
- Institute of Systems Information Technologies,
Kyushu
19SysteMorph Concepts
- A feedback directed dynamic SW / ISA / HW
co-optimization technology - Elemental technologies
- Online profiling
- Adaptive dynamic optimization
- Smart hardware
- VLIW execution units
- Reconfigurable functional units
20Dynamic Optimization
- Dynamic software pipelining for VLIW
21Dynamic Optimization (contd.)
- Reconfigurable device DAP/DNA-HP
22Dynamic Optimization (contd.)