Dynamic FPGA Routing for Just-in-Time Compilation - PowerPoint PPT Presentation

About This Presentation

Title:

Dynamic FPGA Routing for Just-in-Time Compilation

Description:

JIT compile bytecode to processor's native instructions. Java, Python, etc. SW. SW. Profiling ... If congestion exists, rip-up and re-route only the illegal routes ... – PowerPoint PPT presentation

Number of Views:70

Avg rating:3.0/5.0

Slides: 27

Provided by: romanl5

Learn more at: http://www.cs.ucr.edu

Category:

more less

Transcript and Presenter's Notes

Title: Dynamic FPGA Routing for Just-in-Time Compilation

1
Dynamic FPGA Routing for Just-in-Time Compilation

Roman Lyseckya, Frank Vahida, Sheldon X.-D. Tanb
aDepartment of Computer Science and Engineering
bDepartment of Electrical Engineering
University of California, Riverside
Also with the Center for Embedded Computer
Systems at UC Irvine
This work was supported in part by the National
Science Foundation, the Semiconductor Research
Corporation, and a Department of Education GAANN
fellowship

2
IntroductionJust-in-Time Compilation has Become
Commonplace

Just-in-Time Compilation
Modern Pentium processors
Dynamically translate instructions onto
underlying RISC architecture
Transmeta Crusoe Efficeon
Dynamic code morphing
Translate x86 instructions to underlying VLIW
processor
Interpreted languages
Distribute SW as processor independent
bytecode/source
SW typically executed on a virtual machine
JIT compile bytecode to processors native
instructions
Java, Python, etc.

3
IntroductionJust-in-Time Compilation also
Performs Optimization

Dynamic optimizations are increasingly common
Dynamically recompile binary during execution
Dynamo Bala, et al., 2000 - Dynamic software
optimizations
Identify frequently executed code segments
(hotpaths)
Recompile with higher optimization
BOA Gschwind, et al., 2000 - Dynamic optimizer
for Power PC
Advantages
Transparent optimizations
No designer effort
No tool restrictions
Adapts to actual usage
Speedups of up 20-30 -- 1.3X
JIT compilation operates on software binaries

4
IntroductionBut Todays Binaries are More than
just Software
5
IntroductionJust-in-Time FPGA Compilation?

JIT FPGA compilation
Idea standard binary for FPGA
Similar benefits as standard binary for
microprocessor
Portability, transparency, standard tools
Embedded JIT compilation tools optimized for each
FPGA

6
IntroductionOne Use of JIT FPGA Compilation
CableTV Company
7
IntroductionOne Use of JIT FPGA Compilation
CableTV Company
8
IntroductionOne Use of JIT FPGA Compilation
CableTV Company
9
IntroductionAnother Use - Warp Processors
(Dynamic HW/SW Partitioning)
Profiler
µP
I
D
Warp Config. Logic Architecture
Dynamic Part. Module (DPM)
Lysecky/Vahid, DATE04 Stitt/Lysecky/Vahid
DAC03 Stitt/Vahid, ICCAD02
10
IntroductionAnother Use - Warp Processors
(Dynamic HW/SW Partitioning)
Profiler
ARM
I
D
WCLA
DPM
Lysecky/Vahid, DATE04 Stitt/Lysecky/Vahid,
DAC03 Stitt/Vahid, ICCAD02
11
IntroductionAll that CAD on-chip?

CAD people may first think Just-in-Time FPGA
compilation is absurd
CAD tools are extremely complex
Require long execution times on power desktop
workstations
Require very large memory resources
Usually require GBytes of hard drive space
Costs of complete CAD tools package can exceed 1
million
All that CAD on-chip?

12
Simultaneous FPGA/CAD Design

Careful simultaneous design of configurable logic
fabric and CAD tools
Analyze architectural features as to their
impacts on on-chip Just-in-Time CAD tools
Fast execution time
Very low data memory
Produce reasonable (good) hardware circuits

13
Simultaneous FPGA/CAD Design Configurable Logic
Fabric

Array of configurable logic blocks (CLBs)
surrounded by switch matrices (SMs)
Each CLB is directly connected to a SM
Switch matrix connections
Four short wires connect adjacent SMs
Four long wires connect every other SM together

SM
SM
SM
CLB
CLB
SM
SM
SM
Lysecky/Vahid, DATE04
14
Simultaneous FPGA/CAD Design Combinational Logic
Block Design

Incorporate two 3-input 2-output LUTs
Corresponds to four 3-input LUTs
Allows for good quality circuit while reducing
on-chip CAD tools complexity
Provide routing resources between adjacent CLBs
to support carry chains

Lysecky/Vahid, DATE04
15
Simultaneous FPGA/CAD Design Switch Matrix

Switch Matrix
SM connected using eight channels per side
Four short channels
Four long channels
Routes wires from different side using the same
channel
Each short channel is associated with single long
channel
Wires are routed using a single pair of channels
through configurable logic fabric

Lysecky/Vahid, DATE04
16
FPGA Routing

FPGA Routing
Find a path within FPGA to connect source and
sinks of each net within our hardware circuit
Typically use a form of maze routing Lee, 1961
Routes each net using Dijkstras shortest path
algorithm

17
FPGA Routing

Pathfinder Ebeling, et al., 1995
Introduced negotiated congestion
During each routing iteration, route nets using
shortest path
Allows overuse (congestion) of routing resources
If congestion exists (illegal routing)
Update cost of congested resources based on the
amount of overuse
Rip-up all routes and reroute all nets

2
18
FPGA Routing

VPR Versatile Place and Route Betz, et al.,
1997
Uses modified Pathfinder algorithm
Increase performance over original Pathfinder
algorithm
Routability-driven routing
Goal Use fewest tracks possible
Timing-driven routing
Goal Optimize circuit speed

19
JIT FPGA Routing

Riverside On-Chip Router (ROCR)
Represent routing nets between CLBs as routing
between SMs
Resource Graph
Nodes correspond to SMs
Edges correspond to channels between SMs
Capacity of edge equal to the number of wires
within the channel
Requires much less memory than VPR as resource
graph is much smaller

20
JIT FPGA Routing

Riverside On-Chip Router (ROCR) - Global Routing
Based on VPRs routability-driven router
Utilizes similar cost model consisting of base,
historical congestion, and current congestion
costs
Routes nets between SMs using greedy, depth-first
routing algorithm
Faster than traditional VPRs breadth-first
routing method
Requires addition of adjustment cost to direct
ROCR to re-route illegal nets using different
initial routing path
Ignores illegal routing within SMs
If congestion exists, rip-up and re-route only
the illegal routes
Reduces computation time during successive
routing iterations

21
JIT FPGA Routing

Riverside On-Chip Router (ROCR) - Detailed
Routing
Assign specific channels to each route
Construct routing conflict graph
Routes conflict if assigning same channel results
in an illegal routing within any SM
Use Brelazs greedy vertex coloring algorithm
Brelaz, 1979
If illegal routes exist, rip-up illegal routes
and repeat global routing

22
Experiments Memory Usage
23
Experiments Algorithm Performance
24
Experiments Critical Path Results
But 10 shorter critical path than VPR (RD)
25
Experiments Wire Segments
26
Conclusions