Title: Congestion Estimation and Localization in FPGAs: A Visual Tool for Interconnect Prediction David Yeager Darius Chiu Guy Lemieux The University of British Columbia Department of Electrical and Compute Engineering
1Congestion Estimation and Localization in
FPGAsA Visual Tool for Interconnect
PredictionDavid YeagerDarius ChiuGuy
LemieuxThe University of British
ColumbiaDepartment of Electrical and Compute
Engineering
2Outline
- Motivations for congestion localization
heuristics. - Exploring heuristics
- Post-placement
- Pre-placement
- Results
- Future Work
3FPGAs FIXED Routing Architecture
- Fixed Channel Width.
- Over 80 of resources devoted to interconnect.
- Comprised of repeated tiles.
- Routing resources identical throughout.
- Can potentially have enough logic resources, not
enough routing resources for a design.
4FPGA Routing Architecture Design
- Architecture design involves retargetable CAD
flow. - Cover large amount of customer benchmarks.
- Routing resources accommodate majority of
customer designs that fit in FPGA's logic
resources. - Requires excessive amount of fixed interconnect.
- FPGA Architecture design involves retargetable
CAD flow. - Explore different amounts of routing resources.
- Select routing that performs best across all
circuits. -
- Less fixed routing higher density, performance.
- Less fixed routing more unroutable designs.
- More fixed routing more wastage.
- Can use 100 of logic resource.
- Can never use 100 of routing resources.
- Results in excess programmable interconnect.
-
- Congestion aware CAD improves routability.
- Allows architects to get away with less excess
- programmable interconnect.
5FPGA vs ASIC Congestion Impact
- Two CAD flows.
- All results are equal EXCEPT...
- Only one produces evenly distributed
interconnect. - ASIC world gt No major advantage.
- FPGA world gt Smaller channel width.
- Allows for denser FPGA architecture.
- Reduces interconnect wastage.
- Locating congestion can help with this balancing.
6Balanced Routing
waste
7Balanced Routing Denser FPGA
Channel Width 7
Channel Width 3
8Further Motivations for Congestion Localization
- High quality congestion estimation can be slow.
- May not be realistic to constantly update with
every move. - Localization can give different weights to
different nets, CLBs, LUTs. - Update weights during intervals.
- Example application SA optimization, Un/Dopack.
9Motivations for accurate congestion estimation
Un/DoPack
yes
channel width constraint met?
start with netlist
cluster
place
route
no
success
yes
yes
available area left?
channel width constraint met?
depopulated clustering
incremental place
incremental route
congestion calculator
no
no
failure
10 Motivations Un/DoPack
start with netlist
cluster
place
yes
channel width constraint met?
congestion calculator
route
no
success
yes
available area left?
depopulated clustering
incremental place
congestion calculator
no
yes
channel width constraint met?
route
failure
no
11Motivations Un/DoPack
start with netlist
cluster
yes
channel width constraint met?
place
congestion calculator
route
no
success
yes
yes
available area left?
congestion calculator
depopulated clustering
no
yes
channel width constraint met?
failure
place
route
no
12Motivations for accurate congestion localization
Un/DoPack
- Identify regions to add white space
13Congestion Localization Measurement
- Requirements Applicable before and after
placement, can integrate into Un/Dopack, can be
easily displayed visually. - Solution Assign a congestion value to each CLB.
- Allows for localization before and after
placement. - Assigning to specific routing resources not
practical before placement. - Quality Measurement Perform full place and
route. Real congestion Max tracks on each side
of CLB. Compare to estimate.
14Quality Measurement Fidelity VS Accuracy
- Previous work reports accuracy of estimate to
actual peak channel widths. - Does not report localization quality, or
fidelity. - Congestion estimation requires both accuracy and
fidelity. - Accuracy well studied. Therefore fidelity is the
focus of this work. - Fidelity can always be scaled to an accuracy
heuristic. - Good localization required to balance congestion.
- Fidelity FPGA centric measurement.
Higher Fidelity
Higher Accuracy
Actual congestion
Poor Localization
Good Localization
from router
15Measuring Fidelity
- Linearly scale actual and real congestion maps so
that min and max congestion of both maps are
equal. - Subtract difference between each CLB's congestion
estimate and actual CLB's congestion value after
place and route. - Error Avg of absolute value of the
differences / peak CLB congestion. - Average absolute normalized error.
- M rows, M columns. E Estimate, R Real
16Exploring heuristics Local Rent Exponent
- Plot average cuts per partition size
- line of best fit log T p?log(G) log(t)
- T aGP
- p Rent exponent. We will use this as our
congestion value.
log ( of cuts)
Window Size 5
log ( of CLBs)
17Demo
18(No Transcript)
19(No Transcript)
20(No Transcript)
21(No Transcript)
22(No Transcript)
23(No Transcript)
24Exploring heuristics Local Rent Exponent
- Benefit
- Characterizes wire length distribution.
- Downsides
- Requires a lot of data points.
- Better for characterizing entire circuits.
- Smaller window subject to anomylies.
- Larger window loses locality of estimation.
- Rate of change of cuts, not absolute value.
25Exploring heuristics Net cuts per region
- Rent exponent captures rate of change of cuts gt
wire length distribution. - Absolute number of cuts may be better for
locality. - Example region size of 3x3.
Count number of nets crossing this boundary.
26(No Transcript)
27(No Transcript)
28(No Transcript)
29(No Transcript)
30Post Processing Heuristic 1 Cartesian Blending
Blend Step 0
A0
B0
C0
D0
F0
G0
E0
H0
K0
J0
I0
L0
F1 (1-a)F0 a(E0 B0 G0 J0)/4G1
(1-a)G0 a(F0 C0 H0 K0)/4
31Post Processing Heuristic 1 Cartesian Blending
Blend Step 1
A1
B1
C1
D1
F1
G1
E1
H1
K1
J1
I1
L1
F2 (1-a)F1 a(E1 B1 G1 J1)/4G2
(1-a)G1 a(F1 C1 H1 K1)/4
32(No Transcript)
33(No Transcript)
34(No Transcript)
35(No Transcript)
36(No Transcript)
37(No Transcript)
38(No Transcript)
39Exploring heuristics Bounding Box Overlap
- Assign CLB value equal to number of bounding
boxes it resides in. - Zhuo et al. use this during every SA swap in
VPR's placer, yielding avg of 7.1 channel width
reduction.
40(No Transcript)
41Exploring heuristics Wire Length Per Area
- Expected wire-length of net ½ perimeter
bounding box
42Exploring heuristics Bounding Box
- Probability of net routed at any given point in
bounding box expected length / bounding box
area.
43Exploring heuristics Wire Length Per Area
- ½ perimeter bounding box not realistic for high
fan-out nets.
extra pin factor min(BBWidth,
BBHeight)max(0,num_pins 3) expected wire
length 1/2BB (extra pin factor)a probabilit
y of wire expected wire length / area
44(No Transcript)
45(No Transcript)
46(No Transcript)
47(No Transcript)
48(No Transcript)
49(No Transcript)
50Exploring heuristics Bounding Box
- Blending helps spread probability distribution.
- Probability outside bounding box gt 0.
p(wire) gt 0
p(wire) 0
51Post Processing Heuristic 2 Saturated Congestion
- Ideal routing solution would have no channel
width constraint. - Congestion maps of an architecture without a
channel width constraint would have sharper
peaks. - Channel width constraint places a ceiling on wire
density. - Forces routing in vicinity of ideal path.
- This ceiling and spreading of wire density can be
emulated by saturating the congestion.
52(No Transcript)
53(No Transcript)
54(No Transcript)
55(No Transcript)
56(No Transcript)
57(No Transcript)
58Exploring Heuristics Single Pass Route
- Pathfinder routes, calculates overuse, then
reroutes. -
- First routing attempt as congestion estimate.
- Each CLB assigned congestion value based on max
of tracks used on each side of CLB.
congestion 4
59(No Transcript)
60(No Transcript)
61(No Transcript)
62(No Transcript)
63(No Transcript)
64(No Transcript)
65Congestion Estimation Before Placement?
- All previous heuristics require spatial
information. - No spatial information available before
placement. - How can we accurately predict congestion
localization without a placement?
66(No Transcript)
67Exploring Heuristics Blending Pin Count
- Cartesian blend
- (needs placement info)
- Logical/Net blend
- (Does not require placement info)
68(No Transcript)
69(No Transcript)
70(No Transcript)
71(No Transcript)
72(No Transcript)
73(No Transcript)
74(No Transcript)
75Error Produced By Each Heuristic for MCNC 20
a.a.n.e.
76Error Before and After Saturation and Blending
a.a.n.e.
77Speed VS Fidelity
a.a.n.e.
Time (s)
78Conclusion
- Can quickly and accurately locate regions of high
congestion. - After placement
- Local Rent exponent
- Net cuts per region
- Bounding box overlap gt improved gt wire length
per area - Single pass route
- Before placement
- Blending pin count gt localize congestion before
placement - Post processing improve all heuristics.
- Compare fidelity instead of accuracy.
- Necessary for balancing FPGA interconnect.
- Visual, tunable tool helpful for discovering /
improving heuristics. - Journey as important as destination.
79Future Work
- Integrating into Un/DoPack.
- Congestion aware placement.
- Congestion aware clustering.
- Congestion estimation before clustering.