Title: Enforcing LongPath Timing Closure for FPGA Routing with Path Searches on Clamped Lexicographic Spira
1Enforcing Long-Path Timing Closure for FPGA
Routingwith Path Searches on Clamped
Lexicographic Spirals
- Keith So
- University of New South Wales,
- Sydney, Australia
- Feb 25 _at_ FPGA08
2Outline
- Problem Statement
- Related Work
- SpiralRoute Overview
- Budget Generation
- Clamped Lexicographic Search
- Some Performance Optimizations
- Experiments
- Conclusions and Future Work
3Problem Statement Assumptions
- Long-Path Timing-Driven Detailed Routing
- Given Placed circuit mapped into RR Graph
Timing Requirement D - Find Mutually RR-vertex disjoint routing trees
s.t. Max. Long-Path Comb. Delay lt D - Assumptions
- D is achievable under given placement
- Buffered switching (delays summable)
4Related Work
- F92 Iterative slack allocation
- AR95 Criticality bin Steiner/Arbor.
- ME95 Negotiated Congestion
- BR97 VPR
- LW03 Lagrangian Rel. Weighting
- ANC04 Auto. Constraint Gen.
- FBC04 RCV
5SpiralRoute Overview
- Negotiated Congestion Routing over A
- Paths are lexicographic-costed S07,ISPD07
- Major Deltas
- Optimal delay upper bound generation for FPGA
routing domain - Minimum-congestion bounded-delay searching (vs
tradeoff using weights) - Provable timing closure at completion
6Connection Budget Generation Optimization
Component
- Weighted Budget Distribution Problem Ghiasi
et.al, ICCAD04 - Given DAG G(V,A), min. delays dij, weights wij,
long-path constraint T - Find delay budgets bij such that
- (dijbij) summed along all paths satisfies T
- Sum of (wij.bij) over all edges is maximised
- Transforms into min-cost flow problem
- budgets recovered from dual of flow solution.
7Connection Budget Generation Mapping to FPGA
Routing
- Represent LEs and pads as edges (split clocked
LEs) - Form super-DAG
- dij min connection delay (from
congestion-oblivious routing) - Set T D
- Set wij 1 for real edges, 0 for virtuals
- Solved (dijbij) is the maximum delay for each
edge in our routing
8Comparison with It. Minimax PERT(clma runtime
20mins)
9Search Design n-Lex. Search
- 1-Line A search f(v)g(v)h(v), expand v with
minimum f(v) until t - 2-component lexicographic search used for
routability router (Conceptually a8 b) - Need n-components and custom comparison functions
(proofs needed to avoid 8k values!) - Theorem A of n-lexicographic search is
admissible if all components are totally-ordered
monoids with order-preserving addition - Monoids helpful to avoid clutter from max()
10Search Design Clamping Component
- 3-component vector
- Delay, with pivot (x lt y iff x lt T y gt T)
- Congestion, regular lt
- Delay, regular lt
- Ex f(w2)0,2,2 f(x1)1,0,4 f(w3)0,1,3
- Assumption h(v) is at least close to h(v) for
clamping component
11Search Design Timing Closure
- Delay pivot element splits congestion identical
paths by budget - Will always choose a budget-compliant path (sum
of finite congestion costs are finite) - Over all connections gt successful routing always
yields timing closure!
12Performance Low-Hanging Optimizations
- Original implementation is around 2-2.5x slower
than current runtime - Introduced some low-hanging speed quality
optimizations - Index structure for lexicographic costs
- Greedy tree mgmt. to ameliorate pin-ordering
- A high-hanging optimization in future work is
congestion schedule handling (but many promising
leads from global routers in ICCAD07)
13Trie-of-Stacks Index Structure
- Replaces f(v) index structure
- Exploits FPGA routing symmetry
- Index operations independent of size
- Reduces runtimes by 15
14Tree Topology Maintainence
15Experiments - Setup
- Run against VPR4.30 on architecture similar to
single-segment challenge arch. - (Researcher timing constraints)
- routability comparison with unclamped lex-search
- Route at the placement allowed Fmax
- VPR pres_fac1.5/1.1
16Routed Solution Timing Quality
17Runtime Comparison
18Effects of Budget Quality
19Future Work
- Runtime improvements
- Schedule improvement
- Performance tuning
- Multi-CLB segments (see backup slide)
- Multi-objective routing
- Other domains (e.g. standard cell global)
20Conclusions
- Extended lexicographic search to timing-driven
routing - New budgeting component
- Clamped search design
- Supporting techniques for runtime
- Timing closure is guaranteed on routing success
- Solution quality is good but need more runtime
improvement to be viable
21Acknowledgements
- J. Rose, V. Betz, A. Marquardt (Toronto)
VPR4.30 source benchmarks - Australian Centre for Advanced Computing and
Communications (ac3) High Performance Computing
Support - Advisor Dr. Aleks Ignjatovic
22Question Time
23Issues with h(v) / h(v)
- Node locking occurs when g(v)h(v) lt D but
really g(v)h(v) gt D - Expansion downstream will be truncated
- But a subpath with less delay but more congestion
cannot expand into it - But if reexpand on shorter delay then backtrace
will ignore congestion not locally decidable! - Quick fix precompute h(v) (Only needed for sink
pins t) Only bounding components need the
accuracy - Fancy on-the-fly handling?