Simultaneous ShortPath and LongPath Timing Optimization for FPGAs - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Simultaneous ShortPath and LongPath Timing Optimization for FPGAs

Description:

All register hold requirements must also be met ... CAD tools set delays to fix short-path problems ... Benefit of using costs to enforce delay budgets ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 44
Provided by: RFU6
Category:

less

Transcript and Presenter's Notes

Title: Simultaneous ShortPath and LongPath Timing Optimization for FPGAs


1
Simultaneous Short-Path andLong-Path Timing
Optimization for FPGAs
  • Ryan Fung, Vaughn Betz, William Chow
  • Altera Toronto Technology Centre

2
Terminology and Motivation
3
Long-Path Timing
  • Most CAD algorithms focus solely on reducing path
    delays to meet operating frequency requirements
  • Example long-path timing constraints include
    clock period, IO TSETUP, and IO TCLOCK-TO-OUTPUT
    requirements

Design Operating Period gt Delay (Clock -gt Src
Reg) Delay (Comb Logic) TSETUP (Dst
Reg) - Delay (Clock -gt Dst Reg)
4
Long-Path Timing
  • Most CAD algorithms focus solely on reducing path
    delays to meet operating frequency requirements

t lt -TSETUP (Dst Reg)
Source Register
Combinational Logic
Destination Register
t 0
Clock IO
5
Short-Path Timing
  • Satisfying only long-path timing constraints does
    not guarantee design functionality
  • All register hold requirements must also be met
  • Short-path constraints express these requirements
  • Examples include THOLD for register-to-register
    transfers, IO THOLD, and IO minimum
    TCLOCK-TO-OUTPUT

Source Register
Combinational Logic
Destination Register
Register Hold Requirements Met iff Delay
(Clock -gt Src Reg) Delay (Comb Logic)
Delay (Clock -gt Dst Reg) gt THOLD (Dst Reg)
Clock IO
6
Short-Path Timing
  • Satisfying only long-path timing constraints does
    not guarantee design functionality
  • All register hold requirements must also be met
  • Short-path constraints express these requirements

t gt THOLD (Dst Reg)
Source Register
Combinational Logic
Destination Register
t 0
Clock IO
7
Short-Path Timing and FPGAs
  • Before this work, designers manually repaired
    many short-path violations
  • More painful as
  • Designs get larger (more clocks than low-skew
    networks)
  • Process variation increases skew of clock
    networks
  • Clocking strategies increase in complexity
  • Fixing short-path violations may introduce
    long-path violations ? design iterations
  • Typical manual technique uses logic cell buffers
    ? wastes logic
  • Major appeal of FPGAs is fast time-to-market and
    low engineering costs
  • Need CAD algorithms that automatically optimize
    designs to meet all requirements, simultaneously

8
Programmable Delay Chains in FPGAs
  • CAD tools set delays to fix short-path problems
  • Short-path violations may persist, may create
    long-path violations
  • Programmable delay chains cost silicon area
  • Only used to slow down signals entering, leaving
    FPGA
  • Delay chains are not sufficient to fix all
    violations
  • Several paths may pass through the same delay
    chain
  • Below IO TSETUP lt 3 ns, IO THOLD lt 0 can not
    both be satisfied

Input IO
Path B Delay 6 ns
Clock
Path A Delay 2 ns
Clock Delay 3 ns
9
Solution
10
Overview of Overall Strategy
  • Attack the simultaneous short-path and long-path
    timing optimization problem in two phases
  • New slack allocation algorithm
  • Short/long-path constraints ? connection delay
    budgets
  • New FPGA routing algorithm
  • Guided by delay budgets
  • Overall Algorithm Name RCV (Routing Cost
    Valleys).
  • RCV inspired from the shape of the cost vs. delay
    curve of the new routing algorithm

11
Comments on Overall Strategy
  • Effective optimization of short-path timing
    constraints can be achieved by extending only the
    routing algorithm
  • Routing delay is a relatively large fraction of
    total delay
  • Router can model delays relatively accurately
  • Router has many options to insert delay
  • Using spirals of routing resources
  • Selecting delay chain settings (if modeled in
    routing graph)
  • Selecting different LUT inputs (if modeled in
    routing graph)

0
0
f not(A)B
0
0
0
1
1
1
0
0
1
0
1
1
0
0
0
0
1
1
A
B
B
A
12
Prior Work Long-Path Slack Allocation
  • Explicitly monitoring all path-level timing
    constraints during optimization is highly
    inefficient (memory/run-time)
  • Path count can be exponential in circuit size
  • Long-path slack allocation produces a maximum
    delay budget for each connection in a design
  • Minimax-PERT algorithm Youssef et al, ICCAD,
    1990
  • Long-path constraints met if design is
    implemented so that for every connection, c,
    delay(c) DBUDGET_MAX(c)

Desired Period 10 ns
5 ns
5 ns
Slack 8 ns
1 ns
1 ns
2 ns
8 ns
13
Basic Short-/Long-Path Slack Allocation
  • Slack allocation can also be used to produce
    minimum delay budgets from short-path slacks
  • Need to determine legal min and max delay budgets
  • DBUDGET_MIN(c) gt DBUDGET_MAX(c) cannot be
    satisfied
  • Basic algorithm determines min delay budgets by
    allocating short-path slack to max delay budgets

Desired THOLD 4 ns
2 ns
2 ns
Slack 6 ns
5 ns
5 ns
3 ns
1 ns
14
Basic Algorithm
Inputs
Outputs
Initial Delays (Lower-Bound Delays)
Iterative Minimax-PERT Positive Slack Allocation
Maximum Delay Budgets
Long-Path Timing Constraints
Iterative Minimax-PERT Positive Slack Allocation
Minimum Delay Budgets
Short-Path Timing Constraints
15
Comments on Basic Algorithm
  • Algorithm sequence guarantees DBUDGET_MIN(c) lt
    DBUDGET_MAX(c)
  • Considers lower/upper bounds on delay for each
    connection
  • Restrictions imposed on connections by the FPGA
    routing fabric
  • Problem Algorithm sequence ? final DBUDGET_MIN
    may be too small to satisfy short-path timing
  • Solution Need to consider short-path timing
    before finalizing maximum delay budgets

16
Illustration of Problem
2.1 ns Delay
0.7 ns Delay
Connection c
Input IO
Constant (Negligible) Delay Resources
Clock
1.8 ns Clock Delay
IO TSETUP Requirement 3 ns
IO THOLD Requirement 0 ns
  • c delay needs to increase by 1.1 ns for THOLD
  • c shares 2 ns of long-path slack with 7 other
    connections
  • Need to ensure 55 of long-path slack is
    allocated to c to leave room for
    DBUDGET_MIN(c) to satisfy short-path timing

17
Solution
  • Use a pre-processing step to find an initial
    (lower-bound) set of delays that satisfy
    short-path timing constraints
  • Pre-processing step iterates between short-path
    and long-path negative slack allocation
  • Short-path negative slack allocation adds delay
    to connections to fix short-path timing
    violations
  • Long-path negative slack allocation removes delay
    from connections to avoid long-path timing
    violations

18
Enhancement to Basic Algorithm
  • / Adjust initial delays to provide short-path
    critical connections more delay. /
  • DTEMPC DINITIALC
  • iterate until stopping condition met
  • perform short-path timing analysis using
    DTEMPC
  • allocate negative short-path slack using
    Minimax-PERT and update DTEMPC
  • perform long-path timing analysis using
    DTEMPC
  • allocate negative long-path slack using
    Minimax-PERT and update DTEMPC
  • DINITIALC DTEMPC
  • / Continue with basic algorithm. /

19
Routing Algorithm Overview
  • Builds upon negotiated congestion routing
  • Ebeling et al, IEEE Trans. On VLSI, Dec. 1995
  • Negotiated congestion framework is excellent for
    FPGA routing where wiring is quite limited
  • This work modifies the delay cost and look-ahead
    function to achieve desirable routing delays
  • Consider min/max connection delay budgets

20
Routing Algorithm Background
  • Timing-driven negotiated congestion routers begin
    by picking a set of resources to implement each
    connection
  • Routing resources are initially selected to
    implement each connection for minimum delay
  • Electrical shorts (congestion) are ignored
  • Congestion is resolved (over several re-routing
    iterations) encourage connections to take
    detours
  • Router inner-loop does a directed routing-graph
    search using a cost to score the use of
    resources
  • Congestion component gradually reduce
    congestion
  • Delay component keep critical connection delays
    to a minimum

21
Delay Portion of Old Routing Cost
  • Linear long-path cost
  • Critical connections have steeper slope

Delay Portion of Routing Cost
Total Estimated Routing Path Delay
22
Incorporating Delay Budgets in Routing
  • Minimum cost point is the target delay
  • Between minimum and maximum delay budgets

Delay Portion of Routing Cost
Max Delay Budget
Target Delay
Min Delay Budget
Total Estimated Routing Path Delay
23
Incorporating Delay Budgets in Routing
  • Short-path linear region has slope,
    -CRITSHORT-PATH
  • CRITSHORT-PATH (1.0 DLOWER BOUND / DTARGET)0.5

Short-Path Linear Region
Delay Portion of Routing Cost
Long-Path Linear Region
DTARGET
Total Estimated Routing Path Delay
DLOWER BOUND
24
Incorporating Delay Budgets in Routing
  • Quadratic costs ensure delay budgets are
    respected unless significant congestion is
    encountered
  • Congestion will be resolved, sacrificing timing
    as little as possible

Short-Path Quadratic Region
Delay Portion of Routing Cost
Long-Path Quadratic Region
Max Delay Budget
Target Delay
Min Delay Budget
Total Estimated Routing Path Delay
25
Incorporating Delay Budgets in Routing
  • Shape inspired overall algorithm name
  • Routing Cost Valleys (RCV)

Short-Path Quadratic Region
Delay Portion of Routing Cost
Long-Path Quadratic Region
Max Delay Budget
Target Delay
Min Delay Budget
Total Estimated Routing Path Delay
26
Results
27
Example Route with Inserted Delay
  • Enforcing THOLD 0 ns

28
Example Route with Inserted Delay
  • Enforcing THOLD 1 ns

29
Example Route with Inserted Delay
  • Enforcing THOLD 2 ns

30
Example Route with Inserted Delay
  • Enforcing THOLD 4 ns

31
Example Route with Inserted Delay
  • Enforcing THOLD 8 ns

32
Example Route with Inserted Delay
  • Enforcing THOLD 16 ns

33
Example Route with Inserted Delay
  • Enforcing THOLD 32 ns

34
Example Route with Inserted Delay
  • Enforcing THOLD 64 ns

35
Experimental Platform
  • 100 representative FPGA designs from Altera
    customers
  • User timing/placement/routing constraints removed
  • Targeting Stratix devices
  • 3,264 - 67,311 logic elements (median of 12,072
    elements)
  • Version 4.0 of Alteras Quartus II Software
  • No routing failures observed during experiments
  • Benefit of using costs to enforce delay budgets
  • Congestion penalization pushes router to
    sacrifice timing to achieve a legal solution

36
Long-Path Results
  • Quartus II Software was instructed to optimize
    clock frequency, FMAX

37
Long-Path Results
  • Results
  • 3.9 average FMAX improvement
  • Within 8 of an upper bound on FMAX (tolerating
    electrical shorts)
  • 35.6 router time increase
  • 9.3 placement-and-routing time increase
  • Includes delay budget computation time
  • Quadratic region (beyond delay budgets) of
    routing cost beneficial
  • Heavily penalize adding delay that will likely
    cause a timing violation
  • Linear region of routing cost beneficial
  • Delay budget violations inevitable when
    optimizing for maximum speed
  • Linear costs help cover delay budget violations
    by achieving margin on other connections

38
Internal THOLD Results
  • Quartus II Software was instructed to
  • Maximize clock frequency, FMAX
  • Fix THOLD violations related to internal register
    transfers
  • 18 of 100 circuits had internal THOLD problems
  • Results
  • Designs with THOLD violations reduced from 18 to
    5
  • 3.4 average FMAX improvement
  • 20.9 placement-and-routing time increase

39
Internal THOLD Results (Continued)
40
PCI Core Results
  • 66-MHz PCI cores represent a challenging combined
    short/long-path timing optimization problem
  • IO timing requirements (IO TSETUP and THOLD) are
    difficult to satisfy in FPGAs
  • 72 master-target 66-MHz PCI cores compiled into a
    range of Altera Stratix devices and packages
  • Two fastest speed grades

41
PCI Core Results (Before This Work)
42
PCI Core Results (After This Work)
43
Conclusions
  • RCV simultaneously optimizes short/long-path
    timing
  • Relatively small (14-21) increase in
    placement-and-routing time
  • Greatly reduce short-path timing violations
  • RCV improves long-path timing optimization
  • 4 higher circuit speed for 9 increase in
    place-and-route time
  • Delay budgets guide the router
  • Alert it when a non-critical connection may
    become critical
  • RCV reduces the need for delay chains in FPGAs
  • Save area without sacrificing timing
Write a Comment
User Comments (0)
About PowerShow.com