Title: Efficient and Accurate Gate Sizing with Piecewise Convex Delay Models
1Efficient and Accurate Gate Sizing with Piecewise
Convex Delay Models
- Hiran Tennakoon
- Carl Sechen
University of Washington Department of
Electrical Engineering
2Overview
- Introduction to gate sizing
- Delay modeling
- Modeling gate delays
- Delay propagation
- Optimization issues with piecewise models
- Sizing Example
- Algorithmic Issues
- Results
- Delay Model
- Comparisons with a commercial tool
- Conclusions
3Introduction to gate sizing
- Considerable impact on delay, power, and area
- Generation of delay vs. area or power tradeoff
curves - Exploit readily available fluid standard cell
libraries - Scope
- Generation of delay vs. area trade off curve
- Area represented as the sum of transistor sizes
- Within the Static Timing Analysis frame work
- Continuous sizing (fluid library)
4Delay Modeling
- Elmore model
- Delay expressed as the sum of first order time
constants - Convex via variable transformation
- Highly amenable to mathematical programming
techniques - Low accuracy
- Logical Effort
- Reformulation of the Elmore model
- Fitted models
- Simulation data fitted to predetermined function
forms - e.g. K. Kasamsetty, M. Ketkar, and S. S.
Sapatnekar, - New Class of Convex Functions for Delay
Modeling, IEEE Transactions on CAD, July 2000
5Convex Delay Models
- Higher accuracy
- Captures input slew-rate effects
- Accounts for min and max beta ratio limits
- Globally optimum results
- Smaller range
- Model fit may be good for certain ranges in the
design space - e.g. input slew-rates 20ps 300ps output
loads up to 600fF sizes 0.25?m - 7?m - Increasing the range
- Piecewise model generation
6Piecewise Convex Delay Model
- Rise and fall delays and output slew-rates
- Includes all input to output combinations
- Functions of input rise and fall slew-rates,
output loading, nMOS and pMOS device sizes - Parameterized gates, one variable each for the
nMOS and pMOS devices for a gate - Min and max Beta ratio limits
- Increase accuracy by subdividing the data into
smaller regions - Four variables to account for input slew-rate,
nMOS size, pMOS size, output load - Each region or piece is fitted to a convex
function
7Dividing the Data Set
- Data is organized in terms of input slew-rate and
outputload ratio - Load ratio analogous to electrical effort
- Electrical effort ratio of the output
capacitance to input gate capacitance - Load ratio ratio of the driven gate size to the
driving gate size.Size is the sum of the
transistor widths. - Change in characterization paradigm
- Capacitive load vs. active load
8New Characterization Paradigm
- Accounts for nonlinear effects on the driving
gate due to the Miller effect kick-back from
the driven gate.
9Data Set Organization
- Input slew-rate range 20ps 1.2ns
- Load ratio range 1 100
- Prune out bad region (may or may not)
- Non-monotonic delay behaviour negative delay
10Data Set Granularity
- For the given input slew-rate range and load
ratio range,each region contains all possible
sizes and allowed beta ratios - Uniform 80ps step in input slew-rate
11Delay and Slew-rate functions
- Convex under variable transformation
- With ai ? 0 , ei ? 0 and bi , ci , and di any
real number K. Kasamsetty, M. Ketkar, and S. S.
Sapatnekar, - New Class of Convex Functions for Delay
Modeling, IEEE Transactions on CAD, July 2000
12Delay Propagation in Static Timing Analysis
- Latest arrival propagation
- Signal causing the worst output delay propagated
- With that signals slew-rate
- Optimistic delay estimation
- Max slew-rate propagation
- Signal causing the worst output delay propagated
- With the worst output slew-rate
- Not necessarily corresponding to the same input
signalused to propagate the delay - Pessimistic delay estimation
13Signal Bounding in STA
- Create a composite output wave form to account
for signals with different slew-rates - Jim-Fuw lee, D.L. Ostapko, J. Soreff, C.K. Wong,
On the signal bounding problem in timing
analysis, Proceedings International Conference
onCAD Nov 2001
14Delay Propagation Scheme
- Propagation based on both arrival times and
slew-rate - Find the signal whose arrival time and slew-rate
maximizes
- ksr 0 latest arrival propagation
- ksr 1 approaches the max slew-rate propagation
- ksr 0.5 half-envelope method
- Jim-Fuw lee, D.L. Ostapko, J. Soreff, C.K. Wong,
On the signal bounding problem in timing
analysis, Proceedings International Conference
on CADNov 2001
15Optimization with Piecewise models
- Gradients undefined at boundaries of regions
- Successive iterates may get trapped by the
boundary
16Overlapping Regions
- Adjacent regions overlap by half
- Diagonally situated regions overlap by quarter
17Sizing Example minimize worst-case delay
18New Delay Propagation Scheme
19Delay Propagation Scheme
20Problem Simplification
- Non smooth problem
- Assume that the correct delay and slew-rate are
propagated
21Problem Simplification (contd)
- Khun-Tucker optimality conditions
- The primary output arrival times sum of the
Lagrangemultipliers assigned to the primary
outputs must sum to one - Sum of the multipliers at the input of a gate
must equal tothe sum of the multipliers at the
output - C. P. Chen, C. C. N. Chu, and D.F. Wong, Fast
and Exact Simultaneous Gate and Wire Sizingby
Lagrangian Relaxation, Proceedings
International Conference on CAD, Nov 1998.
22Lagrangian of the Problem
- Introduce one Lagrange multiplier per constraint
- Function of gate sizes x and Lagrange multipliers
?
23Minimizing Area given a Delay Target
- Sum of the multipliers at the input of a gate
must equal tothe sum of the multipliers at the
output
24Algorithmic Issues
- The gate sizing problem is solved with a primal
dual algorithm - For a fixed set of multipliers satisfying the KT
conditionsfind the minimum with respect to the
sizes xi - Update the multipliers using a sub-gradient
technique - Repeat until convergence
- A known problem with the sub-gradient scheme is
how to choose a good step size control mechanism - Theoretical optimum update per iteration k
-
25Multiplier Update
- Practical multiplier update
- A required arrival time at primary output
- ai arrival time at any node
- k iteration
- H. Tennakoon, and C. Sechen, Gate sizing using
Lagrangian relaxation combined with a fast
gradient-based pre-processing step, Proc. Intl.
Conf. on Computer-Aided Design,pp. 395-402, Nov
2002.
26Scaling Issues
- Example of a poorly scaled problem
- The function is very sensitive to changes in x1
- Delay constrained area minimization can
potentially havea scaling problem - Dynamic scaling between the objective and the
constraint functions
27Duality
- The primal problem
- Delay constrained area minimization
- Minimization of the worst-case delay
- Lagrangian dual
- Maximizing L(x,?) with respect to ?
- Delay constrained area minimization
- Minimization of the worst-case delay
28Optimality
- Relationship between the primal and the dual
- Global optimum point if
- In practice for all solutions
- Primal dual tolerance less than 1
- Active delay constraints satisfied within 10ps
tolerance
29Delay Model Accuracy
- Library composition
- 11 inverting gates
- 0.18?m TSMC technology
- Min size 0.5?m max size 12?m
?max
?min
30Comparison with a Leading Commercial Tool
- 31 benchmarks from ISCAS85 and ITC99
- Generated 11 points on the area vs. delay curve
- For each solution the transistor sizes are
rounded to thenearest 1/10th of a micron - Hspice simulation is run on the rise and fall
critical pathsfor each solution point to compare
the accuracy of thedelay estimation - The leading commercial transistor sizing tool
(CTST)was given same constraints
31- 4757 cells, execution time 502.42s, speedup 7.6X
- Forge finds a 1.14X faster design, and has 32.74
less transistor area.
32- 5489 cells, execution time 1712s, speedup 6.5X
- Forge finds a 1.22X faster design, and has 63.45
less transistor area.
33- 21,920 cells, execution time 1hrs
- CTST failed to complete within 3 days
34- 31,635 cells, execution time 1.73hrs
- CTST failed to complete within 3 days
35- 44,615 cells, execution time 2hrs
- CTST failed to complete within 3 days
36Summary of Results
- Average area reduction over CTST 29
- Average improvement in runtime 6.4
- Average absolute error in delay estimation
compared to Hspice simulation on rise and fall
critical paths 4.23 - Three of the largest designs from ITC99 failed
to complete with CTST after running for 3 days
37Conclusions
- Forge
- Combines a piecewise convex delay model with a
new delay propagation scheme - Fast generation of area vs. delay tradeoff curves
- Critical path delay estimation is on average
4.23 within Hspice - Compared to a leading commercial transistor
sizing tool - Forge produces solutions that are on average
consume 29 less area - Forge is on average 6.4 faster