Title: Improved Algorithms for LinkBased Nontree Clock Network for Skew Variability Reduction
1Improved Algorithms for Link-Based Non-tree Clock
Network for Skew Variability Reduction
- Anand Rajaram David Z. Pan Jiang Hu
- Dept. of ECE, UT-Austin
- Texas Instruments, Dallas
- Dept. of EE, TAMU
2Outline
- Introduction
- Review of link-based non-tree clock network
- Improved algorithms (over Rajaram et al,
DAC04) - Rule based algorithm (d Rule)
- Graph theoretical approach (MST-based)
- Experimental results
- Conclusions
3Clock Distribution Network
Register
Register
- Signal transfer coordinated by clock signal
- All registers are supplied with clock signal by
clock distribution network - Skew d1 d2
- Zero skew d1 d2
- Useful skew, d1 d2 d12
Dmax
1
2
Catch signals
d1
d2
Clock Network
4Clocks Important Considerations Objectives
- One of the biggest most frequently switching
nets - Very sensitive to unwanted skew introduced by PVT
- Manufacturing process variations (P)
- Power supply voltage noise (V)
- Temperature variations (T)
- Less clock skew variation a MUST for nanometer
VLSI designs - Minimizing clock routing wire-length can
- Reduce power consumption
5Approaches for Reducing Skew Variability
- Buffer wire sizing Pullela et al., DAC93
Chung et al., ICCAD94 Wang et al., ISPD04 - Variation aware routing Lin et al., ICCAD94 Lu
et al., ISPD03 - Non-tree clock networks
- McCoy et al., ETC94 Vandenberghe et al.,
ICCAD97 Xue et al., ICCAD95 - Link based non-tree clock networks Rajaram et
al., DAC04
6Non-tree 1-D Spine Kurd et.al JSSC01
Spines
Clock sinks or local sub-networks
- 1-D spine
- Applied in Intel Pentium processor design
- Variations between spines still exists
7Non-tree 2-D Mesh
- Top level mesh Su et. al, ICCAD01
- Less wire, less effective
Clock sinks or local sub-networks
- Leaf level mesh Restle et. al, JSSC01
- Very effective, huge wire
- Applied in IBM microprocessors
Clock sinks or local sub-networks
8Linked Non-tree Tree LinksRajaram et al,
DAC04
- Non-tree tree links
- How to select link pairs is the key!
- Link link_capacitors link_resistor
i
u
w
9Skew Between Link Endpoints
10Skew Between any Two Nodes (i, j) with Link (u,
w)
g
u
P
w
P nearest common ancestor for u and w
h
Tx Sub-tree rooted at x
- Skew variation between any node pair (i, j)
- Scenario1 i ? Tg , j ? Th gt always smaller
- Scenario2 i j ?Tg (or Th) gt could be worse
- Scenario3 i ? Tp , j ? Tp gt could be much
worse - Key idea try to avoid Scenario 3 and 2 for link
insertion
11Rule Based AlgorithmsRajaram et al, DAC04
?-rule The nearest common ancestor's depth
from root is lt ?max
12Guidelines for Node Pair Selection for Link
Insertion
- Select nodes which are hierarchically far apart
- Select nodes physically close to each other
- Select nodes with equal nominal delay
- Select nodes closer to leaf nodes
- For zero skew routing, only select leaf nodes
13Rule Based AlgorithmsRajaram et al, DAC04
- Merits
- Physical characteristics of the links considered.
So bad links avoided. - Independent of balanced nature of clock structure
- Efficient run time
- Demerits
- No control over distribution of links.
- Possibility of links getting added in the same
region - Solution
- d-rule No two links should have the same pair of
ancestors at the depth d from the clock source - Retains the merits of the previous rules and
addresses the demerit
Using d 2
A
C
D
B
14d Rule An Example
B
A
C
D
15Graph Theoretical Approach
- The entire clock tree is recursively divided into
two parts and links added between them - This ensures distribution of links throughout the
clock tree
v
- Select_Node_Pairs(Tv)
- l v.left_child
- r v.right_child
- P Select_node_pair_between(Tl, Tr, k)
- if Depth(v) depth_limit, exit
- P P ? Select_Node_Pairs(Tl)
- P P ? Select_Node_Pairs(Tr)
- Return P
l
r
Tr1
Tl2
Tl1
Tr2
Tr1
Tl1
Tl2
Tr2
Edge weight Min-distance between sinks of Tli
and Trj
16Graph theoretical approach Min-matching
Rajaram et al, DAC04
- Bipartite min-matching algorithm to select the
node pairs - Merits
- Distribute links evenly through all regions of
the clock network - Demerits
- Due to the nature of the min-matching algorithm,
only one link per sub-tree is allowed - May result in some very lengthy links and
increased wire lengths - Lengthy links might be difficult to route
- Complexity of min-matching is O(n3). Not scalable!
v
r
l
Lengthy links
17New graph theoretical approach Minimum
Spanning Tree Based
v
- MST algorithm allows more than one link per
sub-tree - More number of short links (cf. bipartite
approach) - Retains the merits of the min-matching based
approach - Evenly distribute the links
- Complexity is O(nlogn)
- Much faster than bipartite matching algorithm
O(n3)
l
r
18MST Based Algorithm
v
- MST_node_pair_select(Tl, Tr, k)
-
- Divide Tl into k sub-trees, Sl Tl1 ,
Tl2 , Tl3 , Tlk. - Divide Tr into k subtrees, Sr Tr1 , Tr2
, Tr3 , Trk. - Find MST of the completely connected
bipartite graph between Sl Sr
r
l
Tr1
Tr2
Tl2
Tl1
Sl
Sr
Tl1
Tr1
Tl2
Tr2
After MST pair selection, iteratively delete
edges violating the four rules (a, ß, ?, and d)
19Experimental Setup
- Benchmarks r1 r5 from bounded skew tree work
Cong et. al, ICCAD95 - Interconnect width variation
- Smaller than thickness
- More sensitive to variations
- Load capacitance variation
- Skew Variability measure Standard Deviation
20Experimental Result on Skew Variability
21HSPICE Validation
22Experimental Result on Wire-length
23Wire-length comparison between link insertion
methods
24Conclusions
- Two new efficient algorithms for link insertion
have been proposed - Significant skew variability reduction with very
small wire-length increase - Scale very well with size of clock network for
both runtime and QOR - Proposed methodology is independent of the nature
of variability effects - Friendly to incremental changes
25Sources of the Unwanted Skew Variations
- Process variations (P)
- Gate variations
- Gate length variation
- Tox variation
- Interconnect variations
- Significantly affects delay and skew Liu, et
al., DAC00 - Load capacitance variations
- Supply voltage noise (V)
- Temperature variations (T)
Gate variations
Interconnect width Variations
width
26Skew Between Link Endpoints
27General Flow of Non-tree Clock
- Obtain initial clock tree
- Find node pairs for link insertion
- Add link capacitances to selected nodes
- Tune merging node location to restore original
skew - Insert link resistance to selected node pairs
28Run Time Comparison
Runtime comparison between the different methods
as a function of number of links at ? 1