ECE260B CSE241A Winter 2005 Clocking - PowerPoint PPT Presentation

1 / 54
About This Presentation
Title:

ECE260B CSE241A Winter 2005 Clocking

Description:

Gridded clock distribution common on earlier DEC Alpha microprocessors. Advantages: ... DEC-Alpha 21064 clock spines. DEC-Alpha 21064 RC delays ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 55
Provided by: andrew139
Category:

less

Transcript and Presenter's Notes

Title: ECE260B CSE241A Winter 2005 Clocking


1
ECE260B CSE241AWinter 2005Clocking
Website http//vlsicad.ucsd.edu/courses/ece260b-
w05
Slides courtesy of Prof. Andrew B. Kahng
2
Outline
  • Problem Statement
  • Clock Distribution Structures
  • Robustness / Signal Integrity Control
  • Clock Design
  • Skew Scheduling
  • Topology Construction
  • Embedding

3
Why Clocks?
  • Clocks provide the means to synchronize
  • By allowing events to happen at known timing
    boundaries, we can sequence these events
  • Greatly simplifies building of state machines
  • No need to worry about variable delay through
    combinational logic (CL)
  • All signals delayed until clock edge (clock
    imposes the worst case delay)

Dataflow
FSM
Comb Logic
register
Comb Logic
register
register
Courtesy K. Yang, UCLA
4
Clock Distribution Network
  • General goal of clock distribution
  • Deliver clock to all memory elements with
    acceptable skew
  • Deliver clock edges with acceptable sharpness
  • Clocking network design is one of the greatest
    challenges in the design of a large chip
  • Consume up to 1/3 of chip power
  • Accurate signal delay
  • Signal integrity
  • Subject to uncertainty / variation of different
    processes / operating conditions

5
Clock Design Components
  • Oscillator
  • Dividers
  • Buffers
  • Strong drivers
  • Reduce delay
  • Signal integrity / slew rate
  • Interconnects
  • Balanced trees, meshes, etc.
  • Shielding (e.g., for crosstalk reduction)
  • Non-tree links / feedback loops

6
Clock Distribution Objective
  • Minimum / bounded skew
  • performance / hold time requirements
  • Guaranteed slew rate / signal integrity
  • Small insertion delay
  • Robustness under process / operating condition
    variation
  • Minimum cell / routing area
  • Minimum power consumption

7
Clock Distribution Robustness Subject to
  • Radically different loading (flip-flop density)
  • Across the die
  • ECO (Engineering Change Order)
  • Interconnect coupling
  • Signal integrity
  • Delay variation
  • Process variation
  • From lot-to-lot
  • Across the die
  • Buffers
  • Metal width
  • Supply voltage variation across the die
  • Both static IR drop
  • Dynamic voltage drop
  • Temperature

8
Issues in Clock Distribution Network Design
  • Skew
  • Process, voltage, and temperature
  • Data dependence
  • Noise coupling
  • Load balancing
  • Power, CV2f (consume up to 1/3 of total chip
    power)
  • Clock gating
  • Flexibility/Tunability
  • Compactness fit into existing layout/design
  • Facilitate ECO

9
Skew Clock Delay Varies With Position
10
Clock Skew Causes
  • Designed (unavoidable) variations mismatch in
    buffer load sizes, interconnect lengths
  • Process variation process spread across die
    yielding different Leff, Tox, etc. values
  • Temperature gradients changes MOSFET
    performance across die
  • IR voltage drop in power supply changes MOSFET
    performance across die
  • Note Delay from clock generator to fan-out
    points (clock latency) is not important by itself
  • BUT increased latency leads to larger skew for
    same amount of relative variation

Sylvester / Shepard, 2001
11
Outline
  • Problem Statement
  • Clock Distribution Structures
  • Robustness / Signal Integrity Control
  • Clock Design
  • Skew Scheduling
  • Topology Construction
  • Embedding

12
Clock Distribution Structures
  • Grids
  • Reliable
  • Less data dependency
  • Tunable (late in design)
  • RC-Tree
  • Less capacitance
  • More accuracy
  • Flexible wiring
  • Shown here for final stage drivers driving F/F
    loads

13
Grids
  • Gridded clock distribution common on earlier DEC
    Alpha microprocessors
  • Advantages
  • Skew determined by grid density, not too
    sensitive to load position
  • Clock signals available everywhere
  • Tolerant to process variations
  • Usually yields extremely low skew values
  • Disadvantages
  • Huge amount of wiring and power
  • To minimize such penalties, need to make grid
    pitch coarser ? lose the grid advantage

Pre-drivers
Global grid
Sylvester / Shepard, 2001
14
H-Tree
  • H-tree (Bakoglu)
  • One large central driver, recursive structure to
    match wirelengths
  • Halve wire width at branching points to reduce
    reflections
  • Disadvantages
  • Slew degradation along long RC paths
  • Unrealistically large central driver
  • Clock drivers can create large temperature
    gradients (ex. Alpha 21064 30 C)
  • Non-uniform load distribution
  • Inherently non-scalable (wire R growth)
  • Partial solution intermediate buffers at
    branching points

courtesy of P. Zarkesh-Ha
Sylvester / Shepard, 2001
15
Buffered H-tree
  • Advantages
  • Ideally zero-skew
  • Can be low power (depending on skew requirements)
  • Low area (silicon and wiring)
  • CAD tool friendly (regular)
  • Disadvantages
  • Sensitive to process variations
  • Devices ? Want same size buffers at each level of
    tree
  • Wires ? Want similar segment lengths on each
    layer in each source-sink path !!!
  • Local clocking loads inherently non-uniform

Sylvester / Shepard, 2001
16
Tree Balancing
Con Routing area often more valuable than Silicon
Some techniques a) Introduce dummy loads b)
Snaking of wirelength to match delays
Sylvester / Shepard, 2001
17
Examples From Processor Chips
Grids DEC Alphas
Serpentines Intel x86 Young ISSCC97
  • H-Tree, Asymmetric RC-Tree (IBM)

18
Example Skews From Processor Chips
DEC-Alpha 21064 clock spines
DEC-Alpha 21064 RC delays
DEC-Alpha 21164 RC local delays
DEC-Alpha 21164 RC delays for Global Distribution
(Spine Grid)
19
ReShape Clocks Example (High-End ASIC)
  • Balanced, shielded H-tree for pre-clock
    distribution
  • Mesh for block level distribution
  • All routes 5-6u M6/5, shielded with 1u grounds
  • 10 buffers per node
  • E.g., ganged BUFx20s
  • Output mesh must hit every sub-block

20
Block Level Mesh (.18u)
21
Problems with Meshes
  • Burn more power at low frequencies
  • Blocks more routing resources (solution
    integrated power distribution with ribs can
    provide shielding for free)
  • Difficult for spare clock domains that will not
    tolerate regioning
  • Post placement (and routing) tuning required
  • No beneficial skew possible
  • Clock gating only easy at root
  • Fighting tools to do analysis
  • Clumped buffers a problem in Static Timing
    Analysis tools
  • Large shorted meshes a problem for STA tools
  • What does Elmore delay calculation look like for
    a non-tree?
  • ? Need full extraction and SPICE-like simulation
    to determine skew

22
Benefits of Meshes
  • Deterministic since shielded all the way down to
    rib distribution
  • No ECO placement required all buffers preplaced
    before block placement
  • Low latency since uses shorted ( ganged,
    parallel) drivers, therefore lower skew
  • ECO placements of FFs later do not require
    rebalancing of tree
  • Idealized clocking environment for concurrent
    dance of RTL design and timing convergence

23
Hybrid Structure
  • Balanced tree on the top
  • Mesh in the middle
  • Minimize skew
  • Steiner minimum tree at the bottom
  • Minimize cost
  • Facilitate ECO

24
Outline
  • Problem Statement
  • Clock Distribution Structures
  • Robustness / Signal Integrity Control
  • Clock Design
  • Skew Scheduling
  • Topology Construction
  • Embedding

25
Process Variation
  • Intra-die and inter-die variations
  • Intra-die variation is increasingly significant
    since 0.13um technology
  • Systematic and random variations
  • Systematic variation is due to equipment,
    process, etc.
  • Global len aberration in lithograthy causes
    systematic variation
  • Pattern-dependent optical proximity, chemical
    mechanical polish (CMP)
  • Random variation is due to inherent variation
  • Spatial correlation across a chip
  • Fast vs. slow corners

26
Process Variation
  • Metal wires
  • Width variation can be estimated by LUT(width,
    spacing)
  • Thickness variation ? CMP ? local density
  • Thickness variation also depends on wire width
    and spacing
  • Could be up to 30-40 in 90nm process
  • Transistors
  • Channel length variation (delay L1.5)
  • Thin gate oxide tox variation ? Vth variation
  • Up to 30 variation in term of driving capability

27
Process Variations SPICE model
  • Process variations are reflected into a
    statistical SPICE model
  • Usually only a few parameters have a statistical
    distribution (e.g. DL, DW, TOX,VTn, VTp) and
    the others are set to a nominal value
  • The nominal SPICE model is obtained by setting
    the statistical parameters to their nominal value

Slide courtesy of A. Nardi, J. Rabaey, K. Keutzer
of UCB
28
Global Variations (Inter-die)
  • Process variations ? Performance variations
  • Critical path delay of a 16-bit adder

All devices have the same set of model
parameters value
Slide courtesy of A. Nardi, J. Rabaey, K. Keutzer
of UCB
29
Local Variations (Intra-die)
  • Each device instance has a slightly different set
    of model parameter values (aka device mismatch)
  • The performance of some analog circuits strongly
    depends on the degree of matching of device
    properties
  • Digital circuits are in general more immune to
    mismatch, but clock distribution network is
    sensitive (clock skew)

Slide courtesy of A. Nardi, J. Rabaey, K. Keutzer
of UCB
30
Statistical Design
  • Need to account for process variations during
    design phase
  • Statistical design
  • Nominal design
  • Yield optimization
  • Design centering

Slide courtesy of A. Nardi, J. Rabaey, K. Keutzer
of UCB
31
Statistical Design
Slide courtesy of A. Nardi, J. Rabaey, K. Keutzer
of UCB
32
Process Variation Tolerance Enhancement
  • Rule of thumb balanced tree
  • Identical buffers at identical heights
  • Drive identical subtree loads
  • Can we do better than this?
  • Process variation tolerant clock design
  • Bounded-skew DME
  • Topology construction
  • With process variation tolerance in objective
  • Useful skew scheduling
  • To the center of permissible ranges

33
Signal Integrity
  • Crosstalk
  • Capacitive, inductive
  • Supply voltage drop
  • IR, L dI/dt, LC resonance
  • Temperature
  • Increased resistance with higher temperature
  • Substrate coupling
  • Parasitic resistance, capacitance in the
    substrate layer

34
Crosstalk
  • Due to the coupling capacitance between
    interconnections, a signal switching on a net
    (aggressor) may affect the voltage waveform on a
    neighboring net (victim)

Noise Propagation
Increased Delay
35
Circuit Model for Crosstalk
36
Crosstalk Simulation
37
Design for Crosstalk
  • It can be both capacitive and inductive
  • Capacitive is dominant at current switching
    speeds
  • To reduce it
  • Use of shielding layer (inter-layer)
  • Use of shielding wire (intra-layer)

38
Clock Gating
  • Reduce power consumption by temporarily shutting
    down part of the circuit
  • Additional cost of enabling circuits

FF
FF
combinational logic
D
Q
CLK1
CLK2
CLK ENABLING
39
Outline
  • Problem Statement
  • Clock Distribution Statement
  • Robustness / Signal Integrity Control
  • Clock Design
  • Skew Scheduling
  • Topology Construction
  • Embedding

40
Skew Local Constraint
  • Timing is correct as long as the clock signals of
    sequentially adjacent FFs arrive within a
    permissible skew range

W. Dai, UC Santa Cruz
41
Useful Skew ? Design Robustness
  • Design will be more robust if clock signal
    arrival time is in the middle of permissible skew
    range, rather than on edge

T 6 ns
0 0 0 at verge of violation
W. Dai, UC Santa Cruz
42
Constraints on Skews
  • FFi receives clock signal delayed by xi ? MIN_DEL
  • 0 lt ? ? 1 ? ? if nominal clock delay is xi,
    then actual clock delay must fall within interval
    ?xi ? x ? ?xi
  • For FF to operate correctly when clock edge
    arrives at time x, the correct input data must be
    present and stable during the time interval (x
    SETUP, x HOLD)
  • For 1 ? i,j ? L (FFs), we compute lower and
    upper bounds MIN(i,j) and MAX(i,j) for the time
    that is required for a signal edge to propagate
    from FFi to FFj
  • Avoid double-clocking (race condition)
  • ?xi MIN(i,j) ? ?xj HOLD
  • Avoid zero-clocking
  • ?xj SETUP MAX(i,j) ? ?xj P P clock
    period

43
Optimal Useful Skews by Linear Programming
  • LP_SPEED (clock period reduction)
  • minimize P s.t.
  • ?xj - ?xj ? HOLD MIN(i,j)
  • ?xi ?xj P ? SETUP MAX(i,j)
  • xi ? MIN_DEL
  • LP_SAFETY (robustness)
  • Maximize M s.t.
  • ?xj - ?xj M ? HOLD MIN(i,j)
  • ?xi ?xj M ? SETUP MAX(i,j) P
  • xi ? MIN_DEL
  • Notes
  • J. P. Fishburn, Clock Skew Optimization, IEEE
    Trans. Computers 39(7) (1990), pp. 945-951.
  • T. G. Szymanski, Computing Optimal Clock
    Schedules, Proc. DAC, June 1992, pp. 399-404.
  • Useful Skew optimization is similar to Retiming
    optimization
  • Peak current reductions are a side benefit

44
Outline
  • Problem Statement
  • Clock Distribution Structures
  • Robustness / Signal Integrity Control
  • Clock Design
  • Skew Scheduling
  • Topology Design
  • Embedding
  • For zero skew (ZST-DME)
  • For bounded skew (BST-DME)

45
Zero-Skew Tree (ZST) Problem
  • Zero Skew Clock Routing Problem (S,G) Given a
    set S of sink locations and a connection topology
    G, construct a ZST T(S) with topology G and
    having minimum cost.
  • Skew maximum value of td(s0,si) td(s0,sj)
    over all sink pairs si, sj in S.
  • Td signal delay (from source s0)
  • Connection topology G rooted binary tree with
    nodes of S as leaves
  • Edge ea in G is the edge from a to its parent
  • ea is the (assigned) length of edge ea
  • Cost total edge length

46
Zero-Skew Example (555 sinks, 40 obstacles)
47
A Zero-Skew Routing Algorithm
  • Finds a ZST under linear delay model with minimum
    cost over all ZSTs with topology G and sink set S
  • Terms
  • Manhattan Arc line segment with slope 1 or 1
  • Tilted Rectangular Region (TRR) collection of
    points within a fixed distance of a Manhattan arc
  • Core Manhattan arc
  • Radius distance
  • Merging segment locus of feasible locations for
    a node v in the topology, consistent with minimum
    wirelength
  • If v is a sink, then ms(v) v
  • If v is an internal node, then ms(v) is the set
    of all points within distance ea of ms(a), and
    within distance eb of ms(b)

48
Phase 1 Tree of Merging Segments
  • Goal Construct a tree of merging segments
    corresponding to topology G
  • Merging segment of a node depends on merging
    segment of its children ? bottom-up construction
  • Let a, b be children of v. We want placements of
    v that allow TSa and TSb to be merged with
    minimum added wire while preserving zero skew
  • Merging cost ea eb
  • Fact The intersection of two TRRs is also a TRR
    and can be found in constant time
  • Constant time per each new merging segment ?
    linear time (in size of S) to construct entire
    tree

49
Phase 2 Find Node Placements
  • Goal Find exact locations (embeddings) pl(v)
    of internal nodes v in the ZST topology
  • If v is the root node, then any point on ms(v)
    can be chosen as pl(v)
  • If v is an internal node other than the root, and
    p is the parent of v, then v can be embedded at
    any point in ms(v) that is at distance ev or
    less from pl(p)
  • Detail create square TRR trrp with radius ev
    and core equal to pl(p) placement of v can be
    any point in ms(v) ? trrp
  • Each instruction executed at most once for each
    node in G, and TRR intersection is O(1) time ?
    Find_Exact_Placements is O(n) ? DME is O(n)

50
Outline
  • Problem Statement
  • Clock Distribution Structures
  • Robustness / Signal Integrity Control
  • Clock Design
  • Skew Scheduling
  • Topology Design
  • Embedding
  • For zero skew (ZST-DME)
  • For bounded skew (BST-DME)

51
Non-Zero Skew Bounds
  • Given a skew bound, where can internal nodes of
    the given topology (e.g., a, b, v) be placed?

skew
2
4
6
0
a
skew
2
4
6
0
2
4
2
v
6
4
s0
v
6
b
Topology
a
b
s4
s1
s3
s2
52
BST-DME Bottom-Up Phase
s0
v
Bottom-Up build tree of merging regions
corresponding to given topology
Topology
a
b
s4
s1
s3
s2
s2
B 4
s0
s3
mr(a)
mr(b)
s1
mr(v)
s4
53
BST-DME Top-Down Phase
s0
v
Topology
a
b
s4
s1
s3
s2
s2
B 4
s0
s3
a
b
s1
v
s4
54
Good Luck for the Mid-Term!
Write a Comment
User Comments (0)
About PowerShow.com