Loading...

PPT – Design and Test Trends from Physical Design perspective PowerPoint presentation | free to download - id: 39527-ODlmY

The Adobe Flash plugin is needed to view this content

Interconnect Synthesis

Session III Dr. Parthasarathi Dasgupta MIS

Group Indian Institute of Management Calcutta

Outline

- Interconnect synthesis
- ITRS challenges (http//public.itrs.net)
- Delay models and estimators
- Routing tree construction methods
- Topological routing
- Delay reduction
- Buffer insertion
- Wire sizing
- Non-tree routing methods
- Clock tree routing
- Early interconnect planning
- Buffer block planning
- Interconnect architecture and metrics

ITRS02 Interconnect Projected Parameters

ITRS 2002 Update - I

Solution known

Solution exists

Solution Unknown

Year of Production

2010

2016

DRAM Tech node(nm) 45

22 No. of metal levels 10

11 Total interconnect length(m/cm2) (active

wiring only, excluding global levels)

16063 33508 Interconnect RC

delay- 1mm line (ns) 565

2008

ITRS02 Interconnect Projected Parameters

ITRS 2002 Update - II

Solution known

Solution exists

Solution Unknown

Year of Production

2010

2016

Effective dielectric constant of inter-level

metal insulator

2.1 1.8

Local wiring Pitch (nm) 105 750

Minimum Global wiring Pitch (nm) 205

100 Intermediate wiring Pitch (nm)

135 65 Conductor effective resistivity

(microOhm-cm) 2.2 2.0

ITRS02 Some Grand Challenges - I

- Near Term (Through 2007)
- Mask-Making (Lithography)
- Process Control (Lithography)
- Integration of New Processes and Structures
- (Interconnect)
- Power Management (Design)

ITRS02 Relevant Grand Challenges - II

- Long Term (2008 Through 2016)
- Next-Generation Lithography (Lithography)
- Identify Solutions Addressing Global Wiring

Issues - (Interconnect)
- Error-Tolerant Design (Design)

Why are DSM Interconnects Important ?

- Signal Delay
- Reduction of chip size K times
- increases wire resistance K times
- increases wire capacitance K times, and hence
- increases global interconnect delay K2 times
- reduces gate switching time K times
- local interconnect delay remains unchanged

Outline

- Interconnect synthesis
- ITRS challenges (http//public.itrs.net)
- Delay models and estimators
- Routing tree construction methods
- Topological routing
- Delay reduction
- Buffer insertion
- Wire sizing
- Non-tree routing methods
- Clock tree routing
- Early interconnect planning
- Buffer block planning
- Interconnect architecture and metrics

Why bother about Signal Delay?

- Global Routing trees often need to be

constructed with an objective of minimizing

circuit delay - Minimum circuit delay preferred to increase

speed of the circuit - Accurate measurement of signal delay is thus

very important - Exact signal delay measurement is too complex

and time consuming - Hence there is a need to have an accurate delay

estimation

Estimating Signal Delay

- Elmore Delay
- Delay through an on-path resistor its

resistance ? - downstream capacitance
- Delay through a path (driver to a sink pin)
- sum of delays through individual edges on the

path - First moment of the interconnect under impulse

response - Based on the 50 delay

r

Source

Rest of circuit

C1/2

C1/2

C2

Delay through interconnect r.(C1/2 C2)

Elmore Delay Characteristics

- Fairly accurate estimate of delay at nodes far

from source - Expressible as a closed-form expression

involving only - resistors and capacitors
- Provable upper bound on actual delay for all

inputs - Additive

Source, S

A

B

Delay (S, B) Delay(S, A) Delay(A, B)

Elmore Delay - An Example

i 1

j 1

I 1

j i

Elmore Delay Computation

RC tree is traversed depth-first twice Pass 1

Compute the effective capacitance at each node of

the RC tree Pass 2 At a node, compute the

actual Elmore delay from the source, using the

sum of (a) delay upto the predecessor node, and

(b) the product of the resistance between the

predecessor node and the current node, and the

effective capacitance at current node obtained in

Pass 1.

1 k

1 k

1 k

1 k

1 k

A

B

500

500

500

500

500

?AB 1k ohm x 500 Ff x 5 1k ohm x 500 Ff x 4

1 k ohm x 500 Ff x 3 1 k ohm x 500 Ff x 3 1

k ohm x 500 Ff x 2 1 k ohm x 500 Ff x 1 2500

2000 1500 1000 500 7.5 n seconds.

Bounds on Signal Delay

Lower bound and Upper Bound Computation Define

Rii resistance between source and node i Rki

resistance of the subpath common to the path

between source and node i, and that between

source and node k. The 3 Ts with dimension of

time are artificially constructed to simplify

bound computation. TP ?kRkkCk, TDi

?kRkiCk, TRi (?kR2kiCk)/Rii Let signal delay

at node i bet . Then, TDi - TRi lnTRi / TP(1 -

vi(t)) lt t lt TDi /(1 - vi(t)) -

TRi where v0 1, vi(t) 0.5

J Rubinstein, P Penfield and M A Horowitz,

Signal Delay in RC Tree Networks, IEEE Trans.

on Computer-Aided Design, CAD-2, 3, July, 1983.

Bounds on Signal Delay - An Example

Other delay Metrics

Higher order moments and using R, L and C have

also been tried by several researchers, but most

of them are rarely used due to the difficulty of

their computation inspite of their better

accuracy.

Bonding Wire

Chip

L

Mounting

Cavity

L

Lead Frame

Pin

- Fidelity of a delay estimator
- Degree to which an optimal or near-optimal

solution according to a - delay estimator correlates to a nearly optimal

according to actual delay. - For a set of possible solutions obtained using

the estimator, how - close are the ranks correlated to those for the

solutions obtained by - the actual delay measurement?
- Measure of fidelity in the context of finding

Optimal RST - Portion of the pair-wise inequality relations

among the optimal - solutions that are correctly determined by the

heuristic solution - If there are m instances of RST and hi, si are

respectively the - objective values of the heuristic and optimal

solutions to - instance j,then fidelity
- f (i, j) 0 lt i lt j lt m, ((hi - hj)(si

-sj)gt 0) or (si sj) / mC2

How effective are Delay Estimators?

P. Dasgupta, et al, Relative Accuracies of

Estimators and their use in VLSI Routing, IIM-C

Tech. Report.

Relative Accuracy of Delay Estimators

- Existing work
- Used in constructing near-optimal routing trees

based on - Elmore delay (Boese et al, ICCAD. 1993)
- Used for optimum wire sizing in routing trees
- (Cong et al, ACMTODAES, 1996)
- Major drawback of existing work
- Fidelity measured on all possible samples
- Main ideas
- Fidelity should be computed on a reasonably

diverse set of - relevant (near-optimal?) samples
- Should be dimensionless
- Preferably in the range (-1, 1)
- Relevant, I.e., act as a discriminator for

good solutions, and - not for the bad solutions
- Should be least affected by ties

New Delay Metric?

- Can we use the bounds to have a better delay

metric? - Preferable characteristics of this delay metric ?
- Compact and closed-form expression
- Easily computable
- - Efficient lower bound of actual delay (this

helps!!)

Practical use of delay minimization

Required Arrival Times

s1

s2

s0

sn

RAT(s0) lt RAT(si ) delay(s0, si ), I 1,

n RAT(s0) lt minimumi1, n (RAT(si ) delay(s0,

si )) Slack(s0, si ) (RAT(si ) delay(s0, si

)) - RAT(s0)

Outline

- Interconnect synthesis
- ITRS challenges (http//public.itrs.net)
- Delay models and estimators
- Routing tree construction methods
- Topological routing
- Delay reduction
- Buffer insertion
- Wire sizing
- Non-tree routing methods
- Clock tree routing
- Early interconnect planning
- Buffer block planning
- Interconnect architecture and metrics

Routing Tree Construction

- Mostly based on finding minimum-cost Steiner

trees (SMT) - Some are based on Rectilinear Steiner

Arborescences - which are minimum-cost Steiner trees (RSMT)

with shortest source-sink paths - Algorithms exist for simultaneous cost

minimization and - tree-diameter reduction
- Extended Prim with bounded diameter also

proposed - In DSM range, driver resistance / unit wire

resistance - hence, distributed interconnect structure /

capacitance

Routing Tree Construction

- P-tree heuristics (Lillis et al)
- Iterated 1-steiner (Kahng Robins)
- Geo-Steiner (Best Steiner tree construction

method!) - Bounded PRIM (BPRIM)
- Shallow-Light trees (BRBC)
- Rectilinear Steiner Arborescence (RSA)
- (A-tree construction of Cong et al)

RSMT Problem - Key Results

- Reduction to discrete grid

- NP-hard

- Iterated 1-Steiner heuristic
- Greedily adds Steiner points to the tree
- Almost 11 improvement over MST on average
- Fast batched implementation (BI1S)

- Exact algorithm GeoSteiner 3.0
- Branch-and-cut
- 11.5 improvement over MST on average

A-Tree Construction

A Rectilinear Steiner Tree is an A-tree if every

path from the source to any node in the tree is a

shortest path. A-Tree algorithm (in a nutshell)

- Start with a forest of n single-node

arborescences - Apply a sequence of moves
- Grows an existing arborescence
- Combines two arborescences to form a new one
- Stop when ONE arborescence is left
- Move may be safe (optimal) / heuristic

(possibly sub-optimal)

J. Cong, K-S. Leung, D. Zhou, Performance-Driven

Interconnect Design based on Distributed RC

Delay Model, Design Automation Conference, 1993.

Outline

- Interconnect synthesis
- ITRS challenges (http//public.itrs.net)
- Delay models and estimators
- Routing tree construction methods
- Topological routing
- Delay reduction
- Buffer insertion
- Wire sizing
- Non-tree routing methods
- Clock tree routing
- Early interconnect planning
- Buffer block planning
- Interconnect architecture and metrics

Topological Routing - A new idea

Our goal Partitioning routing area into

zones of pins with geometric proximity for better

area / topological routing and finding ways of

prioritizing zones. Why ??

Sinha, Sur-Kolay, Dasgupta and Bhattacharya,,

Partitioning Routing Areas into Zones with

Distinct Pins, IEEE International Conference

on VLSI Design, Bangalore, India, 2001.

Forming Zones in a Placement

Objective all pins in a zone belong to distinct

nets and are reachable through connected

regions Rationale First, connect nets among

zones, then route in detail each zone

within its connected region Bus lines

are likely to be routed together.

Graph for Zone Partitioning

- Pins in a placement
- gt Point set
- gt Voronoi diagram
- gt Delaunay triangulation
- gt Planar triangulated graph, G

Net name for pin gt color of point, i.e., vertex

in G

Formulation of the problem

- Input Planar triangulated graph G with vertices

having different colours. - MIN_ZONE_PART
- Find minimum set of connected sub-graphs, which

partitions G such that vertices in each

sub-graph have distinct colors. - Proposed algorithm is based on Genetic Algorithm.

Outline

- Interconnect synthesis
- ITRS challenges (http//public.itrs.net)
- Delay models and estimators
- Routing tree construction methods
- Topological routing
- Delay reduction
- Buffer insertion
- Wire sizing
- Non-tree routing methods
- Clock tree routing
- Early interconnect planning
- Buffer block planning
- Interconnect architecture and metrics

Can we reduce Interconnect delay?

- Buffer allocation
- New directions
- Wire sizing
- Non-tree routing

Why use Buffers in Routing Trees?

- Added buffer shields the heavy load downstream

on - the branch from the rest of the tree.
- Recover the slope of the signals transition

edge and - screen out the noise.
- Boost driving power and reduce delay.

Buffer allocation schemes

- Classical technique of Van Ginneken
- Permutation tree (P-tree)-based method to

combine - topology construction and buffer-insertion

searches, - with wire sizing
- Okamoto-Congs work

Ginnekens DP-based Method

Input a) A routing tree. b) Required arrival

times (RAT) at sinks. c) Legal buffer positions

(at vertices of routing tree) Output Find the

optimal buffer placement s.t. the RAT at source

is maximum. Method Two stage dynamic

programming-based algorithm. Stage 1. For each

vertex of routing tree, find best choices for

buffer assignment giving larger RAT at vertex

(Bottom-up). Stage 2. Top-down traversal from

root to leaves corresponding best choice for root

obtained in Stage 1. Actual buffer placement.

L.P.P.P. van Ginneken, Buffer Placement in

Distributed RC-tree Networks for Minimal Elmore

Delay, Int Sym on Circuits Systems, 1990, pp.

865-868.

Ginnekens DP-based Method ..Contd.

s1

B total number of legal buffer positions Time

complexity O(B2)

buffer

s2

s0

Without buffer - Tk Tk rlLk 0.5rcl2,, Lk

Lk cl With buffer Tk Tk Dbuf RbufLk,

Lk Cbuf

s3

s4

s5

s6

- An option is strictly worse if
- load is larger, and (ii) required time is

earlier. - At each vertex, the worst options are not saved.

At root, the - best option is chosen.

Okamoto-Congs Method - I

- existing techniques mostly works in two stages
- optimum Steiner tree construction
- optimum buffer insertion in this tree
- This method - DP-based simultaneous application

of - van Ginnekens buffer insertion
- Congs A-tree construction

S1 (Critical)

S1 (Critical)

S2

S2

S3

S3

Source

S4

Source

S4

Minimum-delay buffered tree

Minimum-delay tree followed by buffer insertion

Okamoto-Congs Method - II

- Characteristic features
- Critical path isolation - root gate drives

critical sinks and a smaller additional load due

to buffered non-critical paths - If RATs at sinks are within a small range,

balanced load decomposition is applied in order

to decrease the load at output of root gate.

Critical Signal

Critical Signal Isolation

Balanced Load Decomposition

T. Okamoto and J. Cong, Interconnect Layout

Optimization by Simultaneous Steiner tree

construction and Buffer insertion, ICCAD, 1996.

Okamoto-Congs Method - III

- Overview
- Critical path isolation (CPI) - root gate drives

critical sinks and a smaller additional load due

to buffered non-critical paths - If RATs at sinks are within a small range,

balanced load decomposition (BLD) is applied in

order to decrease the load at output of root

gate. - Bottom-up DP followed by top-down buffer

placement - CPI and BLD are taken into account when choosing

two subtrees to be merged into a single root. - For a given set of options of two nodes u, v,

and for root node r, the distances dist(r,u),

dist(r, v), and characteristics of buffer to be

placed at r, the set of options at r are computed - Using a cost criteria for different roots in the

A-tree, the best subtree is formed. - In 2nd phase, option at root which gives max

RAT(root) is chosen, and the tree is traversed in

top-down manner using the best chosen nodes in

the previous phase.

Outline

- Interconnect synthesis
- ITRS challenges (http//public.itrs.net)
- Delay models and estimators
- Routing tree construction methods
- Topological routing
- Delay reduction
- Buffer insertion
- Wire sizing
- Non-tree routing methods
- Clock tree routing
- Early interconnect planning
- Buffer block planning
- Interconnect architecture and metrics

Improving Delay by Wire Sizing

- Why wire sizing ?
- In DSM, when wire resistance becomes

significant, proper sizing of the interconnects

can reduce the interconnect delay. - First proposed by Cong, Leung and Zhou in 1993.
- When driver resistance is much larger than wire

resistance of the interconnect, the interconnect

can be modeled as a lumped capacitor without

losing much accuracy, and conventional minimum

wire-width solution often leads to an optimal

design. - When driver resistance falls below unit wire

resistance, optimal wire-sizing can lead to

substantial delay reduction.

J. Cong, K.S. Leung, "Optimal Wire sizing under

the Distributed Elmore Delay Model," ICCAD, 1993.

P-tree-based method

- Salient features
- Uses the notion of permutation of sinks
- Constructs binary search trees as the routing

trees - Finds an optimal sink permutation P based on

minimum length of tour - on the sinks, and searches for the optimal

binary tree for P - Based on DP as in Ginnekens algorithm
- Uses load and RAT as cost parameters in DP
- Performs simultaneous wire sizing for the

constructed tree

s0

s0

s1

s2

s3

s4

s5

s1

s4

s5

s2

s3

Two different trees induced by a sink permutation

Lillis, Cheng, Lin, New Performance Driven

Routing Techniques with Explicit Area/Delay

Tradeoff and Simultaneous Wire Sizing , 33rd

Design Automation Conference, pp. 395-400, 1996.

Outline

- Interconnect synthesis
- ITRS challenges (http//public.itrs.net)
- Delay models and estimators
- Routing tree construction methods
- Topological routing
- Delay reduction
- Buffer insertion
- Wire sizing
- Non-tree routing methods
- Clock tree routing
- Early interconnect planning
- Buffer block planning
- Interconnect architecture and metrics

Battling with Manufacturing Defects

- Wire doubling
- Simple, easy to integrate in current design flows
- Can be applied to all nets

Can the use of Graphs (with cycles) instead

of (conventional) Trees for Routing Topologies be

useful ?

- Non-tree routing (NTR)
- Still easy to integrate in current flows

(post-processing approach) - Appropriate for non timing-critical nets
- Potentially more effective

NTR increases Reliability

- Open fault missing material (or extra oxide

where via should be formed) - Predominant for reduced feature size
- Manufacturing defects and electro-migration tend

to be - acute problems for DSM
- Reliability ability of the interconnect to

tolerate open - faults increases for NTR topology

Spot Defect Classification

(Source Ion Mandoiu, Fujitsu Lab Talk)

Probability of Failures

NTR Problem formulation

Optimal Routing Graph (ORG) Problem Given a

signal net N (n1, n2, , nm) with source s0,

find a set S of Steiner points and a routing

graph G (N U S, E), such that the maximum

source-sink signal propagation delay is

minimum. Result. ORG problem is NP-hard.

B. A. McCoy and G. Robins, Non-Tree Routing,

IEEE Transactions on CAD/ICAS, Vol 14, No. 6,

June 1995.

Other uses of Non-tree Routing

- May reduce signal propagation delay
- Wire capacitance Wire resistance
- Observation Often, for DSM designs,
- decrease in R gt increase in C
- Capable of reducing signal skew
- Signal skew improved by an average of 63 over

Steiner routing

Augmenting Paths for NRT Construction

(C) Paths connecting tree nodes or projections

of tree nodes onto adjacent tree edges

(C)

(Source Ion Mandoiu, Fujitsu Lab Talk)

Outline

- Interconnect synthesis
- ITRS challenges (http//public.itrs.net)
- Delay models and estimators
- Routing tree construction methods
- Topological routing
- Delay reduction
- Buffer insertion
- Wire sizing
- Non-tree routing methods
- Clock tree routing
- Early interconnect planning
- Buffer block planning
- Interconnect architecture and metrics

Achieving High Clock Speed!

- Factors determining the operating speed of a

circuit - Delay
- Clock distribution
- Clock skew
- Measures the asymmetric clock distribution
- Maximum clock delay - Minimum clock delay
- Ideally should be Zero (Zero Skew Trees)

source

Reducing Clock Skews

Zero Skew Routing

- Greedy Deferred Merge Embedding for Zero skew
- Greedy bottom-up method
- Set of merging segments, initially each segment

having a - sink
- Iteratively finds the pair of closest segments
- Determine the position of parent such that the

delays from - parent to the children are equal

M. Edahiro, A Clustering-based Optimization

Algorithm in Zero-Skew Routings, 30th Design

Automation Conference, 1993.

Bounded-Skew Routing

- Problems with Zero-skew Tree construction
- Very difficult to achieve
- Increased wiring area
- Higher power dissipation
- Practical case Circuits operate correctly within

some non-zero skew bound. - BST/DME
- Form merge regions instead of merge segments
- Bottom-up region formation followed by top-down

process to - determine the exact location of the internal

nodes.

Cong et al, Bounded-Skew Clock and Steiner

Routing, ACMTODAES, Vol 3, 1998.

Semi-Synchronous Circuits

- Cluster-based method for Semi-Synchronous Circuit
- A circuit in which the clock is assumed to be

distributed periodically to - each individual register, though not

necessarily simultaneously - Clock period minimization is of prime importance
- Registers are partitioned into clusters

depending on their geometric - positions
- Registers within a cluster are in close

proximity and have identical - clock timing requirements
- Clusters are then modified to improve the clock

period while keeping - the radius small
- Each cluster of registers is driven by a buffer
- Clock period is 18 shorter than zero-skew

method - Wire length and power consumption are comparable

to zero skew

Saitoh, Azuma and Takahashi, A Clustering Based

fast Clock Schedule Algorithm for Light Clock

Trees, IEICE Trans. Fundamentals, Dec, 2002.

Outline

- Interconnect synthesis
- ITRS challenges (http//public.itrs.net)
- Delay models and estimators
- Routing tree construction methods
- Topological routing
- Delay reduction
- Buffer insertion
- Wire sizing
- Non-tree routing methods
- Clock tree routing
- Early interconnect planning
- Buffer block planning
- Interconnect architecture and metrics

Early Design Planning Needs

- Interconnect Planning (Otten, others)
- Buffer Block Planning
- Interconnect Architecture (IA)
- Performance Prediction
- Others .

Buffer Block Planning for Interconnects

- Why planning for buffers?
- Early works were for one net at a time, and had

no global - planning for buffer placement for all the nets
- Buffers can not be placed anywhere as they will

be in the - silicon, and require connections to

power/ground - networks.
- Arbitrary buffer placement may affect use/reuse

of IP - blocks

A Method for Buffer Block Planning

- Salient features of Congs method
- Introducing the concept of feasible region (FR)

for buffer - placement under some delay constraint
- FR can be quite large and can be used to place

buffer - clusters
- an effective algorithm proposed for finding FRs

which can - be used for subsequent Floorplanning and

Interconnect - Planning

J. Cong, T. Kong and Z. Pan, Buffer block

Planning for Interconnect Planning and

Prediction, IEEE TCAD/ICAS, December, 2001.

Routing Tree for Fixed Buffer Locations

- High performance design requires using a large

number of - buffers
- In practice, buffers are organized into buffer

blocks which - are pre-planned
- Buffer positions are fixed prior to routing tree

construction

obstacle

Buffer block

Logic Block

Pin

Routing Graph for a floorplan with buffer block

planning

Routing Tree for Fixed Buffers - A Method

- Summary of the method
- given the RAT values at sinks, and feasible

regions of buffers, to construct a routing tree

and assign buffers such that the RAT at source is

maximum. - Each node v is identified by a tree below it,

and characterized by (i) load capacitance at v in

its subtree, (ii) RAT(v) in the subtree, (iii)

RE reachable sink set in the subtree, and (iv) a

flag buf indicating if a buffer is inserted in

subtree at v.

J. Cong and Xin Yuan, Routing Tree Construction

under Fixed Buffer Locations, Design Automation

Conference, 2000.

Routing Tree for Fixed Buffers - A Method

- Summary of the method Contd.
- Expansion of nodes (subtrees), and merging of

nodes (subtrees) are done at each node and the

corresponding labels generated. - A label-queue stores all the labels at any stage

of the algorithm, and at each iteration of the

algorithm, a new label with maximum RAT is

selected. - Subtrees are not deleted on immediately on

merging. - Stops when algorithm fetches a label from queue

for for the whole tree

Routing Tree for Fixed Buffers - Example

Outline

- Interconnect synthesis
- ITRS challenges (http//public.itrs.net)
- Delay models and estimators
- Routing tree construction methods
- Topological routing
- Delay reduction
- Buffer insertion
- Wire sizing
- Non-tree routing methods
- Clock tree routing
- Early interconnect planning
- Buffer block planning
- Interconnect architecture and metrics

Can we Predict IA Performance?

- Performance of Interconnect Architecture (IA)

traditionally predicted using delay, congestion,

etc. - Previous works lack in considering several

factors like materials, use of vias, repeaters,

number of layers, etc - Helps to choose the dynamic parameters related

to process and materials for an IA

Interconnect Architecture

An Interconnect Architecture (IA) is a collection

of pairs of wiring layers (tiers), with all wires

in a given layer pair having uniform width (w),

height (h), spacing (s) and thickness (t)

Layer-pair j

Layer-pair (j1)

Repeater

Repeater

via

Schematic of an IA

Repeater

Proposing a Novel Metric

- Novel metric for IA performance evaluation
- Efficient dynamic programming method for exactly

computing the metric - for given Interconnect Architecture
- for given Wirelength Distribution (WLD)
- Studies of effects of materials, geometry,

frequency, design parameters, repeater resources

on IA metric

Dasgupta, Kahng and Muddu, A Novel Metric for

Interconnect Performance, Design and Test

Automation in Europe (DATE) 2003.

Performance-, WLD-Aware Routing Model

- All connections (wires) are two-pin, L-shaped
- Each segment of an L is assigned to one layer of

a tier - For a given WLD, longer wires always routed on

upper tiers shorter wires always routed on lower

tiers - Every wire has a target delay (proportional to

clock period) - Repeaters inserted as needed to meet delay

targets - Starting from longer wires first
- All repeaters used in wires of a tier are of

same size - Repeater resource is specified as fraction of

total die area - Repeaters inserted until repeater area budget is

exhausted

Rank Metric for IA

- Determines IA quality in terms of how completely

target performance is met while embedding all

wires - Rank of a wire its index in the WLD, where

wires have been arranged in order of

non-increasing length - Rank of an IA index of the highest-rank wire in

the WLD that meets its target delay, subject to

the constraints - The given repeater area budget is not exceeded
- Lower-rank ( longer) wires in the WLD meet

target delays - All wires in the WLD can be assigned to the IA
- The rank of an IA is zero if not all the wires of

the WLD can be assigned to the IA, even without

meeting any delay targets

Rank of IA Dependencies

WLD

Number of wires

IA of layer pairs W, H, S and T, tech node,

gate count and gate parameters

TWirelength

Target Delays

Rank of the IA

Repeater area budget AR

Via blockage

The Rank Computation Problem

- Given
- IA with fixed number of layer-pairs with fixed

geometry - WLD W with n wires
- Available repeater area AR
- Upper bound di target delay for each wire
- Find
- An assignment of wires from the WLD to the IA
- using repeater insertion within the repeater area

budget - to meet target delays of wires
- such that rank of first wire failing to meet

target delay is maximized

Rank Computation

- Rank of an IA is computed by assigning maximum

number of wires from the WLD to tiers of the IA - by making ActualDelay ? TargetDelay
- within AR
- Maximizing the Rank requires optimum combination

of - wires assigned to tiers
- repeaters assigned to wires
- Exhaustive search over wires, tiers and repeaters

is infeasible - How to compute Rank efficiently?
- Greedy approach or Dynamic Programming (DP)

Layout Enhancement for Manufacturability

Session III Dr. Parthasarathi Dasgupta MIS

Group Indian Institute of Management Calcutta

Outline

- Issues in lithography
- Resolution enhancement
- Optical process correction
- Phase Shift Masking
- Phase assignment
- Chemical mechanical polishing
- Dummy fill synthesis
- Layout data representation and compaction

Process flow for IC Manufacturing

Layout Design

Pattern Generation

Mask or Reticle

Chip Production

IC Manufacturing Terminology

Reticle - A photographic plate developed from a

sequence of polygonal patterns for a single layer

of an IC Depth of focus - a plus or minus

deviation from a defined reference plane wherein

the required resolution for photolithography is

still achievable Photoresist - A

radiation-sensitive material used as a coating on

wafer prior to doping

Photolithography

RADIATION

MASK

RESIST

RESIST

RESIST

OXIDE

OXIDE

SILICON

SILICON

Outline

- Issues in lithography
- Resolution enhancement
- Optical process correction
- Phase Shift Masking
- Phase assignment
- Chemical mechanical polishing
- Dummy fill synthesis
- Layout data representation and compaction

IC Manufacturing in DSM - Problems?

- Feature dimensions (lt 350 nm) lt Wavelength of

the incident light - Effects?
- Optical diffraction.
- Resist process effects.
- Distortion or disappearance of features.
- Rayleigh limit (resolution ? ? / NA2)
- Compensation Schemes (Amp / Phase)
- Optical Proximity Correction.
- Phase-Shifting Masks.
- ...

Resolution Enhancement Techniques (RET)

Optical Proximity Correction (OPC)

- Perturb the shapes of transmitting apertures in

the mask in a systematic manner to address

optical lithographic distortions. - OPC correction primitives
- Serif small L-shaped geometry added to

(subtracted from) convex (concave) corner to

nullify rounding - Hammerhead A U or inverted-U to compensate for

line-end shortening - Notch local thinning of a feature to compensate

for linewidth variation - Outtrigger disconnected, non-printing geometry

that uses diffraction effects to enhance

linewidth control

OPC Example

A. B. Kahng and Y. C. Pati, "Subwavelength

Optical Lithography Challenges and Impact on

Physical Design", Proc. ISPD, April 1999, pp.

112-119.

Outline

- Issues in lithography
- Resolution enhancement
- Optical process correction
- Phase Shift Masking
- Phase assignment
- Chemical mechanical polishing
- Dummy fill synthesis
- Layout data representation and compaction

Phase Shifting Mask (PSM)

Proposed in M.D. Levenson, et al. Improving

Resolution in Photolithography with a

Phase-Shifting Mask, IEEE Trans. Electron

Devices, 29, p. 1812, Dec. 1982. By using a

coating based on Chromium or Molybdenum Silicide

(MoSi), two adjacent clear regions of a mask are

enabled to transmit light with pre-defined

phase-shifts. For a phase-shift 180 degrees,

diffracted light in the intermediate dark region

interfere destructively. Effect - Better

resolution and depth of focus (DOF).

PSM Example

Phase shifter

Light Intensity

Regions of Interference

Without Phase-shifting mask

With Phase-shifting mask

Outline

- Issues in lithography
- Resolution enhancement
- Optical process correction
- Phase Shift Masking
- Phase assignment
- Chemical mechanical polishing
- Dummy fill synthesis
- Layout data representation and compaction

Phase Assignment Problem

Input A given set of features in a mask

layout Objective Assign phases to all the

features of the layout such that no two

conflicting features are assigned the same phase.

New Thoughts?

Constrained Physical Design !!

Layout Geometry Mask

Geometry

Actual geometry on Wafer

Outline

- Issues in lithography
- Resolution enhancement
- Optical process correction
- Phase Shift Masking
- Phase assignment
- Chemical mechanical polishing
- Dummy fill synthesis
- Layout data representation and compaction

Chemical-Mechanical Polishing (CMP)

- Requirements for ULSI --
- smaller feature size
- higher resolution
- multi-layer interconnects
- global planarity on ILD and metal layers for

better - depth of focus
- CMP
- - can be performed on both ILD and metals
- - polishes wafer surface flat
- - uses chemical slurry and circular action

Problems with CMP

Wafer

Slurry

Rotating Pad

- Associated Problems
- wearing out of Polishing Pad over metal features
- dishing in sparse regions of layout
- greater ILD thickness over dense regions of

layout

Outline

- Issues in lithography
- Resolution enhancement
- Optical process correction
- Phase Shift Masking
- Phase assignment
- Chemical mechanical polishing
- Dummy fill synthesis
- Layout data representation and compaction

Uniform Feature Density?

- The density of a layer in any particular region

is the - total area covered by the drawn features on

that - layer divided by the area of the region
- ILD thickness ? Local Feature density
- Uniform (feature) density achieved by
- Post-Processing Insertion of Dummy

(electrically - inactive) features

Uniform Feature Density - Tiling

Dummy feature

Normal feature

Cross-sectional view

Top view

(Dummy) Fill Synthesis Problem (Tiling)

- Foundry rules specify minimum and maximum

feature densities to reduce effect of CMP on

yield (e.g., each metal layer, every 2000 um x

2000 um window must be between 35 and 70 filled - Problem
- Input A post-CMP feature distribution on a

layout - Objective The amount and location of dummy

features to be placed into the layout. - Constraints Feature density gt a prescribed

minimum, variation in feature density is within a

prescribed range, electrical and physical design

rules are observed.

Solving Tiling Problem

- Outline of A Generic Approach
- For every prescribed window size, find the

available - area for dummy features
- fixed dissection
- arbitrary dissection
- Compute amounts and locations of dummy fills
- satisfying the constraints
- Generate Fill Geometry

Methods for Dummy Feature Placement - I

- rule-based (widely used in Industry)
- boolean operation
- tiles of a single prescribed density
- fills ONLY open space

Methods for Dummy Feature Placement - II

- model-based
- analytical expression of the relation between

local - density and ILD thickness
- more accurate than rule-based

R. Tian, D. F. Wong and R. Boone, Model-Based

Dummy Feature Placement for Oxide CMP

Manufacturibility, DAC 2000.

Outline

- Issues in lithography
- Resolution enhancement
- Optical process correction
- Phase Shift Masking
- Phase assignment
- Chemical mechanical polishing
- Dummy fill synthesis
- Layout data representation and compaction

Dummy Fills Is DENSITY the ONLY factor?

- Representing fill patterns (GDSII)
- Generating compressed fill patterns
- Compressing existing fill patterns

Defining Layouts - GDSII

- A standard (binary) file format
- Used for transferring / archiving 2D graphical

design data - Records
- Header (record type)
- Information (GDSII BNF)
- Contains hierarchy of structures
- Structure
- Boundary / polygon, path, text, box
- Structure references (SREF)
- Array of structures references (AREF)

GDSII AREFs

X3, Y3

SREF / AREF

Inter row spacing

X1, Y1

Inter column spacing

X2, Y2

GDSII File Description Example

Header Bgnlib Lib1 Bgnstr Struct1

(ltelementgt) Endstr Endlib Element -

ltboundarygt ltpathgt ltarefgt lttextgt ltnodegt

ltboxgt Endel Header Bgnlib Libname LIB1

Bgnstr Strname CELL1 Boundary Layer 0

Datatype 0 XY 6 X -1000.000 Y

0.000 X 163000.000 Y 0.000 X

163000.000 Y 177000.000 X 80000.000 Y

260000.000 X -1000.000 Y

260000.000 X -1000.000 Y 0.000

Endel Endstr

GDSII File Description Example

Bgnstr Strname AREF_SAMPLE1 Aref

Sname CELL1 Strans 0,0,0 Colrow 7 , 3 XY

3 X -5114000.000 Y -3006000.000 X

-3095600.000 Y -3006000.000 X

-5114000.000 Y -1891800.000 Endel

Endstr

GDSII File Description Example

Bgnstr Strname SREF_SAMPLE1 Sref Sname

AREF_SAMPLE1 XY 1 X

-7114000.000 Y -2006000.000 Endel

Endstr Bgnstr Strname LAYOUT Aref

Sname SREF_SAMPLE1 Strans 0,0,0 Colrow 9 ,

5 XY 3 X -114000.000 Y -2006000.000

X -2095600.000 Y -2006000.000 X

-114000.000 Y -2891800.000 Endel

GDSII File Description Example

Aref Sname CELL1 Strans 0,0,0 Colrow 2 ,

3 XY 3 X -3140000.000 Y

-2006000.000 X -3240000.000 Y

-2006000.000 X -3140000.000 Y

-3891800.000 Endel Endstr Endlib

Aref Sname CELL1 Strans 0,0,0 Colrow 2 , 3

XY 3 X -3140000.000 Y

-2006000.000 X -3240000.000 Y

-2006000.000 X -3140000.000 Y

-3891800.000 Endel Endstr Endlib

Why Layout Data Compaction is needed?

- Drastic increase in Layout data
- pattern modification due to OPC and PSM
- fracturing of layout data (partitioning into
- primitive patterns, like triangles,

trapezoids, etc.) - fill pattern generation

A Method for Layout Data Compaction

- Basic figure a rectangle or a trapezoid

corresponding - to a fractured mask pattern.
- Grouped figure group of multiple basic figures
- Array a reference to grouped figures with

pitch and - number of repetitions in each of X and Y

directions. - Cell has some references and one definition.

Cell - includes grouped figures.

Effective Data Compaction Algorithm for Vector

Scan EB Writing System, S. Ueki et. Al., Proc.

of SPIE, Vol. 4186, 2001.

Compaction Steps in Uekis Algorithm

Step 1. Search arrays of same type of figure

(array search algorithm) Step 2. Classify arrays

by array type and create cells that include

multiple figures for each array type. Step 3.

Search cells from all figures that are not

positioned in array. (cell search algorithm)

Example - I

Shape A A1, A2, A3 Shape B B1, B2, B3 Figures

classified by shape

B1

B2

B3

A1

A3

A2

Array of A1, A2, A3 Array of B1, B2,

B3 Array figures of same type

B1

B2

B3

A1

A3

A2

Grouped figure AB One array AB1, AB2, AB3

AB1

AB3

AB2

Cell extracted for arrays of same pitch and same

of figures

Example - II

Three cell references

5 x 1 array

Open Artwork System Interchange Standard

- OASIS Salient Features
- Proposed in February, 2003 by SEMI TWG
- Achieve gt10x (order of magnitude) file size

reduction - compared to GDSII
- Allow integers to extend to gt 64 bits, when

required - Efficiently handle flat geometric data,

including array - of figures
- Make format publicly available to academic and
- commercial entities

New Stream Format Progress Report on

Containing Data Size Explosion, DSM Technical

Notes, Mentor Graphics

OASIS Repetition Types

Type 1

Type 2

Type 4

Type 5

Type 3

Type 7

Type 6

Type 8

References

1. J. Rubinstein, P. Penfield and M. A. Horowitz,

"Signal Delay in RC Tree Networks", IEEE Trans.

on Computer-Aided Design, CAD-2, 3, July,

1983. 2. J. Lillis, C. K. Cheng, T. Y. Lin and

C. Y. Ho, "New Performance Driven Routing

Techniques with Explicit Area/Delay Tradeoff and

Simultaneous Wire Sizing",33rd Design Automation

Conference, pp. 395-400, 1996. 3. J. Cong and

Xin Yuan, "Routing Tree Construction under Fixed

Buffer Locations", Design Automation Conference,

2000. 4. P. Dasgupta, "Relative Accuracies of

Estimators and their use in VLSI Routing", IIM-C

Tech. Report, 2003.

References

5. K. Sinha, S. Sur-Kolay, P. Dasgupta and B. B.

Bhattacharya, "Partitioning Routing Areas into

Zones with Distinct Pins", Proc. International

Conference on VLSI Design (IEEE-CS Press),

Bangalore, India, 2001. 6. L.P.P.P. van

Ginneken, "Buffer Placement in Distributed

RC-tree Networks for Minimal Elmore Delay",

International Symposium on Circuits Systems,

1990, pp. 865-868. 7. T. Okamoto and J. Cong,

"Interconnect Layout Optimization by Simultaneous

Steiner tree construction and Buffer insertion",

International Conference on Computer-Aided Design

(ICCAD), 1996.

References

8. J. Cong and K.S. Leung, "Optimal Wiresizing

Under the Distributed Elmore Delay Model",

International Conference on Computer-Aided Design

(ICCAD), 1993. 9. B. A. McCoy and G. Robins,

"Non-Tree Routing", IEEE Transactions on

CAD/ICAS, Vol 14, No. 6, June 1995. 10. M.

Edahiro, "A Clustering-based Optimization

Algorithm in Zero-Skew Routings", 30th Design

Automation Conference, 1993. 11. J. Cong, A. B.

Kahng, C-K.Koh and C.-W. A Tsao, "Bounded-Skew

Clock and Steiner Routing, ACM Transactions on

Design Automation of Electronic Systems, Vol 3,

No 3, 1998, pp. 341-388.

References

12. M. Saitoh, M. Azuma and A. Takahashi, "A

Clustering Based fast Clock Schedule Algorithm

for Light Clock Trees", IEICE Transactions

Fundamentals, Vol E85-A, No. 12, Dec, 2002. 13.

J. Cong, T. Kong and Z. Pan, "Buffer block

Planning for Interconnect Planning and

Prediction", IEEE Transactions on CAD/ICAS,

December, 2001. 14. J. Cong and Xin Yuan,

"Routing Tree Construction under Fixed Buffer

Locations", DAC, 2000. 15. P. Dasgupta, A. B.

Kahng and S. V. Muddu, "A Novel Metric for

Interconnect Performance", Design and Test

Automation in Europe (DATE), 2003.

References

16. A. B. Kahng and Y. C. Pati, "Subwavelength

Optical Lithography Challenges and Impact on

Physical Design", Proc. International Symposium

on Physical Design, April 1999, pp. 112-119. 17.

R. Tian, D. F. Wong and R. Boone, "Model-Based

Dummy Feature Placement for Oxide CMP

Manufacturibility", Design Automation Conference,

2000. 18. S. Ueki, et al, "Effective Data

Compaction Algorithm for Vector Scan EB Writing

System", S. Ueki, Proceedings of SPIE, Vol.

4186, 2001.

References

19. "New Stream Format Progress Report on

Containing Data Size Explosion", DSM Technical

Notes, Mentor Graphics, 2003. 20. A. B. Kahng

and G. Robins, "A New Class of Steiner Tree

Heuristics with Good Performance The Iterated

1-Steiner Approach", Proc. International

COnference on Computer-Aided Design, pp. 428-431,

1990. 21. International Technology Roadmap for

Semiconductors (ITRS), http//public.itrs.net. 22

. J. Cong, K-S. Leung, D. Zhou,

"Performance-Driven Interconnect Design based on

Distributed RC Delay Model", Design Automation

Conference, 1993.

Session IV Analyzing Layout Databases for

Improving Test Quality

Dr. Sujit T Zachariah Intel India Development

Centre, Bangalore (sujit.t.zachariah_at_intel.com)

Outline

- HVM Test Basics
- Case Studies
- Defect Based Testing
- Shorts
- Bridge defects (Random two-node and multi-node)
- Opens
- Open defects (Random and systematic)
- Circuit Marginality Testing
- Power Supply Droop
- Q A

High Volume Manufacturing (HVM)Test Basics

HVM Testing Approaches An Overview

- Functional testing
- Exorbitant cost of testers (at-speed application)
- Need for frequent tester upgrades
- Cost of manual test generation
- Structural testing
- Low cost, re-usable structural testers
- Automated approaches for test pattern generation
- Use of fault models
- Classical approach stuck-at fault model

But

- Most manufacturing defects behave electrically as

shorts or opens - Marginality issues introduced by design tool

approximations and process variations on the rise

with device scaling - Stuck-at fault model inadequate for both cases
- We need to rethink fault models!
- Adequately model failing behavior
- Simple enough for targeting test generation

Also

- For realistic fault models
- Number of possible faults is extremely large
- Current ATPG techniques limit target size
- Implies need for fault extraction prior to fault

modeling - Enumerate all failure sites
- Prioritize failure sites as a ranked list

(probability) - Analysis at lower level of design abstraction
- Circuit (schematic) Example cross talk

analysis - Physical (layout) Layout Analysis

for Test

Layout Databases Assumptions

- All standard industry formats converted to

standard hierarchical database format - Rectilinear polygons converted to set of non

overlapping rectangles - Non Manhattan geometry approximated as

rectilinear polygons

Case Study 1 Defect Based TestingExtraction

of Random Bridge Defects

Bridge Fault Extraction Overview

- Identify potential bridge failure sites in a

layout - Useful for yield estimation, test generation and

failure analysis - Approaches
- Capacitance Extraction Based Approaches Stroud,

Emmert et al 00 - Inductive Fault Analysis (IFA) Based Approaches

Ferguson, Shen 88 - Uses defect information from manufacturing

sources - Likelihood of occurrence modeled using Weighted

Critical Area (WCA)

Inductive Fault Analysis (IFA) Overview

Bridge Faults Types

Multi-Node Bridge Faults

Two-Node Bridge Faults

ltn2,n3gt 1.8 ltn1,n2gt 0.7 ltn2,n3gt 0.6 ltn1,n2,n3gt 0.4

ltn2,n3gt 2.2 ltn1,n2gt 1.1 ltn2,n3gt 1.0

- Why Multi-Node Bridge Analysis?
- Accuracy of extracted bridge list
- Impact on test quality and yield estimation

IFA Based Approaches

- CARAFE Jee, Ferguson 92
- CREST Nag, Maly 95
- LOBS Gonclaves, Teixeira, Teixeira 96,97
- Eiffel Chakravarty, Zachariah 00
- FedEx Walker, Stanojevic 01

IFA Based Approaches CARAFE

- Straightforward implementation of the WCA

definition - For each layer L (or layer pair)
- For each defect size S
- Expand each feature by the defect size S
- Determine CARs as the intersection area of the

expanded rectangles - Annotate CARs with net name pair and collect them

into a global list - Find union of CARs by selectively merging the

rectangles from the global list - Repeat computations for each given defect size

IFA Based Approaches CARAFE

- Sources of inefficiency
- Linear increase in run time with the number of

defect sizes processed - Sub optimal line sweeping rectangle intersection

algorithm - Overhead due to global processing of CARs
- Limits use to very small layouts

IFA Based Approaches CREST

- Uses layout hierarchy (no flattening)
- WCA computations performed one instance at a time

- bottom up approach - Through-the-cell routing and net name propagation

issues - Accuracy issues with generated fault list (WCA

values ranking)

IFA Based Approaches LOBS

- Uses sliding window algorithm for computing CARs

based on maximum defect size - Algorithm for determining union of CARs based on

the cube generation of the intersections - When two CARs A and B overlap,
- CA computed as Area(A) Area(B) - Area (A

intersection B) - Potential explosion in number of computations if

number of overlapping CARs is large

IFA Based Approaches Eiffel

- Process multiple defect sizes
- Results deduced for all defect sizes from the

calculations for maximum defect size - Interval tree based algorithm to determine

rectangle intersections - Novel algorithm for finding the union of

rectangles constituting the critical area for a

bridge - Resulting Algorithm is
- Able to process large number of defect sizes
- Able to handle larger layout databases

Algorithm Outline

- For each layer L (or layer pair)
- Step1 Determine CAR for the maximum defect size
- Expand each feature by the maximum defect size

Smax - Determine max_CARs as the intersection area of

the expanded rectangles - Efficient computation using interval trees
- Annotate CARs with net name pair and collect them

into buckets, with each net pair having its own

bucket

Algorithm Outline

- Step 2 Process each bucket of max_CARs
- For each net name pair ltN1,N2gt (bridge)
- For each defect size S
- Shrink max_CARs by (Smax-S) to obtain CARs for

the size S - Merge CARs to obtain CA(N1,N2,S,L)
- (Efficient merging using novel algorithm)
- Weigh CA(N1,N2,S,L) with pL(S) and update

WCAltN1,N2gt - (Bridges and their associated WCA maintained in

balanced AVL tree for efficiency of update

process)

Experimental Results

300X improvement

Experimental Results

IFA Based Approaches FedEx

- Algorithm targeted for fast results
- Capable of handling large VLSI layout databases
- Accuracy traded to achieve speed

Multi-Node Bridges

- Computation more challenging than two-node

analysis - Eiffel Algorithm
- Compute two node critical area rectangles
- Performed only for the maximum defect size
- Efficient interval tree based solution
- Resulting critical areas collected into a global

list - Critical area rectangles for all defect sizes

deduced from critical areas corresponding to the

maximum defect size

Multi-Node Bridges (Continued)

- Compute multi-node WCA value increments from

critical area rectangles - Novel line sweep based solution

Experimental Resul