A Highly Testable Pass Transistor Based Structured ASIC Design Methodology - PowerPoint PPT Presentation

1 / 31

About This Presentation

Title:

A Highly Testable Pass Transistor Based Structured ASIC Design Methodology

Description:

... term1 2.31 6080 2626.6 0.83 600 720 t481 5.89 14035.2 2383.2 0.95 810 850 i9 9.92 40320 4064.1 1.77 1560 880 i8 9.49 24441.6 2575.6 2.1 1700 810 frg2 12.62 ... – PowerPoint PPT presentation

Number of Views:71

Avg rating:3.0/5.0

Slides: 32

Provided by: eceTamuE7

Learn more at: http://www.ece.tamu.edu

Category:

more less

Transcript and Presenter's Notes

Title: A Highly Testable Pass Transistor Based Structured ASIC Design Methodology

1
A Highly Testable Pass Transistor Based
Structured ASIC Design Methodology

Kanupriya Gulati
Nikhil Jayakumar
Sunil P. Khatri

2
Motivation for Structured ASICs
Process (microns) 2.0 0.8 0.6 0.35 0.25 0.18 0.13 0.1
Single Mask Cost (K) 1.5 1.5 2.5 4.5 7.5 12 40 60
of Masks 12 12 12 16 20 26 30 34
Mask Set cost (K) 18 18 30 72 150 312 1000 2000

A full set of lithography masks can cost between
1-3M.
Roughly 25 reduction in ASIC design starts in
past 7 years. Sematech Annual Report 2002,
A. Sangiovanni-Vincentelli The Tides of EDA,
keynote talk, DAC 2003.

3
Our Solution

Use a regular array of pass transistor logic
based if-then-else (ITE) cells with flip-flops
along the edges of the die as the underlying
circuit structure.
Stock such arrays pre-processed up until
metallization step
Or, use previously generated masks for all other
layers and use new masks for only METAL, VIA
layers.
To create an ASIC for a given design
technology-map this design to the smallest
available array.
Only METAL and VIA masks require changes.

4
Advantages

Can share masks for several layers.
Reduces NRE.
No need for the designer to worry about DFM
issues.
Improved yield.
New designs can be implemented faster.
Task of engineering change simplified design
modification requires only METAL, VIA mask
changes.
Generating test patterns for such a design is
easy.
100 test coverage in time linear in the size of
the network
No redundant faults in the design.

5
The Gap between FPGA and ASIC
FPGA
ASIC

Low speed
High Power
Cost-effective for low volume products

High Speed
Low Power
Cost-effective for high volume products
Necessary for products requiring high performance
or low power.

What bridges the gap?
6
Taxonomy of Regular Logic Fabrics

As we move further away from Standard cell
(ASIC), we lose
Area
Speed
Power
As we move closer to FPGAs, we gain
Flexibility
Lower NRE

Our Approach

Exploring Regular Fabrics to Optimize the
Performance-Cost Trade-off L. Pillegi et.al.

7
Overview

Convert a logic netlist to a partitioned Reduced
Order Binary Decision Diagram (ROBDD).
Each ROBDD node is implemented as an ITE cell.
Place these ITE cells in an area and delay
efficient manner on a pre-fabricated array of ITE
cells.

8
ITE Cell Structure
out
out

Used NMOS pass-gate based structure.
Each ITE cell generates buffered output and its
complement.
Delay of NMOS pass-gate ITE cell was found to be
similar to that of CMOS pass-gate based ITE cell
with a smaller area.
Probably due to the increased diffussion
capacitance in CMOS pass-gates.

i
i
T
E
9
ITE Cell Design

MUX control signals run along the length of the
cell.
Each ITE cell has 3 variable signals and three
complemented variable signals running
horizontally in metal 3.
Appropriate placement of stacked vias at the
horizontal metal 3 wires allows the ITE cell to
be connected to any one of the 3 variables in the
corresponding row of the array.
Metal layers 1 and 2 used for most of the layout,
metal layer 3 used to route variables and their
complement.

VDD
GND
10
Synthesis Partitioned ROBDD

Synthesis of logic netlist into a partitioned
ROBDD structure done in VIS.
Primary input variables are ordered using a DFS
ordering.
Enable dynamic variable ordering before building
ROBDDs
Do bottom up construction of ROBDDs
Let set of variables in ROBDD manager be V
(initially PIs).
If size of any ROBDD gt user-specified threshold
B
Introduce new variable v (intermediate ROBDD
variable) and continue building ROBDDs on a set
of variables V U v.
Results in a series of ROBDDs
Size of each ROBDD bounded by B.
Output of these ROBDDs represent either a primary
output or an intermediate ROBDD variable.

11
Example
z
z
y2
y2
y1
y1
x1
x2
x3
x4
x4
x1
x3
x2

Given multi-level logic network with primary
inputs x1,x2, x3,x4
As bottom-up ROBDD construction proceeds, new
variables y1 and y2 are created.
Z is built in terms of y1, y2

12
Placement

First Replicate ITE cells whose outputs are
heavily loaded in order to limit fanout
Correspond to ROBDD nodes with high in-degrees.
If in-degree of ROBDD node k, then replicate
this node times.
we use K 3
Compute initial estimate of number of ITE cells
n in any row of the ITE array and number of
rows m of the ITE array as follows
where, x width of each ITE cell
y height of each ITE cell
N total number of ITE cells

13
Placement

Sort the N ITE cells in increasing order of their
ROBDD variable index.
Variable index is a measure of closeness of
variable to the root of ROBDD.
A variable closer to the root has smaller index
than one further from the root.
Assign ITE cells to rows of the ITE array

14
Assigning ITE cells to rows

If there are nj ITE cells with variable index vj
such that nj gt n (n number of ITE cells that
can fit in one row)
ITE cells need to span rows.
Sort these nj cells in decreasing order of cost
C.
ci children of node c
cj parents of node c
Helps keep routes short.

Level 2
Level 3
Cost(b) 3 3 0
a
Level 4
b
Cost(a) 5 2 3
Level 5
Level 6
15
Assigning ITE cells to rows

If there are nj ITE cells with variable index vj
such that nj lt n
Attempt to populate corresponding row of the ITE
array with additional ITE cells with variable
index vj1
If row is still not full, add ITE cells with
variable index vj2 as well.
Each row can hold ITE cells which depend on at
most 3 variables since the number of variables
that can be routed over any ITE cell is 3.

16
Placement of ITE cells within rows

ITE cells are arranged within rows to reduce
crossings in the induced circuit graph (after
planarization of the array of ITE cells).
Use DOT (graphviz.org) to do this.
DOT only re-arranges cells in each ITE row in a
manner that minimizes graph crossings.
DOT is not allowed to modify the assignment of
ITE cells to rows.

17
Implementing Sequential Designs

Each row of ITE cells has a bank of 3 flip-flops.
Outputs of the flops can drive one of the inputs
by means of a METAL and VIA mask change.

18
Route

Use WROUTE (in Cadences Silicon Ensemble for
DSM) to route the ITE cell array.
Use 4 metal layers for the route.

Example alu2
19
Summary of Design Flow

Convert netlist to partitioned ROBDD in VIS.
Perform cell replication if required to limit
fanout.
Perform ITE cell assignment to rows.
Re-arrange ITE cells within rows using DOT to
minimize crossings in the graph induced by the
interconnections among the ITE cells.
Use the result of DOT as the final placement and
perform routing using WROUTE (or any other
routing tool).

20
Ease of Testability

In traditional scanned standard-cell based
circuits
ATPG problem is NP complete.
In our scanned ITE cell based approach
In functional mode
Partitioned ROBDD outputs are regular inputs to
other partitions.
In test mode
Primary inputs and the outputs of each partition
are scanned in to allow independent testability
of the different partitions.

21
Abstract View of Partitioned ROBDDs
z
y2
PO
x5
x9
x6
.
x3
.
.
.
Additional Scan-able nodes
x4
.
.
y2
y1
x1
x2
x3
x4
PIs
22
Ease of Testability - Excitation
ROBDD of

Path from to
Linear time BDD operation

23
Ease of Testability - Propagation
ROBDD of

Path from to
Again a Linear time BDD operation

Support variables for both conditions are
Non-Overlapping !!
Circuit is guaranteed irredundant
100 stuck fault coverage guaranteed in time
linear in the size of the circuit.

24
Experiments

To compare with standard-cell based design, the
circuits were mapped to a library of 20 gates.
Used SIS for optimization (script.rugged) and
map.
Placement and routing done using SEDSM using
0.1um process and 4 metal layers.
Delay of standard-cell based designs
Pre-characterized the library using SPICE (0.1um
BPTM)
Used sense package in SIS
sense returns longest sensitizeable path (false
paths implicitly ignored)

25
Experiments

Partitioned ROBDD construction done using the
frontier method in VIS.
Tried the following different partitioning
threshold numbers (B).
5, 10, 15, 20 and 1000.
For each circuit, the result that yielded the
smallest number of ROBDD nodes was selected.
This partitioned ROBDD structure was then taken
through our design flow.

26
Experiments

Delay of ITE cell array
Found by traversing longest topological path (in
terms of number of ITE cells) between any circuit
PI and PO
Delay at each ITE cell is given by
If variable is a primary input
D(cell) MAX D(leftchild), D(rightchild)
D(ITE block)
If variable is an internal node
D(cell) MAX D(variable), D(leftchild),
D(rightchild) D(ITE block)
D(ITE block) found from SPICE simulations (0.1um
BPTM)
Assumed that the ITE cell drove the maximum load
allowed hence delay estimates are conservative

27
Results (Combinational designs)
Ckt. Evaluation Delay Evaluation Delay Evaluation Delay Area Area Area
StdCell ITE Ovh StdCell ITE Ovh
alu2 770 500 0.65 1314.1 2560 1.95
alu4 1020 527 0.52 2500 5068.8 2.03
apex6 500 1310 2.57 2678.1 14585.6 5.45
apex7 440 1030 2.34 885.1 4608 5.21
C1908 880 2590 2.91 1827.6 8288 4.53
C3540 1250 3050 2.44 4323.1 29491.2 6.82
C432 930 3070 3.3 715.6 4640 6.48
C499 600 1070 1.78 1827.6 3974.4 2.17
C880 1210 2750 2.27 1463.1 8985.6 6.14
dalu 1110 2460 2.22 3164.1 39916.8 12.62
frg2 810 1700 2.1 2575.6 24441.6 9.49
i8 880 1560 1.77 4064.1 40320 9.92
i9 850 810 0.95 2383.2 14035.2 5.89
t481 720 600 0.83 2626.6 6080 2.31
term1 320 730 2.28 663.1 2355.2 3.55
too_large 510 1550 3.04 1105.6 10560 9.55
vda 650 600 0.92 1508.03 6080 4.03
x1 380 950 2.5 1105.6 9625.6 8.71
x3 510 1660 3.25 2756.25 16844.8 6.11
x4 440 650 1.48 1314.1 11264 8.57
Avg 2.01 6.08

Delay penalty is 2X
Area Penalty is 6X
FPGAs typically have a 25X delay penalty and a
10X area penalty.

28
Results (Sequential designs)

Delay penalty is 1.6X.
Area penalty is 3.4X.
FPGAs typically have a 25X delay penalty and a
10X area penalty

Ckt. Evaluation Delay Evaluation Delay Evaluation Delay Area Area Area
StdCell ITE Ovh StdCell ITE Ovh
s1488 630 650 1.03 3277.6 6240 1.9
s1494 650 600 0.92 3108.1 6400 2.06
s208 270 550 2.04 105.1 1459.2 13.88
s344 390 650 1.67 715.6 2649.6 3.7
s349 410 650 1.59 742.6 2649.6 3.57
s386 290 550 1.9 885.1 2060.8 2.33
s444 380 700 1.84 1105.6 2880 2.6
s510 390 400 1.03 1105.6 3161.6 2.86
s526 330 700 2.12 1314.1 2355.2 1.79
s526n 330 700 2.12 1314.1 2457.6 1.87
s820 560 650 1.16 1827.6 3968 2.17
s832 570 650 1.14 1827.6 3968 2.17
Avg 1.55 3.41
29
Speed-up of ATPG
Ckt Regular ATPG (SIS) ATPG for ITE Improve
C1908 0.78 0.02 39.00
C3540 4.84 0.02 242.00
C432 0.1 0.52 0.19
C499 0.32 0.01 32.00
C880 0.16 0.01 16.00
frg2 17.21 0.45 38.24
i8 16.26 0.16 101.63
i9 0.6 0.03 20.00
apex7 0.05 0.04 1.25
x3 1.95 0.19 10.26
apex6 0.94 0.27 3.48
term1 0.56 0.02 28.00
alu2 0.3 0.02 15.00
alu4 1.47 0.47 3.13
too_large 8.83 0.41 21.54
vda 3.42 4.37 0.78
x1 0.26 0.43 0.60
x4 0.32 0.28 1.14
Avg. 31.90

ATPG is about 30X faster for ITE cell based
circuits.
ITE based circuits are guaranteed irredundant and
100 testable in linear time!!!

30
Conclusions

We have a method that can implement circuits
quicker and with NRE amortized over a large
number of designs.
Strikes a reasonable compromise between ASICs and
FPGAs.
An ITE cell based design is easily testable.
100 testable in linear time
Guaranteed irredundant
Testability gains arise from the use of
partitioned ROBDD based PTL design approach
Same gains can be reaped in a regular PTL design
approach
Can be modified to efficiently test for other
faults
Delay faults, stuck open faults etc.