Title: A Highly Testable Pass Transistor Based Structured ASIC Design Methodology
1A Highly Testable Pass Transistor Based
Structured ASIC Design Methodology
- Kanupriya Gulati
- Nikhil Jayakumar
- Sunil P. Khatri
2Motivation for Structured ASICs
Process (microns) 2.0 0.8 0.6 0.35 0.25 0.18 0.13 0.1
Single Mask Cost (K) 1.5 1.5 2.5 4.5 7.5 12 40 60
of Masks 12 12 12 16 20 26 30 34
Mask Set cost (K) 18 18 30 72 150 312 1000 2000
- A full set of lithography masks can cost between
1-3M. - Roughly 25 reduction in ASIC design starts in
past 7 years. Sematech Annual Report 2002,
A. Sangiovanni-Vincentelli The Tides of EDA,
keynote talk, DAC 2003.
3Our Solution
- Use a regular array of pass transistor logic
based if-then-else (ITE) cells with flip-flops
along the edges of the die as the underlying
circuit structure. - Stock such arrays pre-processed up until
metallization step - Or, use previously generated masks for all other
layers and use new masks for only METAL, VIA
layers. - To create an ASIC for a given design
technology-map this design to the smallest
available array. - Only METAL and VIA masks require changes.
4Advantages
- Can share masks for several layers.
- Reduces NRE.
- No need for the designer to worry about DFM
issues. - Improved yield.
- New designs can be implemented faster.
- Task of engineering change simplified design
modification requires only METAL, VIA mask
changes. - Generating test patterns for such a design is
easy. - 100 test coverage in time linear in the size of
the network - No redundant faults in the design.
5The Gap between FPGA and ASIC
FPGA
ASIC
- Low speed
- High Power
- Cost-effective for low volume products
- High Speed
- Low Power
- Cost-effective for high volume products
- Necessary for products requiring high performance
or low power.
What bridges the gap?
6Taxonomy of Regular Logic Fabrics
- As we move further away from Standard cell
(ASIC), we lose - Area
- Speed
- Power
- As we move closer to FPGAs, we gain
- Flexibility
- Lower NRE
Our Approach
- Exploring Regular Fabrics to Optimize the
Performance-Cost Trade-off L. Pillegi et.al.
7Overview
- Convert a logic netlist to a partitioned Reduced
Order Binary Decision Diagram (ROBDD). - Each ROBDD node is implemented as an ITE cell.
- Place these ITE cells in an area and delay
efficient manner on a pre-fabricated array of ITE
cells.
8ITE Cell Structure
out
out
- Used NMOS pass-gate based structure.
- Each ITE cell generates buffered output and its
complement. - Delay of NMOS pass-gate ITE cell was found to be
similar to that of CMOS pass-gate based ITE cell
with a smaller area. - Probably due to the increased diffussion
capacitance in CMOS pass-gates.
i
i
T
E
9ITE Cell Design
- MUX control signals run along the length of the
cell. - Each ITE cell has 3 variable signals and three
complemented variable signals running
horizontally in metal 3. - Appropriate placement of stacked vias at the
horizontal metal 3 wires allows the ITE cell to
be connected to any one of the 3 variables in the
corresponding row of the array. - Metal layers 1 and 2 used for most of the layout,
metal layer 3 used to route variables and their
complement.
VDD
GND
10Synthesis Partitioned ROBDD
- Synthesis of logic netlist into a partitioned
ROBDD structure done in VIS. - Primary input variables are ordered using a DFS
ordering. - Enable dynamic variable ordering before building
ROBDDs - Do bottom up construction of ROBDDs
- Let set of variables in ROBDD manager be V
(initially PIs). - If size of any ROBDD gt user-specified threshold
B - Introduce new variable v (intermediate ROBDD
variable) and continue building ROBDDs on a set
of variables V U v. - Results in a series of ROBDDs
- Size of each ROBDD bounded by B.
- Output of these ROBDDs represent either a primary
output or an intermediate ROBDD variable.
11Example
z
z
y2
y2
y1
y1
x1
x2
x3
x4
x4
x1
x3
x2
- Given multi-level logic network with primary
inputs x1,x2, x3,x4 - As bottom-up ROBDD construction proceeds, new
variables y1 and y2 are created. - Z is built in terms of y1, y2
12Placement
- First Replicate ITE cells whose outputs are
heavily loaded in order to limit fanout - Correspond to ROBDD nodes with high in-degrees.
- If in-degree of ROBDD node k, then replicate
this node times. - we use K 3
- Compute initial estimate of number of ITE cells
n in any row of the ITE array and number of
rows m of the ITE array as follows -
- where, x width of each ITE cell
- y height of each ITE cell
- N total number of ITE cells
13Placement
- Sort the N ITE cells in increasing order of their
ROBDD variable index. - Variable index is a measure of closeness of
variable to the root of ROBDD. - A variable closer to the root has smaller index
than one further from the root. - Assign ITE cells to rows of the ITE array
14Assigning ITE cells to rows
- If there are nj ITE cells with variable index vj
such that nj gt n (n number of ITE cells that
can fit in one row) - ITE cells need to span rows.
- Sort these nj cells in decreasing order of cost
C. - ci children of node c
- cj parents of node c
- Helps keep routes short.
Level 2
Level 3
Cost(b) 3 3 0
a
Level 4
b
Cost(a) 5 2 3
Level 5
Level 6
15Assigning ITE cells to rows
- If there are nj ITE cells with variable index vj
such that nj lt n - Attempt to populate corresponding row of the ITE
array with additional ITE cells with variable
index vj1 - If row is still not full, add ITE cells with
variable index vj2 as well. - Each row can hold ITE cells which depend on at
most 3 variables since the number of variables
that can be routed over any ITE cell is 3.
16Placement of ITE cells within rows
- ITE cells are arranged within rows to reduce
crossings in the induced circuit graph (after
planarization of the array of ITE cells). - Use DOT (graphviz.org) to do this.
- DOT only re-arranges cells in each ITE row in a
manner that minimizes graph crossings. - DOT is not allowed to modify the assignment of
ITE cells to rows.
17Implementing Sequential Designs
- Each row of ITE cells has a bank of 3 flip-flops.
- Outputs of the flops can drive one of the inputs
by means of a METAL and VIA mask change.
18Route
- Use WROUTE (in Cadences Silicon Ensemble for
DSM) to route the ITE cell array. - Use 4 metal layers for the route.
Example alu2
19Summary of Design Flow
- Convert netlist to partitioned ROBDD in VIS.
- Perform cell replication if required to limit
fanout. - Perform ITE cell assignment to rows.
- Re-arrange ITE cells within rows using DOT to
minimize crossings in the graph induced by the
interconnections among the ITE cells. - Use the result of DOT as the final placement and
perform routing using WROUTE (or any other
routing tool).
20Ease of Testability
- In traditional scanned standard-cell based
circuits - ATPG problem is NP complete.
- In our scanned ITE cell based approach
- In functional mode
- Partitioned ROBDD outputs are regular inputs to
other partitions. - In test mode
- Primary inputs and the outputs of each partition
are scanned in to allow independent testability
of the different partitions.
21Abstract View of Partitioned ROBDDs
z
y2
PO
x5
x9
x6
.
x3
.
.
.
Additional Scan-able nodes
x4
.
.
y2
y1
x1
x2
x3
x4
PIs
22Ease of Testability - Excitation
ROBDD of
- Path from to
- Linear time BDD operation
23Ease of Testability - Propagation
ROBDD of
- Path from to
- Again a Linear time BDD operation
- Support variables for both conditions are
Non-Overlapping !! - Circuit is guaranteed irredundant
- 100 stuck fault coverage guaranteed in time
linear in the size of the circuit.
24Experiments
- To compare with standard-cell based design, the
circuits were mapped to a library of 20 gates. - Used SIS for optimization (script.rugged) and
map. - Placement and routing done using SEDSM using
0.1um process and 4 metal layers. - Delay of standard-cell based designs
- Pre-characterized the library using SPICE (0.1um
BPTM) - Used sense package in SIS
- sense returns longest sensitizeable path (false
paths implicitly ignored)
25Experiments
- Partitioned ROBDD construction done using the
frontier method in VIS. - Tried the following different partitioning
threshold numbers (B). - 5, 10, 15, 20 and 1000.
- For each circuit, the result that yielded the
smallest number of ROBDD nodes was selected. - This partitioned ROBDD structure was then taken
through our design flow.
26Experiments
- Delay of ITE cell array
- Found by traversing longest topological path (in
terms of number of ITE cells) between any circuit
PI and PO - Delay at each ITE cell is given by
- If variable is a primary input
- D(cell) MAX D(leftchild), D(rightchild)
D(ITE block) - If variable is an internal node
- D(cell) MAX D(variable), D(leftchild),
D(rightchild) D(ITE block) - D(ITE block) found from SPICE simulations (0.1um
BPTM) - Assumed that the ITE cell drove the maximum load
allowed hence delay estimates are conservative
27Results (Combinational designs)
Ckt. Evaluation Delay Evaluation Delay Evaluation Delay Area Area Area
StdCell ITE Ovh StdCell ITE Ovh
alu2 770 500 0.65 1314.1 2560 1.95
alu4 1020 527 0.52 2500 5068.8 2.03
apex6 500 1310 2.57 2678.1 14585.6 5.45
apex7 440 1030 2.34 885.1 4608 5.21
C1908 880 2590 2.91 1827.6 8288 4.53
C3540 1250 3050 2.44 4323.1 29491.2 6.82
C432 930 3070 3.3 715.6 4640 6.48
C499 600 1070 1.78 1827.6 3974.4 2.17
C880 1210 2750 2.27 1463.1 8985.6 6.14
dalu 1110 2460 2.22 3164.1 39916.8 12.62
frg2 810 1700 2.1 2575.6 24441.6 9.49
i8 880 1560 1.77 4064.1 40320 9.92
i9 850 810 0.95 2383.2 14035.2 5.89
t481 720 600 0.83 2626.6 6080 2.31
term1 320 730 2.28 663.1 2355.2 3.55
too_large 510 1550 3.04 1105.6 10560 9.55
vda 650 600 0.92 1508.03 6080 4.03
x1 380 950 2.5 1105.6 9625.6 8.71
x3 510 1660 3.25 2756.25 16844.8 6.11
x4 440 650 1.48 1314.1 11264 8.57
Avg 2.01 6.08
- Delay penalty is 2X
- Area Penalty is 6X
- FPGAs typically have a 25X delay penalty and a
10X area penalty.
28Results (Sequential designs)
- Delay penalty is 1.6X.
- Area penalty is 3.4X.
- FPGAs typically have a 25X delay penalty and a
10X area penalty
Ckt. Evaluation Delay Evaluation Delay Evaluation Delay Area Area Area
StdCell ITE Ovh StdCell ITE Ovh
s1488 630 650 1.03 3277.6 6240 1.9
s1494 650 600 0.92 3108.1 6400 2.06
s208 270 550 2.04 105.1 1459.2 13.88
s344 390 650 1.67 715.6 2649.6 3.7
s349 410 650 1.59 742.6 2649.6 3.57
s386 290 550 1.9 885.1 2060.8 2.33
s444 380 700 1.84 1105.6 2880 2.6
s510 390 400 1.03 1105.6 3161.6 2.86
s526 330 700 2.12 1314.1 2355.2 1.79
s526n 330 700 2.12 1314.1 2457.6 1.87
s820 560 650 1.16 1827.6 3968 2.17
s832 570 650 1.14 1827.6 3968 2.17
Avg 1.55 3.41
29Speed-up of ATPG
Ckt Regular ATPG (SIS) ATPG for ITE Improve
C1908 0.78 0.02 39.00
C3540 4.84 0.02 242.00
C432 0.1 0.52 0.19
C499 0.32 0.01 32.00
C880 0.16 0.01 16.00
frg2 17.21 0.45 38.24
i8 16.26 0.16 101.63
i9 0.6 0.03 20.00
apex7 0.05 0.04 1.25
x3 1.95 0.19 10.26
apex6 0.94 0.27 3.48
term1 0.56 0.02 28.00
alu2 0.3 0.02 15.00
alu4 1.47 0.47 3.13
too_large 8.83 0.41 21.54
vda 3.42 4.37 0.78
x1 0.26 0.43 0.60
x4 0.32 0.28 1.14
Avg. 31.90
- ATPG is about 30X faster for ITE cell based
circuits. - ITE based circuits are guaranteed irredundant and
100 testable in linear time!!!
30Conclusions
- We have a method that can implement circuits
quicker and with NRE amortized over a large
number of designs. - Strikes a reasonable compromise between ASICs and
FPGAs. - An ITE cell based design is easily testable.
- 100 testable in linear time
- Guaranteed irredundant
- Testability gains arise from the use of
partitioned ROBDD based PTL design approach - Same gains can be reaped in a regular PTL design
approach - Can be modified to efficiently test for other
faults - Delay faults, stuck open faults etc.
31Questions ?