Title: CSE241 VLSI Digital Circuits Winter 2003 Lecture 03: ASIC Flow and Design Convergence
1CSE241VLSI Digital CircuitsWinter 2003Lecture
03ASIC Flow and Design Convergence
2This Class Logistics
- Overview of flow (preparation for Smith Chapters
12-17) - Read Smith Chapter 12 (Synthesis), 13.7 (Static
timing) - Lab 1 revised due date Monday January 20
- Near-term schedule
- Ben has reserved the lab (EBU I, Room 3329) for
this Friday, January 17, noon-120pm ? a running
start into synthesis - Recitation 2 tomorrow (noon-1250pm) not on
RTL design, but on datapaths and memories - Lab tomorrow (330-5pm) really Lab 1
Slide courtesy of S. P. Levitan, U. Pittsburg
3Review
- Scaling of gates vs. Scaling of wires
- What happens when you make a gate bigger?
- What happens when you make a wire taller? Wider?
- Coupling
- Inductance
- How does power/ground distribution affect
inductance? - RC delay
- Dynamic (useful) power vs. Static (useless) power
- How do these issues impact estimates and design
approaches?
Slide courtesy of S. P. Levitan, U. Pittsburg
4Outline
- Design types and cost / complexity drivers
- Basic flow
- On convergence and hierarchy
5IC Design Methodologies
- Full-Custom (high effort, leading-edge
performance, high-volume) - Semi-Custom (strong infrastructure, economical in
lower volumes) - ASIC (Application-Specific Integrated Circuit)
- COT (Customer-Owned Tooling)
- ASIC vs. COT Who pays for the scrap?
- FPGA
- System-on-a-Chip
- Larger components, often from outside of design
team - Special
- Analog (custom layout, I/Os and sense amps)
- Mixed-Signal / RF (unique to each process, no
scaling)
Slide courtesy of S. P. Levitan, U. Pittsburg
6Acceleration of Gate Length Scaling
- What are some implications?
- Slide courtesy of Numerical Technologies, Inc.
7Mask NRE Cost (1999)
8Design Technology Crises, ITRS-2001
Incremental Cost Per Transistor
Test
Manufacturing
Manufacturing
Turnaround Time
SW Design
NRE Cost
Verification
HW Design
- 2-3X more verification engineers than designers
on microprocessor teams - Software 80 of system development cost (and
Analog design hasnt scaled) - Design NRE gt 10s of M ?? manufacturing NRE 1M
- Design TAT months or years ?? manufacturing TAT
weeks - Without DFT, test cost per transistor grows
exponentially relative to mfg cost
9Silicon Complexity Challenges
- Silicon Complexity impact of process scaling,
new materials, new device/interconnect
architectures - Non-ideal scaling (leakage, power management,
circuit/device innovation, current delivery) - Coupled high-frequency devices and interconnects
(signal integrity analysis and management) - Manufacturing variability (library
characterization, analog and digital circuit
performance, error-tolerant design, layout
reusability, static performance verification
methodology/tools) - Scaling of global interconnect performance
(communication, synchronization) - Decreased reliability (SEU, gate insulator
tunneling and breakdown, joule heating and
electromigration) - Complexity of manufacturing handoff (reticle
enhancement and mask writing/inspection flow,
manufacturing NRE cost)
10System Complexity Challenges
- System Complexity exponentially increasing
transistor counts, with increased diversity
(mixed-signal SOC, ) - Reuse (hierarchical design support, heterogeneous
SOC integration, reuse of verification/test/IP) - Verification and test (specification capture,
design for verifiability, verification reuse,
system-level and software verification, AMS
self-test, noise-delay fault tests, test reuse) - Cost-driven design optimization (manufacturing
cost modeling and analysis, quality metrics,
die-package co-optimization, ) - Embedded software design (platform-based system
design methodologies, software verification/analys
is, codesign w/HW) - Reliable implementation platforms (predictable
chip implementation onto multiple fabrics,
higher-level handoff) - Design process management (team size / geog
distribution, data mgmt, collaborative design,
process improvement)
11Outline
- Design types and cost / complexity drivers
- Basic flow
- On convergence and hierarchy
12Sylvester-Keutzer Classic Picture
Sylvester-Keutzer, Computer Nov. 99
13Traditional Flow
Front End
Back End
14Block-Level Design Methodology
- Architectural optimization (timing)
- Inter-group buses, bandwidth
- Clock, SI, test validation
Design Specs
Fnl. Design
Constraints
Synthesis
Lib.CWLM
- Floorplanning and custom WLM
- Power distribution (Internal, I/O)
- I/O driver, padring design
- Board-level timing, SI
Floor-plan PG
Lib.CWLM
Placement
Physical re-synth
- Row definitions
- Placement of cells
- Congestion analysis
Clock distribution
Route, scan re-order
- Placement-based re-synthesis
- Noise minimization, isolation
- Clock distribution
Timing analysis, IPO
Fnl., pwr., SI ECO
- Full routing
- Scan stitching, re-ordering
A. Khan, Simplex/Altius
Reqmts.
ERC, DRC, LVS
- Full RC back-annotation
- Hierarchical timing, electrical and SI analysis
and IPO/ECO
Tape-out
15Generic Flow Steps
- Preparation
- Library data preparation
- Design data preparation
- Logic design
- Specification to RTL
- RTL simulation
- Hierarchical floorplanning
- Synthesis
- Formal verification
- Gate level simulation
- Static timing analysis
- Physical design
- Physical floorplanning
- Place and route
- RC extraction
- Formal verification
- Physical verification
- Release to manufacturing
- Design for test
- Engineering change order
16Library and Design Data
- Models and technology data required to execute
the design flow - Power, timing ALF, DCL, OLA, .lib, STAMP
- Layout LEF, DEF, GDSII
- Delays and path timing, parasitics SDF, GCF,
SDC, DSPF, RSPF, SPEF, SPICE - Layout rules Dracula, Calibre deck
17Specification to RTL
- Defines the logic and fundamental structure of
the chip at the RTL level in either the verilog
or VHDL language - Requires considerable interaction with the
customer, plus specs such as the architecture,
system, design, test and block specs - May include RTL from the customer or third party
IP providers - Coding guidelines should be established and
adhered to, and the code must be compatible with
the chosen synthesis tool - Special design considerations such as multiple
clock frequencies, asynchronous logic, high speed
logic, race conditions, gated clocks, etc. must
be addressed
18RTL Simulation
- RTL code, written in Verilog, VHDL or a
combination of both, is simulated to verify
functional correctness - Testbenches apply input stimulus to the design
- Several methods are used to verify the outputs
- Self-checking testbenches automatically verify
output correctness and report mismatches - Results can be stored in a file and compared to
previous results - Waveform displays can be used to interactively
verify the outputs - Verification-specific tools Verisity Specman,
Synopsys Vera - Functional verification
- Mostly Modelsim
- Cadences Verilog-XL or NC-Verilog also used
19Hierarchical Floorplanning
- Decide on the physical layout strategyflat or
hierarchical? - Advantages of a flat implementation are generally
a smaller die size, and a more straightforward
approach to clock and power distribution and RC
generation - Advantages of a hierarchical design
- better runtimes,
- better ability to control timing within localized
areas of the design, and concurrent design - For hierarchical design, issues
- physical partitioning of the logic into blocks
- assignment of the physical locations for the
block pins - timing budgeting,
- distribution of clocks, power
- signal bus routing
- RC generation
- Tool Example Cadences design planner
20Floorplanning
- Give placement initial clues
- Cells that are interconnected want to be close
together - Take advantage of RTL hierarchy
- Generate a physical hierarchy
- RTL hierarchy best physical hierarchy?
- Place big blocks on chip (memories)
- Allow space for power/clk/busses
- Reduce complexity of placement
21Synthesis
- Conversion of RTL to gate level netlist
- Target foundry specific library
- Timing driven methodology
- clock information
- input arrival times, output required times
- Input driving cells, output loading
- False paths, multi-cycle paths
- Interconnect delay is calculated based on a
wireload model which uses fanout to calculate
delay - Clocks parameters (insertion delay, skew, jitter,
etc.) Are assumed to be attainable later in place
and route
22Synthesis contd.
- Hierarchical synthesis
- Block-by-block basis
- Minimizes runtimes
- Functional blocks
- Tools
- Cadence Buildgates
- Synopsys Design Compiler (used for this course)
23Formal Verification
- RTL description and gate level netlist are
compared to verify functional equivalence,
thereby verifying the synthesis results - An emerging technology that supplements the more
traditional approach of gate level simulation - Tools
- Verplex Tuxedo-lec
- Design Verifier (Chrysalis), Mentor FormalPro
- Synopsys Formality (will be used in-class)
24Gate Level Simulation
- Another method to verify the synthesis process,
which covers both the functionality and timing - Correctness is only as good as the test vectors
that are used - Especially critical for non-synchronous designs,
verification of false path and multi-cycle path
constraints - Cell timing is included in the simulation models
and interconnect delay is passed from the
synthesis run - Worst case PVT conditions are used to analyze for
setup violations, and best case PVT conditions
are used to analyze for hold violations - PVT Process, Voltage, Temperature
- Popular tools are Cadences Verilog-XL or
NC-Verilog
25Static Timing Analysis
- Verifies that design operates at desired
frequency - Implicitly assumes correct timing constraints
(!), e.g., boundary conditions - Timing constraints are similar to those used in
synthesis - Verifies setup and hold times at FF inputs can
also check timing from and to PIs and POs can
also check point-to-point delay values (with
blocking of pins, etc.) - As with gate-level simulation, both best- and
worst-case analysis is performed - Typically performed on full-chip (not block)
basis - May require modified constraints for inter-block
issues multiple clock domains, multi-cycle
paths, etc. - For compatibility with timing-driven layout flow,
helps to have simple / single set of constraints - Other issues incremental analysis,
26Physical Floorplanning
- Defines the basic chip layout architecture
- Define the standard cell rows and I/O placement
locations - Place rams and other macro cells
- Define power bus structures such as power rings
and stripes - Often performed using the standard place and
route tool - Rules of thumb for cell density are used to
initially calculate design size - Popular standalone tools are Cadences design
planner and avantis planet
27Place and Route
- Automatically place the standard cells
- Generate clock trees
- Add any remaining power bus connections
- Route clock lines
- Route signal interconnects
- Design rule checks on the routes and cell
placements - Timing driven tools
- Require timing constraints and analysis
algorithms similar to those used during the
static timing analysis step - Tools
- Cadence Silicon Ensemble, Synopsys Apollo, Magma
Blast Fusion
28RC Extraction
- Calculates the resistance and capacitance of
interconnects - Based on placement of cells
- Routing segments
- Calculates capacitive effects of adjacent
segments - Extracts capacitance between metal segments
- RC data is transferred to
- Static timing analysis (back annotation)
- Gate level simulation
- Replaces wire load model used in synthesis
- Tools used
- Cadence Hyperextract , Magmas Blast Fusion
- Sequence Columbus, Synopsys Star-RC, Mentor
X-Calibre
29Signal Integrity
- SI
- Crosstalk issues
- Inductance
- Interference
- Need new tools
- Calculate and estimate SI
- New delay models with SI estimates
- SI aware routing
30Formal Verification
- Compares golden netlist to current netlist
- Logic equivalence
- Comparison of pre- and post-layout netlist
- Similar to the formal verification step after
synthesis clock tree insertions, drive strength
changes, etc. have been made - Buffer insertion or logic optimization may have
been performed
31Physical Verification
- DRC Design Rule Check
- Polygon/Layer spacing rules
- Verifies the design rules (DRC)
- LVS Layout Versus Schematic
- Verifies that layout and netlist are equivalent
at the transistor level - Antenna
- Manufacturing check for long nets
- Net can accumulate charge during plasma etch and
damage gate oxide - GDSII
- Final merge of layout, routing and placement data
for mask production - Example tools
- Mentor Graphics Calibre (DRC, LVS)
- Cadence Dracula, Diva
32Release to Manufacturing
- Final edits to the layout are made
- Metal fill and metal stress relief rules are
checked - Manufacturing information such as scribe lanes,
seal rings, mask shop data, part numbers, logos
and pin 1 identification information for assembly
are also added - DRC and LVS are run to verify the correctness of
the modified database - Tapeout documentation is prepared prior to
release of the GDSII to the foundry - Pad location information is prepared, typically
in a spreadsheet - Cadences Virtuoso is used for custom-manual
edits of the mask layers - Manufacturing steps
- generation of masks
- silicon processing
- wafer testing
- assembly and packaging
- manufacturing test
33Outline
- Design types and cost / complexity drivers
- Basic flow
- On convergence and hierarchy
34 35Aristo, DAC-2000
TYPICAL DESIGN FLOW
Design Constraints
IP Blocks
Library
Design Netlist
Gate-Level Verilog
Concurrent Block Partitioning, Clustering
Placement
Early Planning
Gate-Level Optimization
Design Refinement
Gate-Level Place Route
Top-Level Routing
Chip Assembly
RC Extraction
Timing Analysis
PREDICTABLE HIERARCHICAL DESIGN CONVERGENCE
36Monterey, DAC-2000
Design Signoff
Physical Prototyping
GDSII
37Design Closure
- Input
- RT-level HDL technology constraints
- Output
- go recipe for invocation and composition of
SPR results - no go diagnosis of RTL code problems
- Logical and physical hierarchies co-evolve
- spatial top-down coarse placement ? physical
hierarchy - logic/timing implementable RTL ? logical
hierarchy - limits of human fanout, organizations ? always
have hierarchy - Have seen a natural sequence of no-floorplanning,
physical-floorplanning, RTL-floorplanning... as
chip complexities increase - Details (must construct, predict, ignore,
eliminate, ...) - pin optimizations, interconnect planning,
hierarchy reconciliations, budgeting mechanisms,
compatibility with downstream SPR, ...
38Logical and Physical Hierarchies
- Two hierarchies logical/functional, and
physical - (schematic hierarchy also typical in
structured-custom) - RTL design logical/functional hierarchy
- provides valuable clues for physical embedding
datapath structure, timing structure, etc. - can be incredibly misleading (e.g., all clock
buffers in a single hierarchy block) - Main issues
- how to leverage logical/functional hierarchy
during embedding - when to deviate from designers hierarchy
- methodology for hierarchy reconciliation
(buffers, repartitioning / reclustering, etc.)
39Functional Partitioning
- Subblocks in A connected with subblocks in B
result in - 600 top level nets.
Source ReShape
40Physical Partitioning
Physical partitioning reduced the number of top
level nets from 600 to 0
Source ReShape
41Unconstrained Placement
42Floorplanned Placement
43Thermal Map of Routing Congestion
44Natural Block Shapes
- Are not disjoint rectangles, e.g., intersecting
timing paths all want to be embedded as straight
paths - Traditional chip floorplan dissection into
rectangles may not be optimum for wirelength and
timing, but has compensating advantages
(convenience)
Blk A
Blk B
1.0
0.5,0.5
1.0
45Physical Hierarchy
- Physical hierarchy hierarchical, very
structured organization of the core layout region - Potentially, little relation to high-quality
(e.g., w.r.t. timing, routability) embedding of
logic - Some obvious exceptions
- regular structures (memories, PLAs, datapaths)
- hard IP blocks
- And, physical hierarchy helps to define and plan
global interconnects - Recent trend try to avoid artifactual physical
hierarchy created by top-down recursive
bipartitioning-based placement approach
46Convergence and Predictability
- We seek a predictable, estimatable back end
(physical implementation after some handoff level
of design) - Predictability regression models? (e.g.,
wireload models) - Predictability an enforceable assumption?
(correct by construction) - constant-delay paradigm (logical effort, DEC,
IBM, Magma, ...) - Predictability fast constructive prediction?
(also correct by construction) - RT-level (Tera Systems), gate-level flat
full-chip (Silicon Perspective Corp.
FirstEncounter) - Predictability remove the need for
predictability? - GALS, LIS (global-asynchronous/local-synchronous
latency-independent synchronization) - protocol- / communication-based system-level
design - Or, just make the loops tighter and easier
(construct by correction)
47Planning Technology
- RTL partitioning
- understand interaction b/w block definition and
placement quality - recognize and cure a physically challenged logic
hierarchy - Global interconnect planning and optimization
- symbolic route representations to support block
plan ECOs - Controllable SPR back end (including
power/clock/scan) - Incremental / ECO optimizations, and
optimizations that are robust under partial or
imperfect design knowledge - Estimators (initial wireload models)
- to account for resource, topological
heterogeneity - to account for optimizations (placement,
ripup/reroute, timing) - ? earliest RTL signoff with detailed PR
knowledge
48Extra Slides
49Sequence, DAC-2000
3D Extraction
Prepare
Database
Timing Sign-off
Delay
True-3D
Calculation
Parasitics
Place
Timing
Timing
Sequence
RTL
Synthesis
Analysis
Analysis
Route
Interconnect
Interconnect
Driven
Driven
Optimization
Optimization
Driver sizing,topology-based optimization
50Cadence, DAC-2000
RTL, chip constraints
Partitioning Log/Phys Mapping
Block Area/Performance Estimation
Block Placement
Inter-block Routing and Buffering
Communication Logic Synthesis
Concurrent Placement, Synthesis And Route of
Cells in Blocks
Finalize Route/Extract/Back Ann.
51Magma, DAC-2000 fixed timing
0.6ns
0.6ns
0.6ns
0.6ns
FF
- Actively managing wire delay
- Through automatic sizing (sizing-driven
placement) - Through buffer insertion
52Interconnect Complexities
- Interconnect effects play a major role in the
increasing costs for large hard-block or
rectilinear-outline based design styles - Probabilistic wireload models fail
- Without new capabilities for soft IP design and
assembly, interconnect problems will
significantly impact performance and cost for
emerging IC technologies
Local wires
blocks
Occurrence Rate (Normalized)
global wires
Global wires
Courtesy Pileggi, MARCO GSRC
0.5
53Technology Scaling
- Block sizes cannot grow as rapidly as chip sizes
since block design becomes increasingly more
difficult --- each block is a chip design over
multiple configurations - If the blocks are inflexible, the global wiring
problems begin to dominate all aspects of
performance quality and system cost
Occurrence Rate (Normalized)
Courtesy Pileggi, MARCO GSRC
Larger chip with finer feature sizes
0.5
54Soft Blocks
- With soft, flexible blocks, the system assembly
can more thoroughly exploit the available
technology - Interconnect problem is controlled via soft
boundaries for area re-shaping re-synthesis and
re-mapping for timing smart wires and top-down
specified block synthesis - Cf. Amoeba placement, coloring analysis of
good placements with respect to original logic
hierarchy, etc.
Occurrence Rate (Normalized)
Courtesy Pileggi, MARCO GSRC
Superior timing, power and cost
0.5
55Taxonomy of Planning / Implementation
- Centered on logic design (logic synthesis
drives) - wire-planning methodology with block/cell global
placement - global routing directives passed forward to chip
finishing - constant-delay methodology may be used to guide
sizing - Synopsys, (Magma)
- Centered on physical design (layout synthesis
drives) - placement-driven or placement-knowledgeable logic
synthesis - Cadence, Avant!
- Buffer between logic and layout synthesis (thin
layer) - placement, timing, sizing optimization tools
- Sequence
- Centered on SOC, chip-level planning
- interface synthesis between blocks
- communications protocol, protocol implementation
decisions guide logic and physical implementation