Title: Hybrid Hierarchical Timing Closure Methodology for a High Performance and Low Power DSP
1Hybrid Hierarchical Timing Closure Methodology
for a High Performance and Low Power DSP
- Dr. Kaijian Shi
- Professional Services, Synopsys Inc.
- Graig Godwin
- Texas Instruments, Inc.
2Agenda
- Challenges in timing closure of high performance
VDSM designs - Issues in subchip based hierarchical timing
closure methods - Hybrid hierarchical timing closure methodology
- Principle
- Pseudo-subchip introduction, optimization and
integration - Hybrid hierarchical design optimization
- An application example A high performance low
power DSP - Conclusions
3Challenges in timing closure of high performance
VDSM designs
- Interconnect dominant timing
- Tight timing constraints
- Synthesis and layout combined design
optimization is a necessity for high performance
VDSM designs - Flat design timing closure
- All placement and routing resources are visible
and usable - Better quality of optimization result
- Capacity limit
- Excessively long run time in the synthesis and
layout combined design optimization
4Hierarchical design optimization
- RD perspective
- Under the hood partitioning or clustering
- Algorithms based on cutsize, relative location,
delay, retiming etc. - Engineering perspective
- Partition a design into sub-designs
- Sub-design timing and layout constraints
- Optimize sub-designs of different
characteristics by suitable methods and in
parallel - Sub-design integration and top level optimization
5Subchip based hierarchical timing closure methods
- Partition a chip into subchips
- Derive timing and layout constraints for every
subchips - Apply the synthesis and layout combined
optimization to each subchip in parallel to
close time. - Assemble the optimized subchips at chip level.
- Fix timing and design rule violations in
inter-subchip connections and global paths.
6Issues in Subchip based hierarchical timing
closure methods
- Cross subchips nets have to be routed around the
subchips - Repeaters in around subchip nets may not be able
to fix violations in critical paths - Channels between the subchips may cause
congestions - Large margins have to be applied to subchips to
cover pre-layout budget estimation errors and top
level introduced IO timing degradation - High performance designs do not have luxury for
large margins!
7Buffer insertions in hierarchical timing closure
subchip
8Hybrid hierarchical timing closure
- Combine strengths of subchip-based hierarchical
optimization and synthesis/layout combined flat
design optimization - Partition a design into subchip, pseudo-subchip
and glue logic blocks - Individually optimize subchips and
pseudo-subchips - Abstract subchips and integrate optimized
pseudo-subchips into top level optimization to
reduce design complexity and run time - Allow cross pseudo-subchip routing and repeater
insertion in top level optimization
9Pseudo-subchip
- Design partitioning
- Timing critical
- Global paths cross the block
- Timing closure
- Individually and in parallel
- Modeling
- By optimized netlist and partial or all fixed
cell placement - Cell placement and routing resources are visible
at top level - Free cell placement and routing resources can be
used at top level for repeater insertions - Exclude internal cells and timing paths from top
level optimization to speed up top level timing
closure - Optionally unfix the pseudo-subchip IO paths to
allow further optimization at top level where
global path timing is available
10Pseudo-subchip optimization
- Floorplan generation
- 50-75 utilization considering number of cross
block nets and potential global buffer
insertions - Pin assignment Sides based, positions are not
critical - IO budget
- Auto-budgeting with manual adjusting
- Internal constraints
- Block synthesis constraints, OR
- Derived from top level using characterization
- Synthesis and layout combined optimization
- Extra margin to absorb timing variations
introduced by wire coupling of cross block
routing and repeater insertions at top level
optimization
11Pseudo-subchip integration
- Design integration
- Read in either the optimized netlist or db file
- Cell placement integration
- Convert local cell names into top level cell
names - e.g. cell1 ? block1/cell1
- Convert local cell placement coordinates into top
level placement coordinates
Top
block1
(20,10) ? (120,40)
(20,10)
(100,30)
12Hybrid hierarchical top level design optimization
- Subchips are modeled by IO timing and layout
abstraction - Pseudo-subchips modeling
- Fix optimized cells and their placement
- Turn off timing and design rule checks on
optimized pseudo-subchip internal paths - Apply synthesis and layout combined optimization
to the integrated top level design
13Hybrid hierarchical top level design optimization
- Optional incremental optimization
- (if there were violations introduced in the
pseudo-subchips) - Free pesudo-subchip cells placement
- Turn on timing and design rule checks on
pseudo-subchip internal paths - Apply an incremental optimization
- Advantages
- High capacity and efficiency due to subchip
abstraction and pre-optimized pseudo-subchips - Quality timing closure due to cross
pseudo-subchip routing and repeater insertion
14Hybrid hierarchical timing closure flow
Floor plan, Netlist
Design Partitioning
Subchips
Pseudo-Subchips
Glue Logic Blocks
Subchip Timing Closure Flow
Pseudo-Subchip Timing Closure
Netlist, Layout, Timing and Physical models
Netlist, placement pdef, Internal path mask
Top Level Integration
Integrated top level netlist and
constraints
Pseudo-subchip adjusted cell placement
15Hybrid hierarchical timing closure flow
Integrated top level netlist and
constraints
Pseudo-subchip adjusted cell placement floorplan
- Dont touch pseudo-subchip cell, - Dont
check pseudo-subchip internal path timing
Top Level Optimization
- Free pseudo-subchip cell, -Selective check
pseudo-subchip internal path timing
Incremental Top Level Optimization
Netlist, SDF, Load, PDEF
16An example A high performance and low power DSP
- 24 subchips RAM macros, harden IP, 0-routing
impact blocks - 7 pseudo-subchips critical, cross over routes
- 11 glue-logic blocks remaining small and
scattering logic
17Post-layout timing results
- Initial placement standard cells were placed by
Apollo - Pseudo-subchip integrated timing closed
pseudo-subchips were integrated. Glue blocks were
placed by Apollo - Hybrid hierarchical optimization Applied Hybrid
hierarchical optimization to the pseudo-subchip
integrated design (10.3h)
- -0.4ns violations were caused by top/leaf clock
skews on cross chip clock gating paths. Fixed
manually.
18Conclusions
- High performance VDSM designs require synthesis
and layout combined optimization - Subchip based hierarchical optimization methods
have issues in dealing with inter-subchip and
cross-subchip connections - Hybrid hierarchical timing closure methodology
- Combine strengths of subchip-based hierarchical
timing closure and synthesis/layout combined flat
design timing optimization - Pseudo-subchip partitioning, optimization and
integration - Hybrid hierarchical design optimization
- A successful application
19Early buffer planning methods
- Better timing than post-route buffer insertion
- Feasible region (Jason Cong, 2001)
- An maximum area where buffers can be inserted
anywhere to meet timing constraints - Divide inter-subchip channels into tiles
- Calculate area slack of each tile
- For each net that requires buffer insertion, DO
- Find a tile that has max area clack and within FR
- Insert a buffer in the tile and update tile area
slack - Mitigate but not resolve the issues caused by
around subchip detour routes