Hybrid Hierarchical Timing Closure Methodology for a High Performance and Low Power DSP - PowerPoint PPT Presentation

1 / 19

About This Presentation

Title:

Hybrid Hierarchical Timing Closure Methodology for a High Performance and Low Power DSP

Description:

Hybrid Hierarchical Timing Closure Methodology for a High ... Free cell placement and routing resources can be used at top level for repeater insertions ... – PowerPoint PPT presentation

Number of Views:46

Avg rating:3.0/5.0

Slides: 20

Provided by: carl290

Category:

more less

Transcript and Presenter's Notes

Title: Hybrid Hierarchical Timing Closure Methodology for a High Performance and Low Power DSP

1
Hybrid Hierarchical Timing Closure Methodology
for a High Performance and Low Power DSP

Dr. Kaijian Shi
Professional Services, Synopsys Inc.
Graig Godwin
Texas Instruments, Inc.

2
Agenda

Challenges in timing closure of high performance
VDSM designs
Issues in subchip based hierarchical timing
closure methods
Hybrid hierarchical timing closure methodology
Principle
Pseudo-subchip introduction, optimization and
integration
Hybrid hierarchical design optimization
An application example A high performance low
power DSP
Conclusions

3
Challenges in timing closure of high performance
VDSM designs

Interconnect dominant timing
Tight timing constraints
Synthesis and layout combined design
optimization is a necessity for high performance
VDSM designs
Flat design timing closure
All placement and routing resources are visible
and usable
Better quality of optimization result
Capacity limit
Excessively long run time in the synthesis and
layout combined design optimization

4
Hierarchical design optimization

RD perspective
Under the hood partitioning or clustering
Algorithms based on cutsize, relative location,
delay, retiming etc.
Engineering perspective
Partition a design into sub-designs
Sub-design timing and layout constraints
Optimize sub-designs of different
characteristics by suitable methods and in
parallel
Sub-design integration and top level optimization

5
Subchip based hierarchical timing closure methods

Partition a chip into subchips
Derive timing and layout constraints for every
subchips
Apply the synthesis and layout combined
optimization to each subchip in parallel to
close time.
Assemble the optimized subchips at chip level.
Fix timing and design rule violations in
inter-subchip connections and global paths.

6
Issues in Subchip based hierarchical timing
closure methods

Cross subchips nets have to be routed around the
subchips
Repeaters in around subchip nets may not be able
to fix violations in critical paths
Channels between the subchips may cause
congestions
Large margins have to be applied to subchips to
cover pre-layout budget estimation errors and top
level introduced IO timing degradation
High performance designs do not have luxury for
large margins!

7
Buffer insertions in hierarchical timing closure
subchip
8
Hybrid hierarchical timing closure

Combine strengths of subchip-based hierarchical
optimization and synthesis/layout combined flat
design optimization
Partition a design into subchip, pseudo-subchip
and glue logic blocks
Individually optimize subchips and
pseudo-subchips
Abstract subchips and integrate optimized
pseudo-subchips into top level optimization to
reduce design complexity and run time
Allow cross pseudo-subchip routing and repeater
insertion in top level optimization

9
Pseudo-subchip

Design partitioning
Timing critical
Global paths cross the block
Timing closure
Individually and in parallel
Modeling
By optimized netlist and partial or all fixed
cell placement
Cell placement and routing resources are visible
at top level
Free cell placement and routing resources can be
used at top level for repeater insertions
Exclude internal cells and timing paths from top
level optimization to speed up top level timing
closure
Optionally unfix the pseudo-subchip IO paths to
allow further optimization at top level where
global path timing is available

10
Pseudo-subchip optimization

Floorplan generation
50-75 utilization considering number of cross
block nets and potential global buffer
insertions
Pin assignment Sides based, positions are not
critical
IO budget
Auto-budgeting with manual adjusting
Internal constraints
Block synthesis constraints, OR
Derived from top level using characterization
Synthesis and layout combined optimization
Extra margin to absorb timing variations
introduced by wire coupling of cross block
routing and repeater insertions at top level
optimization

11
Pseudo-subchip integration

Design integration
Read in either the optimized netlist or db file
Cell placement integration
Convert local cell names into top level cell
names
e.g. cell1 ? block1/cell1
Convert local cell placement coordinates into top
level placement coordinates

Top
block1
(20,10) ? (120,40)
(20,10)
(100,30)
12
Hybrid hierarchical top level design optimization

Subchips are modeled by IO timing and layout
abstraction
Pseudo-subchips modeling
Fix optimized cells and their placement
Turn off timing and design rule checks on
optimized pseudo-subchip internal paths
Apply synthesis and layout combined optimization
to the integrated top level design

13
Hybrid hierarchical top level design optimization

Optional incremental optimization
(if there were violations introduced in the
pseudo-subchips)
Free pesudo-subchip cells placement
Turn on timing and design rule checks on
pseudo-subchip internal paths
Apply an incremental optimization
Advantages
High capacity and efficiency due to subchip
abstraction and pre-optimized pseudo-subchips
Quality timing closure due to cross
pseudo-subchip routing and repeater insertion

14
Hybrid hierarchical timing closure flow
Floor plan, Netlist
Design Partitioning
Subchips
Pseudo-Subchips
Glue Logic Blocks
Subchip Timing Closure Flow
Pseudo-Subchip Timing Closure
Netlist, Layout, Timing and Physical models
Netlist, placement pdef, Internal path mask
Top Level Integration
Integrated top level netlist and
constraints
Pseudo-subchip adjusted cell placement
15
Hybrid hierarchical timing closure flow
Integrated top level netlist and
constraints
Pseudo-subchip adjusted cell placement floorplan
- Dont touch pseudo-subchip cell, - Dont
check pseudo-subchip internal path timing
Top Level Optimization
- Free pseudo-subchip cell, -Selective check
pseudo-subchip internal path timing
Incremental Top Level Optimization
Netlist, SDF, Load, PDEF
16
An example A high performance and low power DSP

24 subchips RAM macros, harden IP, 0-routing
impact blocks
7 pseudo-subchips critical, cross over routes
11 glue-logic blocks remaining small and
scattering logic

17
Post-layout timing results

Initial placement standard cells were placed by
Apollo
Pseudo-subchip integrated timing closed
pseudo-subchips were integrated. Glue blocks were
placed by Apollo
Hybrid hierarchical optimization Applied Hybrid
hierarchical optimization to the pseudo-subchip
integrated design (10.3h)

-0.4ns violations were caused by top/leaf clock
skews on cross chip clock gating paths. Fixed
manually.

18
Conclusions

High performance VDSM designs require synthesis
and layout combined optimization
Subchip based hierarchical optimization methods
have issues in dealing with inter-subchip and
cross-subchip connections
Hybrid hierarchical timing closure methodology
Combine strengths of subchip-based hierarchical
timing closure and synthesis/layout combined flat
design timing optimization
Pseudo-subchip partitioning, optimization and
integration
Hybrid hierarchical design optimization
A successful application

19
Early buffer planning methods

Better timing than post-route buffer insertion
Feasible region (Jason Cong, 2001)
An maximum area where buffers can be inserted
anywhere to meet timing constraints
Divide inter-subchip channels into tiles
Calculate area slack of each tile
For each net that requires buffer insertion, DO
Find a tile that has max area clack and within FR
Insert a buffer in the tile and update tile area
slack
Mitigate but not resolve the issues caused by
around subchip detour routes