Hybrid Hierarchical Timing Closure Methodology for a High Performance and Low Power DSP - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Hybrid Hierarchical Timing Closure Methodology for a High Performance and Low Power DSP

Description:

Hybrid Hierarchical Timing Closure Methodology for a High ... Free cell placement and routing resources can be used at top level for repeater insertions ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 20
Provided by: carl290
Category:

less

Transcript and Presenter's Notes

Title: Hybrid Hierarchical Timing Closure Methodology for a High Performance and Low Power DSP


1
Hybrid Hierarchical Timing Closure Methodology
for a High Performance and Low Power DSP
  • Dr. Kaijian Shi
  • Professional Services, Synopsys Inc.
  • Graig Godwin
  • Texas Instruments, Inc.

2
Agenda
  • Challenges in timing closure of high performance
    VDSM designs
  • Issues in subchip based hierarchical timing
    closure methods
  • Hybrid hierarchical timing closure methodology
  • Principle
  • Pseudo-subchip introduction, optimization and
    integration
  • Hybrid hierarchical design optimization
  • An application example A high performance low
    power DSP
  • Conclusions

3
Challenges in timing closure of high performance
VDSM designs
  • Interconnect dominant timing
  • Tight timing constraints
  • Synthesis and layout combined design
    optimization is a necessity for high performance
    VDSM designs
  • Flat design timing closure
  • All placement and routing resources are visible
    and usable
  • Better quality of optimization result
  • Capacity limit
  • Excessively long run time in the synthesis and
    layout combined design optimization

4
Hierarchical design optimization
  • RD perspective
  • Under the hood partitioning or clustering
  • Algorithms based on cutsize, relative location,
    delay, retiming etc.
  • Engineering perspective
  • Partition a design into sub-designs
  • Sub-design timing and layout constraints
  • Optimize sub-designs of different
    characteristics by suitable methods and in
    parallel
  • Sub-design integration and top level optimization

5
Subchip based hierarchical timing closure methods
  • Partition a chip into subchips
  • Derive timing and layout constraints for every
    subchips
  • Apply the synthesis and layout combined
    optimization to each subchip in parallel to
    close time.
  • Assemble the optimized subchips at chip level.
  • Fix timing and design rule violations in
    inter-subchip connections and global paths.

6
Issues in Subchip based hierarchical timing
closure methods
  • Cross subchips nets have to be routed around the
    subchips
  • Repeaters in around subchip nets may not be able
    to fix violations in critical paths
  • Channels between the subchips may cause
    congestions
  • Large margins have to be applied to subchips to
    cover pre-layout budget estimation errors and top
    level introduced IO timing degradation
  • High performance designs do not have luxury for
    large margins!

7
Buffer insertions in hierarchical timing closure
subchip
8
Hybrid hierarchical timing closure
  • Combine strengths of subchip-based hierarchical
    optimization and synthesis/layout combined flat
    design optimization
  • Partition a design into subchip, pseudo-subchip
    and glue logic blocks
  • Individually optimize subchips and
    pseudo-subchips
  • Abstract subchips and integrate optimized
    pseudo-subchips into top level optimization to
    reduce design complexity and run time
  • Allow cross pseudo-subchip routing and repeater
    insertion in top level optimization

9
Pseudo-subchip
  • Design partitioning
  • Timing critical
  • Global paths cross the block
  • Timing closure
  • Individually and in parallel
  • Modeling
  • By optimized netlist and partial or all fixed
    cell placement
  • Cell placement and routing resources are visible
    at top level
  • Free cell placement and routing resources can be
    used at top level for repeater insertions
  • Exclude internal cells and timing paths from top
    level optimization to speed up top level timing
    closure
  • Optionally unfix the pseudo-subchip IO paths to
    allow further optimization at top level where
    global path timing is available

10
Pseudo-subchip optimization
  • Floorplan generation
  • 50-75 utilization considering number of cross
    block nets and potential global buffer
    insertions
  • Pin assignment Sides based, positions are not
    critical
  • IO budget
  • Auto-budgeting with manual adjusting
  • Internal constraints
  • Block synthesis constraints, OR
  • Derived from top level using characterization
  • Synthesis and layout combined optimization
  • Extra margin to absorb timing variations
    introduced by wire coupling of cross block
    routing and repeater insertions at top level
    optimization

11
Pseudo-subchip integration
  • Design integration
  • Read in either the optimized netlist or db file
  • Cell placement integration
  • Convert local cell names into top level cell
    names
  • e.g. cell1 ? block1/cell1
  • Convert local cell placement coordinates into top
    level placement coordinates

Top
block1
(20,10) ? (120,40)
(20,10)
(100,30)
12
Hybrid hierarchical top level design optimization
  • Subchips are modeled by IO timing and layout
    abstraction
  • Pseudo-subchips modeling
  • Fix optimized cells and their placement
  • Turn off timing and design rule checks on
    optimized pseudo-subchip internal paths
  • Apply synthesis and layout combined optimization
    to the integrated top level design

13
Hybrid hierarchical top level design optimization
  • Optional incremental optimization
  • (if there were violations introduced in the
    pseudo-subchips)
  • Free pesudo-subchip cells placement
  • Turn on timing and design rule checks on
    pseudo-subchip internal paths
  • Apply an incremental optimization
  • Advantages
  • High capacity and efficiency due to subchip
    abstraction and pre-optimized pseudo-subchips
  • Quality timing closure due to cross
    pseudo-subchip routing and repeater insertion

14
Hybrid hierarchical timing closure flow
Floor plan, Netlist
Design Partitioning
Subchips
Pseudo-Subchips
Glue Logic Blocks
Subchip Timing Closure Flow
Pseudo-Subchip Timing Closure
Netlist, Layout, Timing and Physical models
Netlist, placement pdef, Internal path mask
Top Level Integration
Integrated top level netlist and
constraints
Pseudo-subchip adjusted cell placement
15
Hybrid hierarchical timing closure flow
Integrated top level netlist and
constraints
Pseudo-subchip adjusted cell placement floorplan
- Dont touch pseudo-subchip cell, - Dont
check pseudo-subchip internal path timing
Top Level Optimization
- Free pseudo-subchip cell, -Selective check
pseudo-subchip internal path timing
Incremental Top Level Optimization
Netlist, SDF, Load, PDEF
16
An example A high performance and low power DSP
  • 24 subchips RAM macros, harden IP, 0-routing
    impact blocks
  • 7 pseudo-subchips critical, cross over routes
  • 11 glue-logic blocks remaining small and
    scattering logic

17
Post-layout timing results
  • Initial placement standard cells were placed by
    Apollo
  • Pseudo-subchip integrated timing closed
    pseudo-subchips were integrated. Glue blocks were
    placed by Apollo
  • Hybrid hierarchical optimization Applied Hybrid
    hierarchical optimization to the pseudo-subchip
    integrated design (10.3h)
  • -0.4ns violations were caused by top/leaf clock
    skews on cross chip clock gating paths. Fixed
    manually.

18
Conclusions
  • High performance VDSM designs require synthesis
    and layout combined optimization
  • Subchip based hierarchical optimization methods
    have issues in dealing with inter-subchip and
    cross-subchip connections
  • Hybrid hierarchical timing closure methodology
  • Combine strengths of subchip-based hierarchical
    timing closure and synthesis/layout combined flat
    design timing optimization
  • Pseudo-subchip partitioning, optimization and
    integration
  • Hybrid hierarchical design optimization
  • A successful application

19
Early buffer planning methods
  • Better timing than post-route buffer insertion
  • Feasible region (Jason Cong, 2001)
  • An maximum area where buffers can be inserted
    anywhere to meet timing constraints
  • Divide inter-subchip channels into tiles
  • Calculate area slack of each tile
  • For each net that requires buffer insertion, DO
  • Find a tile that has max area clack and within FR
  • Insert a buffer in the tile and update tile area
    slack
  • Mitigate but not resolve the issues caused by
    around subchip detour routes
Write a Comment
User Comments (0)
About PowerShow.com