Design Methodology for Semi Custom Processor Cores - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Design Methodology for Semi Custom Processor Cores

Description:

Enforcing bit ordering in the datapath (bit stack seeding) ... bit reverse. unit (0) ... single or multi-bit. behavioral latches. buffer instances. with special names ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 16
Provided by: ibm76
Category:

less

Transcript and Presenter's Notes

Title: Design Methodology for Semi Custom Processor Cores


1
Design Methodology for Semi Custom Processor Cores
  • Victor Zyuban
  • Sameh Asaad
  • Thomas Fox
  • Anne-Marie Haen
  • Daniel Littrell
  • Jaime Moreno
  • IBM T.J.Watson Research Center, Yorktown Heights,
    NY

2
Introduction
  • We describe the methodology used in the
    implementation of a DSP core whose requirements
    didnt allow for a typical soft core or hard core
    approach
  • 500 MHz WC, 350mW _at_ 1.5V, 105 C, 0.13um foundry
    technology
  • Objectives
  • Exceed performance and power characteristics of
    designs built using standard ASIC flow - typical
    ASIC runs at 300Mhz in this technology, without
    compromising its productivity and generality
  • Enable integration of custom components
  • Enable application of power reduction techniques
    not provided by ASIC flow, such as power gating,
    reverse bias, and data retention
  • Allow optimizations across design phases
  • Quick turn-around time, reproducible results

3
Methodology Overview
ISA/uA
modify Arch/uA (change latencies,redefine
resource usage)
Define hierarchy Clock gating Latch
grouping Instantiate custom components
VHDL
re-arrange logicre-group latches
Define assertion and Synthesis directives Logic
Synthesis Optimization w/Pseudo-Latches
adjust synthesis constraints
Hiasynth/ Booledozer
Clock Splitters Insertion Scan Insertion Hierarchi
cal Verilog Porting design to Cadence
Scan clock
rewire scan
Pre-placement / pre-routing Place Route Extract
Timing/Clock Skew/Scan Order Power Analysis
PD
4
Overview of main design techniques
  • Hierarchical VHDL and synthesis pre-placement
    of components
  • Grouping of latches for clock splitters in VHDL
    pre-placement of latches and clock splitters
  • Enforcing bit ordering in the datapath (bit stack
    seeding)
  • Instantiation of decoupling buffers in VHDL
    pre-placement of decoupling buffers
  • Pre-routing clock grid and power-ground grid

5
Hierarchical Synthesis and Pre-placement of
Components methodology
  • Every unit is broken up into components (a few
    thousand gates each)
  • Components are synthesized independently
  • Layout of the unit is organized into a set of
    overlapping boxes,gates constituting components
    are assigned to appropriate boxes,leaving
    sufficient flexibility for the place and route
    tools

FU1
FU1
FU2
FU2
dust
FU5
FU4
FU3
FU4
FU5
FU3
VHDL Entry
layout
6
Hierarchical Synthesis and Pre-placement of
Components benefits
  • Different components get best power/performance/ar
    ea characteristics when synthesized with
    different directives
  • Gates inside components are sized for smaller
    area, only gates constituting dust use high-power
    books
  • Most of the wires are restricted within smaller
    areas and are therefore short
  • Most of the gates in the design use low power
    books
  • Both area and power are saved

slice 1
slice 2
slice 3
slice 0
control slice
pointerupdateunit (3)
bit reverseunit (0)
7
Latch grouping fine-grain clock gating
VHDL Entry
Post-Synthesis Processing
single or multi-bit behavioral latches
Gate1
clk1
Gate1
cclk
C
grid clock
Gate2
Gate2
cclk
clk2
C
CG-OR instances define gated latch groups
  • Control granularity down to latch group to be
    driven by same splitter.
  • Performs early (L1 and L2) gating. Similar to
    ASIC Clock-OR methodology

8
Latch Grouping without clock gating
VHDL Entry
Post-Synthesis Processing
single or multi-bit behavioral latches
clk1
grid clock
buffer instances with special names define
clock/latch groups
clk2
  • Designer controls latch grouping by inserting
    special placeholder buffers to form latch groups
  • Post-synthesis script replaces buffers with
    splitters and behavioral latches with LSSD L1/L2
    latches

9
Pre-placement of latches and clock splitters
  • Clock wires are short resulting in power
    reduction
  • Length of clock wires is under control
    resulting in small clock skew, higher frequency
    and faster convergence on timing
  • Bit-precise placement of dataflow latches
    enforces bit ordering in the datapath resulting
    in improved routability and savings in power and
    area

clock distribution
latches
clock splitters
clock, no preplacement
10
Instantiation and pre-placement of decoupling
buffersmethodology
  • Used when a long wire or non-critical block of
    logic needs to be decoupled from the critical
    path
  • Decoupling buffers are instantiated in VHDL, and
    preserved in the synthesis and post-synthesis
    steps
  • Overlapping pre-placement boxes are created in
    the layout, decoupling buffers are assigned to
    the appropriate boxes

latch
latch
latch
decoupling buffer
FU1
FU1
FU1
FU2
FU2
VHDL Entry (case 1)
layout
VHDL Entry (case 2)
11
Instantiation and preplacement of decoupling
buffersbenefits
  • The power level of the decoupling buffers is
    precisely controlled, without impacting the gates
    constituting the components (FUs)
  • Allows keeping the power level of most books
    inside the unit small, using high-power books
    only where they need to drive long wires or high
    FO
  • Decoupling high capacitance nodes from critical
    paths improves speed

decoupling buffers
40-bit latch
output wires
12
Core assembly and timing methodology overview1)
generation of abstracts - unit level
Unit layout
Chipbench
extraction of global wiring (pd file)
EinsTimer
Unit layout abstract
Unit timing abstract
13
Core assembly and timing methodology overview2)
final step core level
top schematic
Generate Physical Hierarchy
Unit layout abstract
Unit timing abstract
top floorplan
Placement (Cadence Preview, skill
scripts)Routing (CCAR)
top routed floorplan
Chipbench
extraction of global wiring
EinsTimer
14
Placed and routed eLite core
custom instructionmemory 32kB
custom vectorregister file256 x 16bit8read /
4write
VPU
DEC
AU
IU
BU
BIU
X buscontrol
CR
16-bit
40-bit
16-bit
40-bit
16-bit
40-bit
16-bit
40-bit
vector control
slice 1
slice 2
slice 3
slice 0
custom datamemory 32kB
reductunit
VMU
SD buscontrol
X buscontrol
15
Conclusion
  • Significant
  • speed improvement, compared with standard ASIC
    flow (critical path reduced from 3ns to 2ns in
    some units)
  • area reduction (gt 30) due to dominant usage of
    low-power cells
  • power reduction (in the range of 50)
  • Careful pre-placement of clock splitters and
    clock gating circuitry allows more time for
    calculating the clock gating conditions
  • Increased from 0.1 to 0.6ns for 500 MHz WC design
    with highly efficient OR-style (early) clock
    gating, allowing to clock gate 90 of eligible
    latches
  • Generic VHDL easy to maintain, port and
    simulate
  • Short time from VHDL to layout, fast turn-around
    time to close on timing, with consistent
    convergence
  • up to 3 VHDL-to-layout iterations per unit per
    day by 2 to 3 designers
Write a Comment
User Comments (0)
About PowerShow.com