Design Productivity Crisis - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

Design Productivity Crisis

Description:

Design Productivity Crisis – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 52
Provided by: bwrcEecs
Category:

less

Transcript and Presenter's Notes

Title: Design Productivity Crisis


1
Design Techniques
Borivoje Nikolicbora_at_eecs.berkeley.edu
BWRC Winter Research Retreat
January 13, 2003
2
Outline
  • Power-constrained design
  • Students D. Markovic, V. Stojanovic (Stanford)
  • B. Brodersen, M. Horowitz
  • Update on various designs
  • Dual-supply ALU (Y. Shimazaki, R. Zlatanovici)
  • Datapaths for maskless lithography (B. Wild, B.
    Warlick)
  • Iterative decoding (E. Yeo, E. Liao)
  • Background-calibrated ADC (Y. Chiu, B. Tsang)
  • Transistor modeling (J. Garrett)
  • Robust design (R. Zlatanovici, S. Vamvakos)

3
Power limited operation
Energy/op
Unoptimized design
Emax
Emin
Dmax
Delay
Dmin
Achieve the highest performance under the power
cap
4
Power limited operation
Energy/op
Unoptimized design
Var1
Emax
Design optimization curves
Emin
Dmin
Dmax
Delay
Achieve the highest performance under the power
cap
5
Power limited operation
Energy/op
Unoptimized design
Var1
Emax
Design optimization curves
Var2
Emin
Dmin
Dmax
Delay
Achieve the highest performance under the power
cap
6
Power limited operation
Energy/op
Unoptimized design
Var1
Emax
Design optimization curves
Var2
Var1 Var2
Emin
Dmin
Dmax
Delay
How far away are we from the optimal solution?
7
Power limited operation
Energy/op
Unoptimized design
Var1
Emax
Design optimization curves
Var2
Var1 Var2
Global
Emin
Dmin
Dmax
Delay
Global optimum best performance
8
Power limited operation
Energy/op
Unoptimized design
Emax
Emin
Dmin
Dmax
Delay
Maximize throughput for given energy
or Minimize energy for given throughput
9
Design optimization
  • There are many sets of parameters to adjust
  • Tuning variables
  • Circuit(sizing, supply, threshold)
  • Logic style(domino, pass-gate, )
  • Block topology (adder CLA, CSA, )
  • Micro-architecture (parallel, pipelined)

10
Design optimization
  • There are many sets of parameters to adjust
  • Tuning variables
  • Circuit(sizing, supply, threshold)
  • Logic style(domino, pass-gate, )
  • Block topology (adder CLA, CSA, )
  • Micro-architecture (parallel, pipelined)
  • Globally optimal boundary curve pieces of E-D
    curves for different topologies

11
Energy-delay sensitivity
  • ?E Sens(A)(-?D) Sens(B)?D

At the optimal point, all sensitivities should
be the same
12
Alpha-power based delay model
  • Fitting parameters
  • Von, ?d, Kd

13
Alpha-power based delay model
heff
  • Fitting parameters
  • Von, ?d, Kd
  • Effective fanout, heff

14
Energy model
  • Switching energy
  • Leakage energy

15
Sensitivity to sizing and supply
  • Gate sizing (Wi)

? for equal heff (Dmin)
  • Supply voltage (Vdd)

xv (Von?Vth)/Vdd
16
Sensitivity to Vth
  • Threshold voltage (Vth)

Low initial leakage ? speedup comes for free
17
Optimization setup
  • Reference/nominal circuit
  • sized for Dmin _at_ Vddnom, Vthnom
  • known average activity
  • Set delay constraint
  • Minimize energy under delay constraint
  • gate sizing
  • Vdd , Vth scaling

18
Circuit Examples
  • Inverter chain
  • No off-path load or reconvergence
  • Memory decoder
  • Off-path load without reconvergence
  • Adder
  • Off-path load with reconvergence

19
SRAM Decoder Energy Profile
Internal energy peak
100
80
60
Energy (norm)
40
20
0
m4
m2
m1
m8
20
W vs. Vdd for Reducing Energy Peak
reference design
optimized design
  • Vdd less effective than W optimization
  • Buffering also reduces energy peak

also B. Amrutur, M. Horowitz, JSSC 10/01
21
Kogge-Stone Tree Adder Topology
  • Off-path load (gates wires)
  • Reconvergence (inside ?-block)

22
Tree Adder Optimization Results
  • Reference all paths are critical

sizing E (-54) dinc10
nominal DDmin
2Vdd E (-27) dinc10
  • Internal energy ? W more effective than Vdd
  • W E(-54), 2Vdd E(-27) at dinc10

23
Joint optimization sizing and Vdd
Nominal design
Energy/op
Delay
?E Sens(Vdd)(-?D) Sens(W)?D
24
Results of joint optimization
Sensitivity table
80 of energy saved without delay penalty
25
Reducing the number of dimensions
Threshold and sizing nearly optimal around the
nominal point
26
Scope of circuit optimization
Effective region /-30 around nominal delay
27
Power- Limited Design Conclusions
  • All design levels need to be optimized jointly
  • Equal marginal costs ? Energy-efficient design
  • Peak performance is VERY power inefficient
  • Todays designs are not leaky enough to be truly
    power-optimal
  • Pipelining starts to gain advantage over
    parallelism

28
Power- Limited Design Directions
  • Expand the analysis across the pipeline stages
  • Better compact models (preferably convex)
  • Exploring E-D bounds for various functions
  • Design qualification in E-D space
  • Design optimization, joint with Kurt Keutzer,s
    Jan Rabaeys groups
  • Robust optimization
  • Robustness through adjustments (supply, body
    bias)

29
Compact Delay Models
  • Unified transistor model
  • Maps into a convex delay model
  • J. Garrett, R. Zlatanovici

30
Adders in E-D Space
2500
Radix 2 Kogge - Stone
Radix 4 Kogge - Stone
2000
Radix 4 2-Sparse
Radix 4 4-Sparse
Ripple Carry Adder
1500
Total Transistor Width per Bitslice unit widths
1000
CLA
500
RCA
0
0
50
100
150
Delay FO4
31
Robust Optimization
  • Example robust linear programming (LP)
  • LP with uncertainty on some parameters
  • Uncertain parameters lie in given ellipsoids Ei
  • Worst case require the constraints to be
    satisfied for all values of the parameters
  • Can be formulated as a convex second order cone
    program

32
Optimization with Random Parameters
  • Suppose parameters are random variables with
    known distribution
  • Require that each constraint holds with a
    probability of at least ?
  • Can be formulated as a second order cone program
  • R. Zlatanovici

33
Dual-Supply ALU Design
Y. ShimazakiR. Zlatanovici
Power
Maxpower
Frequency scaling
Powerlimit
Optimized single-supply design
Dual-supply design
Min delay
Delay
Optimal design achieves the smallest delay under
power constraints.
34
Dual-Supply Designs
  • Dual-Supply-Voltage Technique expands the
    power-delay optimization space.
  • Layout complications and level conversion make it
    impractical for high-speed datapaths in
    conventional implementation.
  • A shared N-well technique is explored on an ALU
    ALU is a performance critical path with highest
    power density.
  • To be presented at ISSCC03

35
Shared-Well Dual-Supply-Voltage
Conventional
Shared N-well
VDDH
VDDH
VDDL
VDDL
i1
o1
i1
o1
i2
o2
i2
o2
VSS
VSS
VDDH circuit
VDDL circuit
VDDH circuit
VDDL circuit
36
Conventional Dual-Supply Layout
VDDL Row
N-well isolation
VDDH
VDDL
VDDH Row
VDDL Row
VDDH Row
(a) Dedicated row
VSS
VDDH Region
VDDL Region
VDDH circuit
VDDL circuit
(b) Dedicated region
37
Shared-Well Dual-Supply Layout
VDDL circuit
Shared N-well
VDDH circuit
VDDH
VDDL
VSS
VDDH circuit
VDDL circuit
(a) Floor plan image
38
FO4-INV Delay and Leakage
25o C, VDDH1.8V
3.0
10
2.5
1
2.0
Normalized FO4-INV delay
Normalized PMOS Ioff
1.5
0.1
1.0
0.01
shared N-well
0.5
conventional
0.0
0.001
1.0
1.2
1.4
1.6
1.8
2.0
VDDL V
39
ALU Block Diagram
clock gen.
clk
ain
carry
sum
ain0
carry gen.
51 MUX
sum sel.
91 MUX
INV1
gp gen.
INV2
s0/s1
21 MUX
91 MUX
partial sum
0.5pF
bin
logical unit
VDDH circuit
VDDL circuit
sumb (long loop-back bus)
40
Sparse Radix-4 Carry Tree
cin
bit0
bit63
G/P
G4/P4
G16/P16
G64
SUMSEL
c63
s0
s63
  • There are fewer logic gates, because only every
    fourth carry signal is calculated.
  • Partial sum gates can be placed in empty slots.

41
Adders Performance Tradeoffs
  • Many choices available to the designer
  • Tradeoff curves generated using optimization
  • Radix-4 4-sparse is best in this case

42
Low Swing Bus Level Converter
VDDH
VDDL
keeper
pc
ain0
sumb
sum
VSS
INV1
INV2
domino level converter (91 MUX)
43
Test Chip Micrograph
2.0mm
760mm
1.5mm
  • Technology summary
  • 0.18mm general-purpose Bulk,
  • 5 Metal Layers (Al), Local interconnect

200mm
44
Measured Results Energy Delay
Room temp.
1000
VDDHVDDL
900
VDDH2.0
800
VDDH1.8
VDDL decreases.
700
Energy pJ
VDDH1.6
600
500
400
300
200
0.6
0.8
1.0
1.2
1.4
1.6
TCYCLE ns
45
Dual-Supply ALU Summary
  • Shared-Well Dual-Supply-Voltage Technique
  • Appropriate for datapath design
  • 30 less area
  • Low Power ALU Design Techniques
  • Sparse radix-4 carry tree
  • Low swing bus and domino level converter
  • Test Chip Measurement
  • 1.16GHz 64bit ALU in GP 0.18mm Bulk
  • 33.3 energy saving with 8.3 delay increase
  • 42 leakage current reduction

46
Maskless Lithography Datapaths
Maskless writing using micromirrors.
47
Maskless Lithography Datapaths
SRAM Writer-Interface
Literal / Offset
FIFO
Huffman Decoder
Lempel-Ziv Decoder
DecompressedData
CRC Check
CompressedData
FIFO
8
Length
RD/WR
Table select
Address
Lookup Tables
Control
Synch.
10
Decompressor row block diagram.
48
Maskless Lithography Datapaths
Performance of test chip vs. full-scale chip
49
Maskless Lithography Datapaths
Huffman Decoder Lookup Memory
SRAM Writer Interface
Huffman Decoder Lempel-Ziv Decoder
FIFO Array
Single Decompression Path
Last SSHAFT 0.18?m designFully functional first
time
B. Warlick is working on next generation
50
Iterative Decoders
  • Turbo codes comprising convolutional codes
    concatenated through interleaver
  • LDPC codes based on finite field geometries
  • Cyclic connectivity between nodes
  • LDPC codes based on Ramanujan graphs
  • Hierarchical connectivity with regular local
    interconnect





p
Convolutional Encoder 2
Convolutional Encoder 1
E. Yeo, E. Liao
51
Background Calibrated ADC
  • Pipeline ADC calibrated by a ??.
  • High speed, 10-bit.

Yun Chiu, Bill TsangProf. Gray
Write a Comment
User Comments (0)
About PowerShow.com