Low-Power Design Techniques in Digital Systems - PowerPoint PPT Presentation

About This Presentation
Title:

Low-Power Design Techniques in Digital Systems

Description:

Low-Power Design Techniques in Digital Systems Prof. Vojin G. Oklobdzija University of California Outline of the Talk Power trends in VLSI Scaling theory and ... – PowerPoint PPT presentation

Number of Views:382
Avg rating:3.0/5.0
Slides: 83
Provided by: acsellabC
Category:

less

Transcript and Presenter's Notes

Title: Low-Power Design Techniques in Digital Systems


1
Low-Power Design Techniques in Digital Systems
  • Prof. Vojin G. Oklobdzija
  • University of California

2
Outline of the Talk
  • Power trends in VLSI
  • Scaling theory and predictions
  • Research efforts in power reduction
  • Efficiency measures and design guidelines
  • Latches and Flip-Flops for Low-Power
  • Dual-Edge FFs
  • SOI
  • Conclusion Low-Power perspective

3
Power trends in VLSI
4
CMOS Circuits dissipate little power by nature.
So believed circuit designers (Kuroda-Sakurai,
95)
100
x4 / 3years

10
Power (W)
1
0.1
0.01
95
90
85
80
By the year 2000 power dissipation of high-end
ICs will exceed the practical limits of ceramic
packages, even if the supply voltage can be
feasibly reduced. ( Taken from Sakurais ISSCC
2001 presentation)
5
Gloom and Doom predictions
Source Shekhar Borkar, Intel
6
Source Shekhar Borkar, Intel
7
Power versus Year taken from ISSCC, uP Report,
Hot-Chips
High-end growing at 25 / year
RISC _at_ 12 / yr
X86 _at_ 15 / yr
Consumer (low-end) At 13 / year
8
VDD, Power and Current Trend
2.5
200
500
Voltage
2
Power
1.5
Voltage V
Power per chip W
Current
VDD current A
1
0.5
0
0
0
1998
2002
2006
2010
2014
Year
International Technology Roadmap for
Semiconductors 1999 update sponsored by the
Semiconductor Industry Association in cooperation
with European Electronic Component Association
(EECA) , Electronic Industries Association of
Japan (EIAJ), Korea Semiconductor Industry
Association (KSIA), and Taiwan Semiconductor
Industry Association (TSIA) ( Taken from
Sakurais ISSCC 2001 presentation)
9
Power Delivery Problem (not just California)
Your car starter !
Source Shekhar Borkar, Intel
10
Trend in L di/dt
  • di/dt is roughly proportional to
  • I f, where I is the chips current and
    f is the clock frequency
  • or I Vdd f / Vdd P f / Vdd, where P
    is the chips power.
  • The trend is
  • P f
    Vdd
  • on-chip L package L slightly decreases
  • Therefore, L di/dt fluctuation increases
    significantly.
  • ( Taken from Norman Chang, HP)

11
Saving Grace !
Energy-Delay product is improving more than 2x /
generation
12
X86 efficiency improving dramatically 4X /
generation
average improving 3X / generation
High-End processors efficiency not improving
13
Scaling theory and predictions
14
The power dissipation has increased 1000 times
over the 15 years and is exceeding 70 Watts
  • Scaling principles
  • 1. A constant field scaling theory Dennard
    assumes that device
  • voltages as well as device dimensions are
    scaled by a scaling
  • factor x (gt1), resulting in a constant
    electric field in a device
  • power density remains constant
  • circuit performance can be improved in terms
    of
  • density x2
  • speed x
  • power 1/ x2
  • power-delay product 1/ x3
  • Limitless progress in CMOS is promised with this
    scaling scenario

15
In practice neither a supply voltage nor a
threshold voltage had been scaled till 1990
leading to the theory of
  • Constant voltage scaling which assumes the
    constant voltage
  • This assumption yields
  • speed improvement by x2
  • power density increases rapidly by x3

16
The constant field is not realistic, x0.5 is
satisfactory - however even with that the power
dissipation would exceed ECL by 2001 a new
philosophy is required !
( Taken from Sakurai and Kuroda, IEICE 95 paper)
17
High-Performance View Point on Powertaken from
Ron Preston, DEC Alpha
  • Pk C V2 f
  • Shrinking to the new technology (30 reduction in
    l)
  • C decreases by 30
  • f increases by 1/0.7 43
  • Pnew0.7 (1/0.7) Pold Pold (No Change in
    Power ! )
  • New design
  • Double the No. of devices
  • Pnew2 x 0.7 (1/0.7) Pold 2 X Pold (Power
    Doubles !)
  • Scale Vdd by 30 in the new design
  • Pnew2 x 0.7 (1/0.7) (0.7)2Pold Pold (Power
    stays constant !)

18
High-Performance View Point on Powertaken from
Ron Preston, DEC Alpha
  • Reality
  • Paradigm Changes More Aggressive Circuits,
    Toggle rate increasing, Out of Order, Speculative
    Execution
  • What to Expect Power will be limited by the
    package and cooling techniques
  • Frequency will be determined by the power - as
    high as package can take !

Chip l Vdd Freq. Power
21164 05u 3.3V 300MHz 50W
21264 0.35u 2.0V 600MHz 72W
Change -30 -39 100 44
19
Research Efforts in Low-Power Design
  • Technology scaling
  • The highest win
  • Thresholds should scale
  • Leakage starts to byte
  • Dynamic voltage scaling
  • Reduce the active load
  • Minimize the circuits
  • Use more efficient design
  • Charge recycling
  • More efficient layout

Psw k CL V2cc fCLK
  • Reduce Switching Activity
  • Conditional clock
  • Conditional precharge
  • Switching-off inactive blocks
  • Conditional execution
  • Run it slower
  • Use parallelism
  • Less pipeline stages
  • Use double-edge flip-flop

20
Reducing the Power Dissipation
  • The power dissipation can be minimized by
    reducing
  • supply voltage
  • load capacitance
  • switching activity
  • Reducing the supply voltage brings a quadratic
    improvement
  • Reducing the load capacitance contributes to the
    improvement of both power dissipation and circuit
    speed.

21
Voltage Scaling
  • There are three means to maintain the throughput
  • Reduce Vth to improve circuit speed
  • Introduce parallel and pipelined architecture
    while
  • using slower device speeds
  • (assumes limitless no. of transistors, in
    reality the transistor density is
  • only increasing by 60 per year)
  • Prepare multiple supply voltages and for each
    cluster
  • of circuits choose the lowest supply voltage
    that satisfies
  • the speed.
  • (A good level converter is necessary which
    exhibits small delay and consumes
  • little power, small area)

22
(No Transcript)
23
Is there an optimal design point ?
24
Power Dissipation and Circuit Delay
-4
x 10
1
0.8
0.6
Power (W)
0.4
0.2
0
4
3
V
-0.
4
0
2
DD
(V)
0.4
1
(V)
0.8
th
( Taken from T. Sakurai)
25
Sensitivity to Vth fluctuation
V
1.0 V
DD
?
V

TH

0.15V


0.05V


0.5
( Taken from T. Sakurai)
26
Power-Delay Product, Energy-Delay Product
Lowest Voltage Highest Threshold no optimum
(from Sakurai, Kuroda, IEICE 95 paper)
  • Power-Delay Product is a misleading measure it
    will always favor a processor that operates at
    lower frequency
  • Energy-Delay is more adequate - but Energy-Delay2
    should be used

27
Power-Delay Product, Energy-Delay Product
Horowitz, Indermaur, Gonzales argue against
Power-Delay, SLPE94
28
Energy-Delay2
(courtesy of Prof. T. Sakurai)
29
Energy-Delay Product vs. Energy-Delay2
Nowka, Hofstee, Carpenter of IBM argue against
Energy-Delay as a design efficiency measure
(private communication)
30
Energy-Delay Product vs. Energy-Delay2
The same design should have relatively the same
efficiency
Optimal point (due to to Vth being fixed ?)
Nowka, Hofstee, Carpenter of IBM argue against
Energy-Delay as a design efficiency measure
(private communication)
31
Example PowerPC
 
32
(No Transcript)
33
Use of Different Circuits Families
34
Capacitance Reduction
  • The load capacitance is the sum of
  • gate capacitance
  • diffusion capacitance
  • routing capacitance
  • Using small number of transistors, or small size
    of transistors
  • contributes to the reduction in the gate
    capacitance and the
  • diffusion capacitance.
  • Pass transistor logic may have advantage because
    it
  • comprises fewer transistors and exhibits smaller
    stray
  • capacitance than conventional static CMOS logic.

35
Pass-Transistor Logic
36
Pass-Transistor Logic CVSL, CPL, SRPL, DSL,
DPL, DCVSPG
37
SAPLSense-Amplifying Pass-transistor Logic
All nodes are first discharged and then evaluated
by inputs. Outputs are 100mV above GND
38
Where does the power go ?
39
Power use is different from chip to chip
(from Sakurai, Kuroda, IEICE 95 paper)
MPU1 is a low end microprocessor MPU2 is a
high-end CPU with large cache ASSP1 is MPEG-2
decoder ASSP2 is an ATM switch
40
Design Example Strong Arm 110
Two power modes idle and sleep Power 0.5W using
1.1V internal PS 184 Drystone/MIPS _at_162MHz 1.1W
using 2V internal PS 245 Drystone/MIPS _at_
215MHz Power Breakdown I-Cache 27 D-Cache 16
I-Unit 18 Exec-Unit 8 I-MMU 9 D-MMU 8 Clock
10 Others 4 (PLL lt 1)
from D. Dobberpuhl
41
Design Example Strong Arm 110
from D. Dobberpuhl
42
Design Example Strong Arm 110
from D. Dobberpuhl
from D. Dobberpuhl
However, leakage currents starts to affect
stand-by power
43
Controlling both VDD and VTH for low power
44
Controlling VDD and VTH for low power
Low power ? Low VDD ? Low speed ? Low VTH ? High
leakage ? VDD-VTH control
Software-hardware cooperation
Technology-circuit cooperation
) MTCMOS Multi-Threshold CMOS ) VTCMOS
Variable Threshold CMOS Multiple spatial
assignment Variable temporal assignment
( from Prof. T. Sakurai)
45
( from Prof. T. Sakurai)
46
Clustered Voltage Scaling for Multiple VDDs
CVS Structure
Conventional Design
Level-Shifting F/F
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
Critical Path
Critical Path
Lower V
portion is shown as shaded
DD
Once VL is applied to a logic gate, VL is applied
to subsequent logic gates until F/Fs to
eliminate DC current paths. F/Fs restore VH.
M.Takahashi et al., A 60mW MPEG4 Video Codec
Using Clustered Voltage Scaling with Variable
Supply-Voltage Scheme, ISSCC, pp.36-37, Feb.1998.
( from Prof. T. Sakurai)
47
If you dont need to hussle,VDD should be as low
as possible
1.0
Variable Vdd
0.8
Fixed Vdd
0.6
Normalized power
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
Normalized workload
( from Prof. T. Sakurai)
48
Measured voltage waveforms
( from Prof. T. Sakurai)
49
Measured power characteristics
Total power 0.8W x 0.08 0.16W x 0.86 0.07W
x 0.06 0.2W
1
0.8
W
0.8
Time for
V
8
DDmax
0.6
ƒ

200MHz
Down
Power P W
to 1/5
0.4
ƒ

100
MHz
Time for
V
86
0.16
W
0.2
DDmin
0.07
W
Time for sleep 6
0
0
1
2
Supply voltage V
V
DD
VDD hopping can cut down power consumption to 1/4
( from Prof. T. Sakurai)
50
Simulation results
MPEG-2 video decoding
VSELP speech encoding
0.40
0.32
0.35
0.28
RPC 2 levels (f,f/2)
RPC 2 levels (f,f/2)
RPC 3 levels (f,f/2,f/3)
RPC 3 levels (f,f/2,f/3)
0.30
0.24
RPC 4 levels (f,f/2,f/3,f/4)
RPC 4 levels (f,f/2,f/3,f/4)
RPC infinite levels
RPC infinite levels
0.25
0.20
post-simulation analysis
post-simulation analysis
0.20
0.16
0.15
0.12
0.10
0.08
0.05
0.04
0.00
0.00
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
Transition Delay T
(ms)
TD
( from Prof. T. Sakurai)
51
Aggressive Voltage Scaling
Taken from Kuroda
If we can dynamically scale Vdd and Vth the
advantage is obvious
52
Example
53
TransMeta Example
Taken from Doug Lairds presentation, January 19
th 2000
54
TransMeta Example
Taken from Doug Lairds presentation, January 19
th 2000
55
TransMeta Example
Taken from Doug Lairds presentation, January 19
th 2000
  • Code Morphing is another contributor to power
    reduction
  • since it eliminates unnecessary external
    memory access

56
TransMeta Example
57
Latches and Flip-Flops for Low-Power
58
Simulation Condition and Testbench
  • Timing
  • Total FF overhead is setup clock-to-output time
  • Circuit optimization towards td-q
  • Clock skew robustness obtained from observing DQ
    curve
  • Power-Delay Product
  • Overall performance parameter at fixed frequency

59
Flip-Flop Performance Comparison
Test bench
  • Total power consumed
  • internal power
  • data power
  • clock power
  • Measured for four cases
  • no activity (0000 and 1111)
  • maximum activity (0101010..)
  • average activity (random sequence)

Delay is (minimum D-Q) Clk-Q Setup time
60
  • OLD TEST BENCH
  • Total Power Drivers Power Test Unit Power
  • PDP- Optimized Equal Trade-off on Power and
    Delay
  • Improper Load on Drivers
  • NEW TEST BENCH
  • Drivers Fixed Gain and Driving Test Unit Only
  • Data-to-Output Delay
  • PD2P Optimized Best for Constant-Field Scaling

OLD TEST BENCH
NEW TEST BENCH
61
Comparison in terms of speed and EDPtot
Technology 0.2u, Vdd2V, T20oC, measured _at_
100MHz
  • Delay below 200ps
  • SDFF 187ps
  • HLFF 199ps
  • K-6 ETL 200ps
  • 200-300ps
  • PowerPC latch 266ps
  • 21264 Alpha FF 272ps
  • Strong Arm FF 275ps
  • mC2MOS latch 292ps
  • above 500ps
  • SSTC latch 592ps
  • DSTC latch 629ps
  • SSTC latch 898ps
  • DSTC latch 1060ps
  • PDPtot _at_100MHz
  • below 30fJ
  • PowerPC latch 28fJ
  • 30 - 50fJ
  • HLFF 29fJ
  • SDFF 39fJ
  • mC2MOS latch 40fJ
  • 21264 Alpha FF 43fJ
  • Strong Arm FF 45fJ
  • 50 - 70fJ
  • K-6 ETL 70fJ
  • above 70fJ
  • SSTC latch 95fJ
  • DSTC latch 125fJ

62
Delay comparison
  • F-F design brings the fastest structures

63
Delay comparison
  • F-F design brings the fastest structures

64
Overall ranking
_at_100MHz
  • EDPtot accepted as the overall cost function
  • Proposed low-power latches from Yuan
    Svensson, compared with other presented
    structures do not show advantage, (the
    optimization was not properly done - optimization
    is yet to be repeated under different setup)

65
Overall ranking, zoomed
  • Real signals have the activity between 0 and 1.0
    (?)
  • Precharged hybrid structures are the fastest but
    their power consumption strongly depends on the
    probability of ones
  • More ones above the ? point

66
Overall performance
  • Real signals have the activity between 0 and 1.0
    (?)
  • Precharged hybrid structures are the fastest but
    their power consumption strongly depends on the
    probability of ones
  • More ones above the ? point

67
Conventional Clk-Q vs. minimum D-Q
  • Hidden positive setup time
  • Degradation of Clk-Q

68
Internal Power distribution
  • Four sequences characterize the boundaries for
    internal power consumption
  • 010101
    maximum
  • random, equal transition probability,
    average
  • 111111
    precharge activity
  • 000000
    leakage internal clock processing

69
Comparison of Clock power consumption
70
Using Dual-Edge Flip-Flop(run at ½ of the
frequencysave on the power consumed in clock
distribution tree)
71
Dual-Edge vs. Single-Edge Flip-Flops Comparison
Delay ps
Total Power ?W
  • Fujitsu 0.18u process Clock frequency 500MHz
    (250MHz for Dual Edge FFs)
  • Data activity ratio ? 0.5
  • VDD 1.8V
  • Temp 25º

72
Dual-Edge vs. Single-Edge Flip-Flops Comparison
Internal Power ?W
Clock Power ?W
Data Power ?W
  • Fujitsu 0.18u process Clock frequency 500MHz
    (250MHz for Dual Edge FFs)
  • Data activity ratio ? 0.5
  • VDD 1.8V
  • Temp 25º

73
Silicon on Insulator (SOI) Technology
74
SOI Comparison
F 1GHz, ? 0.5, Le 0.08 ?m, VDD1.3V, T 25?C
75
In conclusion.
  • What can we expect that low power will bring to
    us ?

76
Wearable Computer
77
Wearable Computer
78
Wearable Computer
79
Digital Ink
80
Implantable Computer
81
Bluetooth
82
Year 2110
Extrapolation of the trend with some saturation
Many important interesting application
Home, Entertainment, Office, Translation , Health
care
Year 2120???
More assembly technique 3D
Year 2110
Combination of bio and semiconductor
Ultra small volume
Brain
Small number of neuron cells
Sensor
Extremely low power
Infrared
Real time image processing
Humidity
(Artificial) Intelligence
Long lifetime by DNA manipulation Bio-computer
CO2
3D flight control
Mosquito
Write a Comment
User Comments (0)
About PowerShow.com