Title: ELEC 5970-003/6970-003 (Fall 2004) Advanced Topics in Electrical Engineering Designing VLSI for Low-Power and Self-Test Power Consumption in a CMOS Circuit
1ELEC 5970-003/6970-003 (Fall 2004)Advanced
Topics in Electrical EngineeringDesigning VLSI
for Low-Power and Self-TestPower Consumption in
a CMOS Circuit
- Vishwani D. Agrawal
- James J. Danaher Professor
- Department of Electrical and Computer Engineering
- Auburn University
- http//www.eng.auburn.edu/vagrawal
- vagrawal_at_eng.auburn.edu
2Motivation
- Low power applications
- Remote systems (e.g., satellite)
- Portable systems (e.g., mobile phone)
- Methods of low power design
- Reduced supply voltage
- Adiabatic switching
- Clock suppression
- Logic design for reduced activity
- Reduce Hazards (40 in arithmetic logic)
- Software techniques
- Reference Chandrakasan and Brodersen
3Low-Power Design
- Design practices that reduce power consumption at
least by one order of magnitude in practice 50
reduction is often acceptable. - General topics
- High-level and software techniques
- Gate and circuit-level methods
- Power estimation techniques
- Test power
4VLSI Chip Power Density
Source Intel?
5Specific Topics on Low-Power
- Power dissipation in CMOS circuits
- Low-power CMOS technologies
- Dynamic reduction techniques
- Leakage power
- Power estimation
6Components of Power
- Dynamic
- Signal transitions
- Logic activity
- Glitches
- Short-circuit
- Static
- Leakage
7Power of a Transition
isc
Power CLVDD2/2 Psc
R
Vo
Vi
CL
R
8Short Circuit Current, isc(t)
VDD
VDD - VTp
Vi(t)
Volt
Vo(t)
VTn
0
Iscmaxr
45µA
isc(t)
Amp
Time (ns)
tB
tE
1
0
9Peak Short Circuit Current
- Increases with the size (or gain, ß) of
transistors - Decreases with load capacitance, CL
- Largest when CL 0
- Reference M. A. Ortega and J. Figueras, Short
Circuit Power Modeling in Submicron CMOS,
PATMOS96, Aug. 1996, pp. 147-166.
10Short-Circuit Energy per Transition
- Escr?tBtE VDD isc(t)dt (tE tB) IscmaxrVDD/2
- Escr tr (VDD VTp-VTn) Iscmaxr/2
- Escf tf (VDD VTp-VTn) Iscmaxf/2
- Escf 0, when VDD VTp VTn
11Short-Circuit Energy
- Increases with rise and fall times of input
- Decreases for larger output load capacitance
- Decreases and eventually becomes zero when VDD is
scaled down but the threshold voltages are not
scaled down
12Short-Circuit Power Calculation
- Assume equal rise and fall times
- Model input-output capacitive coupling (Miller
capacitance) - Use a spice model for transistors
- T. Sakurai and A. Newton, Alpha-power Law MOSFET
model and Its Application to a CMOS Inverter,
IEEE J. Solid State Circuits, vol. 25, April
1990, pp. 584-594.
13Psc vs. C
0.7µ CMOS
45
3ns
Input rise time
Psc/Ptotal
0.5ns
0
35
75
C (fF)
14Technology Scaling
- Scale down by factors of 2 and 4, i.e., model
0.7, 0.35 and 0.17 micron technologies - Constant electric field assumed
- Capacitance scaled down by the technology scale
down factor
15Technology Scaling Results
L0.17µ, C10fF
70
L0.35µ, C20fF
Psc/Ptotal
10
L0.7µ, C40fF
0
tr (ns)
0.4
1.6
16Effects of Scaling Down
- 1-16 short-circuit power at 0.7 micron
- 4-37 at 0.35 micron
- 12-60 at 0.17 micron
- Reference S. R. Vemuru and N. Steinberg, Short
Circuit Power Dissipation Estimation for CMOS
Logic Gates, IEEE Trans. on Circuits and Systems
I, vol. 41, Nov. 1994, pp. 762-765.
17Summary Short-Circuit Power
- Short-circuit power is consumed by each
transition (increases with input transition
time). - Reduction requires that gate output transition
should not be slower than the input transition
(faster gates can consume more short-circuit
power). - Scaling down of supply voltage with respect to
threshold voltages reduces short-circuit power.
18Components of Power
- Dynamic
- Signal transitions
- Logic activity
- Glitches
- Short-circuit
- Static
- Leakage
19Leakage Power
VDD
IG
Ground
R
n
n
Isub
IPT
ID
IGIDL
20Leakage Current Components
- Subthreshold conduction, Isub
- Reverse bias pn junction conduction, ID
- Gate induced drain leakage, IGIDL due to
tunneling at the gate-drain overlap - Drain source punchthrough, IPT due to short
channel and high drain-source voltage - Gate tunneling, IG through thin oxide
21Subthreshold Current
Isub µ0 Cox (W/L) Vt2 exp(VGS-VTH)/nVt
µ0 carrier surface mobility Cox gate oxide
capacitance per unit area L channel length W
gate width Vt kT/q thermal voltage n a
technology parameter
22IDS for Short Channel Device
Isub µ0 Cox (W/L) Vt2 exp(VGS-VTH?VDS)/nVt
VDS drain to source voltage ? a
proportionality factor
23Increased Subthreshold Leakage
Scaled device
Ic
Log Isub
0
VTH
VTH
Gate voltage
24Summary Leakage Power
- Leakage power as a fraction of the total power
increases as clock frequency drops. Turning
supply off in unused parts can save power. - For a gate it is a small fraction of the total
power it can be significant for very large
circuits. - Scaling down features requires lowering the
threshold voltage, which increases leakage power
roughly doubles with each shrinking. - Multiple-threshold devices are used to reduce
leakage power.
25Components of Power
- Dynamic
- Signal transitions
- Logic activity
- Glitches
- Short-circuit
- Static
- Leakage
26Power of a Transition
isc
VDD
Power CLVDD2/2 Psc
R
Vo
Vi
CL
R
Ground
27Dynamic Power
- Each transition of a gate consumes CV2/2.
- Methods of power saving
- Minimize load capacitances
- Transistor sizing
- Library-based gate selection
- Reduce transitions
- Logic design
- Glitch reduction
28Glitch Power Reduction
- Design a digital circuit for minimum transient
energy consumption by eliminating hazards
29Theorem 1
- For correct operation with minimum energy
consumption, a Boolean gate must produce no more
than one event per transition
30Theorem 2
- Given that events occur at the input of a gate
(inertial delay d ) at times t1 lt . . . lt tn
, the number of events at the gate output cannot
exceed
tn t1 -------- d
min ( n , 1 )
tn - t1
time
t1 t2 t3 tn
31Minimum Transient Design
- Minimum transient energy condition for a Boolean
gate
ti - tj lt d
Where ti and tj are arrival times of
input events and d is the inertial delay of
gate
32Balanced Delay Method
- All input events arrive simultaneously
- Overall circuit delay not increased
- Delay buffers may have to be inserted
4?
1
1
1
1
1
3
1
1
1
1
1
33Hazard Filter Method
- Gate delay is made greater than maximum input
path delay difference - No delay buffers needed (least transient energy)
- Overall circuit delay may increase
1
2
1
1
1
1?
3?
2
1
1
1
1
34Linear Program
- Variables gate and buffer delays
- Objective minimize number of buffers
- Subject to overall circuit delay
- Subject to minimum transient condition for
multi-input gates - AMPL, MINOS 5.5 (Fourer, Gay and Kernighan)
35Variables Full Adder add1b
0
1
0
0
1
1
0
0
1
0
1
1
0
0
1
0
0
1
0
1
0
0
36Objective Function
- Ideal minimize the number of non-zero delay
buffers - Actual sum of buffer delays
37Specify Critical Path Delay
0
1
0
0
1
1
0
0
1
0
1
1
0
0
1
0
0
1
0
1
0
0
Sum of delays on critical path maxdel
38Multi-Input Gate Condition
d1
0
d
1
1
d
0
0
1
0
1
0
0
d2
d
d1 - d2 d d2 - d1 d
39AMPL Solution maxdel 6
1
2
1
1
1
1
1
2
1
2
2
40AMPL Solution maxdel 7
3
1
1
1
1
1
2
2
1
2
41AMPL Solution maxdel 11
5
1
1
1
1
3
2
3
4
42Power Estimates for add1b
Hsiao et al., ICCAD-97
43Power Calculation in Spice
V
VDD
Open at t 0
Energy, E(t)
Circuit
Large C
t
Ground
1
1
E(t) -- C VDD 2 - -- C V 2 C VDD (
VDD - V )
2
2
Ref. M. Shoji, CMOS Digital Circuit Technology,
Prentice Hall, 1988, p. 172.
44Power Dissipation of ALU4
1 micron CMOS, 57 gates, 14 PI, 8 PO 100 random
vectors simulated in Spice
7
6
5
Original ALU delay 3.5ns
4
Energy in nanojoules
3
Minimum energy ALU delay 10ns
2
1
0
0.0
0.5
1.5
2.0
1.0
microseconds
45F0 Output of ALU4
Original ALU, delay 7 units (3.5ns)
5
0
Signal Amplitude, Volts
Minimum energy ALU, delay 21 units (10ns)
5
0
0
40
120
160
80
nanoseconds
46References
- E. Jacobs and M. Berkelaar, Using Gate Sizing to
Reduce Glitch Power, Proc. ProRISC/IEEE Workshop
on Circuits, Systems and Signal Processing, Nov.
1996, pp. 183-188 also Int. Workshop on Logic
Synthesis, May 1997. - V. D. Agrawal, Low-Power Design by Hazard
Filtering, Proc. 10th Int. Conf. VLSI Design,
Jan. 1997, pp. 193-197. - V. D. Agrawal, M. L. Bushnell, G. Parthasarathy,
and R. Ramadoss, Digital Circuit Design for
Minimum Transient Energy and a Linear Programming
Method, Proc. 12th Int. Conf. VLSI Design, Jan.
1999, pp. 434-439. - Last two papers are available at website
http//www.eng.auburn.edu/vagrawal
47A Limitation
- Constraints are written by path enumeration.
- Since number of paths in a circuit can be
exponential in circuit size, the formulation is
infeasible for large circuits. - Example c880 has 6.96M constraints.
48Timing Window
- Define two timing window variables per gate
output - ti Earliest time of signal transition at gate i.
- Ti Latest time of signal transition at gate i.
t1, T1
ti, Ti
. . .
i
tn, Tn
Ref T. Raja, Masters Thesis, Rutgers Univ., 2002
49Linear Program
- Gate variables d4 . . . d12
- Buffer Variables d15 . . . d29
- Corresponding window variables t4 . . . t29 and
T4 . . . T29.
50Multiple-Input Gate Constraints
- For Gate 7
- T7 gt T5 d7 t7 lt t5 d7 d7 gt T7 - t7
- T7 gt T6 d7 t7 lt t6 d7
51Single-Input Gate Constraints
Buffer 19
52Overall Delay Constraints
- T11 lt maxdelay
- T12 lt maxdelay
53Advantage of Timing Window
- Path constraints (exponential in n)
- 2 2 2 2n paths between I/O pair
- A single variable specifies I/O delay. Total
variables, O(n). - LP constraint set is linear in the size of
circuit.
54Comparison of Constraints
Number of constraints
Number of gates in circuit
55Results 1-Bit Adder
56Estimation of Power
- Circuit is simulated by an event-driven simulator
for both optimized and un-optimized gate delays. - All transitions at a gate are counted as
Eventsgate. - Power consumed ? Eventsgate x of fanouts.
- Ref Effects of delay model on peak power
estimation of VLSI circuits, Hsiao, et al.
(ICCAD97).
57Original 1-Bit Adder
Color codes for number of transitions
58Optimized 1-Bit Adder
Color codes for number of transitions
59Results 1-Bit Adder
- Simulated over all possible vector transitions
- Average power optimized/unit delay
- 244 / 308 0.792
- Peak power optimized/unit delay
- 6 / 10 0.60
Power Savings Peak 40 Average
21
60Results 4-Bit ALU
maxdelay Buffers inserted
7 5
10 2
12 1
15 0
Power Savings Peak 33 , Average 21
61Benchmark Circuits
Maxdel. (gates) 17 34 24 48 47 94 43 86
Circuit C432 C880 C6288 c7552
No. of Buffers 95 66 62 34 294 120 366 111
Normalized Power
Average 0.72 0.62 0.68 0.68 0.40 0.36 0.38 0.3
6
Peak 0.67 0.60 0.54 0.52 0.36 0.34 0.34 0.32
62Physical Design
Gate l/w
Gate l/w
Gate l/w
Gate l/w
Gate delay modeled as a linear function of gate
size, total load capacitance, and fanout gate
sizes (Berkelaar and Jacobs, 1996). Layout
circuit with some nominal gate sizes. Enter
extracted routing delays in LP as constants and
solve for gate delays. Change gate sizes as
determined from a linear system of
equations. Iterate if routing delays change.
63Power Dissipation of ALU4
64References
- R. Fourer, D. M. Gay and B. W. Kernighan, AMPL A
Modeling Language for Mathematical Programming,
South San Francisco The Scientific Press, 1993. - M. Berkelaar and E. Jacobs, Using Gate Sizing to
Reduce Glitch Power, Proc. ProRISC Workshop,
Mierlo, The Netherlands, Nov. 1996, pp. 183-188. - V. D. Agrawal, Low Power Design by Hazard
Filtering, Proc. 10th Intl Conf. VLSI Design,
Jan. 1997, pp. 193-197. - V. D. Agrawal, M. L. Bushnell, G. Parthasarathy
and R. Ramadoss, Digital Circuit Design for
Minimum Transient Energy and Linear Programming
Method, Proc. 12th Intl Conf. VLSI Design, Jan.
1999, pp. 434-439. - M. Hsiao, E. M. Rudnick and J. H. Patel, Effects
of Delay Model in Peak Power Estimation of VLSI
Circuits, Proc. ICCAD, Nov. 1997, pp. 45-51. - T. Raja, A Reduced Constraint Set Linear Program
for Low Power Design of Digital Circuits,
Masters Thesis, Rutgers Univ., New Jersey, 2002.
65Conclusion
- Glitch-free design through LP constraint-set is
linear in the size of the circuit. - LP solution
- Eliminates glitches at all gate outputs,
- Holds I/O delay within specification, and
- Combines path-balancing and hazard-filtering to
minimize the number of delay buffers. - Linear constraint set LP produces results exactly
identical to the LP requiring exponential
constraint-set. - Results show peak power savings up to 68 and
average power savings up to 64.