Title: ELEC 5970-001/6970-001(Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits Low-Power Logic Design and Parallelism
1ELEC 5970-001/6970-001(Fall 2005)Special Topics
in Electrical EngineeringLow-Power Design of
Electronic CircuitsLow-Power Logic Designand
Parallelism
- Vishwani D. Agrawal
- James J. Danaher Professor
- Department of Electrical and Computer Engineering
- Auburn University
- http//www.eng.auburn.edu/vagrawal
- vagrawal_at_eng.auburn.edu
2State Encoding
- Two-bit binary counter
- State sequence, 00?01?10?11?00
- Six bit transitions in four clock cycles
- 6/4 1.5 transitions per clock
- Two-bit Gray-code counter
- State sequence, 00?01?11?10?00
- Four bit transitions in four clock cycles
- 4/4 1.0 transition per clock
- Gray-code counter is more power efficient.
G. K. Yeap, Practical Low Power Digital VLSI
Design, Boston Kluwer Academic Publishers (now
Springer), 1998.
3Three-Bit Counters
Binary Binary Gray-code Gray-code
State No. of toggles State No. of toggles
000 - 000 -
001 1 001 1
010 2 011 1
011 1 010 1
100 3 110 1
101 1 111 1
110 2 101 1
111 1 100 1
000 3 000 1
4N-Bit Counter Toggles in Counting Cycle
- Binary counter T(binary) 2(2N 1)
- Gray-code counter T(gray) 2N
- T(gray)/T(binary) 2N-1/(2N 1) ? 0.5
Bits T(binary) T(gray) T(gray)/T(binary)
1 2 2 1.0
2 6 4 0.6667
3 14 8 0.5714
4 30 16 0.5333
5 62 32 0.5161
6 126 64 0.5079
8 - - 0.5000
5Bus Encoding
- Example Four bit bus
- 0000?1110 has three transitions.
- If bits of second pattern are inverted, then
0000?0001 will have only one transition. - Bit-inversion encoding for N-bit bus
N N/2 0
Number of bit transitions after inversion
encoding
0 N/2 N
Number of bit transitions
6Bus-Inversion Encoding Logic
Sent data
Received data
Bus register
Polarity decision logic
M. Stan and W. Burleson, Bus-Invert Coding for
Low Power I/O, IEEE Trans. VLSI Systems, vol. 3,
no. 1, pp. 49-58, March 1995.
Polarity bit
7FSM State Encoding
Transition probability based on PI statistics
0.6
0.6
11
01
0.3
0.3
0.1
0.1
0.4
0.4
01
00
11
00
0.1
0.1
0.9
0.9
0.6
0.6
Expected number of state-bit transitions
2(0.30.4) 1(0.10.1) 1.6 1(0.30.40.1)
2(0.1) 1.0
State encoding can be selected using a
power-based cost function.
8FSM Clock-Gating
- Moore machine Outputs depend only on the state
variables. - If a state has a self-loop in the state
transition graph (STG), then clock can be stopped
whenever a self-loop is to be executed.
Xi/Zk
Si
Sk
Xk/Zk
Clock can be stopped when (Xk, Sk) combination
occurs.
Sj
Xj/Zk
9Clock-Gating in Moore FSM
Combinational logic
PI
PO
Flip-flops
Clock activation logic
Latch
L. Benini and G. De Micheli, Dynamic Power
Management, Boston Springer, 1998.
CK
10Clock-Gating in Low-Power Flip-Flop
D
D Q
CK
11Low-Power Datapath Architecture
- Lower supply voltage
- This slows down circuit speed
- Use parallel computing to gain the speed back
- Works well when threshold voltage is also
lowered. - About 60 reduction in power obtainable.
- Reference A. P. Chandrakasan and R. W.
Brodersen, Low Power Digital CMOS Design, Boston
Kluwer Academic Publishers (Now Springer), 1995.
12A Reference Datapath
Combinational logic
Output
Register
Input
Register
Cref
CK
Supply voltage Vref Total capacitance
switched per cycle Cref Clock frequency
f Power consumption Pref CrefVref2f
13A Parallel Architecture
Supply voltage VN V1 Vref N Deg. of
parallelism
A copy processes every Nth input, operates at
reduced voltage
Comb. Logic Copy 1
f/N
Comb. Logic Copy 2
Output
Input
N to 1 multiplexer
f/N
f
Comb. Logic Copy N
Multiphase Clock gen. and mux control
f/N
CK
14Control Signals, N 4
CK Phase 1 Phase 2 Phase 3 Phase 4
15Power
PN Pproc Poverhead Pproc N(CinregCcomb)V
N2f/N CoutregVN2f (CinregCcombCoutreg)VN2f
CrefVN2f Poverhead CoverheadVN2f
dCref(N 1)VN2f PN 1 d(N
1)CrefVN2f PN VN2 -- 1 d(N
1) --- P1 Vref2
16Voltage vs. Speed
CLVref CLVref Delay of a gate, T
---- ---------- I k(W/L)(Vref
Vt)2 where I is saturation current k is a
technology parameter W/L is width to length
ratio of transistor Vt is threshold voltage
4.0 3.0 2.0 1.0 0.0
Voltage reduction slows down as we get closer
to Vt
1.2µ CMOS
N3
Normalized gate delay, T
N2
N1
Supply voltage
Vt
Vref 5V
V22.9V
V3
17Increasing Multiprocessing
1.0 0.8 0.6 0.4 0.2 0.0
1.2µ CMOS, Vref 5V
Vt0.8V
PN/P1
Vt0.4V
Vt0V (extreme case)
1 2 3 4 5 6 7 8 9
10 11 12
N
18Extreme Case Vt 0
Delay, T a 1/ Vref For N processing elements,
delay NT ? VN Vref/N PN 1 -- 1 d (N
1) -- ? 1/N P1 N2 For negligible
overhead, d?0 PN 1 -- -- P1 N2 For Vt gt 0,
power reduction is less and there will be an
optimum value of N.
19Reduced-Power Shift Register
D
D Q
D Q
Output
multiplexer
D Q
CK(f/2)
Flip-flops are operated at full voltage and half
the clock frequency.
20Power Consumption of Shift Reg.
16-bit shift register, 2µ CMOS
P CVDD2f/n
1.0 0.5 0.25 0.0
Deg. Of parallelism Freq (MHz) Power (µW)
1 33.0 1535
2 16.5 887
4 8.25 738
Normalized power
C. Piguet, Circuit and Logic Level Design,
pages 103-133 in W. Nebel and J. Mermet (ed.),
Low Power Design in Deep Submicron Electronics,
Boston Kluwer Academic Publishers, 1997.
1 2 4
Degree of parallelism, n
21Multicore Processors
- D. Geer, Chip Makers Turn to Multicore
Processors, Computer, vol. 38, no. 5, pp. 11-13,
May 2005. - A. Jerraya, H. Tenhunen and W. Wolf,
Multiprocessor Systems-on-Chips, Computer, vol.
5, no. 7, pp. 36-40, July 2005 this special
issue contains three more articles on multicore
processors.
22Multicore Processors
Computer, May 2005, p. 12
Multicore
Performance based on SPECint2000 and SPECfp2000
benchmarks
Single core
2000 2004 2008