Title: ELEC 59700016970001Fall 2005 Special Topics in Electrical Engineering LowPower Design of Electronic
1ELEC 5970-001/6970-001(Fall 2005)Special Topics
in Electrical EngineeringLow-Power Design of
Electronic CircuitsPower Aware Microprocessors
- Vishwani D. Agrawal
- James J. Danaher Professor
- Department of Electrical and Computer Engineering
- Auburn University
- http//www.eng.auburn.edu/vagrawal
- vagrawal_at_eng.auburn.edu
2SIA Roadmap for Processors (1999)
Source http//www.semichips.org
3Power Reduction in Processors
- Just about everything is used.
- Hardware methods
- Voltage reduction for dynamic power
- Dual-threshold devices for leakage reduction
- Clock gating, frequency reduction
- Sleep mode
- Architecture
- Instruction set
- hardware organization
- Software methods
4SPEC CPU2000 Benchmarks
- Twelve integer and 14 floating point programs,
CINT2000 and CFP2000. - Each program run time is normalized to obtain a
SPEC ratio with respect to the run time of Sun
Ultra 5_10 with a 300MHz processor. - CINT2000 and CFP2000 summary measurements are the
geometric means of SPEC ratios.
5Reference CPU s Sun Ultra 5_10 300MHz Processor
6CINT2000 3.4GHz Pentium 4, HT Technology (D850MD
Motherboard)
SPECint2000_base 1341 SPECint2000 1389
Source www.spec.org
7Two Benchmark Results
- Baseline A uniform configuration not optimized
for specific program - Same compiler with same settings and flags used
for all benchmarks - Other restrictions
- Peak Run is optimized for obtaining the peak
performance for each benchmark program.
8CFP2000 3.6GHz Pentium 4, HT Technology
(D925XCV/AA-400 Motherboard)
SPECfp2000_base 1627 SPECfp2000 1630
Source www.spec.org
9CINT2000 1.7GHz Pentium 4(D850MD Motherboard)
SPECint2000_base 579 SPECint2000 588
Source www.spec.org
10CFP2000 1.7GHz Pentium 4 (D850MD Motherboard)
SPECfp2000_base 648 SPECfp2000 659
Source www.spec.org
11Energy SPEC Benchmarks
- Energy efficiency mode Besides the execution
time, energy efficiency of SPEC benchmark
programs is also measured. Energy efficiency of a
benchmark program is given by - 1/(Execution time)
- Energy efficiency ------------
- joules consumed
12Energy Efficiency
- Efficiency averaged on n benchmark programs
- n
- Efficiency ( ? Efficiencyi )1/n
- i1
- where Efficiencyi is the efficiency for program
i. - Relative efficiency
- Efficiency of a computer
- Relative efficiency -----------------
- Eff. of reference computer
13SPEC2000 Relative Energy Efficiency
Always max. clock
Laptop adaptive clk.
Min. power min. clock
14Voltage Scaling
- Dynamic Reduce voltage and frequency during idle
or low activity periods. - Static Clustered voltage scaling
- Logic on non-critical path given lower voltage
- 47 power reduction with 10 area increase
reported. - M. Igarashi et al., Clustered Voltage Scaling
Techniques for Low-Power Design, Proc. IEEE
Symp. Low Power Design, 1997.
15Pipeline Gating
- A pipeline processor uses speculative execution.
- Incorrect branch prediction results in pipeline
stalls and wasted energy. - Idea Stop fetching instructions if a branch
hazard is expected - If the count (M) of incorrect predictions exceeds
a pre-specified number (N), then suspend fetching
instruction for some k cycles. - Ref. S. Manne, A. Klauser and D. Grunwald,
Pipeline Gating Speculation Control for Energy
Reduction, Proc. 25th Annual International Symp.
Computer Architecture, June 1998.
16Slack Scheduling
- Application Superscalar, out-of-order execution
- An instruction is executed as soon as data and
resources it needs become available. - A commit unit reorders the results.
- Delay the execution of instructions whose result
is not immediately needed. - Example of RISC instructions
- add r0, r1, r2 (A)
- sub r3, r4, r5 (B)
- and r9, x1, r9 (C)
- or r5, r9, r10 (D)
- xor r2, r10, r11 (E)
J. Casmira and D. Grunwald, Dynamic Instruction
Scheduling Slack, Proc. ACM Kool Chips Workshop,
Dec. 2000.
17Slack Scheduling Example
18Slack Scheduling
Re-order buffer
Scheduling logic
Low-power execution units
Slack bit
19Parallel Architecture
Processor
Processor
Input
Output
Output
f/2
Input
Processor
f
f
Capacitance C Voltage V Frequency f Power
CV2f
Capacitance 2.2C Voltage 0.6V Frequency
0.5f Power 0.396CV2f
f/2
20Pipeline Architecture
Processor
½ Proc.
½ Proc.
Input
Output
Input
Output
Register
Register
Register
f
f
Capacitance 1.2C Voltage 0.6V Frequency
f Power 0.432CV2f
Capacitance C Voltage V Frequency f Power
CV2f
21Approximate Trend
G. K. Yeap, Practical Low Power Digital VLSI
Design, Boston Kluwer Academic Publishers, 1998.
22Clock Distribution
clock
23Clock Power
Pclk CLVDD2f CLVDD2f / ? CLVDD2f / ?2 .
. . stages 1 1 CLVDD2f S -
n 0 ?n where CL total load
capacitance ? constant fanout at each stage
in distribution network
Clock consumes about 40 of total processor power.
24Clock Network Examples
D. W. Bailey and B. J. Benschneider, Clocking
Design and Analysis for a 600-MHz Alpha
Microprocessor, IEEE J. Solid-State Circuits,
vol. 33, no. 11, pp. 1627-1633, Nov. 1998.
25Power Reduction Example
- Alpha 21064 200MHz _at_ 3.45V, power dissipation
26W - Reduce voltage to 1.5V, power (5.3x) 4.9W
- Eliminate FP, power (3x) 1.6W
- Scale 0.75?0.35µ, power (2x) 0.8W
- Reduce clock load, power (1.3x) 0.6W
- Reduce frequency 200?160MHz, power (1.25x) 0.5W
- J. Montanaro et al., A 160-MHz, 32-b, 0.5-W CMOS
RISC Microprocessor, IEEE J. Solid-State
Circuits, vol. 31, no. 11, pp. 1703-1714, Nov.
1996.