Design challenges in sub-100nm high performance microprocessors - PowerPoint PPT Presentation

1 / 283
About This Presentation
Title:

Design challenges in sub-100nm high performance microprocessors

Description:

Design challenges in sub-100nm high performance microprocessors Nitin Borkar, Siva Narendra, James Tschanz, Vasantha Erraguntla Circuit Research, Intel Labs – PowerPoint PPT presentation

Number of Views:928
Avg rating:3.0/5.0
Slides: 284
Provided by: Vas69
Category:

less

Transcript and Presenter's Notes

Title: Design challenges in sub-100nm high performance microprocessors


1
Design challenges in sub-100nm high performance
microprocessors
  • Nitin Borkar, Siva Narendra, James Tschanz,
    Vasantha Erraguntla
  • Circuit Research, Intel Labs
  • nitin.borkar_at_intel.com
  • siva.g.narendra_at_intel.com
  • james.w.tschanz_at_intel.com
  • vasantha.erraguntla_at_intel.com


2
Outline
  • Section 1 Challenges for low power and high
    performance (90 mins)
  • Historical device and system scaling trends
  • Sub-100nm device scaling challenges
  • Power delivery and dissipation challenges
  • Power efficient design choices
  • Section 2a Circuit techniques for variation
    tolerance (90 mins)
  • Short channel effects
  • Adaptive circuit techniques for variation
    tolerance

3
Outline (contd.)
  • Section 2b Circuit techniques for leakage
    control (90 mins)
  • Leakage power components
  • Leakage power prediction
  • Leakage reduction and control techniques
  • Section 3 Full-chip power reduction techniques
    (90 mins)
  • Micro-architecture innovations
  • Coding techniques for interconnect power
    reduction
  • CMOS compatible dense memory design
  • Special purpose hardware
  • Design methodologies challenges for CAD

4
Section 1
  • Challenges for low power and high performance

5
Moores Law on scaling
6
Scaling of dimensions
7
Transistors on a chip
1000
2X growth in 1.96 years!
100
Pentium 4
Pentium III
10
Pentium II
Pentium
Transistors (MT)
486
1
386
286
0.1
8086
8085
0.01
8080
8008
4004
0.001
1970
1980
1990
2000
2010
Year
Transistors on Lead Microprocessors double every
2 years
8
Die size growth
100
Pentium 4
Pentium III
Pentium II
Pentium
486
Die size (mm)
10
386
286
8080
8086
7 growth per year
8085
8008
2X growth in 10 years
4004
1
1970
1980
1990
2000
2010
Year
Die size grows by 14 to satisfy Moores Law
9
Frequency
10000
Pentium 4
1000
Pentium III
Pentium II
100
Pentium
Frequency (Mhz)
486
386
10
8085
286
8086
8080
1
8008
4004
0.1
1970
1980
1990
2000
2010
Year
Lead Microprocessors frequency doubles every 2
years
10
Performance
Applications will demand TIPS performance
11
Power
Future
100
Pentium 4
Pentium III
Pentium
10
486
286
8086
Power (Watts)
386
8085
1
8080
8008
4004
0.1
1971
1974
1978
1985
1992
2000
Year
Lead Microprocessors power continues to increase
12
Obeying Moores Law...
200M--1.8B transistors on the Lead Microprocessor
13
Vcc will continue to reduce
10.00
1.35
1
1.00
Supply Voltage (V)
1.15
0.9
0.10
1970
1980
1990
2000
2010
Year
Only 15 Vcc reduction to meet frequency demand
14
Constant Electric Field Scaling
15
Active capacitance density
Active capacitance grows 30-35 each technology
generation
16
Power will be a problem
100000
18KW
5KW
10000
1.5KW
500W
1000
P4
P III
100
Power (Watts)
Pentium
486
286
10
386
8086
8085
8080
1
8008
4004
0.1
1974
1978
1985
1992
2000
2004
2008
1971
Year
Power delivery and dissipation will be prohibitive
17
Closer look at the power
100,000
Will be...
18KW
10,000
5KW
Should be...
Power (Watts)
1.5KW
623W
1,000
500W
375W
225W
135W
100
2002
2004
2006
2008
Year
18
Advanced transistor design
Shallow highly doped source/drain extension
Thin TOX
p
p
Halo/pocket
Retrograde Well
Shallow trench isolation
n-well
Deep source/drain
19
Intels 15 nm bulk transistor
R. Chau et al., IEDM 2000
20
Transistor scaling trends - SCE
Le
Tox
Dj
D
Aspect Ratio
  • Short channel effect (SCE) as measured as aspect
    ratio has been worsening with scaling

21
Transistor scaling challenges - Dj
  • Junction depth reduction
  • Device channel length decrease for same SCE
  • - Series resistance to the channel increases

22
Transistor scaling challenges - Tox
  • Thinning gate oxide
  • Increased gate tunneling leakage
  • Electrical thickness is 2X physical thickness
  • Gate stress now limits max VCC
  • Solutions
  • New decoupling caps
  • Modified oxides/gate materials
  • Model gate leakage in circuit simulation

23
VCC and VT scaling
24
Vcc scaling Soft errors
  • Vcc and cap scaling with technology reduces
    charge stored
  • Soft errors prominent in logic circuits
  • No error correction in logic circuits
  • Storage nodes per chip increasing
  • Higher soft errors at the chip level

25
Motivation
  • Soft error rate (SER) per bit staying constant in
    future processes
  • T. Karnik et al, 2001 VLSI Circuits Symposium
  • Need to reduce SER/bit

Goal Reduce chip-level SER with no performance
penalty and minimum power penalty
26
Measured Latch Data
SERX
2.25
7,000
2
5,250
Original
Errors
3,500
SER ImprovementX
1,750
Hardened
0
1
0.5
0.7
0.9
1.1
1.3
Supply Voltage (V)
T. Karnik et al, 2001 VLSI Circuits Symposium
  • Will need 2X SER improvement in latches with no
    performance loss.

27
VT vs. leakage
  • Leakage rises as the VT is lowered
  • MOS has a sub-threshold slope of 110mV/decade
  • Lower VT by 50mV ? 3X leakage
  • Solutions
  • Dual VT
  • Stacking of off gates
  • Controlled back gate bias?
  • Multiple process technologies Mobile vs.
    Performance?

28
Sub-threshold Leakage
Sub-threshold leakage current will increase
exponentially
Assumtions 0.25mm, Ioff 1na/m 5X increase each
generation at 30ºC
29
Leakage Power
Excessive sub-threshold leakage power
30
Leakage Power increases
100,000
0.18u
0.13u
0.1u
0.07u
0.05u
10,000
1,000
Ioff (na/u)
100
10
30
40
50
60
70
80
90
100
Temp (C)
Drain leakage will have to increase to meet freq
demand Results in excessive leakage power
31
Wide Domino Functionality
CLK
CLK
Q2
Q1
A
B
C
B
C
Static Gate
D2 Domino Gate
CLK
D1 Domino Gate
  • Lower AC noise margin Vt
  • Ioff could limit NOR fan-in
  • High activity, higher power, 2X
  • Irreversible logic evaluation
  • Scalability is not good
  • High performance 30 over static
  • High fan-in NOR, less logic gates
  • High fan-in complex gates possible
  • Smaller area

32
Bitline Delay Scaling Problem
  • Bit line swing limited by parameter mismatch
    differential noise
  • Cell stability degrades with Vt lowering
  • Bit line delay a (Cap/W)Vswing/(Ion/W -
    rowsIoff/W)
  • Reducing of rows per bitline approaching limit

33
Restrict transistor leakage
10000
7 GHz
5.5 GHz
4 GHz
2.5 Ghz
1000
Pentium 4
Frequency (Mhz)
Pentium II
100
Pentium
486
386
10
1985
1990
1995
2000
2005
2010
Year
Reduce leakage ? Frequency will not double every
2 years
34
Interconnect scaling trends
35
Interconnect performance
R increases faster at lower levels C increases
faster at higher levels RC increases 40-60
36
Interconnect distribution
Interconnect distribution does not change
significantly
37
Wire Scaling
  • Uarch for short wires
  • Repeaters

38
Optimum Repeater
  • Vary
  • N size, P size
  • Repeater distance
  • Metal width, space

39
P, V, T Variations
40
Frequency SD Leakage
Normalized Frequency
0.18 micron 1000 samples
30
20X
15
20
Normalized Leakage (Isb)
41
Vt Distribution
120
0.18 micron 1000 samples
100
80
30mV
of Chips
60
40
20
0
-39.71
-25.27
-10.83
3.61
18.05
32.49
D
VTn(mv)
42
Frequency Distribution
150
100
of Chips
50
0
1.00
1.07
1.15
1.22
1.30
1.37
Freq (Normalized)
43
Isb Distribution
100
of Chips
1
1.00
4.82
8.64
12.47
16.29
20.11
Isb (Normalized)
44
Supply Voltage Variation
  • Activity changes
  • Current delivery RI and L(di/dt) drops
  • Dynamic ns to 10-100us
  • Within-die variation

45
Handling di/dt
Bulk Decoupling
High Frequency Decoupling
VRM Response
Local Decoupling
Silver BoxResponse
On DieDecoupling
46
Vcc Variation Reduction
  • On die decoupling capacitors reduce DVcc
  • Cost area, and gate oxide leakage concerns
  • On die voltage down converters regulators

47
Temperature Variation
Cache
70ºC
Core
120ºC
  • Activity ambient change
  • Dynamic 100-1000us
  • Within-die variation

48
Major Paradigm Shift
  • From deterministic design to probabilistic and
    statistical design
  • A path delay estimate is probabilistic (not
    deterministic)
  • Multi-variable design optimization for
  • Yield and bin splits
  • Parameter variations
  • Active and leakage power
  • Performance

49
Performance Efficiency of mArch
Pollacks Rule
4
3
Area(Lead / Compaction)
2
Growth (X)
Performance(Lead / Compaction)
1
Note Performance measured using SpecINT and
SpecFP
0
1.5
1
0.7
0.5
0.35
0.18
Technology Generation
  • Implications (in the same technology)
  • New microarchitecture 2-3X die area of the last
    uArch
  • Provides 1.5-1.7X performance of the last uArch

We are on the wrong side of a Square Law
50
Frequency Performance
  • Frequency increased 61X
  • 18.3X ? process technology
  • Additional 3.3X ? uArch
  • Performance increased 100X
  • 14X ? process technology
  • Additional 7X ? uArch, design

51
Design EfficiencymArch
In the same process technology, compare Scalar ?
Super-scalar ? Dynamic ?
Netburst 2-3X Growth in area 1.4X Growth in
Integer Performance 1.7X Growth in Total
Performance 2-2.5X Growth in Power
Pollacks Rule in actionPower inefficiency
52
Power Efficiency - Circuits
Assumptions Activity Static 0.2, Domino
0.5 Clock consumes 40 of full chip power
High Power circuits contribute to power
inefficiency
53
Power density will increase
Power density too high to keep junctions at low
temp
54
Thermal Solutions
Ta
Attachment
Qsa Sink-to-Ambient (Heat-Sink)
Resistance
Ts
Heat Sink
Qcs Case-to-Sink (Interface)
Resistance
Interface
Tc
Package
Qjc Junction-to-Case (Package)
Resistance
Mounting
Tj
55
Thermal CapabilityToday
Package - Polymer thermal interface - 1.5mm
Cu heat spreader - 0.35oC/W (typical) Thermal
Interface Material - Thermal Grease - Phase
Change Material - 0.12oC/W Heat Sink - Al
Folded Fin Cu base - 3.5 x 2.5 x 2 at
400g - 0.38oC/W - 5 (for RM fan)
1.0
0.8
Heat Sink (0.38oC/W)
0.6
QJA 0.82oC/W
Thermal Resistance (oC/W)
TIM (0.12oC/W)
0.4
Package (0.35oC/W)
0.2
0.0
TJ 90oC, TA 45oC, QJA 0.82oC/W P
(90-45)/0.82 55W
56
Thermal CapabilityFuture
Must improve on all frontsno silver bullet
57
Shrinking Size Quieter
3000
2500
2000
1500
System Volume ( cubic inch)
1000
500
0
PC tower
Mini tower
m-tower
Slim line
Small pc
Small quiet, yet high performance
58
Thermal Budget
Desktop PC ASP
2200
Performance PC
1800
1400
Value PC
US
1000
600
200
1995 1996 1997 1998
1999 2000
Source Dataquest Personal Computers
Shrinking ASP, and shrinking budget for thermals
59
Thermal
  • Throttling / clock gating
  • Circuits and sizing
  • 10 performance gain at same power can be
    translated into 25 power reduction by changing
    VCC
  • Improved die attach / package
  • Can effect new uArch / floor planning
  • Spread and reduce power

60
Thermal Envelope Cost
1000
Liquid Spray Refrigeration
100
Liquid Immersion
Unit Cost ()
Mobile High Perf
HAR HS with Heat Pipe
Itanium proc
10
HAR HS
Pentium 4 proc
Celeron
Extrusions
1
1
10
100
1000
Power (W)
61
The Odds
1.5
100
Pentium III
75
1.0
Projected Heat Dissipation Volume
Pentium 4
50
Heat-Sink Volume (in3)
Thermal Budget (oC/W)
Air Flow Rate (CFM)
Projected Air Flow Rate
0.5
25
Thermal Budget
0
0
250
0
50
100
150
200
Power (W)
Power ? Thermals ? Higher Heat Sink Volume ?
Higher Air-flow Is this cheaper, smaller, and
quieter?
62
Whats next
  • Circuit techniques for variation tolerance
  • Circuit techniques for leakage control
  • Full-chip power reduction techniques
  • 30 min quiz

63
Section 2a
  • Circuit techniques for variation tolerance

64
Moores law on scaling
65
Scaling of dimensions
66
Requires die size growth or same die size
67
(No Transcript)
68
Drain current (Linear scale)
VT
(Log scale)
IOFF
69
Barrier Lowering (BL)
Increasing electron energy (NMOS)
L
n
p
Barrier height
Barrier height
n
Channel of length L
Xd
Xd
Drain (n)
Source (n)
70
Drain Induced BL (DIBL)
71
Impact of variation in L
BL (VDS?0)
DIBL (VDSVDD)
VT (Volts)
Channel length (um)
DL ? DVT ? DION DIOFF
72
180nm measurements
Necessary to make circuits less sensitive to VT
(ION IOFF) variation
73
Transistor scaling
L
Tox
Dj
D
Transistor aspect ratio
Short channel effects increase with scaling
74
Transistor scaling challenges - Dj
0.8
0.5
m)
0.7
0.4
NMOS
m)
m
m
(mA/
0.6
0.3
(mA/
DN
DP
PMOS
0.5
I
0.2
I
S. Thompson et al., 1998.
S. Thompson et al., 1998.
0.4
0.1
0
50
100
150
200
Junction Depth (nm)
S. Asai et al., 1997.
75
Transistor scaling challenges - Tox
76
High-K Gate Dielectric
  • Lower gate leakage
  • Higher Cox at a given gate leakage

77
Parameter variation
Device and chip level parameters
Parameter variations increase with
scaling Adaptive VDD, VT to reduce chip level
variation
78
Scaling challenges summary
  • L, VDD, VT scaling
  • ? Increasing parameter variation
  • ? Increasing sub-threshold leakage power
  • ? Increasing gate leakage power
  • Product life cycle reduced from 3.6 years to 2
    years
  • ? Concurrent engineering
  • ? Better prediction models

79
VT variation categories
80
Adaptive Body Bias (ABB)
81
Side effects of ABB
(2) Apply reverse bias
Determine impact of adaptive body bias on
within-die VT variation.
82
Short Channel MOS VT
BL? ? lb? DIBL? ? ld?
83
Within-die VT Variation
Within-die VT variation is primarily due to CD
variation
84
Solutions
  • Bi-directional adaptive body bias
  • Several separate bias generators on-chip

85
Testchip die micrograph
  • 150nm CMOS
  • 21 subsites per die
  • Microprocessor critical path
  • FrequencyMin(F1..F21)
  • PowerSum(P1..P21)
  • Separate VBS for each subsite
  • 62 dies per wafer

5.3 mm
4.5 mm
86
Sub-site micrograph
21 sub-sites with separate body bias for each
sub-site
87
CUT schematics
88
Simple Adaptive Body Bias (S-ABB)
Neglects WID variation
Area overhead 2
89
Effectiveness of S-ABB
Frequency Variations/m
NBB 4.1
S-ABB 1.0
S-ABB
NBB
90
Adaptive Body Bias (ABB)
Accounts for WID variation
Area overhead 2-3
91
Effectiveness of ABB
Frequency Variations/m
NBB 4.1
ABB 0.69
ABB
NBB
92
Adaptive Bias Distribution
N FBB
N FBB
N RBB
P FBB
P RBB
P FBB
1 die
13 dies
38 dies
10 dies
93
Frequency vs. Critical Path Count (NCP)
  • Frequency m and s reduce as NCP increases
  • Frequency distribution unchanged for NCP gt 14

94
WID Delay Variation vs. Logic Depth
NMOS s/m 5.6 PMOS s/m 3.0
Number of samples ()
Delay s/m 4.2
Variation ()
Miyazaki, ISSCC 2000 This work
Path Depth 49 16
Device ?/? 2.4 4.27
Frequency ?/? 0.55 4.17
95
Within-Die Adaptive Body Bias (WID-ABB)
Compensates for WID variation
Area overhead Similar to ABB
96
Effectiveness of WID-ABB
97 in highest bin
Frequency Variations/m
ABB 0.69
WIDABB 0.21
ABB
WID-ABB
97
Within-Die Bias Distributions
P FBBN RBB
Circuit Block Count
P,N FBB
FBB
P RBBN FBB
PMOS Body Bias (V)
P,N RBB
RBB
NMOS Body Bias (V)
RBB
FBB
98
Bias Resolution
Bias resolution ABB ABB WID-ABB WID-ABB
Bias resolution dies, F gt 1 s/m dies, F gt 1.075 s/m
500mV 79 2.87 2 1.89
300mV 100 1.47 66 0.50
100mV 100 0.69 97 0.21
  • 300mV bias resolution sufficient for ABB
  • WID-ABB requires 100mV bias resolution

99
ABB summary
  • D2D and WID variations impact microprocessor
    frequency and leakage
  • ABB improves die acceptance rate from 50 to 100
  • ABB is most effective when WID variations are
    considered
  • Compensating for WID variations by WID-ABB
    increases number of high frequency dies from 32
    to 97

100
Adaptive VDD VT
For iso-frequency Decrease VDD Increase
VT Increase VDD Decrease VT
Fast die
Slow die
V
T
2
S
a
a
10
P


V
P
leak
sw
DD
101
Testchip goals
  • Body bias (VBS) for VT modulation
  • Measure frequency improvement with
  • Adaptive VDD
  • Adaptive VBS
  • Adaptive VDDVBS
  • Adaptive VDD Within-die VBS
  • Subject to total active and standby power
    constraints

102
Baseline measurements
103
Adaptive VDD vs. Fixed VDD
Active power limit 10W/cm2 Standby power
limit 0.5W/cm2
Fixed VDD 1.05V Frequency reduced to meet
power limit
Adaptive VDD 20mV resolution VDD frequency
changed
simultaneously
104
VDD resolution requirement
100
80
60
Die count
40
20
0
0.9
0.95
1
1.05
Frequency Bin
Fixed VDD 1.05V
Adaptive VDD 50mV resolution
Adaptive VDD 20mV resolution
Minimum of 20mV resolution in VDD is required
105
Adaptive VDD vs. Adaptive VBS
100
80
60
79
Die count
74
16
15
40
10
3
0
0
20
0
0.9
0.95
1
1.05
Frequency Bin
Adaptive VDD 20mV resolution
Adaptive VBS 100mV resolution
6 Fixed VDD Target frequency bin 10 Adaptive
VDD 16 Adaptive VBS
106
Adaptive VDD VBS
100
80
60
26
71
79
Die count
16
40
3
2
0
0
20
0
0.9
0.95
1
1.05
Frequency Bin
Adaptive VBS
Adaptive VDDVBS
Adaptive VDD VBS more effective than adaptive
VDD or adaptive VBS
107
VDD distribution
1.05
1.07
0.99
1.01
1.03
VDD (V)
Adaptive VDD VBS results in lower VDD than in
adaptive VDD
108
VBS distribution
Adaptive VBS
Adaptive VDDVBS
Adaptive VDD VBS results in more dies with FBB
than in adaptive VBS
109
Adaptive VDD Within-die VBS
Adaptive VDD Within-die VBS is most effective
110
AVDD ABB Summary
  • 150nm CMOS with 10W/cm2 active 0.5W/cm2 standby
    power density limits result in
  • 20mV resolution in VDD is required
  • 100mV resolution for VBS is required

111
Neighborhood VT variation
The devices of interest that are in close
proximity can be either of the same or different
polarity.
Impacts sense amps, diff amps, current mirrors
etc.
Impacts clock generation circuits, switching
thresholds etc.
Voltage biasing
Current biasing
112
Voltage biasing
Linear threshold voltage mismatch of matched
device pair for 500 mV forward body bias, zero
body bias and 500 mV reverse body bias.
113
Application to sense-amp
Traditional sense-amplifier
New sense-amplifier
114
Simulation results
1.5 V, 1 mV/pS ramp rate, and 110 C
115
Current biasing
Basic iso-current biasing
116
Application
Non-overlapping 2f clock generation
117
Current biasing
Process insensitive current biasing
118
Iref existing techniques
  • Reference voltage to reference current conversion
  • Bandgap circuit with off-chip resistor
  • MOS reference voltage with off-chip
    resistor
  • Direct reference current generation
  • MOS based temperature compensation only

119
Objective
Generate process compensated current ? with thin
tox digital CMOS devices ? without external
resistors
120
Device measurement
0.18 mm CMOS technology, 30oC, Uncompensated
current
n 77 s 235.8 mA m 1.6 mA s/m
15
121
Subtraction method
(at x Xmid)
(around Xmid)
y1 y2 vary with x, but yD is insensitive to x
122
Example
Choose m1?m2 and n2 This will provide non-zero
yD insensitive to x around xd for proper n1
123
Illustration
n2 1 m1 4.2 m2 2 xd 15 ? n1 0.13
y1 (35)
y2 (47)
xd
yD (6)
x
124
MOS devices in saturation
Using long-channel wide devices
?
?
125
Compensation by subtraction
126
VT generation circuit
VDD
½ VDD
1VT
20/2
15/2
5VT
15/2
20/2
2/20
127
Subtraction circuit
VDD
Vsg2 ? 2VT
z2
z1
Iref I1-I2
z1/z2 1/8
I2
I1
128
Device measurement
0.18 mm digital CMOS technology, 30oC
I1
I2
n 112
129
Compensated current
0.18 mm digital CMOS technology, 30oC
n 112 s 17.4 mA m 305 mA s/m
5.7
130
Sub-1 V operation
b, a Vddmin (V) Temp (oC) Iref variation Vdd sensitivity
5, 2 0.9 30 5.0 0.3 per 100 mV
3, 2 0.6 30 5.2 0.4 per 100 mV
Low voltage operation enabled by redesigning Vt
generation circuit
131
Process corner simulationresults
0.18 mm digital CMOS technology, 30oC, VDD 0.9
V, z1/z2 1/6
1.4
Iu (-16 to 22)
Iref (5)
1.22
1.14
1.2
1
1
0.99
0.97
0.95
0.95
0.89
1.0
0.84
0.8
Normalized current
0.6
0.4
0.2
0.0
Slow -
Slow
Typical
Fast
Fast
Process corner
7.6X smaller variation than uncompensated current
132
Summary on Iref
  • Subtraction technique for compensation
  • Compensation technique reduces reference current
    variation to 5 at Vdd of 0.9 V from 38
  • Variation remains as 5 at Vdd of 0.6 V

133
Section 2a Summary
  • Device parameter variation increases with scaling
    ? design margins increase
  • Adaptive schemes required to minimize impact of
    device variation on design margin of digital
    circuits
  • Voltage and current biasing schemes to minimize
    impact of variation on analog circuits

134
Section 2b
  • Circuit techniques for leakage control

135
Outline
  • Leakage sources impact of variations
  • Leakage estimation with variations
  • Static leakage reduction techniques
  • Dynamic leakage reduction techniques
  • Leakage-tolerant circuits

136
Sources of Leakage
137
Transistor leakage mechanisms
From Keshavarzi, Roy, Hawkins (ITC 1997)
1. PN junction leakage
5. Punchthrough current
2. Weak inversion SD leakage
6. Narrow width effects
3. DIBL and contribution from SCE
7. Gate oxide leakage
4. GIDL
8. Hot carrier injection
138
Components of leakage
139
Subthreshold leakage trends
  • Historic Vt scaling 15 per generation
  • S-D and gate leakage impact 3-5X increase
  • Significant component of total power
  • Serious dynamic circuit robustness penalty

140
Leakage vs. switching power
Leakage gt 50 of total power!
250nm
180nm
130nm
100nm
70nm
  • Key requirements
  • Accurate prediction of chip leakage power
  • Techniques to reduce chip leakage power

141
DIBL impact on leakage
BL (VDS?0)
DIBL (VDSVDD)
VT (Volts)
Higher IOFF due to DIBL
Channel length (um)
142
Variation impact on leakage
1.0E
-
05
1.0E
-
05
1.0E
-
05
1.0E
-
05
m
m
150 nm technology
110C
110C
0.18
m CMOS
110C
0.18
m CMOS
110C
110C
110C
VD1V
VD1V
VD1V
VD1V
VD1V
VD1V
1.0E
-
06
1.0E
-
06
1.0E
-
06
1.0E
-
06
1.0E
-
07
1.0E
-
07
1.0E
-
07
1.0E
-
07
NBB0V
NBB0V
NBB0V
NBB0V
Intrinsic IOFF (A)
Intrinsic IOFF (A)
Intrinsic IOFF (A)
Intrinsic IOFF (A)
1.0E
-
08
1.0E
-
08
1.0E
-
08
1.0E
-
08
Lwc
Lwc
1.0E
-
09
1.0E
-
09
Lwc
Lwc
1.0E
-
09
1.0E
-
09
Lwc
Lwc
1.0E
-
10
1.0E
-
10
1.0E
-
10
1.0E
-
10
Lnom
Lnom
Lnom
Lnom
Lnom
Lnom
1.0E
-
11
1.0E
-
11
1.0E
-
11
1.0E
-
11
1500
2000
2500
3000
3500
1500
2000
2500
3000
3500
1500
2000
2500
3000
3500
1500
2000
2500
3000
3500
1/
IDlin
1/
IDlin
1/
IDlin
1/
IDlin
Shorter L
Shorter L
Shorter L
Shorter L
Shorter L
Shorter L
Shorter L transistors contribute more to chip
leakage
143
Transistor scaling challenges - Tox
144
High-K Gate Dielectric
  • Lower gate leakage
  • Higher Cox at a given gate leakage

145
Source/Drain Tunneling Leakage
146
Leakage Estimation and Modeling
147
Leakage estimation
Prior techniques
148
New model
Includes within-die variation
After simplification using error function
properties,
149
Applications
150
Measurement results
0.18 um 32-bit microprocessors (n960)
50 of the samples within 20 of the measured
leakage Compared 11 and 0.2 of the samples
using other techniques
151
Static Leakage Reduction1) Transistor Stacks
152
Leakage of Stacks
Stack leakage is 5-10X smaller
153
ScalabilityStack Effect
Stack effect becomes stronger with scaling
154
Exploiting natural stacks
32-bit Kogge-Stone adder
High VT Low VT
Energy Overhead 1.64 nJ 1.84 nJ
Savings 2.2 mA 38.4 mA
Min time in Standby 84 mS 5.4 mS
Reduction Avg Worst
High VT 1.5X 2.5X
Low VT 1.5X 2X
155
Stack forcing
Delay Penalty
Leakage Reduction
Equal Loading
Low-Vt stack-forcing reduces leakage power by 3X
156
Static Leakage Reduction2) Dual-Vt Process
157
Dual VT design technique
Leakage 3X smaller (Active Standby) No
performance loss
158
Optimum choices of high low Vt
75-100mV VT difference is optimal
159
Dual-VT and sizing
  • Techniques
  • DVT
  • min-lvt
  • min-area
  • min-pwr

Optimize design with concurrent dual-VT
allocation and sizing
160
Results total power
min-lvt
min-pwr
min-area
DVT
Switching
Leakage
Total power (normalized)
1.96 GHz(High-VT target)
2.30 GHz(Low-VT target)
2.21 GHz
  • Total power reduced by 6-8 over DVT-only
  • Leakage power reduced by 20 over DVT-only

161
Results total device width
min-lvt
high-VT
min-pwr
DVT
min-area
low-VT
Total width (normalized)
1.96 GHz(High-VT target)
2.30 GHz(Low-VT target)
2.21 GHz
  • Less low-VT usage than DVT-only
  • Trade-off between area low-VT usage

162
Results area comparison
Frequency 2.3GHz
DVT
min-lvt
min-lvt 15 area overhead
min-pwr
min-area
20 burn-in power reduction
163
Effect of leakage change
  • Push leakage in manufacturing to increase
    frequency
  • Dual-VT design ideally push low-VT only

2.2 GHz
DVTS,original
Path Count (x1000)
2.76 GHz
DVTS,low-VT leakage push
High-VT paths do not speed up
Path Delay (ps)
164
Enhanced dual-VT design
  • Allow for efficient frequency change
  • Insert additional low-VT devices

2.2 GHz
EDVTS,20
Path Count (x1000)
EDVTS,low-VT leakage push
2.76 GHz
Path Delay (ps)
Dual-VT insertion should consider process scaling
165
Dual-VT sizing summary
  • Dual-VT sizing reduces low-VT usage by 30-60
    compared with DVT-only
  • Leakage power reduced by 20
  • Dual-VT designs offer 9 frequency improvement
    over single-VT
  • Enhanced design allows frequency increase through
    low-VT leakage push

166
Dynamic Leakage Reduction1) Body bias
167
Reverse body bias
Total Leakage Power Measured on 0.18m Test Chip
Tech 0.35 mm 0.18 mm
Opt.RBB 2V 0.5V
Ioff Red. 1000X 10X
RBB reduces SD leakage Less effective with
shorter L, lower VT, scaling
168
Impact of scaling on RBB effectiveness
RBB becomes less effective with technology scaling
169
Switching leakage reduction forward body bias
20 power reduction at 1GHz 8 ? frequency at
iso-power 20X ? idle-mode leakage
170
Router chip with forward body bias
150nm technology
Digital core with on-chip PMOS body bias
generator (BG).
171
Power and performance gain by FBB
33 performance gain at 1.1V! 25 power
reduction at 1Ghz!!
172
Standby leakage control by FBB
173
Dynamic Leakage Reduction2) Dynamic sleep
transistor
174
Active leakage control
Sleep transistor
Body bias
175
32-bit ALU overview
Technology 130nm dual-VT CMOS
Die Area 1.61 X 1.44 mm2
Transistors 160K
Frequency 4.05GHz _at_ 1.28V450mV FBB, 75C
CBG central bias generator LBG local bias
generator
176
Sleep transistor layout
ALU
Sleep transistor cells
177
Body bias layout
Sleep transistor LBGs
ALU core LBGs
Number of ALU core LBGs 30
Number of sleep transistor LBGs 10
PMOS device width 13mm
Area overhead 8
ALU
ALU core LBGs
Sleep transistor LBGs
178
Frequency leakage impact
Reference No sleep transistor, 450mV FBB to core, 1.35V, 75C Frequency degradation Leakage reduction Area increase

No over/under drive or sleep body bias 2.3 37X 6
200mV over/under drive 1.8 44X 7
Sleep body biasFBB RBB 1.8 64X 8

Dynamic body biasFBB ZBB 0 1.9X 8
PMOS sleeptransistor
PMOSbody bias
179
Virtual supply convergence
Convergence gt 1ms
  • Convergence time is dependent on capacitance

Convergence lt 1ms
Leaky MOS decap on virtual VCC better leakage
savings for gt 1ms idle time
180
Total power equal frequency
TON 100 cycles, 75C, a0.05, F4.05GHz
15savings
8savings
Overhead
Leakage
? 77
? 45
LBG
Total power (mW)
Switching
? 3
Clock gatingonly
Clock gating body bias
Clock gating sleep transistor
181
Leakage-Tolerant Circuits1) Dynamic register
file
182
Impact of increasing leakage
  • Leakage disturbs the local bit line (LBL)
  • Noise can result in erroneous evaluation
  • Wider addressing exacerbates problem

183
Dual-Vt design for robustness
  • High-Vt and stronger keepers mitigate leakage and
    improve robustness
  • Contention causes severe penalty in delay

184
Source-follower NMOS (SFN)
  • As leakage charges the output node, feedback
    reduces the leakage

Automatic Vgs reduction and reverse Vbs
185
Leakage bypass w/ stack forcing
  • Extra PMOSs supply leakage currents
  • Leakage is bypassed away from LBL
  • Extra NMOS device forces stack node

Stack node
186
Better robustness vs. delay
Larger keeper smaller skew
  • DVT Much better than LVT

187
Energy vs. delay for SFN
  • Robustness fixed at 10 across all points
  • Leakage-tolerant techniques not only improve
    robustness, but reduce energy as well
  • SFN width not as competitive because of PMOS
    pull-up

188
Energy vs. delay for LBSF
  • LBSF faster despite 3-stack pull-down in LBL,
    2-stack in GBL
  • Comparable total width in pull-down stacks yield
    similar capacitance

189
Summary of LBSF and SFN
Full LBSF SFN DVT SFN LBSF
Delay improvement 33 10 31
Energy reduction 37 24 38
Total width reduction 47 -3 26
  • Improved RF robustness without delay penalty
  • Advantages of LBSF and SFN improve as leakage
    increases

190
Leakage-Tolerant Circuits2) L1 cache using
bitline leakage reduction (BLR)
191
Bitline leakage reduction
  • Memory cell HVT and Lmax
  • Solution Larger, Dual-Vt cell for L1 cache
  • 3 types of cells
  • HVT Lmax
  • HVT Lmin
  • DVT Lmin

192
Intrinsic and effective read current
  • DVTLmin cell IINT is 35 larger IEFF is smaller
  • 128 rows per bitline
  • 100 nm technology

193
Bitline leakage reduction
WL -100mV ? Vmax Vvc Vmax 100mV
194
BLR test chip results
  • 2Kb bank of 16Kb L1 cache

BLR 25 higher read current, 3 larger cell area
195
BLR performance
1.2V, 110oC
  • Bitline delay improved from 91ps to 75ps
  • Read delay reduced from 159ps to 132ps
  • Bitline development rate improved by 8

196
Leakage-Tolerant Circuits3) Conditional keeper
for burn-in
197
Leakage at burn-in (BI)
  • BI conditions elevated voltage and temperature
    further challenges leakage issue
  • Higher leakage, higher temperature
  • Thermal runaway issue and positive feedback
    effect
  • Impact of leakage (specially at BI) on circuit
    functionality
  • Stability of IDDQ measurement with BI stress

198
Keepers need to be upsized for burn-in
  • Larger keepers increase delay at normal
    condition

199
Burn-in conditional keeper
Normal mode Keeper
Effective Burn-in Keeper
Burn-in signal (BI)
PKB
PK1
Clock
Min. sized
Pull Down NMOS
Clock
200
Burn-in keeper 100nm comparison
STD
Norm. delay (Normal condition)
Delay improvement
BI-CKP
NORs Fan-in (number of inputs)
Burn-in Keeper size of pull down
Larger delay improvement for wider dynamic gates
201
Summary
  • Control of leakage power becoming crucial
  • Leakage estimation is necessary during design
    phase
  • Static and dynamic techniques can be used for
    leakage control
  • Dual-VT process and stack effect
  • Dynamic sleep transistor and body bias
  • Leakage-tolerant circuits
  • Cache and memory leakage techniques
  • Burn-in leakage reduction

202
Section 3
Full-chip power reduction techniques and design
methodologies
203
Micro architecture innovations
204
mArchitecture Tradeoffs
  • Higher target frequency with
  • Shallow logic depth
  • Larger number of critical paths
  • But with lower probability

205
Improve mArch Efficiency
Thermals Power Delivery designed for full HW
utilization
Single Thread
ST
Wait for Mem
Multi-Threading
Wait for Mem
MT1
Wait
MT2
MT3
Multi-threading improves performance without
impacting thermals power delivery
206
Still obey Moores Law!
10,000
Actual
Moore's Law.
1,000
Transistors (MT)
100
10
2000
2002
2004
2006
2008
Year
Total transistors meet Moores Law
207
Freds Rule
4
Area(Lead/Compaction)
3
Growth (X)
2
1
Perf(Lead/Compaction)
0
1.5
1
0.7
0.5
0.35
0.18
Technology Generation
In the same process technology 2X Area ? 1.4X
Performance
208
Reduced die size causes Performance gap
30-60 performance loss even after meeting
Moores Law
209
Exploit MemoryLow PD
  • Large on die caches provide
  • Increased Data Bandwidth Reduced Latency
  • Hence, higher performance for much lower power

210
Memory has lower power density
Exploit memory !
211
Increase memory area
70
57
60
55
54
50
41
40
Memory Area of total
29
30
20
10
0
2000
2002
2004
2006
2008
Year
Use gt 50 die area in memory
212
Memory trend
100000
12M
2.5M
10000
24M
1M
5.5M
1000
Memory (KB)
100
16
16
8
10
1
1980
1990
2000
2010
Year
213
Power density is reduced
Full chip power density is reduced But local
power density will be high
214
Can DRAM help?
  • Transistor perf not critical for DRAM
  • Dont need large retention time
  • 10X more storage in same area power
  • TB/sec Bandwidth, at lt10ns latency

215
Embedded DRAM on logic
Provides 10X memory--same area, same power as SRAM
216
Embedded DRAM could improve performance
Source Glenn Hinton, 99
  • Embedded DRAM provides
  • 10X increase in on-die Memory
  • 1,000 X increase in Bandwidth
  • 10X reduction in Latency

217
On-die DRAM Applications
(1)
(2)
218
130nm test chip
0.52m
0.52m
0.73m
1.61m
1.05m
1.10m
N/P Inversion
P/N Accumulation
P/P Depletion
219
Area and Power Comparison
  • P/P the best from power and area perspective

220
Interconnect power reduction
221
Motivation CC Multiplier (CCM)
CCM 0
CCM 1
CCM 2
Cc
Cc
CCM Cc Multiplier
Cg
  • RintCint delay of long busses is a key speed
    limiter
  • Coupling cap (Cc) is a large component of Cint

Cint Cg CCM ? (2Cc)
222
Coupling Capacitance Scaling
metal-4
  • Coupling capacitance remains a large fraction of
    Cint despite moving from Al to Cu.

223
Static Bus (SB)
  • Simple scheme with no timing constraints
  • Minimize delay through optimal repeater insertion
  • CCM of 2 has negative impact on delay

224
Dynamic Bus
  • Domino timing applied to interconnect
  • Monotonic transitions
  • Reduced collinear capacitance
  • Static (worst case) 2X
  • Dynamic (worst case) 1X
  • F2 repeater required susceptible to noise
  • Higher transition activity when input 1
  • Static CMOS inverters drive all segments

225
Dynamic Bus Advantages
  • Capacitance effects reduced
  • Collinear capacitance reduced 2X
  • Orthogonal capacitance unchanged
  • Inductance effects reduced
  • Can oppose transition for static bus
  • Can reduce capacitive effects for dynamic bus

226
Static Pulsed Bus (SPB)
  • Static PG generates a pulse on a data transition
  • Toggle FF (TFF) restores correct data at bus end
  • Leading edge is critical repeaters are skewed

227
SPB Benefits
CCM 0
CCM 1
Cc
Cc
Cg
  • In SPB, data transitions are monotonic
  • worst case CCM 1 and repeaters can be skewed
  • Similar to dynamic bus but (1) has no clock
    overhead and (2) its energy scales with switching
    activity

228
SB Vs. SPB Delay
SPB Delay Breakdown
RC Rep. 77
Other 23
  • SPB reduces delay by 22 as a result of
  • Repeater skewing
  • CCM lt 1 due to useful noise coupling

229
SB Vs. SPB Energy
  • SPB reduces energy by 12 due to
  • Smaller skewed repeater sizes
  • Smaller CCM

230
SB vs. SPB Different Bus Lengths
at iso-energy
at iso-delay
  • At iso-energy, SPB improves delay by 15-25
  • At iso-delay, SPB reduces energy by 12-25
  • At iso-delay, SPB reduces current/width by 26-34

231
SPB summary
  • SPB has monotonic data transitions
  • ? worse case CCM 1
  • ? repeaters can be skewed
  • Unlike dynamic bus
  • ? no clock precharge-evaluate energy and routing
  • ? energy consumption is data activity dependent
  • For 1500mm-4500mm metal-4 line, SPB
  • ? improves delay by 15-25
  • ? reduces energy by 12-25
  • ? reduces width by 34-42
  • ? reduces peak-current by 26-34

232
Transition-Encoded Bus (TEB)
  • Encoder circuit
  • XOR of previous and current input
  • Domino compatible output
  • Decoder circuit
  • XOR of previous output and bus state

233
TEB Advantages
  • Dynamic bus performance improvement
  • Collinear capacitance reduction
  • Static bus energy
  • Transition dependent switching activity
  • Noise-insensitive F2 repeater required
  • Regains noise immunity of CMOS inverter

234
Energy Comparison
Static
Transition-encoded
9mm metal3, 130nm process, 1.2V, 30ºC
235
Results
  • Averaged over 3-9mm buses
  • Metal3 in 130nm technology, 1.2V, 30ºC

236
TEB summary
  • Transition-encoded bus
  • High performance, energy efficient on-chip
    interconnect technique
  • 32 active area reduction
  • 49 peak current reduction
  • Transition dependent energy consumption
  • ? Energy savings at aggressive delay targets
  • Enables 10-35 performance improvement on 79 of
    full-chip Pentium 4 buses

237
Special purpose hardware
238
Special-Purpose HW
  • Special-purpose performance ? more MIPS/mm²
  • SIMD integer and FP instructions in several ISAs
  • Integration of other platform components, e.g.
    memory controller, graphics
  • Special-purpose logic, programmable logic, and
    separately programmable engines

Die Area Power Performance
General Purpose 2X 2X 1.4X
Multimedia Kernels lt10 lt10 1.5-4X
Improve power efficiency with Valued Performance
239
TCP/IP challenges
Saturated 1GbE
1GbE 1.48M pkts/sec 672 ns 10GbE 14.8M
pkts/sec 67.2 ns
General purpose MIPS will not keep up!
240
Compute power required for TCP/IP
TCP/IP Engine will provide required MIPs
241
A sample approach
  • A programmable hardware engine for offloading TCP
    processing
  • Focus on
  • Most complex part TCP inbound processing
  • Handle 10Gbps Ethernet traffic with sufficient
    headroom for outbound processing
  • Aggressive wire speed goal - minimum packet size
    on saturated wire
  • Simple, scalable, flexible design enabling fast
    time to market

242
Key features
  • Special purpose processor
  • Dual frequency, low latency, buffer-free design
  • High frequency execution core
  • Accelerated context lookup and loading
  • Programmability for ever-changing protocols
  • Programmable design with special instructions
  • Rapid validation and debug
  • Scalable solution
  • Across bandwidth and packet sizes
  • Extendable to multi-core solution

243
Packet size vs. core frequency
64
1Gbps 1GHz
10Gbps 10GHz
Increase packet size ? reduce frequency
244
Chip characteristics
Chip Area Process Interconnect Transistors Pad count 2.23 x 3.54mm2 90nm dual-VT CMOS 1 poly, 7 metal 460K 306
245
Standard FP MAC
FB
MA
Critical Path Logic Stages 26 _at_30ps per stage,
Fmax 1.2Ghz (P860, 1.1V)
246
Prototype FP MAC
M(CS)
FB(CS)
MP(CS)
ZD
0
1
1
42 compressor
1
0
M gtF
Shift by 32
Overflow detector
1
0
ME FBE
Critical Path Logic Stages 12 _at_30ps per stage,
Fmax 3GHz (P860, 1.1V)
247
Accumulator Algorithm
  • Key Minimize interaction between incoming
    operand and accumulator result
  • Floating point number converted to base 32
  • Exponent subtraction no longer necessary
  • Exponent comparison reduced from 8 to 3 bits

248
Die photograph and characteristics
MULTIPLIER
FIFOs SCAN
ALIGNER
CLK
ACCUMULATE
NORMALIZE

Clock Grid Buffers
249
Design methodologies
250
Motivation
  • Parameter variations will become worse with
    technology scaling
  • Robust variation tolerant circuits and
    microarchitectures needed
  • Multi-variable design optimizations considering
    parameter variations
  • Major shift from deterministic to probabilistic
    design

251
Impact on Design Methodology
Due to variations in Vdd, Vt, and Temp
Probability
Delay
Deterministic
Deterministic
Probabilistic
Probabilistic 10X variation 50 total power
Frequency
of Paths
of Paths
Delay Target
Delay Target
Leakage Power
252
Tool Complexity
  • Problems
  • Far too many tools and tool interfaces
  • Data is not easily extractable
  • Circuit reuse is minimal
  • Solutions
  • Common tool interfaces
  • Standard databases
  • Parameterized design

253
Designer Cockpit
  • Everything on the menu bar

254
Designer Cockpit
  • Selection in either view

255
Designer Cockpit
File Edit View Select Synthesize
Parasitics Sizing Analyze Experiment
Checks Options
AMPS Tune to target sense amp memory cellSet
restrictions ? Autosize
Speed power curve Optimize with
sensitivity Optimize metal line Pick VT Delay vs
size Cell characterize Sense amp
characterize Memory cell stability Setup hold
chararacterize User specified ? New Script ?
  • Tools work with partial or full selection
  • Designer intervention allowed anywhere
  • Layout planner provides wiring parasitics
  • Not the route of the week
  • All tools callable from user programs
  • Experiment organizer
  • Optimization and experiments built in

256
Optimization Example
  • Imagine
  • Select gates from schematic editor or layout
    planner to optimize
  • Select optimization for PD3
  • Include a metal width and space
  • Include VT range optimization
  • Force a metal line length as a function of
    transistor sizes in a cell
  • Select Pathmill analysis
  • Run with sensitivity turned on

257
Optimization Example
258
Evolve a Macro Library
Feasibility studies estimation
RTL
Circuit design
Layout
Tapeout
  • Executable on-line documentation
  • Designs must be easily absorbed into the library

259
Tools and productivity
  • Functional uarch modules
  • Investigation tools and libraries
  • Cross discipline optimization Monte Carlo
  • Easy database access
  • Designer has same access as developer
  • Full chip path extraction and visualization
  • Productive design requires
  • Innovation to be early
  • Early innovation enabled by
  • Flexible and open tools

260
Development CAD and DA
Research- Core technologies
Tool Vendors
CAD Development- Productize modules and sample
flow
Design DA Groups - Interface, flows and
adaptations
Designers - Special features
261
Examples
262
Chip with bias generator (BG)
150nm Communications router (ISSCC 01)
Digital core with on-chip PMOS body bias
generator (BG). 1.5 million PMOS devices
263
Distributed biasing scheme
Central Bias Generator (CBG) and Local Bias
Generator (LBG)
264
Bias generation distribution
265
Routing details
Global routing
Vcca
Vcca 450mV
Vcca
To LBGs
FBB / ZBB control bit
Local routing
Vcc
From LBGs
Vcc 450mV
266
Router chip summary
267
Dual-VT Motivation
  • Low-VT used in critical paths
  • Achieve same frequency as all low-VT design
  • Leakage power much smaller than all low-VT design

268
Dual-VT Options
  1. DVT
  2. H-SDVT
  3. L-SDVT
  4. DVTS

269
Dual-VT Allocation Only (DVT)
  • Transistors sized for original target
  • Insert low-VT to meet new target frequency

270
Selective LVT Insertion (H-SDVT)
  • Size at target frequency
  • Insert low-VT to fix critical paths
  • Size to optimize slack (down-size)

271
Selective HVT Insertion (L-SDVT)
  • Convert netlist to all low-VT
  • Size at target frequency
  • Insert high-VT on non-critical paths
  • Size to optimize slack

272
Dual-VT and Sizing (DVTS)
  • Iterative DVT flow
  • Use different amounts of sizing, low-VT to reach
    target
  • Pick best iteration

273
Tutorial summary
  • Challenges for low power and high performance
  • Historical device and system scaling trends
  • Sub-100nm device scaling challenges
  • Power delivery and dissipation challenges
  • Power efficient design choices
  • Circuit techniques for variation tolerance
  • Short channel effects
  • Adaptive circuit techniques for variation
    tolerance

274
Tutorial summary (contd.)
  • Circuit techniques for leakage control
  • Leakage power components
  • Leakage power prediction and control techniques
  • Full-chip power reduction techniques
  • Micro-architecture innovations
  • Coding techniques for interconnect power
    reduction
  • CMOS compatible dense memory design
  • Special purpose hardware
  • Design methodologies challenges for CAD

275
Power limited microprocessor integration choices
Special purpose processing DSP Network processing
(wired/wireless)
Adapt to Process
Present
Next decade
Adaptive general purpose units
Special purpose units
General purpose units
Dense Memory
Memory
Power (active and standby) management
276
Acknowledgements
  • The presenters would like to thank all the CRL
    team members and Intel design and manufacturing
    teams for their contribution towards the contents
    of this tutorial.

277
Bibliography (1 of 7)
  • De, V. Borkar, S. Technology and design
    challenges for low power and high performance
    microprocessors, Low Power Electronics and
    Design, 1999. Proceedings. 1999 International
    Symposium on , 1999, Page(s) 163 168
  • Lundstrom, M. Ren, Z. Essential physics of
    carrier transport in nanoscale MOSFETs, Electron
    Devices, IEEE Transactions on , Volume 49 Issue
    1 , Jan. 2002, Page(s) 133 -141
  • Thompson, S. et al A 90 nm logic technology
    featuring 50 nm strained silicon channel
    transistors, 7 layers of Cu interconnects, low k
    ILD, and 1um2 SRAM cell, Electron Devices
    Meeting, 2002. IEDM '02. Digest. International ,
    8-11 Dec. 2002, Page(s) 61 -64
  • Karnik, T. Borkar, S. Vivek De Sub-90nm
    technologies--challenges and opportunities for
    CAD, Computer Aided Design, 2002. ICCAD 2002.
    IEEE/ACM International Conference on , 2002,
    Page(s) 203 206
  • Belady, C. Cooling and power considerations for
    semiconductors into the next century, Low Power
    Electronics and Design, International Symposium
    on, 2001. , 6-7 Aug. 2001, Page(s) 100 -105
  • Karnik, T et al Selective node engineering for
    chip-level soft error rate improvement, VLSI
    Circuits Digest of Technical Papers, 2002.
    Symposium on , 13-15 June 2002, Page(s) 204 -205
  • Narendra, S. De, V. Borkar, S. Antoniadis, D.
    Chandrakasan, A. Full chip sub-threshold leakage
    power prediction model for sub 0.18um CMOS, Low
    Power Electronics and Design, 2002. ISLPED '02.
    Proceedings of the 2002 International Symposium
    on , 2002, Page(s) 19 -23
  • Narendra, S. Borkar, S. De, V. Antoniadis, D.
    Chandrakasan, A. Scaling of stack effect and its
    application for leakage reduction, Low Power
    Electronics and Design, International Symposium
    on, 2001. , 2001, Page(s) 195 200
  • Narendra, S. et al 1.1 V 1 GHz communications
    router with on-chip body bias in 150 nm CMOS,
    Solid-State Circuits Conference, 2002. Digest of
    Technical Papers. ISSCC. 2002 IEEE International
    , Volume 1 , 2002, Page(s) 270 -466 vol.1

278
Bibliography (2 of 7)
  • Tschanz, J.W. Narendra, S. Nair, R. De, V.
    Effectiveness of adaptive supply voltage and body
    bias for reducing impact of parameter variations
    in low power and high performance
    microprocessors, Solid-State Circuits, IEEE
    Journal of , Volume 38 Issue 5 , May 2003,
    Page(s) 826 -829
  • Tschanz, J.W. et al Adaptive body bias for
    reducing impacts of die-to-die and within-die
    parameter variations on microprocessor frequency
    and leakage, Solid-State Circuits, IEEE Journal
    of , Volume 37 Issue 11 , Nov. 2002, Page(s)
    1396 -1402
  • Vangal, S. et al 5GHz 32b integer-execution core
    in 130nm dual-Vt CMOS, Solid-State Circuits
    Conference, 2002. Digest of Technical Papers.
    ISSCC. 2002 IEEE International , Volume 2 ,
    2002, Page(s) 334 -535
  • Narendra, S. Keshavarzi, A. Bloechel, B.A.
    Borkar, S. De, V. Forward body bias for
    microprocessors in 130-nm technology generation
    and beyond, Solid-State Circuits, IEEE Journal of
    , Volume 38 Issue 5 , May 2003, Page(s) 696
    -701
  • Somasekhar, D Lu, Shih-Lien Bloechel, Bradley
    Lai, Konrad Borkar, Shekhar De, Vivek Planar
    1T-Cell DRAM with MOS Storage Capacitors in a
    130nm Logic Technology for High Density
    Microprocessor Caches, European Solid-State
    Circuits Conference, 2002, Proceedings of the
    2002 International Conference on, ESSCIRC 2002,
    Page(s) 127 - 130
  • Khellah, M. Tschanz, J. Ye, Y. Narendra, S.
    De, V. Static pulsed bus for on-chip
    interconnects, VLSI Circuits Digest of Technical
    Papers, 2002. Symposium on , 13-15 June 2002,
    Page(s) 78 79
  • Anders, M. Rai, N. Krishnamurthy, R.K. Borkar,
    S. A transition-encoded dynamic bus technique
    for high-performance interconnects, Solid-State
    Circuits, IEEE Journal of , Volume 38 Issue 5 ,
    May 2003, Page(s) 709 714
  • Vangal, S. et al A 5GHz Floating Point Multiply
    Accumulator in 90nm Dual-VT CMOS, Solid-State
    Circuits Conference, 2003. Digest of Technical
    Papers. ISSCC. 2003 IEEE International , Volume
    46 , 2003, Page(s) 334 -335

279
Bibliography (3 of 7)
  • Hoskote, Y. et al A 10GHz TCP Offload
    Accelerator for 10Gbps Ethernet in 90nm Dual-VT
    CMOS, Solid-State Circuits Conference, 2003.
    Digest of Technical Papers. ISSCC. 2003 IEEE
    International , Volume 46 , 2003, Page(s)
    258-259
  • http//www.intel.com/research/silicon/mooreslaw.ht
    m
  • G.E. Moore, Cramming more components onto
    integrated circuits, Electronics, vol. 38, no.
    8, April 19, 1965.
  • K.G. Kempf, Improving Throughput across the
    Factory Life-Cycle, Intel Technology Journal,
    Q4, 1998.
  • S. Thompson, P. Packan, and M. Bohr, MOS
    Scaling Transistor Challenges for the 21st
    Century, Intel Technology Journal, Q3, 1998.
  • Y. Taur and T. H. Ning, Fundamentals of Modern
    VLSI Devices, Cambridge University Press, 1998.
  • D. Antoniadis and J.E. Chung, Physics and
    Technology of Ultra Short Channel MOSFET
    Devices, Intl. Electron devices Meeting, pp.
    21-24, 1991.
  • A. Chandrakasan, S. Sheng, and R. W. Brodersen,
    Low-Power CMOS Digital design, IEEE J.
    Solid-State Circuits, vol. 27, pp. 473-484, Apr.
    1992.
  •  Z. Chen, J. Shott, J. Burr, and J. D. Plummer,
    CMOS Technology Scaling for Low Voltage Low
    Power Applications, IEEE Symp. Low Power Elec.,
    pp. 56-57, 1994.
  • H.C. Poon, L.D. Yau, R.L. Joh
Write a Comment
User Comments (0)
About PowerShow.com