Title: Power/Performance/Cost Efficiency of Adiabatic Circuits, as a function of Device On/Off Power Ratios
1Power/Performance/Cost Efficiency ofAdiabatic
Circuits, as a function ofDevice On/Off Power
Ratios
- Michael P. FrankCISE Department / ECE
Dept.Brown Bag Seminar - Tue., Mar. 26
2(No Transcript)
3Source ITRS 99
4Across Multiple Technologies
Vacuum Tubes
IntegratedCircuits
Mechanical
DiscreteTransistors
ElectromechanicalRelays
Source Kurzweil, The Age of Spiritual Machines,
pp. 22-25
5½CV2 based on ITRS 99 figures for Vdd and
minimum transistor gate capacitance. T300 K
6Information Entropy
1 2 3
Example System with 3 two-state subsystems,such
as quantum spins.
Ruled outby someknowledge
Informational Spin label Status
1 Entropy 2 Known
information 3 Entropy
238 states
7Illustrating Landauers principle
Before bit erasure
After bit erasure
State ofbit to beerased.
s0
0
0
s??0
State ofrest ofsystem(thermalmodes, c.)
Nstates
sN-1
s??N-1
0
0
Unitary(1-1)evolution
2Nstates
s?0
s??N
1
0
Nstates
s??2N-1
0
s?N-1
1
8Conventional Gates are Irreversible
- Logic gate behavior (on receiving new input)
- Many-to-one transformation of local state!
- Required to dissipate bT by Landauer principle
- Incurs ½CV2 dissipation in 2 out of 4 cases.
Transformation of local state
Example Static CMOS Inverter
in
out
9(No Transcript)
10Adiabatic Charging in CMOS
Exact formula (if R const.)for frequency
factor f ? RC/t
11Adiabaticity is Fundamental
- Adiabatic (dissipation ? quickness) processes
can occur in any type of system. - Cf. Adiabatic theorem of quantum mechanics.
- Specific adiabatic logics have been described for
many proposed future device technologies - Superconducting (Likharev 82, Averin et al. 01)
- Nanomechanical (Drexler 92, Merkle mid-90s)
- Quantum-dot (Lent Tougaw, mid-90s-present)
- Quantum computing implementations (inherently)
- Claim Work on architectures analysis for
adiabatic CMOS will still apply post-CMOS!
12Adiabatic Rules for Transistors
- Rule 1 Never turn on a transistor if it has a
nonzero voltage across it! - I.e., between its source drain terminals.
- Why This erases info. causes ½CV2 disspation.
- Rule 2 Never apply a nonzero voltage across a
transistor even during any on?off transition! - Why When partially turned on, the transistor has
relatively low R, gets rel. high PV2/R
dissipation. - Corollary Never turn off a transistor when it
has a nonzero current going through it! - Why As R gradually increases, the VIR voltage
drop will build, and then rule 2 will be violated.
13Adiabatic Rules continued
- Transistor Rule 3 Never apply a large voltage
across any on transistor. - Why So transition will be more reversible
dissipation will approach CV2(RC/t), not ½CV2. - Adiabatic rules for other components
- Diodes Dont use them at all!
- There is always a built-in voltage drop across
them! - Resistors Avoid moderate network resistances.
- e.g. stay away from range gt10 k? and lt1 M?
- Capacitors Minimize, reliability permitting.
- Note Adiabatic dissipation scales with C2!
14Transistor Rules Summarized
Legal transitions in green. (For n- or
p-FETs.)Dissipative states and transitions in
red.
off
high
low
off
off
high
high
low
low
off
high
low
on
on
high
low
high
low
on
on
low
low
high
high
15?
Transformation of local state
16Simple Reversible CMOS Latch
- Uses a standard CMOS transmission gate
- Sequence of operation
- (1) input initially matches latch contents
(output), - (2) input changes?output changes, (3) latch
closes, (4) input removed.
b
a
Before Input Inputinput arrived removedin out
in out in outa a a a a a b b a b
P
in
out
b
a
17Generic Frictional Coefficients
- Normal defs. of friction (coeff. of sliding
friction, viscosity, etc.) may not apply to all
processes. - For a given mechanism executing a specified
process (i.e., following a specified desired
trajectory or -ies) adiabatically over a time t - Energy coefficient cE ?Elostt ?Elost/q
- Energy dissipated from traj. per unit of
quickness - Note quickness q 1/t has units like Hz
- Entropy coefficient cS ?Smadet ?Smade/q
- New entropy generated per unit of quickness
- Note that cE cST at temperature T.
What matters!
18Energy Coefficient in Electronics
- For charging capacitive load C by voltage V
through effective resistance R cE ?Elostt
(CV2RC/t)t C2V2R - If the resistances are voltage-controlled
switches with gain factor k controlled by the
same voltage V, then effective R ? 1/kV cE
C2V/k - In constant-field-scaled CMOS, k ? 1/hox ? ?, C ?
?, and V ? ?, so cE ? ?3/? ?4 ?Elost cE/t
? ?4/? ?3 (like CV2
energy)
19Entropy coefficients of some reversible logic
gate operations
- From Frank 98, Ultimate theoretical models of
nanocomputers (Nanotechnology journal) - SCRL, circa 1997 1 b/Hz
- Optimistic reversible CMOS 10 b/kHz
- Merkles quantum FET 1.2 b/GHz
- Nanomechanical rod logic .07 b/GHz
- Superconducting PQ gate 25 b/THz
- Helical logic .01 b/THz
How low can you go? We dont really know!
20Quantifying Leakage
- For a given structured system
- Leakage power Pleak dEleak / dt
- Spontaneous entropy generation rate Sleak
dSleak / dt - Again, note Pleak Sleak T at temperature T.
21Minimum Losses w. Leakage
Etot Eadia Eleak
Eleak Pleaktr
Eadia cE / tr
22Min. energy Roff/Ron ratio
- Note that cE C2V2Ron and if dominant leakage
is source/drain Pleak V2/Roff - So cEPleak C2V4/(Roff/Ron) Emin
2(cEPleak)1/2 2CV2(Roff/Ron)?1/2 - So Qmax ½CV2 / (2CV2(Roff/Ron)?1/2)
¼(Roff/Ron)1/2 ¼(Ion/Ioff)1/2
23Clock/Power Supply Desiderata
- Requirements for an adiabatic timing signal /
power supply - Generate trapezoidal waveform with very flat
high/low regions - Flatness limits Q of logic.
- Waveform during transitions is ideally linear,
- But this does not affect maximum Q, only energy
coefficient. - Operate resonantly with logic, with high Q.
- Power supply Q will limit overall system Q
- Reasonable cost, compared to logic it powers.
- If possible, scale Q ? t (cycle time)
- Required to be considered an adiabatic mechanism.
- May conflict w. inductor scaling laws!
- At the least, Q should be high at leakage-limited
speed
(Ideally,independentof t.)
24Supply concepts in my research
- Superpose several sinusoidal signals from
phase-synchronized oscillators at harmonics of
fundamental frequency - Weight these frequency components as per Fourier
transform of desired waveform - Create relatively high-L integrated inductors via
vertical, helical metal coils - Only thin oxide layers between turns
- Use mechanically oscillating, capacitive MEMS
structures in vacuo as high-Q (10k) oscillator - Use geometry to get desired wave shape directly
25A MEMS Supply Concept
- Energy storedmechanically.
- Variable couplingstrength -gt customwave shape.
- Can reduce lossesthrough balancing,filtering.
- Issue How toadjust frequency?
26Summary of Limiting Factors
- When considering adiabaticizing a system
- What fraction of system power is in logic? fL
- Vs. Displays, transmitters, propulsion.
- What fraction of logic is done adiabatically? fa
- Can be all, but w. cost-efficiency overheads.
- How large is the Ion/Ioff ratio of switches?
- Affects leakage minimum adiabatic energy.
- What is the Qsup of the resonant power supply?
- What is the relative cost of power logic? r
- E.g. decreasing power cost by r by increasingHW
cost by ? r will not help. Power premium
27Minimizing cost/performance
- P Cost of power in original system
- H Cost of logic HW in original system
- P rH H P/r
- For cost-efficiency inverse to energy savings
- tot,min Pr-1/2 Hr1/2 2 Pr-1/2
- tot,orig P H (1r)H ((1r)/r) P
- tot,orig/tot,min ½(1r)r-1/2 ?
½r1/2 for large r
28Summary of adiabatic limits
- Cost-effective adiabatic energy savings factor
- Sa Econv / Eadia in cost-effective adiabatic
system - Some rough upper bounds on Sa Sa ?
1/(1?fL) Sa ? 1/(1?fa) Sa ? ¼(Ion/Ioff)1/2
Sa ? Qsup Sa ? r1/2 - Discussion ignores benefits from adiabatics of
denser packing smaller communications delays in
parallel algorithms.
(worse than thesefor non-idealcomputations)
29Motivation for this study
- We want to know how to carry out any arbitrary
computation in a way that is reversible to an
arbitrarily high degree. - Up to limits set by leakage, power supply, etc.
- We want to do this as efficiently as possible
- Using as few device ticks as possible
(spacetime) - Minimizes HW cost, leakage losses
- Using as few adiabatic transitions as possible
(ops) - Minimizes frictional losses
- But, a desired computation may be originally
specd in terms of irreversible primitives.
30General-Case vs. Special-Case
- Wed like to know two kinds of things
- For arbitrary general-purpose computations,
- How to automatically emulate them in a fairly
efficient reversible way, - w/o needing new intelligent/creative design work
in each case? - For various specific computations of interest,
- What are the most efficient reversible
algorithms? - Or at least, the most efficient that we can find?
- Note These may not look anything like the most
efficient irreversible algorithms!
31The Landauer embedding
- The obvious embedding of irreversible ops into
expanding reversible ones leads to a linear
increase in space through time. (Landauer 61) - Or, increase in width of an input-consuming
circuit
Expandingoperations(e.g., AND)
Desiredoutput
Garbagebits
input
Circuit depth, or time ?
32Lecerf Reversal
- Lecerf (63) was interested in the group-theory
question of whether an iterated permutation of
items would eventually return to initial item. - Proved undecidable by reducing Turings halting
problem to this question, w. a reversible TM. - Reversible TM reverses direction instead of
halting. - Returns to initial state iff irreversible TM
would halt. - Only problemNo useful output data!
Desiredoutput
f
f ? 1
Garbage
Copy ofInput
Input
33The Bennett Trick
- Bennett (73) pointed out that you could simply
fan-out (reversibly copy) the desired output
before reversing. - Note O(T) storage is still temporarily needed!
Desired output
f
f ? 1
Copy ofInput
Input
Garbage
34Triangle Representation
- Represents any use of Bennett 73 embedding
State ofirrev. comp._at_ time ti?ti
Time in irreversiblesystem
AdiabaticProcess
?ti
Reversephase
Forwardphase
State ofirrev. comp._at_ time ti
Mass on anyvertical line space usage_at_ that
time
Time in reversiblesystem
35Improving Spacetime Efficiency
- Bennett 73 transforms a computation taking
spacetime ST to one taking ?(ST2) in the worst
case. - Can we do better?
- Bennett 89 Described a technique that takes
spacetime - Actually, can generalize slightly and arrange for
exponent on T to be 1?, where ??0 (very slowly) - Lange, McKenzie, Tapp 97 Space ?(S) is
possible, if you use time ?(exp(?(S))) - Not any more spacetime-efficient than Bennett.
36Pebble Game Representation
37Triangle representation
k 2n 3
k 3n 2
38Analysis of Bennett Algorithm
- n of recursive levels of algorithm
- k of lower-level iterations to go forward 1
higher-level step - Tr of reversible lowest-level steps
executed c(2k?1)n (c a small
constant, e.g. 2) - Ti of irreversible steps emulated kn
- So, n logk Ti, and so Tr c(2k?1)log Ti/log k
celog(2k?1)log(Ti)/log k cTilog(2k ?1)/log k
(n1 spikes)
E.g. k2 Tr 2Tilog(3)/log(2)
39Cost-Efficiency Analysis
- Total cost of doing a computation includes
- Spacetime costs (storage used, integrated over
time) - Includes time-amortized manufacturing cost
- Includes cost of total energy leakage
- leakage from any in-use storage element
- Irreversibility costs (energy loss from irrev.
ops) - Total number of irreversible bit-erasures, CV2 gt
kT each. - Adiabatic costs (energy loss from reversible
ops.) - Proportional to number na of adiabatic ops
performed,times ce, divided by time top of a
single op.
40Bennett 89 alg. is not optimal
k 2n 3
k 3n 2
Just look at all the spacetime it wastes!!!
41Parallel Frank02 algorithm
- We can simply squish the triangles closer
together to eliminate the wasted spacetime! - Resulting algorithm is linear time for all n and
k and dominates Ben89 for time, spacetime,
energy!
k3n2
k2n3
Emulated time
k4n1
Real time
42Setup for Analysis
- For energy-dominated limit,
- let cost equal energy.
- c energy coefficient, r r(min) leakage
power - i energy dissipation per irreversible
state-change - Let the on/off ratio Ron/off r(max)/r(min)
Pmax/Pmin. - Note that c ? itmin i (i / r(max)),
so r(max) ? i2/c - So Ron/off ? i2 / cr(min) i2 / cr
43Time Taken
- There are n levels of recursion.
- Each multiplies the width of the base of the
triangle by k. - Lowest-level triangles take time ctop.
- Total time is thus ctopkn.
k4n1
Width 4 sub-units
44Number of Adiabatic Ops
- Each triangle contains k (k ? 1) 2k ? 1
immediate sub-triangles. - There are n levels of recursion.
- Thus number of adiabatic ops is c(2k ? 1)n
k3n2
52 25little triangles(adiabaticoperations)
45Spacetime Usage
- Each triangle includes the spacetime usage of all
k ? 1 of its subtriangles, - Plus,additional spacetime units, each
consisting of 1 storage unit, for time
topkn?1
k5n1
1 state of irrev. mach. Being stored
1
2
Time top kn-1
3
Resulting recurrence relationST(k,0) 1 (or
c)ST(k,n) (2k?1)ST(k,n?1) (k2?3k2)kn?1/2
123 units
46Reversible Cost
- Adiabatic cost plus spacetime cost r a r
(2k-1)nc/t ST(k,n)rt - Minimizing over t gives r 2(2k-1)n
ST(k,n) c r1/2 - But, in energy-dominated limit, c r ? i2 /
Ron/off, - So r 2i (2k-1)n ST(k,n) / Ron/off1/2
47Tot. Cost, Orig. Cost, Advantage
- Total cost i for irreversible operation
performed at end of algorithm, plus reversible
cost, gives tot i 1 2(2k-1)n
ST(k,n) / Ron/off1/2 - Original irreversible machine performing kn ops
would use cost orig ikn, so, - Advantage ratio between reversible irreversible
cost,
48Optimization Algorithm
- For any given value on Ron/off,
- Scan the possible values of n (up to some limit),
- For each of those, scan the possible values of k,
- Until the maximum R(i/r) for that n is found
- (the function only has a single local maximum)
- And return the max R(i/r) over all n tried.
49Spacetime blowup
Energy saved
k
n
50Asymptotic Scaling
- The potential energy savings factor scales as
R(i/r) ? Ron/off0.4, - while the spacetime overhead goes only as
R(i/r) ? R(i/r)0.45, or Ron/off0.18. - E.g., with an Ron/off of 109, you can do
worst-case computation in an adiabatic circuit
with - An energy savings of up to a factor of 1,200 !
- But, this point is 700,000 less
hardware-efficient!
51Conclusions
- A new, more spacetime-efficient
energy-efficient algorithm for doing arbitrary
computations adiabatically has been described. - The energy savings in worst-case computations
goes as the 0.4th power of device on/off ratio. - Best case computations 0.5th power.
- However, the reduction in spacetime efficiency
scales with energy savings to the 1.6th power. - Still much faster than we would like!
- Adiabatics can be generally cost-effective, but
still only for heavily energy-dominated apps.