Title: Low Power, Fix Throughput 12bit Multiplier Design with 0.18um Dual Threshold
1Low Power, Fix Throughput 12-bit Multiplier
Design with 0.18um Dual Threshold
- Chen Chang, Changchun Shi
- and Prof. Bora Nikolic
- Berkeley Wireless Research Center
2Motivation
- Low power design promotes longer battery life in
portable applications and reduces heat
dissipation in high performance applications. - Virtual any DSP design requires the usage of
multiplier - With technology miniaturization, tradeoff between
performance and power become more pronounced
3Problem Statement
- How to identify each component of the power
- How to reduce each component of the power
- How to reduce power on different level of the
design - System level
- Architectural level
- Circuit style level
- Device technology level
4Possible Solutions
- Algorithm and system level
- Architecture Level
- Array, Wallace tree, split array, booth encoded
- Pipelined, parallel structure
- Circuit Level
- Static CMOS vs. CPL
- Device Technology level
- Dual Threshold vs. Single Threshold
- MTCMOS
- Balancing critical path
- Dual Threshold Domino logic
5Proposed Comparison
- With carry save array multiplier, compare power
and performance using single and dual threshold
devices, with various Vdd - With Wallace-tree multiplier, compare power and
performance using single and dual threshold
devices, with various Vdd - Compare between the results from array and
Wallace-tree multiplier - Applying delay balancing technique by adding
delay components to minimize spurious
transitions, and compare
6Conditions and Assumptions
- 100MHz non-pipelined multiplier for wireless
communication systems - 0.18um dual threshold Static CMOS circuits
- Supply Voltage Choice of 1.0v, 1.1v, 1.2v, 1.3v,
1.4v, 1.5v - No layout, but model long wire as capacitors
7Wallace Tree Multiplier Algorithm using 4-to-2
Compressor
- Stick-dot view, similar to
- Daddas notition
- Our Excel model that
- Works on actual numbers.
8Component Building Blocks
a)
b)
d)
c)
a) AOI unit, b) XOR, c)HA, d) FA
94 to 2 Compressor
- critical path is abut
- 3 xors in both cases
- above is 42 with 4
- Inputs, critical path is
- In-sum
- below is 42 with only
- 3 inputs, critical path is
- Cin-sum, also about
- 3 xor (including Cout
- Generation from
- Previous bit
10Array Multiplier with Carry Save adders
- Three versions of Array adders were Built
- All low leakage
- All high speed
- Mixed, with red
- Box high speed
- red box indicates devices are HS
- yellow line indicate critical path
11Wallace Tree Adder
- estimated size 0.13mm0.13mm. This implies about
- 2 inverter gate caps needed on some of the long
wires - again three versions were made, the middle range
- Devices and the final adder are HS in the mixed
version
12test vectors
- 10 carefully chosen input vector transitions are
used - (about 2hours simulation time for each run, 10 is
max) - among the 10, three transitions are shown below
- a) triggers the critical path delay for array
mult - b) triggers the critical path delay for Wallace
tree mult - c) has large amount of transitions
a)
c)
b)
13Delay of the multipliers with various Vdd
- Wallace tree with/out wire cap model result
- A difference of 15 performance difference
- delay versus v follow analytic model
- tdVdd/(vdd-vt-vdsat/2)
- Wallace tree gives 20 speed winning over array
- keep 10ns delay (100MHz) as margin is reasonable
- the lowest Vdd for mixed tree mult is 1.1v
- the lowest Vdd for mixed array mult is 1.3v
- Mixed and HS have the same delay!
14Power analysis--Leakage
- Leakage power is 10-3 less than active power,
so only when at - Concern leakage power in active mode
- leakage power is about 10-1 factor difference
is HS and LL, agree with Delta(Vth)0.1v - leakage power vdd2
- Mixed structure can usually save 50 power
versus HS structure while suffering no - Performance penalty, actual depends on the
percentage of HS devices - array, tree gives about 20 less leakage than
array, at same performance
15Power analysis--active
- while we expect mixed structure
- Offer active power reduction by
- Balancing the path better, our
- Two structure are already quite
- Parallel, so only 2 power reduction
- From HS to Mixed mode, due to reducing
spuriousTransition by using dual threshold - wallace tree has less transitions than array at
same - Vdd less spurious transitions
- wallace tree can further reduce Vdd to reduce
active power as 1/vdd2 (useful - Transition power consumption
16Power AnalysisSpurious Transitions
About half transitions Are spurious!
- an array structure in excel is fully studied
- With delay modeled
- A lot spurious trans come from partial products
- Early availability, dual vth cannt help much
- by adding delay component to partial product
- Generator output should helps!!
- Wallace tree has much less profound improvement
- Because only a few bits not processed parallely
17Poweradding delay components
- expect 45 active power reduction from excel
simulation - actual simulation only gives 11 reduction in
array, and 1 - For tree.
- the discrepancy comes from over simplification
in excel model
18Conclusion
- utilization of dual threshold can save power at
minimum impact on performance (Leakage 50 in
both structures) - the amount of power saved improves with fewer
critical path and more balance architecture - Dual threshold helps only 2 saving spurious
transitions for both case - Delay components helps array save another 11
spurious trans, while - Only 1.4 for tree
- Architecture advantage of Wallce tree is clear,
as it can also reduce Vdd to save active power. - Would like to fully extend the simulink model if
time permited