Low Power, Fix Throughput 12bit Multiplier Design with 0.18um Dual Threshold - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Low Power, Fix Throughput 12bit Multiplier Design with 0.18um Dual Threshold

Description:

red box indicates devices are HS. yellow line indicate critical path ... Mixed, with red. Box high speed. Wallace Tree Adder. estimated size 0.13mm*0.13mm. ... – PowerPoint PPT presentation

Number of Views:137
Avg rating:3.0/5.0
Slides: 19
Provided by: ChenC5
Category:

less

Transcript and Presenter's Notes

Title: Low Power, Fix Throughput 12bit Multiplier Design with 0.18um Dual Threshold


1
Low Power, Fix Throughput 12-bit Multiplier
Design with 0.18um Dual Threshold
  • Chen Chang, Changchun Shi
  • and Prof. Bora Nikolic
  • Berkeley Wireless Research Center

2
Motivation
  • Low power design promotes longer battery life in
    portable applications and reduces heat
    dissipation in high performance applications.
  • Virtual any DSP design requires the usage of
    multiplier
  • With technology miniaturization, tradeoff between
    performance and power become more pronounced

3
Problem Statement
  • How to identify each component of the power
  • How to reduce each component of the power
  • How to reduce power on different level of the
    design
  • System level
  • Architectural level
  • Circuit style level
  • Device technology level

4
Possible Solutions
  • Algorithm and system level
  • Architecture Level
  • Array, Wallace tree, split array, booth encoded
  • Pipelined, parallel structure
  • Circuit Level
  • Static CMOS vs. CPL
  • Device Technology level
  • Dual Threshold vs. Single Threshold
  • MTCMOS
  • Balancing critical path
  • Dual Threshold Domino logic

5
Proposed Comparison
  • With carry save array multiplier, compare power
    and performance using single and dual threshold
    devices, with various Vdd
  • With Wallace-tree multiplier, compare power and
    performance using single and dual threshold
    devices, with various Vdd
  • Compare between the results from array and
    Wallace-tree multiplier
  • Applying delay balancing technique by adding
    delay components to minimize spurious
    transitions, and compare

6
Conditions and Assumptions
  • 100MHz non-pipelined multiplier for wireless
    communication systems
  • 0.18um dual threshold Static CMOS circuits
  • Supply Voltage Choice of 1.0v, 1.1v, 1.2v, 1.3v,
    1.4v, 1.5v
  • No layout, but model long wire as capacitors

7
Wallace Tree Multiplier Algorithm using 4-to-2
Compressor
  • Stick-dot view, similar to
  • Daddas notition
  • Our Excel model that
  • Works on actual numbers.

8
Component Building Blocks
a)
b)
d)
c)
a) AOI unit, b) XOR, c)HA, d) FA
9
4 to 2 Compressor
  • critical path is abut
  • 3 xors in both cases
  • above is 42 with 4
  • Inputs, critical path is
  • In-sum
  • below is 42 with only
  • 3 inputs, critical path is
  • Cin-sum, also about
  • 3 xor (including Cout
  • Generation from
  • Previous bit

10
Array Multiplier with Carry Save adders
  • Three versions of Array adders were Built
  • All low leakage
  • All high speed
  • Mixed, with red
  • Box high speed
  • red box indicates devices are HS
  • yellow line indicate critical path

11
Wallace Tree Adder
  • simulink model
  • Cadence schematic
  • estimated size 0.13mm0.13mm. This implies about
  • 2 inverter gate caps needed on some of the long
    wires
  • again three versions were made, the middle range
  • Devices and the final adder are HS in the mixed
    version

12
test vectors
  • 10 carefully chosen input vector transitions are
    used
  • (about 2hours simulation time for each run, 10 is
    max)
  • among the 10, three transitions are shown below
  • a) triggers the critical path delay for array
    mult
  • b) triggers the critical path delay for Wallace
    tree mult
  • c) has large amount of transitions

a)
c)
b)
13
Delay of the multipliers with various Vdd
  • Wallace tree with/out wire cap model result
  • A difference of 15 performance difference
  • delay versus v follow analytic model
  • tdVdd/(vdd-vt-vdsat/2)
  • Wallace tree gives 20 speed winning over array
  • keep 10ns delay (100MHz) as margin is reasonable
  • the lowest Vdd for mixed tree mult is 1.1v
  • the lowest Vdd for mixed array mult is 1.3v
  • Mixed and HS have the same delay!

14
Power analysis--Leakage
  • Leakage power is 10-3 less than active power,
    so only when at
  • Concern leakage power in active mode
  • leakage power is about 10-1 factor difference
    is HS and LL, agree with Delta(Vth)0.1v
  • leakage power vdd2
  • Mixed structure can usually save 50 power
    versus HS structure while suffering no
  • Performance penalty, actual depends on the
    percentage of HS devices
  • array, tree gives about 20 less leakage than
    array, at same performance

15
Power analysis--active
  • while we expect mixed structure
  • Offer active power reduction by
  • Balancing the path better, our
  • Two structure are already quite
  • Parallel, so only 2 power reduction
  • From HS to Mixed mode, due to reducing
    spuriousTransition by using dual threshold
  • wallace tree has less transitions than array at
    same
  • Vdd less spurious transitions
  • wallace tree can further reduce Vdd to reduce
    active power as 1/vdd2 (useful
  • Transition power consumption

16
Power AnalysisSpurious Transitions
About half transitions Are spurious!
  • an array structure in excel is fully studied
  • With delay modeled
  • A lot spurious trans come from partial products
  • Early availability, dual vth cannt help much
  • by adding delay component to partial product
  • Generator output should helps!!
  • Wallace tree has much less profound improvement
  • Because only a few bits not processed parallely

17
Poweradding delay components
  • expect 45 active power reduction from excel
    simulation
  • actual simulation only gives 11 reduction in
    array, and 1
  • For tree.
  • the discrepancy comes from over simplification
    in excel model

18
Conclusion
  • utilization of dual threshold can save power at
    minimum impact on performance (Leakage 50 in
    both structures)
  • the amount of power saved improves with fewer
    critical path and more balance architecture
  • Dual threshold helps only 2 saving spurious
    transitions for both case
  • Delay components helps array save another 11
    spurious trans, while
  • Only 1.4 for tree
  • Architecture advantage of Wallce tree is clear,
    as it can also reduce Vdd to save active power.
  • Would like to fully extend the simulink model if
    time permited
Write a Comment
User Comments (0)
About PowerShow.com