Approaches to LowPower Implementations of DSP Systems - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Approaches to LowPower Implementations of DSP Systems

Description:

... logarithms together and then take the antilogarithm of the resulting summation. ... Antilogarithms of this two equations are: To correct the error the ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 23
Provided by: Kav36
Category:

less

Transcript and Presenter's Notes

Title: Approaches to LowPower Implementations of DSP Systems


1
Approaches to Low-Power Implementations of DSP
Systems
  • Class Advisor Dr. Fakhraie
  • Presentor Nariman Moezi
  • DSP Design Implementation Course Seminar
  • Spring 2004

2
Out line
  • Reduced twos complement representation
  • Low power Scheduling Techniques for embedded DSP
    software
  • Low power multiplier
  • - Mitchell-Based logarithm multiplier
  • - Power-Aware pipelined multiplier

3
Reduced twos complement representation
  • twos complement representation is widely used in
    the implementation of arithmetic operations.
  • If X has a small magnitude and switches between a
    positive and a negative value,its sign extension
    changes between strings of zeros and ones.
  • If X has magnitude less than 2m-1 (mltN), We van
    represent this number by the sum of an m-bit
    vector and a constant vector
    having a string of ones from bit N-1 to bit m-1
    at the MSB side

(Zhan Yu et al , 2002)
4
  • APPLICATION Low power FIR filter using Reduced
    Twos Complement Representation
  • Consider a hybrid-form adaptive FIR filter
    ,where the inputs are 5-level
    data symbols and take values in -2,-1,0,-1,2 .
  • Assuming coefficients are N-bit twos complement
    numbers
  • Such multiplications are simply shift and
    complement operations
  • Assume that we detect that the maximum magnitude
    of a coefficient H is less than 2m-2 .We know
    that corresponding partial product P has a
    magnitude less than 2m-1 .

5
Coefficient Maximum Magnitude Detection(An
example with two taps and 6 bit coefficients)
- Partial-Product generation using reduced twos
complement representation
6
As the adaptive filter updates the coefficients,
the word-length of the reduced representation
will change. So does the error introduced by
using the reduced representation.We can build a
compensation vector correction path that imitates
the error propagation in the accumulation path.
  • A test chip was implemented in 0.25 um CMOS
    technology.There were used a hybrid-form filter
    of 160 taps and having 8 taps per hybrid
    section.The coefficient word-length is 10
    bits.when operating at 2.5V with a 100MHz clock,
    a 32 power saving has been measured as
    summarized in this table

7
Low-Power Scheduling Techniques for Embedded DSP
Software
  • This section describes an instructional-level
    power model for a processor (Fujitsu) , and
    techniques to reduce the power of this processor.
  • The DSP processor has a special architecture that
    allows instructions to be packed into pairs.
  • The Booth multiplier on this processor is a major
    source of energy consumption for DSP programs.
  • So a micro-architectural power model for the on
    chip Booth-multiplier is developed and analyzed
    for further power minimization.
  • Based on this model, an effective technique of
    local code modification by operand swapping is
    used to further reduce power consumption.

(S. Malik,IEEE Trans 1997)
8
An example of a sequence four instructions where
the overhead cost between 1 and 3 can nat be
ignored
  • The sum of measured current for the four
    instructions is 204 mA.
  • The sum of the base costs (37.214.436.614.4)
    and the overhead costs of adjacent instructions
    (18.418.418.418.4) is only 176.2 ,which under
    estimates the actual cost by 13.6.
  • The difference ,27.8,in the two estimates comes
    from the circuit state overhead between
    non-adjacent instructions 13.
  • This is due to a special design at the inputs of
    the multiplier.there is a latch between each
    operand and multiplier to retain the the old
    values until the next multiply instruction is
    executed.
  • This overhead is dependent on the previous and
    current values of input latches for each multiply
    operation.

9
Instruction packing for lowpower
  • A special architecture of the target DSP
    processor is the capability of packing an
    ALU-type instruction and a data transfer
    instruction codeword for simultaneous execution .
  • The average current for packed instructions is
    only slightly more than the average current for a
    sequence of the two unpacked instructions.

Comparision of energy consumed by packed and
unpacked instructions
10
  • As to the overhead cost of MAC instructions, when
    MAC is packed with a data transfer instruction,
    especially LAB ,which changes data values in
    registers A and B used by MAC as inputs,
    significantly wide variation of overhead cost is
    observed(from 1.4mA to 33.0mA).
  • Such wide variation is mainly due to the complex
    booth multiplier implemented in the MAC unit.
  • The fundamental idea behind booth multiplier is
    to recode B by skipping over 1s technique.
  • For example a 7-digit B value 0011110 that would
    need four additions of shifted A,can be recoded
    to a new value which requires one addition and a
    subtraction
  • weight4 weight2

Micro architectural model for the booth multiplier
11
  • we can reduce the number of additions and
    subtractions by just swapping the operands in
    registers A and B, which can result in current
    reduction. The table gives three experiments
    where swapping

Variation of measured current by swapping
operands op1 and op2 in registers A and B for
MACLAB instructions.
  • Another that determines power consumption of the
    multiplier,is switching activity
  • For the booth multiplier the characteristic of A
    is its switching activity and for B, weight
    factor and switching activity

12
  • Average current drawn by MACLAB for different
    characteristics of consecutive values in A and B.
  • For a typical DSP application MACLAB
    instructions are usually applied to a sequence
    data for filter operations such as
  • As we know only C and there is no information
    about X we , consider C as the value B .If
    switching activity or weight factor of value C is
    high we can swap operands.

Comparison of power consumption for 5 DSP
programs by different scheduling techniques
13
Improved Mitchell-Based Logarithmic Multiplier
for Low-power DSP Applications
  • The technique of multiplying two numbers using
    logarithms is simple. Take the logarithms of two
    multiplicands, add the logarithms together and
    then take the antilogarithm of the resulting
    summation.
  • Mitchell method of calculating logarithms
  • assume N 2510 110012
  • The MSB is bit 4,that gives a characteristic of
    1002 and the retaining bits(10012) gives the
    fraction. This gives a value for the logarithm of
    100.10012 (4.562510).
  • The correct value of log2(25) is 4.6439.

(Duncan J. McLaren et al IEEE 2003)
14
  • A binary number N ,can be written as

Antilogarithms of this two equations are
Note that k represents the characteristic
and x the binary fraction,with x in the range 0lt
x lt 1. The true logarithm and the approximation
using the Mitchell method are
The logarithm of a product is equal to the
sum of the logarithms of the multiplicands
To correct the error the following is used
15
  • This shows that to provide the correct answer, an
    error correction factor should be added to the
    summation before the antilogarithm is calculated.
  • however this would be impractical. The approach
    is to average the value of the correction factor
    over a range of x values, and add this to the
    summation. This results in a multiplier of
    improved accuracy.
  • multiplier of improved accuracy. The two
    fractional parts are split into 8 ranges, from 0
    to 1 in steps of 0.125. This means that the 3
    most significant bits of x can be used to
    determine the error correction factor (which is
    pre calculated).

16
  • To test the multiplier further, it was used as
    part of a real application, in this case a Finite
    Impulse Response (FIR) Filter. The filter was an
    11-tap low-pass FIR, with a normalized cut-off
    frequency of 0.25. The filter was implemented in
    Verilog using the standard multiplier, the
    un-modified Mitchell multipliers and the Improved
    Mitchell multipliers. The input was 16-bit and
    the output was 32-bit. The figure below shows the
    magnitude response from each of the three
    implementations.

17
Power-aware Pipelined Multiplier Design Based
On2-Dimensional Pipeline Gating
  • Although Boolean multipliers have natural power
    awareness to the changing of input precision,
    deeply pipelined designs do not have this
    benefit.
  • In Boolean unpipelined multipliers, low input
    precision calculation (like 00010001) dissipates
    much less power than high input precision
    calculation (like 11111111). So Boolean
    unpipelined multipliers are naturally power aware
    to the changing of input precision.
  • In deeply pipelined designs, the number
  • of registers is much larger than that of
  • other elements, these designs do not have
  • the natural power awareness to the
  • changing of input precision.

(Jia Di, J. S. Yuan et al GLSVLSI 2003)
18
  • To solve this problem and improve the power
    awareness of deeply pipelined multipliers,a
    novel technique,2-dimensional pipeline gating is
    proposed.This technique is to gate the clock to
    the registers in both vertical and horizontal
    direction.

19
  • In a 44 multiplier , when the input precision is
    4, for example, calculating 11111111, S is
    generated based on all inner partial products. If
    the input precision is 2, for example,
    calculating 00110011, the partial products
    containing X2 or Y2 (the ones enclosed by a
    rectangular) can also be disabled.

20
(No Transcript)
21
(No Transcript)
22
References
  • M. T. Lee, V. Tiwari, S. Malik, and M. Fujita,
    Power analysis and minimization techniques for
    embedded DSP software," IEEE Trans. VLSI Syst.,
    vol. 5, pp. 123-135, Mar. 1997.
  • Jia Di, J. S. Yuan et al,Power-aware Pipelined
    Multiplier Design Based On 2-Dimensional Pipeline
    Gating GLSVLSI03, April 28-29, 2003
  • Zhan Yu et al,A Low Power Adaptive Filter Using
    Dynamic Reduced 2SC Representation,IEEE Custom
    Integrated Circuits Conference 2002
  • Duncan J. McLaren et al,Improved Mitchell-Based
    Logarithmic Multiplier for Low Power DSP
    ApplicationsIEEE 2003
Write a Comment
User Comments (0)
About PowerShow.com