Title: ICASSP 2004 Poster Slides
1ICASSP 2004 Poster Slides
2Abstract
- This poster presents an automatic approach for
minimizing the number of additions required for a
multiplierless implementation of a signal
transform, under an arbitrary quality constraint. - Multiplierless implementation means
multiplications by constants are realized as
networks of shifts and additions (e.g., y 5x ?
y xltlt2 x). - Cost of implementation number of additions
- Higher precision higher output quality but also
higher cost there is a tradeoff.
3Design Flow Challenges
Choosing a robust algorithm requires
understanding DSP concepts and literature.
An exponentially large number of different
precision configurations exist.
Reducing the precision of a constant
unpredictably impacts output.
- Given a transform
- We automatically select a numerically robust
algorithm - We automatically tune the constant precisions
4DSP Transform Algorithms
- We consider the following transforms
5SPIRAL
- Automatically implements, optimizes DSP
algorithms. - Searches across many formulas, finding one with
minimal runtime. - We use it to generate robust algorithms.
transform
controls
algorithm generation
algorithm
controls
search engine
algorithm compilation
C code
runtime
runtime measurement
platform-adapted implementation
6Robust Algorithm Example DCT-II 8
- Formula generated by SPIRAL
- Rotation-based algorithms are selected for
robustness.
7Increasing Algorithm Robustness
- Automatically convert Rotations to Lifting Steps
(LS)
Targets for approximation
- Rounding error in 1st LS (3rd LS analogous)
not magnified
e is magnified unless ? in 0, ?/2 or 3?/2,
2?
Solution angle manipulation
8Multiplierless Implementation
Constant multiplies are converted to shifts and
additions.
Arobust
Amultiplierless
- An algorithm constant c is approximated as
. (n denotes of fractional bits) - Example
Direct
c 0.10011100100101
6 adds (6 shifts)
Canonical Signed Digit (CSD)
c 0.10100100100101
5 adds (5 shifts)
Addition Chains(our method)
4 adds (5 shifts)
9Converting Constant Multiplies to Shifts and Adds
Addition Chains outperform CSD
10Quality Measures
- Transform quality must be maintained when
minimizing cost. - Measures we consider
- Coding Gain
- MP3 decoder compliance rating(Non-Compliant,
Limited Accuracy, or Fully Compliant) - Peak Signal to Noise Ratio of JPEG decompressed
image, D, to original image, O
11Search Space
- A precision list, , is
associated with an algorithm having n constants.
- If max. bitwidth, B, is 19, at most 5 adds are
needed per constant - Goal Find a precision list s.t. of additions
is minimized and quality threshold is met. - Size of search space 6n32-point DCT-II has 80
constants, size 680, exhaustive search
infeasible
12Global Greedy Search
- Global
- Same bitwidth assumed for all constants
- Exhaustive search over all B1 possibilities
- Greedy
- Set each constant to max. precision
- Each constant is reduced in turn to require one
fewer addition quality is evaluated (n
evaluations) - Choose the config. whose quality is highest
- Continue until Qthresh is not satisfied by any
config.
13Evolutionary Search
- Mimics natural process of evolution
- Random configs. are chosen
- For a set of generations, random members are
introduced, mutated, and crossbred. Only the
fittest proceed to the next generation.
Mutation
Randomly change precision.
Crossbreeding
Swap precisions.
14Experiments
15Experimental Results
Number of additions is significantly reduced
transform quality is maintained.
16Experimental Results
Progress of the evolutionary algorithm during E1.
17Applying Global Search to the DCT-II within MP3.
Experimental Results
Quality Measure MP3 Compliance Rating Limited
Accuracy(RMS lt 1.4e-4, MaxDiff lt 8) Achieved
whenglobal bitwidth 9 Fully Compliant(RMS lt
8.8e-6, MaxDiff lt 6.1e-5) Achieved when global
bitwidth 13
Reference input is decoded and compared to
reference output. RMS (Root Mean Squared) error
and MaxDiff error are computed.
18Searching to find a low-cost DCT-II for JPEG.
Experimental Results
Quality Measure PSNR (dB) Thresholds 32, 32.5,
, 34.5 (dB) Global best for lowest
thresholds Greedy, Evol best for larger
thresholds. Greedy, Evol often produce same
result for a given threshold.
19Acknowledgments
- This work was supported by NSF through awards
0234293, 0310941, and 0325687.
Primary References
- J. Liang and T.D. Tran, Fast Multiplierless
Approximations of the DCT with the Lifting
Scheme, IEEE Trans. on Sig. Proc., vol. 49, no.
12, pp. 30323044, 2001. - M. Pueschel, et. al., SPIRAL A Generator for
Platform-Adapted Libraries of Signal Processing
Algorithms, Journal of High Performance
Computing and Applications, special issue on
Automatic Performance Tuning, 18(1), pp. 21-45,
2004. http//www.spiral.net