Title: Reducing Complexity in Signal Processing Algorithms for Communication Receiver and Image Display Software
1Reducing Complexity inSignal Processing
Algorithms forCommunication Receiver andImage
Display Software
Wireless Networking and Communications Group
Brian L. Evans Prof. Brian L. Evans
Seminar at the American University of Beirut
27 July 2010
2Outline
- Embedded digital systems
- Generating sinusoidal waveforms
- Discrete-time filters
- Multicarrier equalizers
- Image halftoning algorithms
- Conclusion
2004
2005
2006
2007
2008
2009
2010
3Embedded Digital Systems
- Often work on application-specific tasks
- In consumer products (2008 units)
- 1200M cell phones 70M DSL modems
- 300M PCs 55M
cars/light trucks - 100M digital cameras 30M gaming consoles
(2007) - 100M DVD players
- iPhone has six programmable processors (2008)
- Embedded programmable processors
- Inexpensive with small area and volume
- Predictable off-chip input/output (I/O) rates
- Low power (TI C5504 45mW _at_ 300MHz)
Limited on-chip memory Fixed-point arithmetic
4Embedded Digital Systems
- Memory access in processors
- External I/O block data transfers to/from
on-chip memory - Internal I/O on-chip memory to CPU registers
using data buses (e.g. TI C6000 processor has two
32-bit data buses) - Common word sizes for signal processing software
- 64-bit floating-point for desktop computing (e.g.
Matlab) - 32-bit floating-point for pro-audio and sonar
beamforming - 16-bit fixed-point for speech, consumer audio,
image proc. - IEEE floating-point operations
- Handles many special cases (e.g. 8, -8 and not a
number) - Add, multiply, divide have comparable hardware
complexity
5Embedded Digital Systems
- Fixed-pointoperations
- Multiplicationbased on addition operations
- Division takes 1-2instructions perbit of
accuracy - Multiplication canconsume muchdynamic power
- Truncate constantsfor power savings
56
Multiplier used in TI C64 processors
Han, Evans Swartzlander, 2005
6Generating Sinusoidal Waveforms
- Sample continuous-time cosine signal at rate fs
- Discrete-time fixed frequency ?0 2 ? f0 / fs
- Example f0 1200 Hz and fs 8000 Hz, ?0 3/10
? - Discrete-time realization drops fs term in front
of cosine - Math library call to cos function in C
- Uses double-precision floating-point arithmetic
- No standard in C for internal implementation
- Generally meant for high-accuracy desktop
calculations - Call to gsl_sf_cos_e in GNU scientific library
1.8 - 20 multiply, 30 add, 2 divide, 2 power
calculations/output
7Generating Sinusoidal Waveforms
- Difference equation with input xn and output
yn - yn (2 cos ?0) yn-1 - yn-2 xn - (cos
?0) xn-1 - From inverse z-transform of z-transform of cos(?0
n) un - Impulse response gives cos(?0 n) un
- 2 multiplications and 3 adds per output value
- Buildup in error as n increases due to feedback
- Lookup table pre-compute samples offline
- Discrete-time frequency ?0 2 ? f0 / fs 2 ? N
/ L - All common factors between integers N and L
removed - ? 2 ? k 2 ? (N / L) n ? n L ? store L
samples - Entries in either floating-point or fixed-point
format - Table would contain N periods of the cosine
Initial conditions are all zero
8Generating Sinusoidal Waveforms
- Signal quality vs. implementation complexity in
generating cos(?0 n) un with ?0 2 ? N / L
Method MACs/ sample ROM (words) RAM (words) Quality in floating pt. Quality in fixed point
C math library call 30 22 1 Second Best N/A
Difference equation 2 2 3 Worst Second Best
Lookup table 0 L 0 Best Best
MAC Multiplication-accumulationRAM Random
Access Memory (writeable) ROM Read-Only
Memory
9Discrete-Time Filters
- Finite impulse response (FIR) filter
- Impulse response hk has finite extent k 0,,
M-1
xk-1
z-1
z-1
z-1
xk
h0
h1
h2
hM-1
S
yk
Discrete-time convolution
10Discrete-Time Filters
- Infinite impulse response (IIR) filter
- Biquad building block 2 poles and 0-2 zeros
- Generally, coefficients a1, a2, b0, b1, b2 are
real-valued
Biquad is short for biquadratic- transfer
function is ratio of two quadratic polynomials
11Discrete-Time Filters
FIR Filters IIR Filters
Implementation complexity (1) Higher Lower (sometimes by factor of four)
Minimum order design Parks-McClellan (Remez exchange) algorithm (2) Elliptic design algorithm
Stable? Always May become unstablewhen implemented (3)
Linear phase If impulse response is symmetric or anti-symmetric about midpoint No, but phase may made approximately linear over passband (or other band)
(1) For same piecewise constant magnitude
specification(2) Algorithm to estimate minimum
order for Parks-McClellan algorithm by
Kaiser may be off by 10. Search for minimum
order is often needed.(3) Algorithms can tune
design to implementation target to minimize risk
12Discrete-Time Filters
- Keep roots computed by filter design algorithms
- Polynomial deflation (rooting) reliable in
floating-point - Polynomial inflation (expansion) may degrade
roots - Choice of IIR filter structure matters
- Direct form IIR structures expand zeros and
poles, and may become unstable for large order
filters (order gt 12) - Cascade of biquads expands zeros and poles in
each biquad - Minimum order design not always most efficient
- Efficiency depends on target implementation
- Consider power-of-two coefficient design
- Efficient designs may require search of 8 design
space
13Halftime AUB Summer 2005
- EECE 503 Real-Time DSP Lab
- Embedded digital systems
- Generating sinusoidal waveforms
- Discrete-time filters
- Multicarrier equalizers
- Image halftoning algorithms
- Conclusion
14Channel Equalization
- Channel degrades transmitted signal
- Nonlinear distortion, e.g. amplitude
nonlinearities - Linear distortion, e.g. convolution by channel
impulse response - Additive noise, e.g. thermal (Gaussian) and
impulsive - Equalization compensates linear distortion
- Spreading/attenuation in time
- Magnitude/phase distortion in frequency
15Multicarrier Modulation
- Divide channel into narrowband subchannels
- Discrete multitone modulation
- Baseband transmission based on fast Fourier
transform (FFT) - Each subchannel carries single-carrier
transmission - Standardized for digital subscriber line (DSL)
communication
channel
carrier
magnitude
subchannel
frequency
Subchannels are 4.3 kHz wide in DSL systems
16Channel Equalization
nk
Channel
Equalizer
- Equalizer
- Shortens channelimpulse response(time domain
eq.) - Compensates phase/magnitude distortion(freq.
domain eq.) - Single carrier system g is scalar constant
- FIR filter w performs time and frequency domain
equalization - Multicarrier system g is FIR filter of length
n1 - Time domain equalizer (w) then FFT freq.
domain equalizer
yk
xk
rk
ek
h
w
Training signal
-
Ideal Channel
Receiver generates xk
g
z-?
Discretized Baseband System
Equalization in DSL receivers increases bit rate
by 10x
17Multicarrier Equalization
- Maximum shortening SNR time domain equalizer
- Minimize energy leakage outside shortened channel
length - For each position of window ? Melsa, Younce
Rohrs, 1996 - Cholesky decomposition of Bleads to optimal
eigensolution - Computationally-intensive O(Lw3)
- Floating-point multiplications/divisions
- Restricts TEQ length to be less than n1
18Time Domain Equalizer Design
Bit Rate (Mbps)
TEQ length of 17 Data rates averaged over eight
standard DSL test lines Martin et al., 2006
Training complexity in log10(multiply-add
operations)
Most efficient floating-point versions of
algorithms used
19Time Domain Equalizer Design
- Unified framework Martin et al., 2006
- A and B are square (Lw ? Lw) and depend on choice
of ? - Constraint prevents trivial non-practical
solution w 0 - Find eigenvector for largest generalized
eigenvalue - Formulation
- Power method
- Alternating
- Lagrangian
Iterative Methods
division-free
20 iterations to converge for 17-tap MSSNR TEQ
design
20Digital Image Halftoning
- For display on devices with fewer bits
ofgray/color resolution than original image - Grayscale 8-bit image to 1-bit image
- Color 24-bit RGB image to 12-bit RGB display
- Produces artifacts
Each pixel in original image is 8-bit unsigned
intensity in 0, 255 For display, 0 is black
and 255 is white
21Quantization with Feedback
- Consider 4-bit data on 2-bit display (unsigned)
- Feedback quantization error
- For constant input 1001 9
- Average output value
- ¼ (10101011) 1001
- 4-bit resolution at DC !
- Noise shaping
- Truncating from 4 to 2 bits increases noise by
12dB - Feedback removes noise at DC increases HF noise
Adder Inputs
OutputTime Upper Lower Sum to display
1 1001 00 1001 10 2
1001 01 1010 10 3 1001
10 1011 10 4 1001 11
1100 11
Periodic
22Error Diffusion Halftoning
- Quantize each pixel
- Diffuse filtered quantization error to future
pixels
current pixel
Floyd Steinberg, 1976
error filter weights
23Error Diffusion Halftoning
Artifact Model Compensation Added Complexity
Sharpening Linear Sharpnesscontrol 1 multiplication and1 addition per pixel
False textures Nonlinear Deterministic bit flipping quantization 1 comparison per pixel
- Deterministic bit flipping quantizer
(DBF)Damera-Venkata Evans, 2001 - Thresholds input to black (0) or white (255)
- Flip quantized value about mid-gray (128)
- Reduces false textures in mid-grays
- Implemented with two comparisons
DBF(x)
255
x
128
x2
x1
24Sharpness Control
- Model quantizer as gain plus noise Kite, Evans
Bovik, 1997
- Signal transfer function models sharpening
- Ks 2 for Floyd-Steinberg
- Noise transfer function models noise-shaping
- Kn 1
Ks 2
Plots for ideal lowpass H(?)
25Sharpness Control
- Adjust by threshold modulation Eschbach Knox,
1991 - Scale image by gain L and add it to quantizer
input - Flatten signal transfer function Kite, Evans
Bovik, 2000
26Results
Floyd-Steinberg
Original
DBF quantizer
Unsharpened
27Conclusion
- Processor architecture
- Decrease data sizes to reduce on-chip memory
usage and increase data bus efficiency - Truncate multiplicand constants to reduce power
- Compute signal values by recursion or lookup
table - Algorithm design
- Keep offline design results in full precision
until end - Order of calculations matters in implementation
- Exploit problem structure in developing
fixed-point algorithms - Linearize nonlinear systems to leverage linear
system methods - Many other ways to reduce complexity exist
28Invitations
- Panel discussion on graduate studies
- Tomorrow (Wednesday) 130 230 pm in this room
(RCR) - Panelists Prof. Zaher Dawy (AUB), Prof. Imad
El-Hajj (AUB) and Prof. Brian Evans (UT Austin) - IEEE Workshop on Signal Processing Systems
- Early October 2011
- Short walk from the AUB campus
- Organizers include Prof. Magdy Bayoumi (Univ. of
Louisiana at Lafayette), Prof. Brian Evans (UT
Austin), Dean Ibrahim Hajj (AUB) and Prof.
Mohammad Mansour (AUB)
29Thank You!
30Digital Signal Processors
DSP Processor Market
- Market
- 1/3 of 25B embedded digital signal processing
market - 2007 cholesterol loweringPzifer Lipitor sales
13B - Applications (2007)
Source Forward Concepts
Source Forward Concepts
31Screening (Masking) Methods
Introduction
- Periodic thresholds to binarize image
- Periodic application leads to aliasing (gridding
effect) - Clustered dot screening is more resistant to ink
spread - Dispersed dot screening has higher spatial
resolution - Blue larger masks (e.g. 1 by 1)
Clustered dot mask
Dispersed dot mask
index
Threshold Lookup Table
32Linear Gain Model for Quantizer
- Extend sigma-delta modulation analysis to 2-D
- Linear gain model for quantizer in 1-D Ardalan
and Paulos, 1988 - Linear gain model for grayscale image Kite,
Evans, Bovik, 1997 - Error diffusion is modeled as linear,
shift-invariant - Signal transfer function (STF) quantizer acts as
scalar gain - Noise transfer function (NTF) quantizer acts as
additive noise
us(m)
Ks us(m)
Signal Path
u(m)
b(m)
n(m)
un(m)
un(m) n(m)
Noise Path
33Spatial Domain
34Magnitude Spectra
35Human Visual System Modeling
- Contrast at particular spatialfrequency for
visibility - Bandpass non-dimbackgroundsManos Sakrison,
1974 1978 - Lowpass high-luminance officesettings with
low-contrast imagesGeorgeson G. Sullivan,
1975 - Exponential decay Näsäsen, 1984
- Modified lowpass versione.g. J. Sullivan, Ray
Miller, 1990 - Angular dependence cosinefunction Sullivan,
Miller Pios, 1993
36Linear Gain Model for Quantizer
Analysis and Modeling
- Best linear fit for Ks between quantizer input
u(m) and halftone b(m) - Stable for Floyd-Steinberg
- Can use average value to estimate Ks from only
error filter - Sharpening proportional to Ks Kite, Evans
Bovik, 2000 - Value of Ks Floyd Steinberg lt Stucki lt Jarvis
- Weighted SNR using unsharpened halftone
- Floyd-Steinberg gt Stucki gt Jarvis at all viewing
distances