High Speed FIR Filter Implementation Using Add and Shift Method - PowerPoint PPT Presentation

1 / 25

About This Presentation

Title:

High Speed FIR Filter Implementation Using Add and Shift Method

Description:

High Speed FIR Filter Implementation Using Add and Shift Method ... (Distributed Arithmetic based, adder-shifter based, multiplier-adder based) ... – PowerPoint PPT presentation

Number of Views:241

Avg rating:5.0/5.0

Slides: 26

Provided by: eceU1

Category:

more less

Transcript and Presenter's Notes

Title: High Speed FIR Filter Implementation Using Add and Shift Method

1
High Speed FIR Filter Implementation Using Add
and Shift Method

Shahnam Mirzaei, Anup Hosangadi, Ryan Kastner
University of California, Santa Barbara
ICCD 2006
San Jose, California
October 2006

UC Santa Barbara
ICCD 2006
2
Outline

Introduction
FIR filter implementation
Traditional Methods
MAC (Multiply Accumulate) implementation
DA (Distributed Arithmetic) implementation
New method
Add and Shift method and CSE (Common
Subexpresssion Elimination)
Experiments and results
Resource utilization
Power consumption
Conclusion

UC Santa Barbara
ICCD 2006
3
Introduction

Extensive use of FPGAs in computationally
intensive applications such as DSP
More available logic resources in current FPGAs
Broad applications of FIR filters in multimedia
and communications
Need to efficient design methods to save
area/power
Research motivation
Develop a more efficient implementation method
for FIR filters that consumes less area at
comparable performance.
Develop a unified tool for performing redundancy
elimination, scheduling and module assignment.
Perform physically aware optimizations.
Architecture design exploration for ASIC and FPGA
implementations (Distributed Arithmetic based,
adder-shifter based, multiplier-adder based).

UC Santa Barbara
ICCD 2006
4
FIR FilterMAC Implementation

L tap FIR filter
Convolution of the latest L input samples. L is
the number of coefficients h(k) of the filter,
and x(n) represents the input time series.
yn ? hk xn-k k 0,
1, ..., L-1

Disadvantages
Large area on FPGA due to multipliers and the
fact that full flexibility of general purpose
multipliers are not required
Limited number of embedded resources such as MAC
engines, multipliers, etc. in FPGAs

UC Santa Barbara
ICCD 2006
5
FIR FilterDA (Distributed Arithmetic)
Implementation

An alternative to MAC implementation which is the
most common FPGA FIR implementation due to the
LUT rich architecture of FPGAs.
yn ? cn xn n 0, 1, , N-1
Variable xn can be represented by
x n ? xb n 2b b0, 1, , B-1
xb n
0, 1
where xb n is the bth bit of xn and B is the
input width. The inner product can be rewritten
as follows

UC Santa Barbara
ICCD 2006
6
FIR FilterDA (Distributed Arithmetic)
Implementation (contd)

y ? cn ? xb k 2b
c0 (xB-1 02B-1 xB-2 0 2B-2
x0 020 )
c1 (xB-1 1 2B-1 xB-2 1 2B-2
x0 1 20 )
cN-1 (xB-1 N-1 2B-1 xB-2 0 2B-2
x0 N-1 20 )
(c0 xB-1 0 c1 xB-1 1 cN-1
xB-1 N-1) 2B-1
(c0 xB-1 0 c1 xB-2 1
cN-1 xB-2 N-1) 2B-2
(c0 x0 0 c1 x0 1 cN-1
x0 N-1) 20
? 2b ? cn xb k
where n0, 1, , N-1 and b0, 1, , B-1

UC Santa Barbara
ICCD 2006
7
DA (Distributed Arithmetic) ImplementationSerial

A Serial DA Filter Block Diagram

n1 clock cycles are needed for an n
but input symmetrical filter to
generate the output.
Performance is limited by the fact
that the next input sample can be
processed only after every bit of the
current input samples are processed
The tradeoff here is performance for
area

UC Santa Barbara
ICCD 2006
8
DA (Distributed Arithmetic) ImplementationParalle
l

The performance of the circuit can
be improved by modifying the
architecture to a parallel architecture
which processes the data bits in
groups
Increasing the number of bits
sampled has a significant effect on
resource utilization on FPGA.
More LUTs
Larger size scaling accumulator

A 2 bit parallel DA Filter Block Diagram
UC Santa Barbara
ICCD 2006
9
CSE (Common Subexpression Elimination)

Linear systems can be modeled using polynomials.
Expressions consist of ,-,ltlt operators.
Polynomial formulation

C X ?(XLi)
(14)10 X (1110)2 X
Xltlt3 Xltlt2 Xltlt1 XL3
XL2 XL1
UC Santa Barbara
ICCD 2006
10
CSE Example
Y0 X0 X1 X2 X3 Y1 2X0 X1 X2
2X3 Y2 X0 X1 X2 X3 Y3 X0 2X1
2X2 X3
Y0 1 1 1 1 X0
Y1 2 1 -1 -2 X1
Y2 1 -1 -1 1 X2 Y3
1 -2 2 -1 X3
Y0 X0 X1 X2 X3 Y1 X0L X1 X2
X3L Y2 X0 X1 X2 X3 Y3 X0 X1L
X2L X3
UC Santa Barbara
ICCD 2006
11
CSE Example

D0 (X0 X3)
D1 (X1 X2)

Y0 X0 X1 X2 X3 Y1
X0L X1 - X2 - X3L Y2 X0 - X1 -
X2 X3 Y3 X0 - X1L X2L - X3
Y0 D0 X1 X2 Y1 X0L
X1 - X2 - X3L Y2 D0 - X1 - X2 Y3
X0 - X1L X2L - X3
UC Santa Barbara
ICCD 2006
12
CSE Example

D2 (X1 X2)
D3 (X0 X3)

Y0 D0 D2 Y1 X0L D1 - X3L Y2
D0 - D2 Y3 X0 - D1L - X3
UC Santa Barbara
ICCD 2006
13
CSE Example
12 additions 4 shifts
Y0 X0 X1 X2 X3 Y1
X0L X1 - X2 - X3L Y2 X0 - X1 -
X2 X3 Y3 X0 - X1L X2L - X3
D0 X0 X3 Y0 D0 D2 D1 X1
X2 Y1 D1 D3L D2 X1
X2 Y2 D0 - D2 D3 X0 -
X3 Y3 D3 D1L
8 additions 2 shifts
UC Santa Barbara
ICCD 2006
14
FIR Filter Add/Shift ImplementationReplacing
Constant Multiplication by Multiplier Block
UC Santa Barbara
ICCD 2006
15
FIR Filter Add/Shift ImplementationRegistered
Adder at no Additional Cost
UC Santa Barbara
ICCD 2006
16
Extracting Common Subexpressions
F1 A B C D F2 A B C E
Optimization
Extracting Common Expression (A B C)
Unoptimized Expression Trees
Extracting Common Expression (A B)
UC Santa Barbara
ICCD 2006
17
Synchronization

Extra registers are needed to
synchronize the intermediate values,
such that new values for A,B,C,D,E,F
can be read in every clock cycle

Calculating registers required for fastest
evaluation
UC Santa Barbara
ICCD 2006
18
Experiment ResultsResource Utilization/Performanc
e
Filter Implementation Using Add and Shift Method
Filter Implementation Using Xilinx Coregen (PDA)
UC Santa Barbara
ICCD 2006
19
Experiment ResultsResource Utilization
UC Santa Barbara
ICCD 2006
20
Experiment ResultsPower Consumption
UC Santa Barbara
ICCD 2006
21
Creating MAC Filters Using Xilinx Coregen
UC Santa Barbara
ICCD 2006
22
Experiment ResultsComparison with MAC Filters
Using Multiplier Blocks
UC Santa Barbara
ICCD 2006
23
Experiment ResultsComparison with MAC Filters
Using Multiplier Blocks Resource Utilization
UC Santa Barbara
ICCD 2006
24
Experiment ResultsComparison with MAC Filters
Using Multiplier Blocks - Performance
UC Santa Barbara
ICCD 2006
25
Conclusion/Observations

Presented a multiplierless technique, based on
the add and shift method and common subexpression
elimination for low area, low power and high
speed implementations of FIR filters.
Validated our techniques on Virtex II/IV devices
where we observed significant area and power
reductions over traditional Distributed
Arithmetic based techniques.
an average reduction of 58.7 in the number of
LUTs, and about 25 reduction in the number of
slices and FFs.
Better performance in most of the cases even
though our algorithm does not optimize for
performance
Observed up to 50 reduction in dynamic power
consumption
Higher performance as the filter size increases.
Critical path in our design consists of adders
while in MAC method, critical path consists of
multipliers and adders.