Is F Better than D - PowerPoint PPT Presentation

1 / 22

About This Presentation

Title:

Is F Better than D

Description:

David Hansen and James Michelussi – PowerPoint PPT presentation

Number of Views:62

Avg rating:3.0/5.0

Slides: 23

Provided by: JohnD318

Category:

more less

Transcript and Presenter's Notes

Title: Is F Better than D

1
Is F Better than D

David Hansen and James Michelussi

2
Introduction

Discrete Fourier Transform (DFT)
Fast Fourier Transform (FFT)
FFT Algorithm Applying the Mathematics
Implementations of DFT and FFT
Hardware Benchmarks
Conclusion

3
DFT

In 1807 introduced by Jean Baptiste Joseph
Fourier.
allows a sampled or discrete signal that is
periodic to be transformed from the time domain
to the frequency domain
Correlation between the time domain signal and N
cosine and N sine waves

X(k) DFT Frequency Signal N Number of Sample
Points X(n) Time Domain Signal WN Twiddle
Factor
4
DFT (Walking Speed)

Why is this important? Where is this used?
allows machines to calculate the frequency domain
allows for the convolution of signals by just
multiplying them together
Used in digital spectral analysis for speech,
imaging and pattern recognition as well as signal
manipulation using filters
But the DFT requires N2 multiplications!

5
FFT (Jet Speed)

J. W. Cooley and J. W. Tukey are given credit for
bringing the FFT to the world in the 1960s
Simply an algorithm for more efficiently
calculating the DFT
Takes advantage of symmetry and periodicity in
the twiddle factors as well as uses a divide and
conquer method
Symmetry WNr N/2 -WNr
Periodicity WNrN WNr
Requires only (N/2)log2(N) multiplications !
Faster computation times
More precise results due to less round-off error

6
FFT Algorithm

Several different types of FFT Algorithms
(Radix-2, Radix-4, DIT DIF)
Focus on Radix-2 using Decimation in Time (DIT)
method
Breaks down the DFT calculation into a number of
2-point DFTs
Each 2-point DFT uses an operation called the
Butterfly
These groups are then re-combined with another
group of two and so on for log2(N) stages
Using the DIT method the input time domain points
must be reordered using bit reversal

7
Butterfly Operation
8
Bit Reversal
9
8-Point Radix-2 FFT Example
10
8-Point Radix-2 FFT Example
11
Implementations of DFT and FFT

David Hansen

12
DFT Implementation
for (r0 rltsamples/2 r) float re 0.0f,
im 0.0f float part (float)r -2.0f PI /
(float)samples for (k0 kltsamples
k) float theta part (float)k re
data_ink cos(theta) im data_ink
sin(theta)

Nested For Loop, (N/2)N Iterations O(N2)
63027.41 Cycles / Sample (123 cycles per inner
loop iteration)
Obvious Inefficiencies, cos and sin math.h
functions
Efficient assembly coding could reduce the inner
loop to 3 cycles per iteration (1,536 cycles /
sample)

13
C FFT Implementation
void fft_float (unsigned NumSamples, float
RealIn, float ImagIn, float RealOut,
float ImagOut ) for ( i0 i lt NumSamples
i ) // Iterate over the samples and
perform the bit-reversal j ReverseBits
( i, NumBits ) BlockEnd 1 //
Following loop iterates Log2(NumSamples) for
( BlockSize 2 BlockSize lt NumSamples
BlockSize ltlt 1 ) // Perform Angle
Calculations (Using math.h sin/cos) //
Following 2 loops iterate over NumSamples/2
for ( i0 i lt NumSamples i BlockSize )
for ( ji, n0 n lt BlockEnd
j, n ) // Perform
butterfly calculations
BlockEnd BlockSize
14
C FFT Implementation

Bit-Reverse For Loop N iterations
Nested For Loops
First Outer Loop Log2(N) iterations
Made use of sin/cos math.h functions
Second Outer Loop N / BlockSize iterations
Inner Loop BlockSize/2 iterations
O(N Log2(N) N/BlockSize BlockSize/2)
O(NNLog2(N))
193.84 Cycles / Sample

15
Assembly FFT Implementation

Bit-Reverse Address Generation
Hide Bit-Reverse operation inside first and
second FFT Stages
Sin and Cos values stored in a Look-Up-Table
256 Kbyte LUT added to Data1
Needed to grow Data1 Memory Space using LDF file
Interleaved Real and Imaginary Arrays
Quad Reads Loads 2 Complex Points per Cycle
Supports the Real FFT for input signals with no
Imaginary component
40 Algorithm-based Savings

16
Assembly FFT Implementation

Special Butterfly Instruction
Can perform addition/subtraction in parallel in
one compute block
Speeds up the inner-most loop
VLIW and SIMD Operations
Performs simultaneous operations in both compute
blocks
Loop unrolling and instruction scheduling keeps
the entire processor busy with instructions.
11.35 Cycles per Sample

17
Assembly FFT Implementation
_BflyLoop qj24r2726 k5k5k9 fr6r30r12 fr16r6-r7 yr30qj04 k3k5 and k4 fr15r23r4 fr24r8r18, fr26r8-r18 xr30qj04 r54lk7k3 fr7r31r13 fr25r9r19, fr27r9-r19 qj14r2524 fr14r30r13 fr17r14r15 qj24r2726 k5k5k9 fr6r2r4 fr18r6-r7 yr118qj04 k3k5 and k4 fr15r31r12 fr24r20r16, fr26r20-r16 xr118qj04 r1312lk7k3 fr7r3r5 fr25r21r17, fr27r21-r17 qj14r2524 fr14r2r5 fr19r14r15 qj24r2726 k5k5k9 fr6r10r12 fr16r6-r7 yr2320qj04 k3k5 and k4 fr15r3r4 fr24r28r18, fr26r28-r18 xr2320qj04 r54lk7k3 fr7r11r13 fr25r29r19, fr27r29-r19 qj14r2524 fr14r10r13 fr17r14r15 qj24r2726 k5k5k9 fr6r22r4 fr18r6-r7 yr3128qj04 k3k5 and k4 fr15r11r12 fr24r0r16, fr26r0-r16 xr3128qj04 r1312lk7k3 fr7r23r5 fr25r1r17, fr27r1-r17 .align_code 4 if NLC0E, jump _BflyLoop
18
DC FFT Test

FFT Source Array

FFT Output Magnitude

19
Audio FFT Test

FFT Source Array

FFT Output Magnitude

20
1024 Point DFT / FFT Comparison
Implementation Cycles Per Sample
DFT Implemented in C 63,027.41 cycles / sample
DFT Implemented in Assembly 1,536 cycles / sample
FFT Implemented in C 193.85 cycles / sample
FFT Implemented in Assembly 11.35 cycles / sample
21
1024 Point Radix-2 FFT Hardware Comparison
Processor Architecture Cycles Per Sample Processor Frequency Execution Time
ADSP-21369 (SHARC) 8.98 cycles / sample 400 MHz 22.99 µSec
TigerSHARC (website) 9.16 cycles / sample 600 MHz 15.63 µSec
TigerSHARC (our results) 11.35 cycles / sample 600 MHz 19.37 µSec
TMS320C6000 14.125 cycles / sample 350 MHz 41.33 µSec
TMS320DM644x 7.59 cycles / sample 594 MHz 13.08 µSec
22
Conclusion

The FFT algorithm is very useful when computing
the frequency domain on a DSP.
FFT is much faster than a regular DFT algorithm
FFT is more precise by having less errors
created due to round off.
The timed coding examples further support this
claim and demonstrate how to code the algorithm.
The Radix-2 FFT isnt the fastest but it uses a
less complex addressing and twiddle factor
routine
In this case (unlike in school) F is better then
D.