The FFT on a GPU - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

The FFT on a GPU

Description:

We have two columns with real values. Use same 'tangled' approach. ... Use two frame buffers: one for retrieving values of last pass and one for ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 26
Provided by: csU94
Category:

less

Transcript and Presenter's Notes

Title: The FFT on a GPU


1
The FFT on a GPU
  • Graphics Hardware 2003
  • July 27, 2003
  • Kenneth Moreland Edward Angel
  • Sandia National Labs U. of New Mexico

Sandia is a multiprogram laboratory operated by
Sandia Corporation, a Lockheed Martin
Company,for the United States Department of
Energys National Nuclear Security
Administration under contract DE-AC04-94AL85000.
2
Overview
  • Introduction
  • Motivation, FFT review.
  • FFT Techniques
  • Exploitable FFT properties.
  • Implementation
  • Results
  • Performance, applications, conclusions.

3
Motivation
  • The Fourier transform is a principal tool for
    digital image processing.
  • Filtering.
  • Correction.
  • Compression.
  • Classification.
  • Generation.
  • As such, should not our graphics hardware support
    such a tool?

4
The Discrete Fourier Transform
  • Converts data in the spatial or temporal domain
    into frequencies the data comprise.

5
The Discrete Fourier Transform
  • 2D transform can be computed by applying the
    transform in one direction, then the other.

6
The Fast Fourier Transform
  • Divide and Conquer Algorithm
  • Input sequence is divided into subsequences
    consisting of values from even and odd indices,
    respectively.

7
Index Magic
  • Do not use recursion.
  • Use dynamic programming iterate over entire
    array computing all values for each recursive
    depth together, like mergesort.
  • Indexing is non-obvious.
  • Unlike mergesort, recursive step does not divide
    array into contiguous chunks.
  • At any iteration, what partition does a given
    index belong to, and where can one find the
    applicable values of the sub-partitions?

8
Index Magic
  • Common solution rearrange data by reversing the
    bits of indices.
  • FFT can occur with contiguous partitions.
  • Requires an extra data copy.
  • Our solution, determine indexing in place.

Note that the paper has a typo.
9
Fourier Symmetry of Real Sequences
  • In general, the frequency spectra of even real
    functions contain imaginary values.
  • Captures magnitude and phase shift of sinusoids.
  • Brute force FFT doubles computation and storage
    costs.
  • But, Fourier transforms of real functions have
    symmetry.
  • Values at and are real
    (because they are conjugates with themselves).

10
Fourier Transform of Real Functions
  • Pick two functions, let them be f(x) and g(x).
  • Let h(x) f(x) j g(x).
  • Note that there is no loss of information.
  • Can perform FFT of h in half the time as
    performing the brute force FFT of f and g
    individually.
  • Simply point to one row of image as real
    components and another as imaginary components.

f
g
11
Untangling Fourier Transform Pairs
  • Fourier transform is linear.
  • H(u) F(u) j G(u)
  • We can untangle using symmetry of F and G.
  • Add and subtract H(u) and H(N u) to cancel out
    conjugate terms of F and G.

12
Untangling Fourier Transform Pairs
13
Packing Transforms of Real Functions
  • We can store Fourier transform in an array the
    same size as the input.
  • Throw away conjugate duplicates.
  • Throw away imaginary values known to be zero.

14
Column-wise FFT
  • We have two columns with real values.
  • Use same tangled approach.
  • All other columns are complex numbers.
  • Use regular FFT.

Real
Real
Paired for Complex
15
Packing 2D Transforms of Real Functions
  • Rows transformed from complex values are already
    packed appropriately.
  • The two rows transformed from real values are
    untangled and packed to follow suite.

Real Values
Imaginary Values
16
Available Resources
  • nVidia GeForce FX 5800 Ultra.
  • Full 32-bit floating point pipeline and frame
    buffers.
  • Fully programmable vertex and fragment units.
  • Cg
  • High level language for vertex and fragment
    programs.
  • Traditional CPU 1.7 GHz Intel Zeon
  • Freely available high performance FFT
    implementations.

17
Implementation
  • Using a SIMD model for parallel computation.
  • Draw quadrilateral parallel to screen.
  • Rasterizer invokes the same fragment program in
    parallel over all pixels covered by
    quadrilateral.
  • Inputs/output dependent on location of pixel the
    fragment program is running.
  • We require many rendering passes.
  • Use render to texture extension.
  • Use two frame buffers one for retrieving values
    of last pass and one for storing results of
    current computation.

18
Implementation
FFT
FFT
Untangle
Untangle
Frequency Spectra
Images
FFT
FFT
Untangle
Untangle
19
Fragment Programs
  • Written in Cg, compiled for GeForce FX.

20
Applications
  • Digital image filtering.

21
Applications
  • Texture generation.
  • Volume rendering.

22
Performance
  • Computation speed 2.5 GigaFLOPS
  • Texture read rate 3.4 GB/sec

23
Conclusions
  • The Fourier transform on the GPU has many
    potential applications.
  • A well established FFT on the CPU (FFTW) still
    has an edge over GPU implementation.
  • Both software and hardware of GPU are first
    generations.
  • Room for improvement.

24
Get the Cg Code
  • http//www.cgshaders.org ?
  • http//www.cs.unm.edu/kmorel/documents/fftgpu
  • kmorel_at_sandia.gov

25
Questions?
Write a Comment
User Comments (0)
About PowerShow.com