An Implementation of a FIR Filter on a GPU - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

An Implementation of a FIR Filter on a GPU

Description:

Numerical algorithms often perform repeated computations on vectors of elements. ... Rendering APIs. OpenGL (Linux, Windows, MacOS) and DirectX (Windows) ... – PowerPoint PPT presentation

Number of Views:86
Avg rating:3.0/5.0
Slides: 22
Provided by: alexeys
Category:

less

Transcript and Presenter's Notes

Title: An Implementation of a FIR Filter on a GPU


1
An Implementation of a FIR Filter on a GPU
  • Alexey Smirnov and Tzi-cker Chiueh
  • ECSL Research Seminar
  • 9/13/05

2
Outline
  • Introduction
  • GPU Computing Overview
  • Related Work
  • FIR Filter Definition
  • FIR Filter Implementation on GPU
  • Performance Evaluation
  • Conclusion

3
Introduction
  • Numerical algorithms often perform repeated
    computations on vectors of elements.
  • Parallel computation improves performance.
  • x86 MMX, SSE, SSE2, SSE3.
  • Video cards are now programmable.

4
Computation and Bandwidth Rates
  • Video cards have higher GFLOPs rate and memory
    bandwidth compared to CPU.
  • However, data copying between main memory and
    video memory can reduce performance.

5
GPU Computing Background
  • Rendering pipeline
  • User program defines vertex and texture
    coordinates.
  • Vertex processor converts vertex attributes from
    world coordinate system into screen coordinate
    system.
  • Fragment processor computes color of each output
    pixel using textures and color.
  • Interpolation defines coordinates and color for
    each pixel.
  • Vertex and fragment processors are programmable
    for example in C-like language Cg.

6
Rendering APIs
  • OpenGL (Linux, Windows, MacOS) and DirectX
    (Windows).
  • OpenGL extensions allow to use advanced features
    of a video card.
  • NV_float_buffer supports floating-point textures.
  • ARB_render_texture allows to render to a texture
    instead of the screen.

7
GPU Program Architecture
  • Create floating-point textures that contain input
    data and load them into video memory
  • Load the fragment program and enable
    multi-texturing
  • Define vertex and texture coordinates
  • Draw the figure to an off-screen buffer
  • If the results were rendered to an off-screen
    buffer then copy the image to a texture using
    glCopyTexSubImage2D().
  • Go to step 3 if more iterations needed.
  • Use glGetTexImage() to copy data from video
    memory to main memory.

8
Input Data Representation
  • Matrices are represented as textures naturally.
    Four elements per pixel (R, G, B, A).
  • Vectors are wrapped into matrices. Textures have
    maximum dimensions.

9
Related Work
  • Four papers describing matrix multiplication
  • Linear algebra operations
  • Array sorting
  • FFT
  • Earlier papers concluded that the CPU is more
    efficient then GPU.
  • Recent video cards, e.g. GeForce 7800 and ATI
    X800 XT do better than CPU.

10
FIR Filter Definition
  • Finite Impulse Response (FIR) filter is used in
    audio processing.
  • We modified GNU Radio an open-source software
    implementing Software Defined Radio.

11
Other Relevant Transformations
  • Hilbert transformation
  • Frequency translation FIR filter

12
FIR Filter on a GPU
13
FIR Filters Loop
  • Initialization
  • Loop iteration

14
FIR Filters Loop
  • O(j1)O(j)MI
  • Final output value is computed as

15
Fragment Program
16
Optimizations
  • Break loop into two to get rid of conditional
    expression
  • Unroll loop body w/ and w/o conditional
    expression
  • Process two rows of input and textures
  • Use different texture units in unrolled loops
  • Nothing of the above improved performance.

17
Performance Evaluation FIR Filter
18
Performance of FreqXlating FIR Filter
19
Performance of Hilbert Transformation
20
Conclusion
  • Not everything improves from GPU optimization.
  • CPU optimization tricks do not work on GPU.
  • Texture upload/download takes up to 60 of total
    time.
  • GPU computation can take several seconds compared
    to millisecond time to render a frame in a game.

21
Future Work
  • QoS for GPU can application specify maximum
    latency or share of GPU resources?
  • Work offload from CPU to GPU is it possible to
    build a compiler that can automatically decide
    what is worth GPU optimization?
  • Debugging support a lot of tools for Windows,
    none for Linux.
Write a Comment
User Comments (0)
About PowerShow.com