An Implementation of a FIR Filter on a GPU

About This Presentation

Title:

An Implementation of a FIR Filter on a GPU

Description:

Numerical algorithms often perform repeated computations on vectors of elements. ... Rendering APIs. OpenGL (Linux, Windows, MacOS) and DirectX (Windows) ... – PowerPoint PPT presentation

Number of Views:86

Avg rating:3.0/5.0

Slides: 22

Provided by: alexeys

Category:

more less

Transcript and Presenter's Notes

Title: An Implementation of a FIR Filter on a GPU

1
An Implementation of a FIR Filter on a GPU

Alexey Smirnov and Tzi-cker Chiueh
ECSL Research Seminar
9/13/05

2
Outline

Introduction
GPU Computing Overview
Related Work
FIR Filter Definition
FIR Filter Implementation on GPU
Performance Evaluation
Conclusion

3
Introduction

Numerical algorithms often perform repeated
computations on vectors of elements.
Parallel computation improves performance.
x86 MMX, SSE, SSE2, SSE3.
Video cards are now programmable.

4
Computation and Bandwidth Rates

Video cards have higher GFLOPs rate and memory
bandwidth compared to CPU.
However, data copying between main memory and
video memory can reduce performance.

5
GPU Computing Background

Rendering pipeline
User program defines vertex and texture
coordinates.
Vertex processor converts vertex attributes from
world coordinate system into screen coordinate
system.
Fragment processor computes color of each output
pixel using textures and color.
Interpolation defines coordinates and color for
each pixel.
Vertex and fragment processors are programmable
for example in C-like language Cg.

6
Rendering APIs

OpenGL (Linux, Windows, MacOS) and DirectX
(Windows).
OpenGL extensions allow to use advanced features
of a video card.
NV_float_buffer supports floating-point textures.
ARB_render_texture allows to render to a texture
instead of the screen.

7
GPU Program Architecture

Create floating-point textures that contain input
data and load them into video memory
Load the fragment program and enable
multi-texturing
Define vertex and texture coordinates
Draw the figure to an off-screen buffer
If the results were rendered to an off-screen
buffer then copy the image to a texture using
glCopyTexSubImage2D().
Go to step 3 if more iterations needed.
Use glGetTexImage() to copy data from video
memory to main memory.

8
Input Data Representation

Matrices are represented as textures naturally.
Four elements per pixel (R, G, B, A).
Vectors are wrapped into matrices. Textures have
maximum dimensions.

9
Related Work

Four papers describing matrix multiplication
Linear algebra operations
Array sorting
FFT
Earlier papers concluded that the CPU is more
efficient then GPU.
Recent video cards, e.g. GeForce 7800 and ATI
X800 XT do better than CPU.

10
FIR Filter Definition

Finite Impulse Response (FIR) filter is used in
audio processing.
We modified GNU Radio an open-source software
implementing Software Defined Radio.

11
Other Relevant Transformations

Hilbert transformation

Frequency translation FIR filter

12
FIR Filter on a GPU
13
FIR Filters Loop

Initialization

Loop iteration

14
FIR Filters Loop

O(j1)O(j)MI

Final output value is computed as

15
Fragment Program
16
Optimizations

Break loop into two to get rid of conditional
expression
Unroll loop body w/ and w/o conditional
expression
Process two rows of input and textures
Use different texture units in unrolled loops
Nothing of the above improved performance.

17
Performance Evaluation FIR Filter
18
Performance of FreqXlating FIR Filter
19
Performance of Hilbert Transformation
20
Conclusion

Not everything improves from GPU optimization.
CPU optimization tricks do not work on GPU.
Texture upload/download takes up to 60 of total
time.
GPU computation can take several seconds compared
to millisecond time to render a frame in a game.

21
Future Work

QoS for GPU can application specify maximum
latency or share of GPU resources?
Work offload from CPU to GPU is it possible to
build a compiler that can automatically decide
what is worth GPU optimization?
Debugging support a lot of tools for Windows,
none for Linux.

Write a Comment

User Comments (0)