Enhancing GPU for Scientific Computing - PowerPoint PPT Presentation

1 / 26

About This Presentation

Title:

Enhancing GPU for Scientific Computing

Description:

... rasterizer creates the s pixels for fragment processing. For each pixel, our fragment processor will ... These pixels are written onto the Pbuffer memory. ... – PowerPoint PPT presentation

Number of Views:14

Avg rating:3.0/5.0

Slides: 27

Provided by: duderoc

Category:

more less

Transcript and Presenter's Notes

Title: Enhancing GPU for Scientific Computing

1
Enhancing GPU for Scientific Computing

Some thoughts

2
Outline

Motivation
Related work
BLAS Library
Execution Model
Benchmarks
Recommendations

3
Motivation

GPU Computing
Vector and Fragment Processor
streaming (super)-computers
enormous performance!
ATI 9700, NV30
They have become programmable
Emerging application areas
Numerical Sim.Schroder03, Sorting, Genomics,
etc.
Goal Scientific Computing

4
Motivation

Most software built from small-efficient parts
Scientific apps built on top of s/w library
routines
Harnessing GPU resources
Arithmetic Intensive
Data parallel
BLAS Library

5
Related work

Using non-programmable GPUs
Erik01 prog. vertex engine for
lighting/morphing
Oskin02 vector processing using VP
Ian03 stream processing using FP
Problems
Monolithic Big Programs
One of VP or FP
CPU Passive Mode
No Cascading Loop-backs (Parallelism, Setup
Times)

6
BLAS Library

BLAS (Basic Linear Algebra Subprograms)
Building blocks for vector and matrix operations
development of highly efficient linear algebra
software
LINPACK and LAPACK
Operations
Scalar Vector
Vector Vector
Vector Matrix
Matrix Matrix

7
Mapping

Operation processor
CPU/FP - All ops
VP - no memory access
Restricted data-flows
CPU FP
VP CPU

8
Execution graph Vector Scalar Add Operation
vAdd CPU

In this example, a Vector of length n is
segmented into m other vectors of length 4 in the
CPU function vsAdd.
The vertex program vsAdd.cg is loaded onto the
vertex processor and the scalar value is passed
as a parameter.
Subsequently, CPU function vsAdd will stream the
set of m vectors onto the CPU as openGL primitive
points. Our vertex program, vsAdd.cg will add the
scalar value to all fields in the m vertices.
Consequently, these vertices will proceed to the
fragment processor and written onto the
framebuffer memory.
The CPU function vsADD continues to read the
color values off each pixel representation of the
vertices. These color values contain result of a
Vector Scalar add.
Lastly the CPU function concatenates the sequence
of color values into a vector of length n as
result.

vAdd.cg
Vertexm (GL_POINTS)
vAdd.cg Vertex Processor
Vertexm
G P U
None Fragment Processor
Texture Mem
PBuffer
TextureDatam
Texture Color valuesm
vAdd CPU
(Vectors)
9
Execution graph Vector Vector Add Operation
vAdd CPU
GL_QUAD Vector4m

In this example, 2 vectors of length s are
transformed into texture data in the CPU function
vAdd.
The vertex program vAdd.cg, and texture data are
loaded onto the fragment processor GPU memory
respectively.
Subsequently, CPU function vAdd will draw a
quadrilateral primitive having s pixels.
The vertex processor does nothing and passes on
the vertices to the rasterizer to process into
pixel representation.
The rasterizer creates the s pixels for fragment
processing.
For each pixel, our fragment processor will
lookup the values from both textures and
determine the color value of each pixel. These
pixels are written onto the Pbuffer memory.
The CPU function vADD continues to read the color
values off each pixel representation of the
vertices. These color values contain result of a
Vector Vector add.
The output in Pbuffer is then converted into a
texture entry.
Lastly the CPU function reads the texture entry
and concatenates the sequence of color values
into a vector of length s as result.

Vertex4m GL_QUAD
vAdd.cg
None Vertex Processor
Vertex4m
G P U
vAdd.cg Fragment Processor
TextureData1m TextureData2m
Texture Mem
PBuffer
TextureData3m
Texture Color valuesm
vAdd CPU
(Vectors)
10
Execution graph 2 Vector Vector Add Operations
vAdd CPU
GL_QUAD Vectex4m

In this example, we perform 2 separate vector
vector add operations.
The 1st operation proceeds as described earlier
in our vector vector add operation.
The output of the 1st operation is used as input
for the 2nd operation.
Since its the same operation, we do not load a
new Vertex or Fragment program. However we
proceed to load a new texture data.
The 2nd operation proceeds as normal.
Lastly the CPU function concatenates the sequence
of color values into a vector of length s as
result.

Vertex4m
None Vertex Processor
TextureData4m
Vertex4m
G P U
vAdd.cg Fragment Processor
Texture Mem
PBuffer
TextureData3m
Texture Color valuesm
vAdd CPU
(Vectors)
11
Performance Issues