A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware - PowerPoint PPT Presentation

About This Presentation

Title:

A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware

Description:

A Multigrid Solver for Boundary Value Problems Using Programmable ... General-purpose GPU. Mark Harris. Aaron Lefohn. Ian Buck. Funding. NSF Award #0092793 ... – PowerPoint PPT presentation

Number of Views:62

Avg rating:3.0/5.0

Slides: 36

Provided by: csSu5

Learn more at: https://www3.cs.stonybrook.edu

Category:

more less

Transcript and Presenter's Notes

Title: A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware

1
A Multigrid Solver for Boundary Value Problems
Using Programmable Graphics Hardware
Nolan Goodnight Cliff Woolley Gregory
LewinDavid Luebke Greg Humphreys
University of Virginia
augmented by Klaus Mueller, Stony Brook University
2
General-Purpose GPU Programming

Why do we port algorithms to the GPU?
How much faster can we expect it to be, really?
What is the challenge in porting?

3
Case Study

Problem Implement a Boundary Value Problem (BVP)
solver using the GPU
Could benefit an entire class of scientific and
engineering applications, e.g.
Heat transfer
Fluid flow

4
Related Work

Krüger and Westermann Linear Algebra Operators
for GPU Implementation of Numerical Algorithms
Bolz et al. Sparse Matrix Solvers on the GPU
Conjugate Gradients and Multigrid
Very similar to our system
Developed concurrently
Complementary approach

5
Driving problem Fluid mechanics sim

Problem domain is a warped disc

regular grid
regular grid
6
BVPs Background

Boundary value problems are sometimes governedby
PDEs of the form
L? f
L is some operator
? is the problem domain
f is a forcing function (source term)
Given L and f, solve for ?.

7
BVPs Example

Heat Transfer
Find a steady-state temperature distribution T in
a solid of thermal conductivity k with thermal
source S
This requires solving a Poisson equation of the
form
k?2T -S
This is a BVP where L is the Laplacian operator
?2
All our applications require a Poisson solver.

8
BVPs Solving

Most such problems cannot be solved analytically
Instead, discretize onto a grid to form a set of
linear equations, then solve
Direct elimination
Gauss-Seidel iteration
Conjugate-gradient
Strongly implicit procedures
Multigrid method

9
Multigrid method

Iteratively corrects an approximation to the
solution
Operates at multiple grid resolutions
Low-resolution grids are used to correct
higher-resolution grids recursively
Very fast, especially for large grids O(n)

10
Multigrid method

Use coarser grid levels to recursively correct an
approximation to the solution
may converge slowly on fine grid -gt restrict to
course grid
push out long wavelength errors quickly (single
grid solvers only smooth out high frequency
errors)
Algorithm
smooth
residual ?
restrict
recurse
interpolate

? L?i - f
11
Implementation - Overview

For each step of the algorithm
Bind as texture maps the buffers that contain the
necessary data (current solution, residual,
source terms, etc.)
Set the target buffer for rendering
Activate a fragment program that performs the
necessary kernel computation (smoothing, residual
calculation, restriction, interpolation)
Render a grid-sized quad with multitexturing

source buffer texture
source buffer texture
render target buffer
render target buffer
fragment program
12
Implementation - Overview
13
Input buffers

Solution buffer four-channel floating point
pixel buffer (p-buffer)
one channel each for solution, residual, source
term, and a debugging term
toggle front and back surfaces used to hold old
and new solution
Operator map contains the discretized operator
(e.g., Laplacian)
Red-black map accelerate odd-even tests (see
later)

14
Smoothing

Jacobi method
one matrix row
calculate new value for each solution vector
element
in our application, the aij are the Laplacian
(sparse matrix)

15
Smoothing

Also factor in the source term
Use Red-black map to update only half of the grid
cells in each pass
converges faster in practice
known as red-black iteration
requires two passes per iteration

16
Calculate residual

Apply operator (Laplacian) and source term to the
current solution
residual ? k?2T S
Store result in the target surface
Use occlusion query to determine if all solution
fragments are below threshold (e lt threshold)
occlusion query true means all fragments are
below threshold
this is an L? norm, which may be too strict
less strict norms L1, L2, will require reduction
or fragment accumulation register (not available
yet), could run in CPU instead

17
Multigrid reduction and refinement

Average (restrict) current residual into coarser
grid
Iterate/smooth on coarser grid, solving k?2 ?
-S
Interpolate correction back into finer grid
or restrict once more -gt recursion
use bilinear interpolation
Update grid with this correction
Iterate/smooth on fine grid

18
Boundary conditions

Dirichlet (prescribed)
Neumann (prescribed derivative)
Mixed (coupled value and derivative)
Uk value at grid point k
nk normal at grid point k
Periodic boundaries result in toroidal mapping
Apply boundary conditions in smoothing pass

19
Boundary conditions

Only need to compute at boundaries
boundaries need significantly more computations
restrict computations to boundaries
GPUs do not allow branching
or better, both branches are executed and the
invalid fragment is discarded
even more wasteful
decompose domain into boundary and interior areas
use general (boundary) and fastpath (interior)
shaders
run these in two separate passes, on respective
domains

20
Optimizing the Solver

Detect steady-state natively on GPU
Minimize shader length
Use special-case whenever possible
Limit context switches

21
Optimizing the Solver Steady-state

How to detect convergence?
L1 norm - average error
L2 norm RMS error (common in visual sim)
L? norm max error (common in sci/eng apps)
Can use occlusion query!

secs to steady statevs. grid size
22
Optimizing the Solver Shader length

Minimize number of registers used
Vectorize as much as possible
Use the rasterizer to perform computations of
linearly-varying values
Pre-compute invariants on CPU
Compute texture coodinate offsets in vertex shader

23
Optimizing the Solver Special-case

Fast-path vs. slow-path
write several variants of each fragment program
to handle boundary cases
eliminates conditionals in the fragment program
equivalent to avoiding CPU inner-loop branching

fast path, no boundaries
slow path with boundaries
24
Optimizing the Solver Special-case

Fast-path vs. slow-path
write several variants of each fragment program
to handle boundary cases
eliminates conditionals in the fragment program
equivalent to avoiding CPU inner-loop branching

secs per v-cyclevs. grid size
25
Optimizing the Solver Context-switching

Find best packing data of multiple grid
levelsinto the pbuffer surfaces - many p-buffers

26
Optimizing the Solver Context-switching

Find best packing data of multiple grid
levelsinto the pbuffer surfaces - two p-buffers

27
Optimizing the Solver Context-switching

Find best packing data of multiple grid
levelsinto the pbuffer surfaces - a single
p-buffer
Still one front- and one back surface for
iterative smoothing

28
Optimizing the Solver Context-switching