Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid

About This Presentation

Title:

Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid

Description:

Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid ... Purcell, et al. 2002. data structures: streams. algorithms: kernels. Concrete model ... – PowerPoint PPT presentation

Number of Views:228

Avg rating:3.0/5.0

Slides: 20

Provided by: jeffreybol

Category:

more less

Transcript and Presenter's Notes

Title: Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid

1
Sparse Matrix Solvers on the GPU Conjugate
Gradients and Multigrid

Jeffrey Bolz, Ian Farmer, Eitan Grinspun, Peter
Schröder
Caltech ASCI Center

2
Why Use the GPU?

Semiconductor trends
cost
wires vs. compute
Stanford streaming supercomputer
Parallelism
many functional units
graphics is prime example
Harvesting this power
what application suitable?
what abstractions useful?
History
massively parallel SIMD machines
media processing

Chart courtesy Bill Dally
Imagine stream processor Bill Dally, Stanford
Connection Machine CM2 Thinking Machines
3
Contributions and Related Work

Contributions
numerical algorithms on GPU
unstructured grids conjugate gradients
regular grids multigrid
what abstractions are needed?
Numerical algorithms
Goodnight et al. 2003 (MG)
Hall et al. 2003 (cache)
Harris et al. 2002 (FD sim.)
Hillisland et al. 2003 (optimization)
Krueger Westermann 2003 (NLA)
Strzodka (PDEs)

4
Streaming Model
output record stream

Abstract model
Purcell, et al. 2002
data structures streams
algorithms kernels
Concrete model
render a rectangle
data structures textures
algorithms fragment programs

input record stream
globals
globals
5
Sparse Matrices Geometric Flow

Ubiquitous in numerical computing
discretization of PDEs animation
finite elements, difference, volumes
optimization, editing, etc., etc.
Example here
processing of surfaces
Canonical non-linear problem
mean curvature flow
implicit time discretization
solve sequence of SPD systems

6
Conjugate Gradients

High level code
inner loop
matrix-vectormultiply
sum-reduction
scalar-vector MAD
Inner product
fragment-wise multiply
followed by sum-reduction
odd dimensions can be handled

7
yAx
Aj off-diagonal matrix elements
R pointers to segments
8
Row-Vector Product
X vector elements
J pointers to xj
R pointers to segments
Fragment program
Aj off-diagonal matrix elements
Ai diagonal matrix elements
9
Apply to All Pixels

Two extremes
one row at a time setup overhead
all rows at once limited by worst row
Middle ground
organize batches of work
How to arrange batches?
order rows by non-zero entries
optimal packing NP hard
We choose fixed size rectangles
fragment pipe is quantized
simple experiments reveal best size
26 x 18 91 efficient
wasted fragments on diagonal

10
Packing (Greedy)

9
9
8
8
8
8
8
7
7
15
13
13
7
7
7
7
7
7
7
7
6
5
5
4
12
12
11
10
9
9
non-zero entries per row
15
13
13
9
9
8
7
7
7
12
12
11
8
8
8
7
7
7
10
9
9
8
7
7
7
7
6
All this setup done once only at the beginning of
time. Depends only on mesh connectivity
11
Recomputing Matrix