Sequential vs. Parallel solvers for the systems

of linear equations

- PhD student Taras Grytsenko
- Supervisor Dr. Andres Peratta
- Wessex Institute of Technology, March 2006

Outline

- Introduction
- Parallel solver structure
- Krylov subspace methods
- Packages
- Matrix partitioning
- Data formats
- Parallel platforms
- Software structure
- Test results
- Conclusions and future work

Applications

- Solving very large systems of linear equations

is central to many numerical simulations, and is

the most time-consuming part of the computation - Discretisation (and linearisation) of partial

differential equations of elliptic and parabolic

type - Design and computer analysis of
- circuits
- power system networks
- chemical engineering processes
- macroeconomics models
- queuing systems

- Example of the results

Harwell-Boeing, CSR, Coordinate format, MSR, VBR

Modified Mondriaan

Internal mechanism

Extended Aztec library

About the solvers

- There are very simple solvers for the system of

linear equations - (SOLE, SLES) such as Cramers method. But these

methods can - produce only exact solution.
- What if we cant find exact solution?
- Iterative methods. In this case we can solve a

n-dimensional problem - Ax b up to a residual of Ax-b lt epsb
- where
- A - matrix
- x,b - vectors
- eps - is a stopping criterion.

Krylov subspace methods

- CQ Conjugate gradient (not yet implemented)
- GMRES restarted generalised minimal residual
- CGS conjugate gradient squared
- TFQMR transpose-free quasi-minimal residual
- BICGSTAB Bi-conjugate gradient with

stabilisation - These methods have sequential implementation as

well - as parallel.

Different solvers

- Why do we have so many solvers instead of having

only one - universal?
- CQ Conjugate gradient is only applicable to

symmetric positive definite matrices - BICGSTAB Bi-conjugate gradient with

stabilisation solves not only the original system

Ax b but also a dual linear system ATx b - LU is a general direct solver for the

non-singular sparse system - In principal matrix could be
- Symmetric, Unsymmetric, Hermitian, Skew

symmetric, Rectangular

List of the libraries

- Aztec
- http//www.cs.sandia.gov/CRF/aztec1.html
- BlockSolve95
- http//www.mcs.anl.gov/blocksolve95/.
- BPKIT2
- http//www.cs.umn.edu/Echow/bpkit.html/
- IML
- http//math.nist.gov/iml/
- Itpack 2C / ItpackV 2D
- ftp//ftp.ma.utexas.edu/pub/CNA/ITPACK
- Laspack
- http//www.math.tu-dresden.de/skalicky/laspack/i

ndex.html - ParPre
- http//www.cs.utk.edu/eijkhout/
- and about 20-30 more

Comparison chart of features

About partitioning

- For large-scale computing the calculation of Axb

is very time - comsuming when computed without paying particular

attention to - data structure and computer architecture.
- Partitioning the original problem into blocks
- Distributing them to different processors and

performing the computation in parallel. - Criteria
- Should partition the vectors x and b and the

nonzero - entries of A
- Each block should contain almost the same number

of entries and be as independent as possible from

other blocks - When the blocks are distributed, communication

between them should be low.

Matrix partitioning

- Block distribution of 59x59 matrix with 312
- nonzeros, for p 4
- nonzeros per processor 76, 76, 80, 80

Popular partitioners

- PaToH
- hMeTiS
- These partitioners use the hypergraph model to

partition initial matrix. - Mondriaan
- It partitions both the matrix A and the vectors

x and b. It partitions both the rows and the

columns of the matrix. Mondriaan implements a

recursive bipartitioning algorithm. - Monet
- Monet is used to partition and reorder an

unsymmetric matrix into singly bordered blocked

diagonal (SBBD) form and is commonly used with

direct methods such as LU method. It implements a

multilevel recursive bisection algorithm and the

Kernighan-Lin refinement method.

About the data formats

- If we store sparse matrix without modification we

lose memory for - zero entries which are not important at all. It

is why several - advanced data formats were been proposed to avoid

storing of - zero entries.
- Coordinate format
- CSR
- VBR
- MSR

Parallel platforms

- MPI (C, Fortran)
- HPF (new language)
- OpenMP (directives)
- DVM (C-DVM, Fortran-DVM)
- Test NPB 2.3 (BT, CG, FT, LU, MG, SP) class ?
- Speedup Ts/Tp
- Efficiency Speedup/Np

MPI time / HPF time on Origin 2000

MPI time / OpenMPMPI time on IBM SP

MPI time / DVM time on MVS-1000m

MPI speedup on Origin 2000

MPI Parallel Efficiency on Origin 2000

Test results

Conclusions

- Developed test platform based on MPI and Aztec

library. Since new ideas can be tested and

compared to existing methods - Developed converter for different matrix storing

formats (Harwell-Boeing, CSR, Coordinate format,

MSR, VBR). It gives possibility to use matrices

from different sources - Investigated existing partitioning schemes and

algorithms. Adopted one of them (Mondriaan) for

our test platform - Compared popular Krylov subspace methods for

solving SLES.

Future work

- We are working with new solver based on

hierarchical matrices and it promises to be very

efficient solver of linear complexity for the

kind of matrices which we are dealing with, i.e.

for BEM and FEM methods - To make necessary corrections to existing

software and test it on 4-processor cluster - To develop an approach for analysing of the

initial information about matrix structure and

hardware platform and make decision about

appropriate partitioner, preconditioner and

solver to apply in every case

