Title: Multigrid on 2:1 Balanceconstrained Octrees for Finite Element Calculations with Billions of Unknown
1Multigrid on 21 Balance-constrained Octrees for
Finite Element Calculations with Billions of
Unknowns
- Rahul S. Sampath
- George Biros
- 13th SIAM Conference on Parallel Processing for
Scientific Computing - 14th March, 2008
2Acknowledgements
- Department of Energy
- DE-FG02-04ER25646
- National Science Foundation
- CCF, CNS, DMS, OCI
- Teragrid/PSC/NCSA
- MCA04T026
- ASC070050N
- PETSc
- SuperLU
3Outline
- Motivation Review of Octrees
- FEM on Octrees (DENDRO)
- Handling Hanging Nodes
- Geometric Multigrid (DENDRO-M)
- Global Coarsening
- Inter-grid Transfer Operations
- Scalability Results
4Motivation Review of Octrees
5Adaptivity Vs Simplicity
- Structured grids
- Simple, Fast, Limited Adaptivity
- Generic Unstructured Meshes
- Very Flexible, Bulky, Require a lot of Memory
- Octrees Good balance between the two approaches
- Allow local refinements
- Support matrix-free implementations
6Linear Octree Data Structure
- Tree data structure used to store hierarchical
information - Binary-trees 1D, Quad-trees 2D, Octrees 3D
- Its sufficient to store the leaves Linear
Octrees - Leaves can serve as elements of a finite element
mesh - Morton Ordering (pre-order traversal) A way to
sort leaves
7Example of an Octree Mesh
8Finite Elements on Octrees
- 21 Balance Condition
- Handling Hanging Nodes
921Balance Constraint
- Adjacent octants must not differ by more than 1
level - A kind of smoothing
- Inherently iterative process Ripple effect
- 1 split ? cascade of splits across multiple
processors
10Example of the Ripple Effect
11Some References for 21 Balancing
- Past Approaches
- Search free approach
- M.W. Bern, D. Eppstein, S-H Teng, 1999
- Prioritized ripple propagation (PRP)
- Tiankai Tu, D.R. O Hallaron, O. Ghattas, 2005
- Our Approach
- Hybrid Balancing Algorithm
- H. Sundar, R. S. Sampath, G. Biros, 2007
12Handling Hanging Nodes
- Nodes at the center of faces and edges
- Do not represent independent degrees of freedom
- Mapped to parents nodes
13Geometric Multigrid on Octrees
- Coarsening
- Inter-grid Transfers
- V-cycle Schedule
- R/P Matvec
14Global Coarsening
- Requires 21 balancing at each level
- Regular coarse nodes are preserved in all finer
levels - Results in a sequence of nested finite element
spaces
15Inter-grid Transfer Operations
- P Vk-1 ! Vk (Prolongation)
- P v v 8 v 2 Vk-1 ½ Vk
- P (i, j) ?jk-1(pi)
- R Vk ! Vk-1 (Restriction)
- R PT
- Need to identify the regular fine grid nodes
within the support of each coarse grid shape
function - Need to align the coarse and fine octrees
- Perform Matvecs using pre-computed stencils
16V-cycle Schedule
17Restriction/Prolongation Matvec
- Coarse and fine grids share the same partition
- Loop over coarse and fine grid elements
simultaneously - Choose from the various pre-computed stencils
- Child number of the coarse octant
- Child number of the underlying fine octant
- Hanging configuration of the coarse octant
- Two or more elements can share the same vertices
- Dummy matvec to identify and store these cases
- Special data structures (masks) to avoid
repetitions - Only 8 bytes per fine grid node
18Scalability Results
- Problem Description
- Fixed Size
- Iso-granular
- Comparison with BoomerAMG
193-D, Scalar, Linear, Elliptic Problem
20Architecture and Software Details
- Teragrids NCSA Intel 64 Linux Cluster Abe
- 1200 Nodes, 9600 CPUs
- 2.33 GHz dual socket, quad core processor
- 2 MB L2 cache per core
- 8 GB/16 GB RAM per node
- Peak Performance 89.47 Tflops
- Libraries used C STL, MPI, PETSc, SuperLU_DIST
21Fixed Size (Strong) Scalability
22Iso-granular (Weak) Scalability(0.25M
elements/processor)
23Comparison with BoomerAMG (Hypre)(60K
elements/processor)
24Future Work
- Non-linear Problems
- Full Approximation Schemes
- Newton Multigrid
- Higher-order convergence
- Higher order discretizations
25Related Publications
- A parallel geometric multigrid method for finite
elements on octree meshes - (in review)
- Bottom-up construction and 21 balance refinement
of linear octrees in parallel - SISC, 2008 (to appear)
- Low-constant parallel algorithms for finite
element simulations using linear octrees - Supercomputing, November 2007
- Preprints
- www.seas.upenn.edu/rahulss
26Questions ?