Multigrid on 2:1 Balanceconstrained Octrees for Finite Element Calculations with Billions of Unknown

About This Presentation

Title:

Multigrid on 2:1 Balanceconstrained Octrees for Finite Element Calculations with Billions of Unknown

Description:

Binary-trees 1D, Quad-trees 2D, Octrees 3D ... 2.33 GHz dual socket, quad core processor. 2 MB L2 cache per core. 8 GB/16 GB RAM per node ... – PowerPoint PPT presentation

Number of Views:171

Avg rating:3.0/5.0

Slides: 27

Provided by: rahuls

Category:

more less

Transcript and Presenter's Notes

Title: Multigrid on 2:1 Balanceconstrained Octrees for Finite Element Calculations with Billions of Unknown

1
Multigrid on 21 Balance-constrained Octrees for
Finite Element Calculations with Billions of
Unknowns

Rahul S. Sampath
George Biros
13th SIAM Conference on Parallel Processing for
Scientific Computing
14th March, 2008

2
Acknowledgements

Department of Energy
DE-FG02-04ER25646
National Science Foundation
CCF, CNS, DMS, OCI
Teragrid/PSC/NCSA
MCA04T026
ASC070050N
PETSc
SuperLU

3
Outline

Motivation Review of Octrees
FEM on Octrees (DENDRO)
Handling Hanging Nodes
Geometric Multigrid (DENDRO-M)
Global Coarsening
Inter-grid Transfer Operations
Scalability Results

4
Motivation Review of Octrees
5
Adaptivity Vs Simplicity

Structured grids
Simple, Fast, Limited Adaptivity
Generic Unstructured Meshes
Very Flexible, Bulky, Require a lot of Memory
Octrees Good balance between the two approaches
Allow local refinements
Support matrix-free implementations

6
Linear Octree Data Structure

Tree data structure used to store hierarchical
information
Binary-trees 1D, Quad-trees 2D, Octrees 3D
Its sufficient to store the leaves Linear
Octrees
Leaves can serve as elements of a finite element
mesh
Morton Ordering (pre-order traversal) A way to
sort leaves

7
Example of an Octree Mesh
8
Finite Elements on Octrees

21 Balance Condition
Handling Hanging Nodes

9
21Balance Constraint

Adjacent octants must not differ by more than 1
level
A kind of smoothing
Inherently iterative process Ripple effect
1 split ? cascade of splits across multiple
processors

10
Example of the Ripple Effect
11
Some References for 21 Balancing

Past Approaches
Search free approach
M.W. Bern, D. Eppstein, S-H Teng, 1999
Prioritized ripple propagation (PRP)
Tiankai Tu, D.R. O Hallaron, O. Ghattas, 2005
Our Approach
Hybrid Balancing Algorithm
H. Sundar, R. S. Sampath, G. Biros, 2007

12
Handling Hanging Nodes

Nodes at the center of faces and edges
Do not represent independent degrees of freedom
Mapped to parents nodes

13
Geometric Multigrid on Octrees

Coarsening
Inter-grid Transfers
V-cycle Schedule
R/P Matvec

14
Global Coarsening

Requires 21 balancing at each level
Regular coarse nodes are preserved in all finer
levels
Results in a sequence of nested finite element
spaces

15
Inter-grid Transfer Operations

P Vk-1 ! Vk (Prolongation)
P v v 8 v 2 Vk-1 ½ Vk
P (i, j) ?jk-1(pi)
R Vk ! Vk-1 (Restriction)
R PT
Need to identify the regular fine grid nodes
within the support of each coarse grid shape
function
Need to align the coarse and fine octrees
Perform Matvecs using pre-computed stencils

16
V-cycle Schedule
17
Restriction/Prolongation Matvec

Coarse and fine grids share the same partition
Loop over coarse and fine grid elements
simultaneously
Choose from the various pre-computed stencils
Child number of the coarse octant
Child number of the underlying fine octant
Hanging configuration of the coarse octant
Two or more elements can share the same vertices
Dummy matvec to identify and store these cases
Special data structures (masks) to avoid
repetitions
Only 8 bytes per fine grid node

18
Scalability Results

Problem Description
Fixed Size
Iso-granular
Comparison with BoomerAMG

19
3-D, Scalar, Linear, Elliptic Problem
20
Architecture and Software Details

Teragrids NCSA Intel 64 Linux Cluster Abe
1200 Nodes, 9600 CPUs
2.33 GHz dual socket, quad core processor
2 MB L2 cache per core
8 GB/16 GB RAM per node
Peak Performance 89.47 Tflops
Libraries used C STL, MPI, PETSc, SuperLU_DIST

21
Fixed Size (Strong) Scalability
22
Iso-granular (Weak) Scalability(0.25M
elements/processor)
23
Comparison with BoomerAMG (Hypre)(60K
elements/processor)
24
Future Work

Non-linear Problems
Full Approximation Schemes
Newton Multigrid
Higher-order convergence
Higher order discretizations

25
Related Publications

A parallel geometric multigrid method for finite
elements on octree meshes
(in review)
Bottom-up construction and 21 balance refinement
of linear octrees in parallel
SISC, 2008 (to appear)
Low-constant parallel algorithms for finite
element simulations using linear octrees
Supercomputing, November 2007
Preprints
www.seas.upenn.edu/rahulss

26
Questions ?

Write a Comment

User Comments (0)