Multigrid on 2:1 Balanceconstrained Octrees for Finite Element Calculations with Billions of Unknown - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Multigrid on 2:1 Balanceconstrained Octrees for Finite Element Calculations with Billions of Unknown

Description:

Binary-trees 1D, Quad-trees 2D, Octrees 3D ... 2.33 GHz dual socket, quad core processor. 2 MB L2 cache per core. 8 GB/16 GB RAM per node ... – PowerPoint PPT presentation

Number of Views:171
Avg rating:3.0/5.0
Slides: 27
Provided by: rahuls
Category:

less

Transcript and Presenter's Notes

Title: Multigrid on 2:1 Balanceconstrained Octrees for Finite Element Calculations with Billions of Unknown


1
Multigrid on 21 Balance-constrained Octrees for
Finite Element Calculations with Billions of
Unknowns
  • Rahul S. Sampath
  • George Biros
  • 13th SIAM Conference on Parallel Processing for
    Scientific Computing
  • 14th March, 2008

2
Acknowledgements
  • Department of Energy
  • DE-FG02-04ER25646
  • National Science Foundation
  • CCF, CNS, DMS, OCI
  • Teragrid/PSC/NCSA
  • MCA04T026
  • ASC070050N
  • PETSc
  • SuperLU

3
Outline
  • Motivation Review of Octrees
  • FEM on Octrees (DENDRO)
  • Handling Hanging Nodes
  • Geometric Multigrid (DENDRO-M)
  • Global Coarsening
  • Inter-grid Transfer Operations
  • Scalability Results

4
Motivation Review of Octrees
5
Adaptivity Vs Simplicity
  • Structured grids
  • Simple, Fast, Limited Adaptivity
  • Generic Unstructured Meshes
  • Very Flexible, Bulky, Require a lot of Memory
  • Octrees Good balance between the two approaches
  • Allow local refinements
  • Support matrix-free implementations

6
Linear Octree Data Structure
  • Tree data structure used to store hierarchical
    information
  • Binary-trees 1D, Quad-trees 2D, Octrees 3D
  • Its sufficient to store the leaves Linear
    Octrees
  • Leaves can serve as elements of a finite element
    mesh
  • Morton Ordering (pre-order traversal) A way to
    sort leaves

7
Example of an Octree Mesh
8
Finite Elements on Octrees
  • 21 Balance Condition
  • Handling Hanging Nodes

9
21Balance Constraint
  • Adjacent octants must not differ by more than 1
    level
  • A kind of smoothing
  • Inherently iterative process Ripple effect
  • 1 split ? cascade of splits across multiple
    processors

10
Example of the Ripple Effect
11
Some References for 21 Balancing
  • Past Approaches
  • Search free approach
  • M.W. Bern, D. Eppstein, S-H Teng, 1999
  • Prioritized ripple propagation (PRP)
  • Tiankai Tu, D.R. O Hallaron, O. Ghattas, 2005
  • Our Approach
  • Hybrid Balancing Algorithm
  • H. Sundar, R. S. Sampath, G. Biros, 2007

12
Handling Hanging Nodes
  • Nodes at the center of faces and edges
  • Do not represent independent degrees of freedom
  • Mapped to parents nodes

13
Geometric Multigrid on Octrees
  • Coarsening
  • Inter-grid Transfers
  • V-cycle Schedule
  • R/P Matvec

14
Global Coarsening
  • Requires 21 balancing at each level
  • Regular coarse nodes are preserved in all finer
    levels
  • Results in a sequence of nested finite element
    spaces

15
Inter-grid Transfer Operations
  • P Vk-1 ! Vk (Prolongation)
  • P v v 8 v 2 Vk-1 ½ Vk
  • P (i, j) ?jk-1(pi)
  • R Vk ! Vk-1 (Restriction)
  • R PT
  • Need to identify the regular fine grid nodes
    within the support of each coarse grid shape
    function
  • Need to align the coarse and fine octrees
  • Perform Matvecs using pre-computed stencils

16
V-cycle Schedule
17
Restriction/Prolongation Matvec
  • Coarse and fine grids share the same partition
  • Loop over coarse and fine grid elements
    simultaneously
  • Choose from the various pre-computed stencils
  • Child number of the coarse octant
  • Child number of the underlying fine octant
  • Hanging configuration of the coarse octant
  • Two or more elements can share the same vertices
  • Dummy matvec to identify and store these cases
  • Special data structures (masks) to avoid
    repetitions
  • Only 8 bytes per fine grid node

18
Scalability Results
  • Problem Description
  • Fixed Size
  • Iso-granular
  • Comparison with BoomerAMG

19
3-D, Scalar, Linear, Elliptic Problem
20
Architecture and Software Details
  • Teragrids NCSA Intel 64 Linux Cluster Abe
  • 1200 Nodes, 9600 CPUs
  • 2.33 GHz dual socket, quad core processor
  • 2 MB L2 cache per core
  • 8 GB/16 GB RAM per node
  • Peak Performance 89.47 Tflops
  • Libraries used C STL, MPI, PETSc, SuperLU_DIST

21
Fixed Size (Strong) Scalability
22
Iso-granular (Weak) Scalability(0.25M
elements/processor)
23
Comparison with BoomerAMG (Hypre)(60K
elements/processor)
24
Future Work
  • Non-linear Problems
  • Full Approximation Schemes
  • Newton Multigrid
  • Higher-order convergence
  • Higher order discretizations

25
Related Publications
  • A parallel geometric multigrid method for finite
    elements on octree meshes
  • (in review)
  • Bottom-up construction and 21 balance refinement
    of linear octrees in parallel
  • SISC, 2008 (to appear)
  • Low-constant parallel algorithms for finite
    element simulations using linear octrees
  • Supercomputing, November 2007
  • Preprints
  • www.seas.upenn.edu/rahulss

26
Questions ?
Write a Comment
User Comments (0)
About PowerShow.com