M.S. Shephard, K.E. Jansen, A. Ovcharenko, O. Sahni, Ting Xie and Min Zhou - PowerPoint PPT Presentation

1 / 37

About This Presentation

Title:

M.S. Shephard, K.E. Jansen, A. Ovcharenko, O. Sahni, Ting Xie and Min Zhou

Description:

Example - stack of prismatic elements in a boundary layer to support the ... Mesh size field of air bubbles distributing in a tube (segment of the model) ... – PowerPoint PPT presentation

Number of Views:82

Avg rating:3.0/5.0

Slides: 38

Provided by: robert504

Category:

more less

Transcript and Presenter's Notes

Title: M.S. Shephard, K.E. Jansen, A. Ovcharenko, O. Sahni, Ting Xie and Min Zhou

1
Dynamic Load Balancing Needs of Parallel
Adaptive Analysis

M.S. Shephard, K.E. Jansen, A. Ovcharenko, O.
Sahni, Ting Xie and Min Zhou
Scientific Computation Research Center
Rensselaer Polytechnic Institute

Outline
Introduction to approach
Parallel adaptive finite element analysis
Scaling the solver
Adaptive mesh control
Parallel mesh structure and parallel mesh
migration
Some applications
More parallel applications
Mesh generation
Adaptive multiscale

This work is supported by the DOE SciDAC program
as part of theInteroperable Technologies for
Advanced Petascale Simulations,and NSF through a
petascale application grant
2
Status and Issues in Parallel Adaptive Simulation

Components needed
Automatic mesh generators
General mesh adaptation
Mesh correction indication
Parallel adaptive mesh control
Dynamic load balancing
Issues going forward
Dealing with more complex parallel control
functions New demands on dynamic load-balancing
Development of parallel adaptive applications
Operation in a petascale environment

initial mesh
adapted mesh
3
Dealing with New Application Development

Want to reuse methods and tools - provide
enabling technology.
A partitioned-based parallel computation mode
is assumed.
In this case the key components can be abstracted
as
The System Model
To account for the characteristics of the
computing system - becoming more complicated at
the petascale
The Partition Model
A simple model to which application models can be
mapped for purposes of parallel adaptive
computations
The Application Model
Accounts how computations can be done in parallel
Must focus on the entities associated with the
computations and their interactions
Structure of computational entities and their
interactions must map to the partition model

4
System Model

Petascale Computers
A key driver Number of kilowatts to run and cool
the computer
Can not afford to construct them like the
clusters we all love
There will be many cores per node and cores
and even nodes will not all be the same
Substantial differences between machines -
Contrast the IBM BlueGene to Ranger
(Sunmachine at U. Texas)

The machine going into Argonne as ofAugust 2006
presentation by Rick Stevens
5
Partition Model

The partition is a collection of parts
Parts
Have a given amount of computational load
Need to communicate with other parts in
prescribed way during the computation
Parts are collections of objects that are of
meaning to the applications
Needs effective interactions with dynamic load
balancing

6
Dynamic Load Balancing

Zoltan Dynamic Services (http//www.cs.sandia.gov/
Zoltan/)
Supports multiple dynamic partitioners
General control of the definition of part objects
and weights supported
Under active development with emphasis on going
to petascale machines
Focused on graph-based (or hypergraph-based)
partitioners (intend to even use them when the
interactions are spatial - like potential contact
- simply define graph edges when things might
touch in the next set of steps!)

7
Application Model

Must
Define the objects that will make up the parts
Quantification of the object computation
Determination of object-to-object dependencies
Defining and controlling the entities
Have to be defined at the appropriate level and
be related to the data structures and control of
the application
Entities considered in the applications covered
today
Mesh entities in non-manifold FE mesh
Collection of mesh entities to be kept in the
same part
Integration points for which unit cell
evaluations are performed
Atoms
Chunks of space containing material

8
Petascale Adaptive FE Analysis

Steps to get there
1. Be sure the fixed mesh solver scales to
100,000s of processors
2. Provide parallel distributed support for mesh
adaptation
3. Construct adaptive loops in which all
components run on petascale machines
4. Get scalability on all of it
Status Summary
Good progress with an implicit FE flow code
Tools for supporting parallel mesh adaptation
Constructing initial adaptive loops
A way to go to petascale

9
Introduction to PHASTA

Parallel finite element flow solver that solves
both compressible and incompressible flow.
Implicit time integration - requires the solution
of very large systems of linear algebra equations
at each time step using iterative solvers.
PHASTA and its predecessor have been parallel for
over 15 years, 10 of which have been at RPI.
Breaks the total domain into parts with roughly
the same number of elements on each processor.
Work can be characterized as requiring
Substantial floating point operations to form
system of equations,
Organized, substantial, and regular communication
between partitions that touch each other,
For each iteration (typically O(10) iterations
per solve), there is a required ALL-REDUCE
communication.

10
Patient-Specific Abdominal Aortic Aneurysm

Mesh had gt 50M dof
Must be solved in 10 min.
Implicit FE flow solve scales

Proc. t (sec) scale
16384 60.6 1.04
8192 131.7 0.957
4096 241.6 1.04
2048 502.3 1.00
1024 1008.7 1.00
11
Implementation of Adaptive Mesh Control

Given the mesh size field
Mesh modification loop
Look at element edge lengths and shape
If both satisfactory, continue to next element
If not, select best modification
Elements with edges that are too long must have
edges split or swapped
Short edges eliminated
Continue until size and shape is satisfied or no
more improvement possible
Determination of best mesh modification
Select mesh modifications based on element shape
properties
Appropriate considerations of neighboring
elements
Choosing the best mesh modification

12
Mesh Information

A piece-wise domain decomposition over which the
simulation is to be run
A mesh data structure provide services to create
and/or use the mesh data
Each application has its own needs of mesh
representation in terms of levels of entities
and adjacencies used
? flexibility in mesh representations
3 approaches for mesh data structure design
Fixed, specific mesh representation
Reduced model store only needed entities
Fixed, general mesh representation
Full model Store all entities.
Flexible mesh representation
Flexible Mesh Data structure
Switch between various representations for
different needs of applications
Application specifies which entities and
adjacencies it needs.
Achieving a good performance both in memory and
computational cost for wide range of applications.

13
Distributed Mesh Data Structure

Distributed Mesh
Mesh divided into parts for distribution on
parallel computers
Part Pi consists of a set of mesh entities
assigned to the ith part.
Part Object
The basic unit to which a part ID is assigned.
A mesh entity to be partitioned
Mesh entities to be partitioned in the
example mesh are M13, M12, M23, M22, M11
An EntityGroup.
Residence Part
Operator P Mid returns a set of part IDs
where Mid exists. (e.g. P M10 P0, P1,
P2 )
Residence part of Mid on Pi
If a partition object, P Mid Pi
Otherwise, P Mid U P Mjq Mid ? ?(Mjq)

14
Entity Group

EntityGroup
A group of mesh entities that needs to stay
together in a part during the lifetime of the
EntityGroup as defined by the needs of an
application.
Example - stack of prismatic elements in a
boundary layer to support the adaptation of the
layer
Entity group rules
Mesh entities in a group stay as a group during
the life time of EntityGroup
A mesh entity can only be in a single
EntityGroup, and is defined once in the
EntityGroup
EntityGroup information maintained before and
after migration
EntityGroup is dynamic as defined by the
application which can create and destroy an
EntityGroup, or add/remove mesh entities in an
EntityGroup

15
Mesh Partition through Zoltan

Perform mesh partition through Zoltan
graph-based partitioning
In mesh partitioning, a partition object can be
either a mesh entity to be partitioned, or an
EntityGroup.

Different colors represent different EntityGroups
(3 EntityGroups in the 2D mesh). Construct
graph for mesh partition Graph nodes
objects to be partitioned (partition
object). Graph Edges mesh edge-based
dependencies between two objects. Weights
Set graph node and graph edge weights.
16
Distributed Mesh Representation

Functional Requirements
Communication links
Remote part non-self part where an entity is
duplicated
Remote copy the memory location of the entity
duplicated on the remote part
Efficient mechanisms to update mesh partitioning
and keep the links between partitions are
mandatory
Entity ownership
Used for operation control
Static ownership
Owner part of an entity is fixed to the specific
partition regardless of mesh partitioning
Not suitable for adaptive analysis due to severe
load imbalance
Dynamic ownership
Owner part of an entity is determined dynamically
depending on mesh partitioning

17
Distributed Mesh Data Structure

Mesh Migration with Full Complete Representations
Given a list of pair ltpartition object,
destination part idgt
STEP 1 Collect entities to be updated and reset
their P and partition classification
STEP 2 Determine P of partition objects and
downward entities
STEP 3 Based on P, update the partition model
and collect entities to remove from each part
STEP 4 Exchange entities and update remote
copies
STEP 5 Remove unnecessary entities collected in
STEP 3
STEP 6 Update the owner part of partition model
entities

18
Flexible Distributed Mesh Data Structure

Mesh Migration with reduced complete
representation
STEP A collect neighboring part objects ()
STEP B restore downward interior entities ()
STEP 1 collect entities to update and clear
partition classification and P of them.
STEP 2 Determine P
STEP 3 Update partition classification and
collect entities to remove
STEP 4 create only necessary migrate-in entities
in representation and update remote copies
Do not send interior entities that will not be on
the partition boundary ()
STEP 5 remove unnecessary migrate-out entities
STEP 6 update entity ownership
STEP C remove unnecessary interior entities and
adjacencies ()
savings in migration time with flexible mesh
representation in parallel
- losses in migration time with flexible mesh
representation in parallel

P0
Serial 2D mesh with MSR
Partitioned 2D mesh with MSR
19
Flexible Distributed Mesh Data Structure

Examples 2-D mesh migration with the reduced
representation

?
1
1
(a) Mark destination pid
(b) step A Get neighboring POs
(c) Step B Restore internal ents
20
Parallel Mesh Adaptation

Parallelization of refinement perform on each
part and synchronize at inter-part boundaries
Parallelization of coarsening and swapping
migrate cavity (on-the-fly) and perform operation
locally on one part.
Requires update of evolving communication-links
between parts and dynamic mesh partitioning

21
Mesh Adaptation - Uniform Refinement

Tests run on IBM Blue Gene/L
Slow processors
Fast communication

initial (265k tets)

At 8 processors
Initial mesh - 33K/processor
Final mesh - 226K/processor
At 128 processors
Initial mesh 2.0K/processor
Final mesh 16.7K/processor

adapted (2,127k tets)

Scalability for one iteration
of mesh adaptation

22
Mesh Adaptation - Refinement

Communication time for one iteration of mesh
adaptation

Total time for one iteration of mesh adaptation

Communication to total time ratio for one
iteration of mesh adaptation

23
Mesh Adaptation - Refinement and Coarsening

Scalability for five iterations
of mesh adaptation

initial mesh (1,528k tets)

At 8 processors
Initial mesh - 191K/processor
Final mesh - 241K/processor
At 128 processors
Initial mesh 11.9K/processor
Final mesh 15.0K/processor

adapted mesh (1,926k tets)
24
Mesh Adaptation - Refinement and Coarsening

Communication time for
five iterations

Total time for five iterations

Communication to total time ratio for five
iterations

25
Mesh Adaptation for 1 Billion Element Mesh
Mesh size field of air bubbles distributing in a
tube (segment of the model)
Number of regions of adapted mesh among 16k parts

Initial mesh uniform, 17,179,836 mesh regions
Adapted mesh 160 air bubbles 1,064,284,042 mesh
regions
Multiple predictive load balance are used to make
the adaptation possible
Larger meshes possible (not out of memory) but
this element count is appropriate for solver

Initial and adapted mesh at one bubble - colored
by magnitude of mesh size field
26
Predictive Load Balancing

Refinement of mesh before load balancing can lead
to memory problems
Employ predictive load balancing to avoid the
problem
Assign weights based on what will be refined
Apply dynamic load balancing
Refinement
May want to do some local migration

with predictive load balancing
without predictive load balancing
27
Nodal Balance by Local Modification

For light loaded mesh (small number of regions
for each process), well distributed mesh (based
on the number of regions) could have bad nodal
balance.
Local modification method is used to balance the
number of nodes on each part.
Region (node) ratio number of region
(node)/average number of region (node)
Average number of regions for the test 2434
Number of parts1024

Node ratio before and after node balance
Region/node ratio before node balance
Before node balance
Region ratio before and after node balance
28
Dynamic Load Balancing Needs

Basic needs for graph-based partitioner
Executed many times during - needs to scale and
be efficient
Abstraction of a graph important - definition of
graph nodes and edges is application dependent
Not hard to create edges near contact - need to
determine information for mesh adaptation anyway
Real-valued weights (not integers)
Some additional functions needed
Moving small numbers of entities for specific
needs
Consideration of multiple criteria - (e.g., edges
nodes)
Possible multiple levels of interacting graphs
graphs and interactions defined by the
applications
Important for multiphysics and/or multiscale
Current area of research at RPI.

29
FMDB is Part of ITAPS

ITAPS tools (http//www.itaps-scidac.org)
A core functionality of the Interoperable
Technologies for Advanced Petascale Simulations
(ITAPS) meshing tools
The only ITAPS component thus far that supports
Geometry-based adaptive analysis
Distributed mesh operations in parallel
Flexible mesh representation

30
Adaptive Loop for Accelerator Design

Complex CAD geometry
Physics modeling by the SLAC Omega3P
High level modeling accuracy needed
E.g., 0.1 error in frequency predictions
Parallel adaptive mesh control needed to provide
accuracy needed

Initial mesh (1,595 tets)
Adapted mesh (23,082,517 tets)
31
Patient Specific Vascular Surgical Planning
(Stanford, RPI)

Virtual flow facility for patient specific
surgical planning
High quality patient specific flow simulations
needed quickly
Image patient, create model, apply adaptive flow
simulation

Mesh generation
Path planning
Segmentation solid modeling
Simulation
Load CT image
32
Reliable In-Time Cardiovascular Flow Simulations

Requirements
Must execute from image data
Geometric domains and meshes automatically
constructed
Inflow BC from image data
Realistic representation of outflows and
materials
Simulations must provide reliable flow results
Meshes with millions of elements typically
required
Adaptive mesh control required to ensure accuracy
Simulations must execute in time needed (15
minutes)
Highly effective parallel computation required

Meshes by Simmetrix MeshSim
33
Example of Entire Process Image, Solid, Mesh
Wilson et al. Lect. Notes Comp. Sci. 2001 2208
449-456
S T A N F O R D U N I V E R S I T YC A R D I O
V A S C U L A R B I O M E C H A N I C S R E S E
A R C H L A B O R A T O R Y
34
Example of Entire Process Adaptive Meshes
35
Parallel Mesh Generation

Consider parallel mesh generation
Computation effort related to of elements, but
boundary elements have variable load
Only structure known at start is the geometric
model
Calculations evolve during mesh generation All
mesh generation steps operate in parallel
Meshes starting from solid model
Both structures created by the mesh generator are
distributed
Octree - used for mesh control, localizing
searches, interior templates
Mesh - topological hierarchy distributed over
parts
Mesh generation steps
Surface mesh generation
Octree refinement
Template meshing of interior octants
Meshing boundary octants

36
Abstraction of Multiscale Simulation

Practical multiscale simulations will
Take advantage of existing single scale tools
Automatically execute processes
Communicate information, accounting for
transformations, between scales
Functional components needed
Specification and interaction of physical,
mathematical and computational models
Definition, construction and transformation of
domains
Specification of physical parameters in the form
of tensors
Specification and execution of scale linking
Adaptive multiscale require petascale computing
Just starting to consider doing in parallel
Scaling multiscale parallel makes scaling
adaptive FEM look easy

37
Closing Remarks

Making progress on moving adaptive simulations to
petascale machines
Solver scaling well on some machines
Can adapt mesh in parallel on large numbers of
processors
ITAPS is developing tools to support parallel
mesh-based applications
Substantial challenges ahead of us
Dealing with the new machines - so far our stuff
does not scale as well on Ranger as it did on the
Blue Gene - appear to have to code to each
Dynamic load balancing is critical to our
applications
Need it working well on all machines
Have additional requirements (some defined, some
we are trying to define)