M.S. Shephard, K.E. Jansen, A. Ovcharenko, O. Sahni, Ting Xie and Min Zhou - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

M.S. Shephard, K.E. Jansen, A. Ovcharenko, O. Sahni, Ting Xie and Min Zhou

Description:

Example - stack of prismatic elements in a boundary layer to support the ... Mesh size field of air bubbles distributing in a tube (segment of the model) ... – PowerPoint PPT presentation

Number of Views:82
Avg rating:3.0/5.0
Slides: 38
Provided by: robert504
Category:

less

Transcript and Presenter's Notes

Title: M.S. Shephard, K.E. Jansen, A. Ovcharenko, O. Sahni, Ting Xie and Min Zhou


1
Dynamic Load Balancing Needs of Parallel
Adaptive Analysis
  • M.S. Shephard, K.E. Jansen, A. Ovcharenko, O.
    Sahni, Ting Xie and Min Zhou
  • Scientific Computation Research Center
  • Rensselaer Polytechnic Institute
  • Outline
  • Introduction to approach
  • Parallel adaptive finite element analysis
  • Scaling the solver
  • Adaptive mesh control
  • Parallel mesh structure and parallel mesh
    migration
  • Some applications
  • More parallel applications
  • Mesh generation
  • Adaptive multiscale

This work is supported by the DOE SciDAC program
as part of theInteroperable Technologies for
Advanced Petascale Simulations,and NSF through a
petascale application grant
2
Status and Issues in Parallel Adaptive Simulation
  • Components needed
  • Automatic mesh generators
  • General mesh adaptation
  • Mesh correction indication
  • Parallel adaptive mesh control
  • Dynamic load balancing
  • Issues going forward
  • Dealing with more complex parallel control
    functions New demands on dynamic load-balancing
  • Development of parallel adaptive applications
  • Operation in a petascale environment

initial mesh
adapted mesh
3
Dealing with New Application Development
  • Want to reuse methods and tools - provide
    enabling technology.
  • A partitioned-based parallel computation mode
    is assumed.
  • In this case the key components can be abstracted
    as
  • The System Model
  • To account for the characteristics of the
    computing system - becoming more complicated at
    the petascale
  • The Partition Model
  • A simple model to which application models can be
    mapped for purposes of parallel adaptive
    computations
  • The Application Model
  • Accounts how computations can be done in parallel
  • Must focus on the entities associated with the
    computations and their interactions
  • Structure of computational entities and their
    interactions must map to the partition model

4
System Model
  • Petascale Computers
  • A key driver Number of kilowatts to run and cool
    the computer
  • Can not afford to construct them like the
    clusters we all love
  • There will be many cores per node and cores
    and even nodes will not all be the same
  • Substantial differences between machines -
    Contrast the IBM BlueGene to Ranger
    (Sunmachine at U. Texas)

The machine going into Argonne as ofAugust 2006
presentation by Rick Stevens
5
Partition Model
  • The partition is a collection of parts
  • Parts
  • Have a given amount of computational load
  • Need to communicate with other parts in
    prescribed way during the computation
  • Parts are collections of objects that are of
    meaning to the applications
  • Needs effective interactions with dynamic load
    balancing

6
Dynamic Load Balancing
  • Zoltan Dynamic Services (http//www.cs.sandia.gov/
    Zoltan/)
  • Supports multiple dynamic partitioners
  • General control of the definition of part objects
    and weights supported
  • Under active development with emphasis on going
    to petascale machines
  • Focused on graph-based (or hypergraph-based)
    partitioners (intend to even use them when the
    interactions are spatial - like potential contact
    - simply define graph edges when things might
    touch in the next set of steps!)

7
Application Model
  • Must
  • Define the objects that will make up the parts
  • Quantification of the object computation
  • Determination of object-to-object dependencies
  • Defining and controlling the entities
  • Have to be defined at the appropriate level and
    be related to the data structures and control of
    the application
  • Entities considered in the applications covered
    today
  • Mesh entities in non-manifold FE mesh
  • Collection of mesh entities to be kept in the
    same part
  • Integration points for which unit cell
    evaluations are performed
  • Atoms
  • Chunks of space containing material

8
Petascale Adaptive FE Analysis
  • Steps to get there
  • 1. Be sure the fixed mesh solver scales to
    100,000s of processors
  • 2. Provide parallel distributed support for mesh
    adaptation
  • 3. Construct adaptive loops in which all
    components run on petascale machines
  • 4. Get scalability on all of it
  • Status Summary
  • Good progress with an implicit FE flow code
  • Tools for supporting parallel mesh adaptation
  • Constructing initial adaptive loops
  • A way to go to petascale

9
Introduction to PHASTA
  • Parallel finite element flow solver that solves
    both compressible and incompressible flow.
  • Implicit time integration - requires the solution
    of very large systems of linear algebra equations
    at each time step using iterative solvers.
  • PHASTA and its predecessor have been parallel for
    over 15 years, 10 of which have been at RPI.
  • Breaks the total domain into parts with roughly
    the same number of elements on each processor.
  • Work can be characterized as requiring
  • Substantial floating point operations to form
    system of equations,
  • Organized, substantial, and regular communication
    between partitions that touch each other,
  • For each iteration (typically O(10) iterations
    per solve), there is a required ALL-REDUCE
    communication.

10
Patient-Specific Abdominal Aortic Aneurysm
  • Mesh had gt 50M dof
  • Must be solved in 10 min.
  • Implicit FE flow solve scales

Proc. t (sec) scale
16384 60.6 1.04
8192 131.7 0.957
4096 241.6 1.04
2048 502.3 1.00
1024 1008.7 1.00
11
Implementation of Adaptive Mesh Control
  • Given the mesh size field
  • Mesh modification loop
  • Look at element edge lengths and shape
  • If both satisfactory, continue to next element
  • If not, select best modification
  • Elements with edges that are too long must have
    edges split or swapped
  • Short edges eliminated
  • Continue until size and shape is satisfied or no
    more improvement possible
  • Determination of best mesh modification
  • Select mesh modifications based on element shape
    properties
  • Appropriate considerations of neighboring
    elements
  • Choosing the best mesh modification

12
Mesh Information
  • A piece-wise domain decomposition over which the
    simulation is to be run
  • A mesh data structure provide services to create
    and/or use the mesh data
  • Each application has its own needs of mesh
    representation in terms of levels of entities
    and adjacencies used
  • ? flexibility in mesh representations
  • 3 approaches for mesh data structure design
  • Fixed, specific mesh representation
  • Reduced model store only needed entities
  • Fixed, general mesh representation
  • Full model Store all entities.
  • Flexible mesh representation
  • Flexible Mesh Data structure
  • Switch between various representations for
    different needs of applications
  • Application specifies which entities and
    adjacencies it needs.
  • Achieving a good performance both in memory and
    computational cost for wide range of applications.

13
Distributed Mesh Data Structure
  • Distributed Mesh
  • Mesh divided into parts for distribution on
    parallel computers
  • Part Pi consists of a set of mesh entities
    assigned to the ith part.
  • Part Object
  • The basic unit to which a part ID is assigned.
  • A mesh entity to be partitioned
  • Mesh entities to be partitioned in the
  • example mesh are M13, M12, M23, M22, M11
  • An EntityGroup.
  • Residence Part
  • Operator P Mid returns a set of part IDs
  • where Mid exists. (e.g. P M10 P0, P1,
    P2 )
  • Residence part of Mid on Pi
  • If a partition object, P Mid Pi
  • Otherwise, P Mid U P Mjq Mid ? ?(Mjq)

14
Entity Group
  • EntityGroup
  • A group of mesh entities that needs to stay
    together in a part during the lifetime of the
    EntityGroup as defined by the needs of an
    application.
  • Example - stack of prismatic elements in a
    boundary layer to support the adaptation of the
    layer
  • Entity group rules
  • Mesh entities in a group stay as a group during
    the life time of EntityGroup
  • A mesh entity can only be in a single
    EntityGroup, and is defined once in the
    EntityGroup
  • EntityGroup information maintained before and
    after migration
  • EntityGroup is dynamic as defined by the
    application which can create and destroy an
    EntityGroup, or add/remove mesh entities in an
    EntityGroup

15
Mesh Partition through Zoltan
  • Perform mesh partition through Zoltan
    graph-based partitioning
  • In mesh partitioning, a partition object can be
    either a mesh entity to be partitioned, or an
    EntityGroup.

Different colors represent different EntityGroups
(3 EntityGroups in the 2D mesh). Construct
graph for mesh partition Graph nodes
objects to be partitioned (partition
object). Graph Edges mesh edge-based
dependencies between two objects. Weights
Set graph node and graph edge weights.
16
Distributed Mesh Representation
  • Functional Requirements
  • Communication links
  • Remote part non-self part where an entity is
    duplicated
  • Remote copy the memory location of the entity
    duplicated on the remote part
  • Efficient mechanisms to update mesh partitioning
    and keep the links between partitions are
    mandatory
  • Entity ownership
  • Used for operation control
  • Static ownership
  • Owner part of an entity is fixed to the specific
    partition regardless of mesh partitioning
  • Not suitable for adaptive analysis due to severe
    load imbalance
  • Dynamic ownership
  • Owner part of an entity is determined dynamically
    depending on mesh partitioning

17
Distributed Mesh Data Structure
  • Mesh Migration with Full Complete Representations
  • Given a list of pair ltpartition object,
    destination part idgt
  • STEP 1 Collect entities to be updated and reset
    their P and partition classification
  • STEP 2 Determine P of partition objects and
    downward entities
  • STEP 3 Based on P, update the partition model
    and collect entities to remove from each part
  • STEP 4 Exchange entities and update remote
    copies
  • STEP 5 Remove unnecessary entities collected in
    STEP 3
  • STEP 6 Update the owner part of partition model
    entities

18
Flexible Distributed Mesh Data Structure
  • Mesh Migration with reduced complete
    representation
  • STEP A collect neighboring part objects ()
  • STEP B restore downward interior entities ()
  • STEP 1 collect entities to update and clear
    partition classification and P of them.
  • STEP 2 Determine P
  • STEP 3 Update partition classification and
    collect entities to remove
  • STEP 4 create only necessary migrate-in entities
    in representation and update remote copies
  • Do not send interior entities that will not be on
    the partition boundary ()
  • STEP 5 remove unnecessary migrate-out entities
  • STEP 6 update entity ownership
  • STEP C remove unnecessary interior entities and
    adjacencies ()
  • savings in migration time with flexible mesh
    representation in parallel
  • - losses in migration time with flexible mesh
    representation in parallel

P0
Serial 2D mesh with MSR
Partitioned 2D mesh with MSR
19
Flexible Distributed Mesh Data Structure
  • Examples 2-D mesh migration with the reduced
    representation

?
1
1
(a) Mark destination pid
(b) step A Get neighboring POs
(c) Step B Restore internal ents
20
Parallel Mesh Adaptation
  • Parallelization of refinement perform on each
    part and synchronize at inter-part boundaries
  • Parallelization of coarsening and swapping
    migrate cavity (on-the-fly) and perform operation
    locally on one part.
  • Requires update of evolving communication-links
    between parts and dynamic mesh partitioning

21
Mesh Adaptation - Uniform Refinement
  • Tests run on IBM Blue Gene/L
  • Slow processors
  • Fast communication

initial (265k tets)
  • At 8 processors
  • Initial mesh - 33K/processor
  • Final mesh - 226K/processor
  • At 128 processors
  • Initial mesh 2.0K/processor
  • Final mesh 16.7K/processor

adapted (2,127k tets)
  • Scalability for one iteration
  • of mesh adaptation

22
Mesh Adaptation - Refinement
  • Communication time for one iteration of mesh
    adaptation
  • Total time for one iteration of mesh adaptation
  • Communication to total time ratio for one
    iteration of mesh adaptation

23
Mesh Adaptation - Refinement and Coarsening
  • Scalability for five iterations
  • of mesh adaptation

initial mesh (1,528k tets)
  • At 8 processors
  • Initial mesh - 191K/processor
  • Final mesh - 241K/processor
  • At 128 processors
  • Initial mesh 11.9K/processor
  • Final mesh 15.0K/processor

adapted mesh (1,926k tets)
24
Mesh Adaptation - Refinement and Coarsening
  • Communication time for
  • five iterations
  • Total time for five iterations
  • Communication to total time ratio for five
    iterations

25
Mesh Adaptation for 1 Billion Element Mesh
Mesh size field of air bubbles distributing in a
tube (segment of the model)
Number of regions of adapted mesh among 16k parts
  • Initial mesh uniform, 17,179,836 mesh regions
  • Adapted mesh 160 air bubbles 1,064,284,042 mesh
    regions
  • Multiple predictive load balance are used to make
    the adaptation possible
  • Larger meshes possible (not out of memory) but
    this element count is appropriate for solver

Initial and adapted mesh at one bubble - colored
by magnitude of mesh size field
26
Predictive Load Balancing
  • Refinement of mesh before load balancing can lead
    to memory problems
  • Employ predictive load balancing to avoid the
    problem
  • Assign weights based on what will be refined
  • Apply dynamic load balancing
  • Refinement
  • May want to do some local migration

with predictive load balancing
without predictive load balancing
27
Nodal Balance by Local Modification
  • For light loaded mesh (small number of regions
    for each process), well distributed mesh (based
    on the number of regions) could have bad nodal
    balance.
  • Local modification method is used to balance the
    number of nodes on each part.
  • Region (node) ratio number of region
    (node)/average number of region (node)
  • Average number of regions for the test 2434
  • Number of parts1024

Node ratio before and after node balance
Region/node ratio before node balance
Before node balance
Region ratio before and after node balance
28
Dynamic Load Balancing Needs
  • Basic needs for graph-based partitioner
  • Executed many times during - needs to scale and
    be efficient
  • Abstraction of a graph important - definition of
    graph nodes and edges is application dependent
  • Not hard to create edges near contact - need to
    determine information for mesh adaptation anyway
  • Real-valued weights (not integers)
  • Some additional functions needed
  • Moving small numbers of entities for specific
    needs
  • Consideration of multiple criteria - (e.g., edges
    nodes)
  • Possible multiple levels of interacting graphs
  • graphs and interactions defined by the
    applications
  • Important for multiphysics and/or multiscale
  • Current area of research at RPI.

29
FMDB is Part of ITAPS
  • ITAPS tools (http//www.itaps-scidac.org)
  • A core functionality of the Interoperable
    Technologies for Advanced Petascale Simulations
    (ITAPS) meshing tools
  • The only ITAPS component thus far that supports
  • Geometry-based adaptive analysis
  • Distributed mesh operations in parallel
  • Flexible mesh representation

30
Adaptive Loop for Accelerator Design
  • Complex CAD geometry
  • Physics modeling by the SLAC Omega3P
  • High level modeling accuracy needed
  • E.g., 0.1 error in frequency predictions
  • Parallel adaptive mesh control needed to provide
    accuracy needed

Initial mesh (1,595 tets)
Adapted mesh (23,082,517 tets)
31
Patient Specific Vascular Surgical Planning
(Stanford, RPI)
  • Virtual flow facility for patient specific
    surgical planning
  • High quality patient specific flow simulations
    needed quickly
  • Image patient, create model, apply adaptive flow
    simulation

Mesh generation
Path planning
Segmentation solid modeling
Simulation
Load CT image
32
Reliable In-Time Cardiovascular Flow Simulations
  • Requirements
  • Must execute from image data
  • Geometric domains and meshes automatically
    constructed
  • Inflow BC from image data
  • Realistic representation of outflows and
    materials
  • Simulations must provide reliable flow results
  • Meshes with millions of elements typically
    required
  • Adaptive mesh control required to ensure accuracy
  • Simulations must execute in time needed (15
    minutes)
  • Highly effective parallel computation required

Meshes by Simmetrix MeshSim
33
Example of Entire Process Image, Solid, Mesh
Wilson et al. Lect. Notes Comp. Sci. 2001 2208
449-456
S T A N F O R D U N I V E R S I T YC A R D I O
V A S C U L A R B I O M E C H A N I C S R E S E
A R C H L A B O R A T O R Y
34
Example of Entire Process Adaptive Meshes
35
Parallel Mesh Generation
  • Consider parallel mesh generation
  • Computation effort related to of elements, but
    boundary elements have variable load
  • Only structure known at start is the geometric
    model
  • Calculations evolve during mesh generation All
    mesh generation steps operate in parallel
  • Meshes starting from solid model
  • Both structures created by the mesh generator are
    distributed
  • Octree - used for mesh control, localizing
    searches, interior templates
  • Mesh - topological hierarchy distributed over
    parts
  • Mesh generation steps
  • Surface mesh generation
  • Octree refinement
  • Template meshing of interior octants
  • Meshing boundary octants

36
Abstraction of Multiscale Simulation
  • Practical multiscale simulations will
  • Take advantage of existing single scale tools
  • Automatically execute processes
  • Communicate information, accounting for
    transformations, between scales
  • Functional components needed
  • Specification and interaction of physical,
    mathematical and computational models
  • Definition, construction and transformation of
    domains
  • Specification of physical parameters in the form
    of tensors
  • Specification and execution of scale linking
  • Adaptive multiscale require petascale computing
  • Just starting to consider doing in parallel
  • Scaling multiscale parallel makes scaling
    adaptive FEM look easy

37
Closing Remarks
  • Making progress on moving adaptive simulations to
    petascale machines
  • Solver scaling well on some machines
  • Can adapt mesh in parallel on large numbers of
    processors
  • ITAPS is developing tools to support parallel
    mesh-based applications
  • Substantial challenges ahead of us
  • Dealing with the new machines - so far our stuff
    does not scale as well on Ranger as it did on the
    Blue Gene - appear to have to code to each
  • Dynamic load balancing is critical to our
    applications
  • Need it working well on all machines
  • Have additional requirements (some defined, some
    we are trying to define)
Write a Comment
User Comments (0)
About PowerShow.com