Title: A Framework for Parallel Finite Element Method Codes With Charm
1A Framework for Parallel Finite Element Method
Codes With Charm
- This presentation will probably involve audience
discussion, which will create action items. Use
PowerPoint to keep track of these action items
during your presentation - In Slide Show, click on the right mouse button
- Select Meeting Minder
- Select the Action Items tab
- Type in action items as they come up
- Click OK to dismiss this box
- This will automatically create an Action Item
slide at the end of your presentation with your
points entered.
- M. Bhandarkar, T. Hinrichs, O. Lawlor, K.
Mahesh, L.V. Kale - and
- J.H. Jeong, J. Dantzig
2Objectives
- Assist in development of parallel FEM codes
- Support parallel implementation
- Provide capabilities of adaptive load balancing
- Yet, keep the application code free of
parallelization issues
3Dendritic Growth
- Studies evolution of solidification
microstructures using a phase-field model
computed on an adaptive finite element grid - Adaptive refinement and coarsening of grid
involves re-partitioning
4Crack Propagation
- Explicit FEM code
- Zero-volume Cohesive Elements inserted near the
crack - As the crack propagates, more cohesive elements
added near the crack, which leads to severe load
imbalance
Decomposition into 16 chunks (left) and 128
chunks, 8 for each PE (right). The middle area
contains cohesive elements. Pictures S.
Breitenfeld, and P. Geubelle
5FEM Framework Responsibilities
FEM Application (Initialize, Registration of
Nodal Attributes, Loops Over Elements, Finalize)
FEM Framework (Update of Nodal properties,
Reductions over nodes or partitions)
Partitioner
Combiner
Charm (Dynamic Load Balancing, Communication)
METIS
I/O
6Structure of an FEM Program
- Serial init() and finalize() subroutines
- Do serial I/O, read serial mesh and call
FEM_Set_Mesh - Parallel driver() main routine
- One driver per partitioned mesh chunk
- Runs in a thread time-loop looks like serial
version - Does computation and call FEM_Update_Field
- Framework handles partitioning, parallelization,
and communication
7Structure of an FEM Application
init()
Update
Update
Update
driver
driver
driver
Shared Nodes
Shared Nodes
finalize()
8Framework Calls
- FEM_Set_Mesh
- Called from initialization to set the serial mesh
- Framework partitions mesh into chunks
- FEM_Create_Field
- Registers a node data field with the framework,
supports user data types - FEM_Update_Field
- Updates node data field across all processors
- Handles all parallel communication
- Other parallel calls (Reductions, etc.)
9Implementation
- Requirements
- Multi-partitioning
- Latency-tolerant, better use of cache,
flexibility for load balancing - Migration-based adaptive load balancing
- Handle dynamic load variations in irregular
applications - Threads
- Parallel code structure similar to sequential
codes - Charm supports all of these
- Data-driven parallel language and runtime system
10Charm
- Adaptive latency tolerance
- Message-driven execution
- No blocking receives
- Interoperability with other parallel languages
- Run-time system allows modules using Charm,
Threads, PVM, MPI to be combined - Runs on a variety of machines
- Clusters of workstations (Unix, Linux, Windows
NT) - Massively parallel machines (ASCI Red, Cray T3E)
- SMP machines (IBM SP, SGI Origin)
11Load Balancing Framework
- Based on object migration and measurement of load
information - Partition problem more finely than the number of
available processors - Partitions implemented as objects (or threads)
and mapped to available processors by LB
Framework - Runtime system measures actual computation times
of every partition, as well as communication
patterns - Variety of plug-in LB strategies available
12Load Balancing Framework
13Crack Propagation
Decomposition into 16 chunks (left) and 128
chunks, 8 for each PE (right). The middle area
contains cohesive elements. Both decompositions
obtained using Metis. Pictures S. Breitenfeld,
and P. Geubelle
14Overhead of Multipartitioning
15Load balancer in action
Automatic Load Balancing in Crack Propagation
1. Elements Added
3. Chunks Migrated
2. Load Balancer Invoked
16Charm Threads
- A single scheduler for both objects and threads
(based on generalized messages) - Non-preemptive
- Complete control over scheduling
- No locking overhead
- Allow migration
- isomalloc thread stacks balances creation
overhead with ability to migrate
17Migrating Threads
- Problem references to local variables on stack
undefined upon migration - Solution Force threads to use identical virtual
addresses on any processor - Stack-copying
- Isomalloc
18Migrating Threads
- Preliminary implementation Stack-copy on
context-switch - All threads execute on process stack
- Stack copied on context-switch
- Expensive
- Current implementation isomalloc
- Threads use identical virtual addresses on any
processor - Stacks are locally allocated, but globally
reserved - Sync-less algorithm using memory slotting
19Overhead of Context-switch
20Scalability of FEM Framework
21Future Work
- More support for dynamic FEM applications
- Adaptive refinement/coarsening
- Insertion/deletion of elements