The Zoltan Toolkit - Partitioning a Linear Accelerator for Tau3P - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

The Zoltan Toolkit - Partitioning a Linear Accelerator for Tau3P

Description:

E. Boman, K. Devine, R. Heaphy, B. Hendrickson; Sandia National Labs, NM ... Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 21
Provided by: kdd7
Category:

less

Transcript and Presenter's Notes

Title: The Zoltan Toolkit - Partitioning a Linear Accelerator for Tau3P


1
The Zoltan Toolkit - Partitioning a Linear
Accelerator for Tau3P
  • E. Boman, K. Devine, R. Heaphy, B. Hendrickson
    Sandia National Labs, NM
  • N. Folwell, K. Ko, M. Wolf SLAC
  • Pinar, LBL
  • Sandia is a multiprogram laboratory operated by
    Sandia Corporation, a Lockheed Martin
    Company,for the United States Department of
    Energys National Nuclear Security
    Administration under contract DE-AC04-94AL85000.

2
The Zoltan Toolkit
  • Parallel, dynamic, adaptive computations need
    many services to obtain peak performance.
  • Processor work loads change during computation.
  • Communication patterns are complicated.
  • Memory usage is dynamic.
  • Application developers wrote their own solutions.
  • Little expertise in such parallel algorithms.
  • No capability to compare approaches.
  • No code reuse.

Zoltan Toolkit of data services for dynamic,
unstructured, adaptive computations
3
Zoltan Data Services
4
Support for Many Applications
  • Different applications, requirements, data
    structures.

5
Zoltan Interface
  • Simple, easy-to-use interface.
  • Small number of callable Zoltan functions.
  • Callable from C, C, Fortran.
  • Data-structure neutral design.
  • Supports wide range of applications and data
    structures.
  • Imposes no restrictions on applications data
    structures.
  • Application does not have to build Zoltans data
    structures.
  • Only requirement unique global IDs for objects.
  • Application interface
  • Zoltan queries the application for needed info.
  • IDs of objects, coordinates, relationships to
    other objects.
  • Application provides simple functions to answer
    queries.
  • Small extra costs in memory and function-call
    overhead.

6
Partitioning and Dynamic Load Balancing
  • Goals for static partitioning
  • Distribute work evenly among processors.
  • Minimize interprocessor communication.
  • Desirable characteristics for dynamic load
    balancing
  • Keep data movement costs low.
  • Incremental partitioning small changes in
    workloads produce only small changes in
    decomposition.
  • Parallel, scalable implementation.

7
No One-Size-Fits-All Solutions
  • No single partitioner works best for all
    applications.
  • Trade-offs
  • Quality vs. speed.
  • Geometric locality vs. data dependencies.
  • Low data-movement costs vs. tolerance for
    remapping.
  • Application developers may not know which
    partitioner is best for application.
  • Zoltan contains suite of partitioning methods.
  • Application changes only one parameter to switch
    methods.
  • Allows experimentation/comparisons to find most
    effective partitioner for application.
  • Advantage of toolkit approach.

8
Zoltan Suite of Partitioning Algorithms
Recursive Coordinate Bisection (Berger,
Bokhari) Recursive Inertial Bisection
ParMETIS (Karypis, Schloegel, Kumar) Jostle
(Walshaw)
Space Filling Curves (Peano, Hilbert) Refinement-t
ree Partitioning (Mitchell) Octree Partitioning
(Loy, Flaherty)
9
SLAC SciDAC project
55-cell Linear Accelerator with couplers
1,122,445 elements (H60VG3) Courtesy of Michael
Wolf, SLAC.
  • Tau3P Electromagnetic field solver (SLAC)
  • Kwok Ko, N. Folwell, M. Wolf (SLAC) K. Devine
    (SNL) A. Pinar (LBL).
  • Long simulation times
  • Tens of thousands of CPU hours
  • Communication cost dominates
  • Need high-quality static partitioning

10
Several Partitioning Methods
11
RCB-1D Partitioning
12
5 Cell RDDS (32 processors) Partitioning
Tau3P Runtime Max Adj. Procs Sum Adj. Procs Max Bound. Objs Sum Bound. Objs
ParMETIS 165.5 s 8 134 731 16405
RCB-1D (z) 67.7 s 3 66 2683 63510
RCB-3D 373.2 s 10 208 1404 24321
RIB-3D 266.8 s 8 162 808 20156
HSFC-3D 272.2 s 10 202 1279 26684
2.0 ns runtimeIBM SP3 (NERSC)
13
Coupler Port Grouping Complication
14
U Partitioning of 5 cell (32 processors)
15
Tau3P Speedup
16
Summary
55-cell Linear Accelerator with couplers
1,122,445 elements Courtesy of Michael Wolf,
SLAC.
  • Dont blindly use graph partitioner
  • In this case, 1-d RCB is much better
  • Performance sensitive to number of adjacent
    processors (not edge cut in graph)
  • Zoltan toolkit
  • Provides easy access to several algorithms
  • Zoltans 1D geometric partitioner reduced runtime
    up to 68 on 512 processor IBM SP3.

17
For More Zoltan Information...
  • Zoltan Home Page
  • http//www.cs.sandia.gov/Zoltan
  • Users and Developers Guides
  • Download Zoltan software under GNU LGPL.
  • Email
  • zoltan_at_cs.sandia.gov

18
(No Transcript)
19
Applications Adaptive Mesh Refinement
  • Dynamic load balancing.
  • Redistribute elements after mesh refinement.
  • Keep data movement costs low.
  • Recursive Coordinate Bisection
  • Parent and child elements assigned to same
    processor.
  • Inexpensive.
  • Incremental.

Using RCB with AMR in SIERRA (Edwards, Rath,
Lober, et al., Sandia)
20
U Partitioning vs. Z Partitioning
RCB-1D-Z Run Time
-- RCB-1D-Z Adj. Procs
Max. Adj. Procs
Run Time (s)
RCB-1D-U Run Time
-- RCB-1D-U Adj. Procs
Processors
Write a Comment
User Comments (0)
About PowerShow.com