Acknowledgments: Thanks to Professor Nicholas Brummell from UC Santa Cruz for his help on FFTs after class, and also thanks to Professor James Demmel from UC Berkeley for his teaching on parallel computing, and Vasily Volkov for his feedback on our - PowerPoint PPT Presentation

1 / 1
About This Presentation
Title:

Acknowledgments: Thanks to Professor Nicholas Brummell from UC Santa Cruz for his help on FFTs after class, and also thanks to Professor James Demmel from UC Berkeley for his teaching on parallel computing, and Vasily Volkov for his feedback on our

Description:

Acknowledgments: Thanks to Professor Nicholas Brummell from UC Santa Cruz for ... double-diffusion similar to thermohaline staircases in the ocean (S. Stellmach, UCSC) ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 2
Provided by: Vai847
Category:

less

Transcript and Presenter's Notes

Title: Acknowledgments: Thanks to Professor Nicholas Brummell from UC Santa Cruz for his help on FFTs after class, and also thanks to Professor James Demmel from UC Berkeley for his teaching on parallel computing, and Vasily Volkov for his feedback on our


1
Parallel-FFTs in 3D Testing different
implementation schemes Luis Acevedo-Arreguin,
Benjamin Byington, Erinna Chen, Adrienne Traxler.
Department of Applied Mathematics and Statistics,
UC Santa Cruz
Introduction
2. Transpose-Based parallel-FFTs
  • Motivation
  • Many problems in geophysics or astrophysics can
    be modeled using the Navier-Stokes equations.
  • e.g. Earths magnetic field, mantle convection,
    solar convection zone, ocean circulation
  • Various numerical methods can be employed to
    solve these problems. Because of geometry and
    symmetry, many numerical codes employ a spectral
    method to solve the fluid dynamics.
  • Solutions can be written as sums of periodic
    functions and their corresponding weights
    (spectral coefficients)
  • Provides solutions that are continuous (can
    obtain solutions that are at positions not along
    the grid)
  • Particularly computationally challenging is the
    calculation of the non-linear terms in the
    equations
  • Using spectral coefficients, the calculation of
    non-linear terms for a 3-d space is O(n3)
  • If non-linear terms are calculated in grid
    (real) space, calculations are reduced to O(n2)
  • Fast Fourier transforms (FFTs) O(n2log n) make
    it numerically tractable to perform frequent
    transforms from real to spectral space (forward)
    and from spectral to real space backward)

Figure 3 The starting setup Data is initially
decomposed into a stack of x-y planes, so each
processor can do a local FFT in two dimensions.
The data is then transposed so that each
processor now holds a set of y-z slices, allowing
the final transform (in the z direction) to be
completed. For the reverse procedure, a similar
process (transform, transpose, complete
transform) is required.
Figure 1 (left) Magnetic field polarity reversal
caused by fluid motion in the Earths core (G.
Glatzmaier, UCSC). (right) Layering phenomenon
caused by double-diffusion similar to
thermohaline staircases in the ocean (S.
Stellmach, UCSC)
  • The transpose-based parallel-FFT utilizes FFTW
    to perform a 2-D FFT (optimized in-processor
    implementation) and then redistrubutes all data
    for the FFT in the third dimension.
  • The optimal communication scheme for transposing
    the third dimension to perform the final FFT is
    not clear.
  • a) Complete all 2-D planes in-processor, call
    MPI_ALLTOALL to transpose the data, block until
    entire transformed data is obtained, perform
    final FFT
  • b) Complete a single 2-D plane in-processor,
    call MPI_ALLTOALL to transpose each plane
    (overlapping communication with computation?),
    block until entire transformed data is obtained,
    perform final FFT
  • c) Complete a single 2-D plane in-processor,
    utilize an explicit asynchronous communication
    scheme, perform final FFT as data is available.

Schemes
  1. A naïve parallel-FFT

Figure 2 Schematic representation of a naïve
implementation of a 3D FFT. Rows, columns, and
finally stacks are sent to slave processors,
which perform 1D FFTs on the received vectors.
  • The Parallel Problem
  • If the code is serial FFTW takes all the work
    out of optimizing the FFT
  • In general Want to have large domains to
    resolve the fluid dynamics for realistic
    geophysical/astro-physical constants
  • Use large distributed clusters with large
    domains
  • Would like to use FFTW because it is the fastest
    algorithm (in the West) - Autotuning is done for
    us
  • Multiple schemes can be employed
  • Transpose-based Data in a single-direction is
    in-processor, once FFT is completed, re-decompose
    domain in another direction
  • Parallel Leave data in place and send pieces to
    other processors as necessary
  • Which is faster? Not obvious at the outset
  • The data are decomposed in vectors in the three
    directions. Each vector, whether a row, column,
    or stack, is sent to a slave processor. A master
    processor distributes tasks to each processor,
    receives the 1D FFT performed to each vector by
    the slave processors, and resend more work to
    those processors that ended their previous
    assignments.
  • This scheme aims to compute 3D FFT to datasets
    that exhibit clustering, i.e. no homogeneity in
    terms of density, which consequently requires
    more intense computational work on some areas
    than others.
  • This scheme is tested on NERSC Franklin by using
    module fftw 3.1.1.

3. Parallel version of FFTW 3.2, alpha
  • The most recent version of FFTW includes an
    implementation of a 3-D parallel-FFT.
  • It is unclear how this is implemented whether
    it completes a fully-parallel FFT (3-D domain
    decomposition) or implements a transpose-based
    FFT
  • However, if it autotunes the implementation of
    the 3-D parallel FFT it allows for greater
    portability and hides cluster topology.
  • Potentially provides a parallel implementation
    where the end-user needs no knowledge of
    parallelism in order to calculate the FFT.
  • Case for serial FFT use FFTW on any computer

Acknowledgments Thanks to Professor Nicholas
Brummell from UC Santa Cruz for his help on FFTs
after class, and also thanks to Professor James
Demmel from UC Berkeley for his teaching on
parallel computing, and Vasily Volkov for his
feedback on our homeworks.
Write a Comment
User Comments (0)
About PowerShow.com