Implementation of Fast Fourier Transform for Three Dimensional Cubic Matrices - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Implementation of Fast Fourier Transform for Three Dimensional Cubic Matrices

Description:

Kali Beowulf Cluster, Math Department at UMBC. 30 nodes. 2 Intel Xeon 2.0 GHz processors per node ... math.umbc.edu/~gobbert/kali/ Optimal split of array. Alex ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 17
Provided by: userpag
Category:

less

Transcript and Presenter's Notes

Title: Implementation of Fast Fourier Transform for Three Dimensional Cubic Matrices


1
Implementation of Fast Fourier Transform for
Three DimensionalCubic Matrices
Alex C. Szatmary Department of Mechanical
Engineering University of Maryland, Baltimore
County al1_at_umbc.edu December 18, 2006
Sponsored by NSF GRFP
2
Purpose
Implement three dimensional fast Fourier
transform on Beowulf cluster Numerical PDEs
Alex C. Szatmary
3
Introduction to FFT
  • Fourier transform (FT) represent continuous
    functions as infinite summation of sinusoidal
    basis functions
  • Discrete Fourier transform (DFT) represent
    discrete series of data as finite summation of
    basis functions
  • Fast Fourier transform O(n log(n))
    implementation of DFT (Cooley-Tukey, etc.)
  • Simplest implementation for n2m elements

Alex C. Szatmary
4
Introduction to FFT
Alex C. Szatmary
5
Terminology
n2m N n3 elements total A(i,j,k) First index
changes first (Fortran column-oriented)
Alex C. Szatmary
6
3DFFT Algorithm
Scale (cheap) do k1,n do j1,n FFT
in i-direction on A(,j,k) do k1,n do i1,n
FFT in j-direction on A(i,,k) do j1,n
do i1,n FFT in k-direction on A(i,j,)
O(n3 log(n))
Alex C. Szatmary
7
Parallelization
  • Do 2 loops in parallel, 1 in serial
  • Cost
  • Floating point operations
  • Transfer data between memory and cache
  • Network Communications
  • Array copy operations

Alex C. Szatmary
8
Machine Specifications
  • Kali Beowulf Cluster, Math Department at UMBC
  • 30 nodes
  • 2 Intel Xeon 2.0 GHz processors per node
  • 1 GB RAM each
  • Fortran 90
  • MPI
  • http//www.math.umbc.edu/gobbert/kali/

Alex C. Szatmary
9
Optimal split of array
  • Why split in k-direction?
  • Natural
  • Maintain element contiguity
  • Use MPI_Scatter/MPI_Gather

Alex C. Szatmary
10
Tried k-direction split
Alex C. Szatmary
11
Serial Timing Results by Part
Alex C. Szatmary
12
Why did k-direction split fail?
  • Same flops needed as i- or j-direction split
  • Same amount of information to be communicated
    across network through MPI
  • Fewer array copy operations
  • Higher cost of memory access for FFT in j and k
    directions!

Alex C. Szatmary
13
Disadvantages of i-direction split
  • Disrupt element contiguity
  • Need to make copies of each block before sending
    with MPI
  • Need MPI_Send, MPI_Recv for each process

Alex C. Szatmary
14
i-direction Split Results
Alex C. Szatmary
15
i-direction Split Results
Alex C. Szatmary
16
Conclusion
  • A parallel 3DFFT algorithm was successfully
    implemented
  • Speedup of up to 12
  • Split in i-direction
  • Nonsequential memory access during computation is
    expensive

Alex C. Szatmary
Write a Comment
User Comments (0)
About PowerShow.com