"GPU Programming and Performance" Kevin R. Tubbs - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

"GPU Programming and Performance" Kevin R. Tubbs

Description:

... Graduate Program in Engineering Science, Louisiana State University, 3507 ... Incompressible Flows' 47th AIAA Aerospace Sciences Meeting, Orlando FL, paper no: ... – PowerPoint PPT presentation

Number of Views:127
Avg rating:3.0/5.0
Slides: 19
Provided by: Ige9
Category:

less

Transcript and Presenter's Notes

Title: "GPU Programming and Performance" Kevin R. Tubbs


1
"GPU Programming and Performance"Kevin R. Tubbs
IGERT Fellow Donald W. Clayton Graduate Program
in Engineering Science, Louisiana State
University, 3507 CEBA Building, Baton Rouge, LA
70803-6405 PH (225) 578-4246 FAX (225)
578-6782, E-mail ktubbs2_at_lsu.edu
  • IGERT Student Seminar Series

2
BackGround
  • Computational Power

Source heise.de, tomshardware.com, wikipedia.org
3
Background
  • CUDA Programming Model

Source nvidia.com
4
Overview CUDA Zone for CFD
http//www.nvidia.com/object/cuda_home.htmlstate
home
5
Applications
  • Elegant Mathematics
  • Iterative Solvers
  • Linear System Solvers
  • CFD EM Solvers
  • Accelereyes
  • Jacket GPU Engine
  • Ports matlab code to GPU
  • Supports functions friendly to GPU

6
Journal Papers
  • Methods
  • 3D Euler
  • LBM (2D 3D)
  • Discontinuous Galerkin (DG)

7
Presentations
  • Smoothed Particle Hydrodynamics
  • 23X Meshless/MeshFree
  • Langrangian
  • LES Navier Stokes Code
  • 17X Cartesian
  • Red-Black SOR Poisson Solver
  • 3D Incompressible Navier Stokes Code
  • 100X 4 GPU
  • Cartesian

8
Applications Accelereyes
  • Accelereyes
  • Jacket GPU Engine
  • Maximum Reported Speedup Using Jacket 50 X
  • LBM Speedup 10X
  • Graphics Toolbox
  • OpenGL
  • Basic Plot open source functions

9
Applications Elegant Mathematics
  • EMGPUIter
  • Avg. Peformance 100K unknowns
  • 7 GFlop/s Single NVIDIA 260 (100 MFlop/s QC Xeon
    2.6Ghz)
  • Block Versions 70 GFlop/s (3 GFlops )
  • EMGPUHMatrix
  • Linear System 163,840 unknowns
  • Generated and Solved w/ 25X Speedup (QC Xeon)
  • EMGPUPartBoltzmann
  • Solves Boltzmann Eq. via Particle Method
  • Collision 30X (QC Xeon)
  • Stream 120X (QC Xeon)
  • Problem Size
  • 15 M particles on 1 GPU
  • 1 B particles on GPU CPU w/ 64GB of Ram Using
    Out-of-Core

10
Method Performance LBM
  • LBM
  • Good Locality in Memory Acces
  • 2D LBM for N-S
  • D2Q9
  • 10X
  • 3D LBM
  • D3Q15
  • 28X 100X (4 GPU)

11
Method Performance 2D LBM
  • 2D Shallow Water Mass Transport
  • 10X
  • Same Code Cast to GPU
  • No Optimization

12
Method Performance DG
  • Maxwells Eq.
  • DG, 3D Unstructured Mesh
  • DG
  • Good Locality in Memory Access (element local)
  • High order require fewer data points

A. Klocknera, T. Warburtonb, J. Bridgeb, J. S.
Hesthavena (2009) Nodal Discontinuous Galerkin
Methods on Graphics Processors
http//www.citebase.org/abstract?idoaiarXiv.org
0901.1024
13
Method Performance DG
  • Maxwells Eq.
  • DG, 3D Unstructured Mesh
  • Single 400 GTX 280 GPU
  • 40X to 60X relative to serial
  • 200 GFlop/s

A. Klocknera, T. Warburtonb, J. Bridgeb, J. S.
Hesthavena (2009) Nodal Discontinuous Galerkin
Methods on Graphics Processors
http//www.citebase.org/abstract?idoaiarXiv.org
0901.1024
14
Method Performance 3D N-S Solver
  • Lid-Driven Cavity
  • Re 1000 Mesh 323 2563
  • 2D flow physics 3D computations
  • Numerical Method
  • Incompressible flow Cartesian Method
  • Projection Method Press/Vel Staggered Grid
  • 1st order time 2nd order diff. Poisson Jacobi

Julien C. Thibault and Inanc Senocak, (2009)
CUDA Implementation of a Navier-Stokes Solver on
Multi-GPU Desktop Platforms for Incompressible
Flows 47th AIAA Aerospace Sciences Meeting,
Orlando FL, paper no AIAA-2009-758.
15
Method Performance 3D N-S Solver
  • Workstation GPU Speedup
  • Intel Core 2 Duo 3.0 GHz (Max 21X)
  • AMD Opteron 2.4 GHz (Max 100X)

Julien C. Thibault and Inanc Senocak, (2009)
CUDA Implementation of a Navier-Stokes Solver on
Multi-GPU Desktop Platforms for Incompressible
Flows 47th AIAA Aerospace Sciences Meeting,
Orlando FL, paper no AIAA-2009-758.
16
Method Performance 3D N-S Solver
  • Server GPU Speedup
  • Scales at 75 Efficiency for Number of GPU

Julien C. Thibault and Inanc Senocak, (2009)
CUDA Implementation of a Navier-Stokes Solver on
Multi-GPU Desktop Platforms for Incompressible
Flows 47th AIAA Aerospace Sciences Meeting,
Orlando FL, paper no AIAA-2009-758.
17
IGERT Hardware Options
  • Tesla
  • 1300
  • 240 Streaming Processor Cores
  • Frequency of processor cores1.3GHz
  • Single Precision floating point performance (peak
    ) 933
  • Double Precision floating point performance
    (peak) 78
  • Dedicated Memory 4GB GDDR3
  • GTX 280
  • 350
  • 240 Processor Cores
  • Processor Clock (MHz)1.3 GHz  
  • Dedicated Memory 1GB GDDR3

18
Questions
  • Thank You
Write a Comment
User Comments (0)
About PowerShow.com