Evaluating GPU Passthrough in Xen for High Performance Cloud Computing - PowerPoint PPT Presentation

Loading...

PPT – Evaluating GPU Passthrough in Xen for High Performance Cloud Computing PowerPoint presentation | free to download - id: 6ae145-MjY4N



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Evaluating GPU Passthrough in Xen for High Performance Cloud Computing

Description:

Evaluating GPU Passthrough in Xen for High Performance Cloud Computing Andrew J. Younge1, John Paul Walters2, Stephen P. Crago2, and Geoffrey C. Fox1 – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 29
Provided by: Gregorvon7
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Evaluating GPU Passthrough in Xen for High Performance Cloud Computing


1
Evaluating GPU Passthrough in Xen for High
Performance Cloud Computing
  • Andrew J. Younge1, John Paul Walters2, Stephen
    P. Crago2, and Geoffrey C. Fox1
  • 1 Indiana University
  • 2 USC / Information Sciences Institute

2
Where are we in the Cloud?
  • Cloud computing spans may areas of expertise
  • Today, focus only on IaaS and the underlying
    hardware
  • Things we do here effect the entire pyramid!

3
Motivation
  • Need for GPUs on Clouds
  • GPUs are becoming commonplace in scientific
    computing
  • Great performance-per-watt
  • Different competing methods for virtualizing GPUs
  • Remote API for CUDA calls
  • Direct GPU usage within VM
  • Advantages and disadvantages to both solutions

4
Front-end GPU API
  • Translate all CUDA calls into remote method
    invocations
  • Users share GPUs across a node or cluster
  • Can run within a VM, as no hardware is needed,
    only a remote API
  • Many implementations for CUDA
  • rCUDA, gVirtus, vCUDA, GViM, etc..
  • Many desktop virtualization technologies do the
    same for OpenGL DirectX

5
Front-end GPU API
6
Front-end API Limitations
  • Can use remote GPUs, but all data goes over the
    network
  • Can be very inefficient for applications with
    non-trivial memory movement
  • Usually doesnt support CUDA extensions in C
  • Have to separate CPU and GPU code
  • Requires special decouple mechanism
  • Cannot directly drop in solution with existing
    solutions.

7
Direct GPU Passthrough
  • Allow VMs to directly access GPU hardware
  • Enables CUDA and OpenCL code
  • Utilizes PCI-passthrough of device to guest VM
  • Uses hardware directed I/O virt (VT-d or IOMMU)
  • Provides direct isolation and security of device
  • Removes host overhead entirely
  • Similar to what Amazon EC2 uses

8
Direct GPU Passthrough
9
Hardware Setup
Sandy Bridge Kepler Westmere Fermi
CPU (cores) 2x E5-2670 (16) 2x X5660 (12)
Clock Speed 2.6 GHz 2.6 GHz
RAM 48 GB 192 GB
NUMA Nodes 2 2
GPU 1x Nvidia Tesla K20m 2x Nvidia Tesla C2075
Type Linux Kernel Linux Distro
Native Host 2.6.32-279 CentOS 6.4
Xen Dom0 4.2.22 3.4.53-8 CentOS 6.4
DomU Guest VM 2.6.32-279 CentOS 6.4
10
SHOC Benchmark Suite
  • Developed by Future Technologies Group
  • _at_ Oak Ridge National Laboratory
  • Provides 70 benchmarks
  • Synthetic micro benchmarks
  • 3rd party applications
  • OpenCL and CUDA implementations
  • Represents well-rounded view for GPU performance

11
(No Transcript)
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
Initial Thoughts
  • Raw GPU computational abilities impacted less
    than 1 in VMs compared to base system
  • Excellent sign for supporting GPUs in the Cloud
  • However, overhead occurs during large transfers
    between CPU GPU
  • Much higher overhead for Westmere/Fermi test
    architecture
  • Around 15 overhead in worst-case benchmark
  • Sandy-bridge/Kepler overhead lower

16
(No Transcript)
17
(No Transcript)
18
Discussion
  • GPU Passthrough possible in Xen!
  • Results show high performance GPU computation a
    reality with Xen
  • Overhead is minimal for GPU computation
  • Sandy-Bridge/Kepler has lt 1.2 overall overhead
  • Westmere/Fermi has lt 1 computational overhead,
    7-25 PCIE overhead
  • PCIE overhead not likely due to VT-d mechanisms
  • NUMA configuration in Westmere CPU architecture
  • GPU PCI Passthrough performs better than other
    front-end remote API solutions

19
Future Work
  • Support PCI Passthrough in Cloud IaaS Framework
    OpenStack Nova
  • Work for both GPUs and other PCI devices
  • Show performance better than EC2
  • Resolve NUMA issues with Westmere architecture
    and Fermi GPUs
  • Evaluate other hypervisor GPU possibilities
  • Support large scale distributed CPUGPU
    computation in the Cloud

20
Conclusion
  • GPUs are here to stay in scientific computing
  • Many Petascale systems use GPUs
  • Expected GPU Exascale machine (2020-ish)
  • Providing HPC in the Cloud is key to the
    viability of scientific cloud computing.
  • OpenStack provides an ideal architecture to
    enable HPC in clouds.

21
Thanks!
  • Acknowledgements
  • About Me
  • NSF FutureGrid project
  • GPU cluster hardware
  • FutureGrid team _at_ IU
  • USC/ISI APEX research group
  • Persistent Systems Graduate Fellowship
  • Xen open source community
  • Andrew J. Younge
  • Ph.D Candidate
  • Indiana University
  • Bloomington, IN USA
  • Email ajyounge_at_indiana.edu
  • Website http//ajyounge.com
  • http//portal.futuregrid.org

22
Extra Slides
23
FutureGrid a Distributed Testbed
24
(No Transcript)
25
OpenStack GPU Cloud Prototype
26
1.25
27
.64
3.62
28
Overhead in Bandwidth
About PowerShow.com