GPU????lattice?? - PowerPoint PPT Presentation

About This Presentation
Title:

GPU????lattice??

Description:

* * * * 1. Graphic Processing Unit (GPU) 2. Nvidia CUDA programming model 3. – PowerPoint PPT presentation

Number of Views:91
Avg rating:3.0/5.0
Slides: 15
Provided by: yusu8
Category:
Tags: gpu | hiroshima | lattice

less

Transcript and Presenter's Notes

Title: GPU????lattice??


1
GPU????lattice??
  • ???? ????? ?? ??
  • ?? ??

2
???
  • 1. Graphic Processing Unit (GPU)???
  • 2. Nvidia CUDA programming model
  • 3. GPU????
  • 4. QCD with CUDA
  • 5. ??
  • 6. ???

3
Graphics Processing Unit
  • ?????????PC???
  • ??????
  • ????????
  • 100200??processer???
  • ???????

GPU????? Graphic Card
  • ?? 510??
  • ?? ??GFLOPS (???)

??????
O(a)????Wilson-Dirac quark? solver?CUDA???????? GP
U????????????????
????
Gyözö I. Egri, hep-lat/0611022 Lattice QCD as a
video game
? ???
????????
4
Mixed Precision Solver
???solver??????? ???????? ?????? ?????(Wilson-Di
rac) Dx b ??????? ???? Dxb
?????? ?????????????????? GPU???????????
(300-900GFlops) ??????????GPU?????????????????!

5
CUDA Programming Model
L???????????? (LNM) c a b // host
???????? GPU code _global_ void
vadd_kernel(float a, float b, float c)
int idx threadIdx.xblockIdx.xblockDim.x
cidx aidx bidx // host ? code
void main() // GPU???????
cudaMalloc((void)a,.) .. // c ab
?????GPU???? // thread?/blockN, block?M ?????
vadd_kernelltltltM,Ngtgtgt(a,b,c) ??????????
????????


thread 1
thread 2
thread 3
thread 4
?
thread N
block 1
block 2
block 3
block 4
?
block M
grid
block
  • thread ???????
  • (max 512/block)
  • thread block
  • ???multiprocessor??
  • ?????thread????
  • (max 65535)
  • grid thread block????
  • ?????????????

6
???????
Nvidia CUDA Programming Guide ??
  • ????????
  • ? 1thread ? 1 ??????
  • ???????????????
  • ? GPU?????????????????

7
Memories on GPU
  • Shared Memory
  • global Memory
  • ??????????
  • (4 clock cycles)
  • read-write ????
  • ??block??thread????
  • 16KB/block
  • device memory ?????
  • ??????????
  • (400600 clock cycles)
  • read-write ????
  • ?thread????

Shared Memory ?????
8
CUDA with QCD, programming strategy
  • 1?????????????????
  • fermion 8?(1?)
  • 3424Byte96Byte
  • gauge link 2?
  • 3(3-1)24Byte448Byte4
  • SU(3) reconstruction method.
  • clover? 1?
  • 21224Byte336Byte

CUDA ????? 432 ???????? ?????128
????
???????1584 Byte ???1896 Flop
Byte/Flop 0.83 G80???? 80GB/s ???? 100
GFlops!!
fermion ? shared memory ???? 444296Byte12.3K
B, (max 16KB/block)
gauge link ? clover ? device memory ?????
9
???????solver???
  • ?????
  • solver

GPUNVIDIA GeForce 8800 GTX CPUIntel Core 2
_at_2.66GHz
354.6GFLOPS 21.3GFLOPS
  • O(a)???Wilson-Dirac quark solver
  • Bi-CGStab ?
  • ?????
  • ??????GPU???
  • even-odd preconditioning

10
Results Calculation Time
  • ?????16332
  • quench
  • 0.15fm
  • quark??MeV
  • 23?52?81

23MeV
???
???
GPU
52MeV
81MeV
???solver? ????
GPU?????? ???1/7?
10-15
10-12
10-15
10-6
10-12
10-15
10-6
11
Performance (Volume)
  • quark ?? 23MeV
  • ?????
  • 438
  • 8316
  • 16332

GPU
CPU
?????
???? 17GFLOPS
???????????? ? coalesced access
12
Coalesced Access
4,8,or 16Byte
??? 0
??? 0
??? 0
??? 1
??? 1
??? 1
??? 2
??? 2
??? 2
?
??? 0
??? 1
??? 2
?
??? 0
??? 1
??? 2
?
??? 0
?
thread 0
thread 1
thread 2
?
thread 0
thread 1
thread 2
?
13
?????
Nvidia GeForce GTX 280 Core 2 Duo 3.0GHz (6MB)
  • non coalesced access
  • on shared memory

20GFLOPS
????solver
  • coalesced access
  • on texture cache

4050GFLOPS
hopping ? 89GFLOPS clover ? 100GFLOPS
???solver
GPU solver
22
10?
220?
14
???
  • GPU???????????????? ? ??QCD??
  • GPU??????????
  • ??????????GPU solver ?????? ? ??????
  • ????solver?O(a)??????clover?????????
  • GeForce 8800 GTX ????
  • solver?????????17GFOLPS?
  • ????? Core 2 Duo 2.66GHz CPU?1/7?
  • GeForce GTX 280 ????
  • coalesced access ???4050GFLOPS?
  • Core 2 Duo 3.0GHz ?22??
  • ???????coalesced access ????
  • ???GPU??????
Write a Comment
User Comments (0)
About PowerShow.com