Computed Tomography Reconstruction - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Computed Tomography Reconstruction

Description:

Background. FDK Algorithm on planar detector array. Double precision not used. 512x512x370 volume ... Lack of 3D textures forced tiling in host ... – PowerPoint PPT presentation

Number of Views:91
Avg rating:3.0/5.0
Slides: 26
Provided by: coursesEc
Category:

less

Transcript and Presenter's Notes

Title: Computed Tomography Reconstruction


1
Computed Tomography Reconstruction
2
Background
  • FDK Algorithm on planar detector array
  • Double precision not used
  • 512x512x370 volume
  • 750 projection slices 366x304

3
Key Optimizations
  • Sine/cosine in constant memory
  • -use_fast_math
  • Computation Reduction
  • Block size
  • Texture memory
  • Storage
  • Interpolation

4
-use_fast_math
  • Reduction in float division time
  • __fdividef(x,y)
  • Problematic if
  • 2126 lt y lt 2128
  • x 8
  • Output is identical
  • Uses more registers

5
(No Transcript)
6
Computation Reduction
  • Value of a does not vary in z
  • Compute multiple output voxels in each thread to
    reuse a
  • Tradeoff Additional register usage for fewer
    overall computation

7
Computation Reduction cont.
for x for y for p calculate a
for z calculate b
sum(z) proj(a,b) end end
for z voxel(x,y,z) sum(z)
end end end
  • for x
  • for y
  • for z
  • for p
  • calculate a
  • calculate b
  • sum proj(a,b)
  • end
  • voxel(x,y,z) sum
  • end
  • end
  • end

8
Slices per Thread
9
Block size
10
Texture Memory
  • Filtered projections stored in 2D texture memory
  • Lack of 3D textures forced tiling in host
  • Fetching from texture memory is faster than
    global memory due to caching
  • Takes advantage of 2D locality because threads in
    a block access projection data in close proximity

11
Texture Memory
  • Interpolation
  • Linear filtering
  • Fewer registers
  • Less computation
  • Fewer memory accesses

12
Interpolation
13
Verification
  • Visual verification
  • Difference images
  • Intensity profiles
  • RMS error (when appropriate)

14
Visual Verification
  • Texture filtering
  • Difference Images
  • Manual filtering

15
(No Transcript)
16
(No Transcript)
17
Performance
18
OP Rate
  • Theoretical OP rate
  • Calculated OP rate

19
Extension Micro CT
  • 1000x1000x466 volume (1.77GB)
  • 321 projection slices 1000x520 (636MB)
  • Input Output does not fit into device memory
  • Scheme Place all projection data into texture
    memory and compute 16 slices at a time

20
(No Transcript)
21
Improvements and Future Work
  • 3D Texture Memory
  • Characterize bottleneck
  • Need better documentation
  • maxrregcount
  • Texture cache
  • Big data
  • Find balance between amount of projection data in
    memory vs slices computed per kernel

22
Any Thoughts?
23
Questions?
24
Appendix - Gold Code (no SSE)
25
Fast Math
Write a Comment
User Comments (0)
About PowerShow.com