A Coherent Grid Traversal Algorithm for Volume Rendering - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

A Coherent Grid Traversal Algorithm for Volume Rendering

Description:

Exceptional FLOPS / cost ratio and more powerful than the Itanium! ... if the bundles are large enough for A and D to fit in the cache/local store. ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 25
Provided by: ioannis
Category:

less

Transcript and Presenter's Notes

Title: A Coherent Grid Traversal Algorithm for Volume Rendering


1
A Coherent Grid Traversal Algorithm for Volume
Rendering
UCL Department of Computer Science
  • Ioannis Makris
  • Supervisors Philipp Slusallek, Céline Loscos
  • Computer Graphics Lab, Universität des
    Saarlandes

2
Overview
UCL Department of Computer Science
  • Introduction
  • Previous work in software Direct Volume Rendering
  • Introduction to the Cell Broadband Engine
  • The Coherent Grid Traversal Algorithm
  • Parallelisation Schemes

3
Introduction to Direct Volume Rendering
  • Technique of displaying a 2D projection of a 3D
    sampled dataset (volume), by accumulating samples
    across lines of sight with some transfer
    function.
  • Several types of sampled data. We will only deal
    with rectilinear grids.

4
Direct Volume Rendering
UCL Department of Computer Science
  • Ray Casting (Levoy 1988, 1990)
  • Image order algorithm
  • Splatting (Westover 1990)
  • Object order
  • Shear Warp (Lacroute 1994, 1996)
  • Hybrid order

5
Ray Casting
UCL Department of Computer Science
  • Cast a ray from the viewpoint to the volume for
    all pixels
  • Obtain samples from the volume in equal
    intervals, by trilinearly interpolating
    neighbouring voxels. Accumulate with some
    operator to get final colour.
  • Several acceleration techniques have been
    suggested (early ray termination (Levoy 1990),
    adaptive sampling, octrees (Ogata et al. 1998),
    kd-trees(Wald et al 2005)

6
Shear-Warp
UCL Department of Computer Science
  • Considered the fastest known Direct Volume
    Rendering algorithm.
  • Steps
  • Transform volume to sheared object space
  • Project sheared slices on an intermediate image
  • Transform the intermediate image to image space
  • Requires 3 copies of the data, for every
    principal axis, but RLE compression can help.

7
Characteristics of modern x86 processors
  • Deep instruction pipeline.
  • Very sophisticated hardware branch prediction
  • 2 levels of cache, supports software prefetching
  • Rich SIMD instruction set

8
The CELL processor
UCL Department of Computer Science
  • Developed jointly by IBM, Sony and Toshiba
  • Combines a PowerPC general purpose processor with
    8 separate SIMD execution units (SPUs).
  • Exceptional FLOPS / cost ratio and more powerful
    than the Itanium!
  • Needs fast memory, which is relatively expensive

9
Notable Characteristics of the SPUs
UCL Department of Computer Science
  • Software managed local store (i.e. no caches)
  • No branch prediction, expensive branch misses
  • SIMD loads/stores ONLY
  • Favors streaming code

10
Motivation for a new algorithm
UCL Department of Computer Science
  • Ray Casting algorithms are typically not cache
    friendly. Performance depends on viewing axis.
  • Acceleration structures may produce non-streaming
    code and several overheads.
  • Shear Warp may require too much memory for
    certain data.

11
A Coherent Grid Traversal Algorithm for Volume
Rendering (1)
UCL Department of Computer Science
  • Original idea from Ray Tracing Animated Scenes
    using Coherent Grid Traversal (Wald et al,
    SIGGRAPH 2006).
  • Bundles (frustums) of coherent rays are traced in
    grid space, by incrementaly computing the overlap
    with grid slices. The overlap of the frustum is
    computed with a SIMD addition and a SIMD
    truncation only

12
A Coherent Grid Traversal Algorithm for Volume
Rendering (2)
UCL Department of Computer Science
  • The volume rendering version of the algorithm
    uses a bricked volume (Sakas et al 1994),
    bricks replace the grid elements.
  • Bricks are referenced by 3 maps, one for each
    principal axis.
  • Compression is achieved by not storing empty
    bricks.

13
A Coherent Grid Traversal Algorithm for Volume
Rendering (3)
14
A Coherent Grid Traversal Algorithm for Volume
Rendering (4)
UCL Department of Computer Science
  • Traversal is performed on the principal axis,
    using the corresponding map.
  • Indices are computed incrementally.
  • If all the overlapping bricks of a slice are
    empty, the slice is skipped.
  • If some bricks are empty, they are associated
    with a locally stored empty brick and processed
    redundantly (but not fetched).

15
A Coherent Grid Traversal Algorithm for Volume
Rendering (examples)
UCL Department of Computer Science
16
Bundle Parallelisation
UCL Department of Computer Science
  • Bundle Parallelisation is trivial. On a x86 C
    OpenMP implementation, it only required 1 line of
    code.
  • It is possible to have some blocks fetched
    multiple times from neighbouring bundles.

17
Slice Parallelisation
UCL Department of Computer Science
  • A slice parallelisation is less likely to exhibit
    this problem, but traversal of brick slices is
    not incremental!
  • So, how would the processing element know which
    bundles to process for a given slice?

18
Slice Parallelisation
UCL Department of Computer Science
  • Most bundles will start on k0, or end on kkmax
    (or both).
  • During tracing, we create 2 vectors of references
    to bundles, we shall call them A and D, along
    with 2 index tables for the corresponding slices
    we shall call P and Q .
  • The bundles that run through a given slice s can
    be expressed as
  • Only 2 memory reads are required for that, or no
    memory reads if the bundles are large enough for
    A and D to fit in the cache/local store.

19
Slice Parallelisation
UCL Department of Computer Science
  • Remaining bundles can take up to 33 (they are
    about 14 average).
  • We use two more lists, we shall call S and E with
    index tables M and N. S holds references to the
    remaining bundles sorted by the first slice they
    intersect, and E sorted by the last.
  • Remaining bundles that run through s are
  • We need to run through both these lists to find
    that out, but this does not hit performance.

20
A notable problem of the CGT algorithm as
described in Wald 2006
  • When the roll angle of the bundles to the
    respective angle of the volume is close to p/4,
    the number of blocks fetches can be double than
    the number required.
  • There is a good solution to that (not yet
    published).

21
Results
UCL Department of Computer Science
  • First results demonstrated an speed increase of
    up to 2 orders of magnitude from ray-casting.
  • This may increase with further optimisations

22
Conclusion
  • We have developed a scalable algorithm for
    coherent volume traversal with performance on-par
    with the Shear Warp, with reduced memory
    requirements.
  • We demonstrated parallel implementations.

23
Future Work
  • Investigate mixed parallelisation schemes
  • Optimise the computation performed per brick.

24
The End
UCL Department of Computer Science
  • Thank you for your attention
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com