A Coherent Grid Traversal Algorithm for Volume Rendering - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

A Coherent Grid Traversal Algorithm for Volume Rendering

Description:

Exceptional FLOPS / cost ratio and more powerful than the Itanium! ... if the bundles are large enough for A and D to fit in the cache/local store. ... – PowerPoint PPT presentation

Number of Views:36

Avg rating:3.0/5.0

Slides: 25

Provided by: ioannis

Category:

more less

Transcript and Presenter's Notes

Title: A Coherent Grid Traversal Algorithm for Volume Rendering

1
A Coherent Grid Traversal Algorithm for Volume
Rendering
UCL Department of Computer Science

Ioannis Makris
Supervisors Philipp Slusallek, Céline Loscos
Computer Graphics Lab, Universität des
Saarlandes

2
Overview
UCL Department of Computer Science

Introduction
Previous work in software Direct Volume Rendering
Introduction to the Cell Broadband Engine
The Coherent Grid Traversal Algorithm
Parallelisation Schemes

3
Introduction to Direct Volume Rendering

Technique of displaying a 2D projection of a 3D
sampled dataset (volume), by accumulating samples
across lines of sight with some transfer
function.
Several types of sampled data. We will only deal
with rectilinear grids.

4
Direct Volume Rendering
UCL Department of Computer Science

Ray Casting (Levoy 1988, 1990)
Image order algorithm
Splatting (Westover 1990)
Object order
Shear Warp (Lacroute 1994, 1996)
Hybrid order

5
Ray Casting
UCL Department of Computer Science

Cast a ray from the viewpoint to the volume for
all pixels
Obtain samples from the volume in equal
intervals, by trilinearly interpolating
neighbouring voxels. Accumulate with some
operator to get final colour.
Several acceleration techniques have been
suggested (early ray termination (Levoy 1990),
adaptive sampling, octrees (Ogata et al. 1998),
kd-trees(Wald et al 2005)

6
Shear-Warp
UCL Department of Computer Science

Considered the fastest known Direct Volume
Rendering algorithm.
Steps
Transform volume to sheared object space
Project sheared slices on an intermediate image
Transform the intermediate image to image space
Requires 3 copies of the data, for every
principal axis, but RLE compression can help.

7
Characteristics of modern x86 processors

Deep instruction pipeline.
Very sophisticated hardware branch prediction
2 levels of cache, supports software prefetching
Rich SIMD instruction set

8
The CELL processor
UCL Department of Computer Science

Developed jointly by IBM, Sony and Toshiba
Combines a PowerPC general purpose processor with
8 separate SIMD execution units (SPUs).
Exceptional FLOPS / cost ratio and more powerful
than the Itanium!
Needs fast memory, which is relatively expensive

9
Notable Characteristics of the SPUs
UCL Department of Computer Science

Software managed local store (i.e. no caches)
No branch prediction, expensive branch misses
SIMD loads/stores ONLY
Favors streaming code

10
Motivation for a new algorithm
UCL Department of Computer Science

Ray Casting algorithms are typically not cache
friendly. Performance depends on viewing axis.
Acceleration structures may produce non-streaming
code and several overheads.
Shear Warp may require too much memory for
certain data.

11
A Coherent Grid Traversal Algorithm for Volume
Rendering (1)
UCL Department of Computer Science

Original idea from Ray Tracing Animated Scenes
using Coherent Grid Traversal (Wald et al,
SIGGRAPH 2006).
Bundles (frustums) of coherent rays are traced in
grid space, by incrementaly computing the overlap
with grid slices. The overlap of the frustum is
computed with a SIMD addition and a SIMD
truncation only

12
A Coherent Grid Traversal Algorithm for Volume
Rendering (2)
UCL Department of Computer Science

The volume rendering version of the algorithm
uses a bricked volume (Sakas et al 1994),
bricks replace the grid elements.
Bricks are referenced by 3 maps, one for each
principal axis.
Compression is achieved by not storing empty
bricks.

13
A Coherent Grid Traversal Algorithm for Volume
Rendering (3)
14
A Coherent Grid Traversal Algorithm for Volume
Rendering (4)
UCL Department of Computer Science

Traversal is performed on the principal axis,
using the corresponding map.
Indices are computed incrementally.
If all the overlapping bricks of a slice are
empty, the slice is skipped.
If some bricks are empty, they are associated
with a locally stored empty brick and processed
redundantly (but not fetched).

15
A Coherent Grid Traversal Algorithm for Volume
Rendering (examples)
UCL Department of Computer Science
16
Bundle Parallelisation
UCL Department of Computer Science

Bundle Parallelisation is trivial. On a x86 C
OpenMP implementation, it only required 1 line of
code.
It is possible to have some blocks fetched
multiple times from neighbouring bundles.

17
Slice Parallelisation
UCL Department of Computer Science

A slice parallelisation is less likely to exhibit
this problem, but traversal of brick slices is
not incremental!
So, how would the processing element know which
bundles to process for a given slice?

18
Slice Parallelisation
UCL Department of Computer Science

Most bundles will start on k0, or end on kkmax
(or both).
During tracing, we create 2 vectors of references
to bundles, we shall call them A and D, along
with 2 index tables for the corresponding slices
we shall call P and Q .
The bundles that run through a given slice s can
be expressed as
Only 2 memory reads are required for that, or no
memory reads if the bundles are large enough for
A and D to fit in the cache/local store.

19
Slice Parallelisation
UCL Department of Computer Science

Remaining bundles can take up to 33 (they are
about 14 average).
We use two more lists, we shall call S and E with
index tables M and N. S holds references to the
remaining bundles sorted by the first slice they
intersect, and E sorted by the last.
Remaining bundles that run through s are
We need to run through both these lists to find
that out, but this does not hit performance.

20
A notable problem of the CGT algorithm as
described in Wald 2006

When the roll angle of the bundles to the
respective angle of the volume is close to p/4,
the number of blocks fetches can be double than
the number required.
There is a good solution to that (not yet
published).

21
Results
UCL Department of Computer Science

First results demonstrated an speed increase of
up to 2 orders of magnitude from ray-casting.
This may increase with further optimisations

22
Conclusion

We have developed a scalable algorithm for
coherent volume traversal with performance on-par
with the Shear Warp, with reduced memory
requirements.
We demonstrated parallel implementations.

23
Future Work