Title: Memory Efficient Acceleration Structures and Techniques for CPUbased Volume Raycasting of Large Data
1Memory Efficient Acceleration Structures and
Techniques for CPU-based Volume Raycasting of
Large Data
- S. Grimm, S. Bruckner, A. Kanitsar and E. Gröller
- Institute of Computer Graphics and Algorithms
- Vienna University of Technology
- Vienna, Austria
2Motivation (1/3)
- Direct Volume Rendering
- Important tool in
- medical environments
- CT angiography
- run-offs (gt 1000 slices)
- are clinical practice
- Scanner resolutions are
- getting higher (1024x1024 per slice)
Efficient data memory layout essential!
3Motivation (2/3) - Goals
- Interactive rendering/handling of large datasets
up to 1 GB - Support of heterogeneous PC hardware environment
- Test hardware specifications
- Notebook
- Pentium M 1.6 GHz
- 1 GB RAM
- GeForce 4 GO (32 MB)
Smart combination and modification of known
methods!
4Motivation (3/3)
- Hierarchy of successively larger but slower
memory technology - Avoid frequent access to slower levels
- Exploit spatial and temporal locality
Memory hierarchy
Hard disk
Main memory
L2 cache
L1 cache
CPU
5Outline
- Memory Layout Data Processing Scheme
- Gradient Caching
- Empty Space Skipping
- Parallelization Features
- Results
6Linear Memory Layout (1/2)
2D Slice
15
12
13
14
8
9
10
11
4
5
6
7
3
0
1
2
Memory storage order
0
1
2
3
4
5
6
7
7Linear Memory Layout (2/2)
Volume
Store volume as a stack of 2D images (slices)
View dependent cache behavior!
Rays
8Bricked Memory Layout (1/2)
2D Slice
15
12
13
14
8
9
10
11
4
5
6
7
3
0
1
2
Memory storage order
0
1
4
5
2
3
6
7
9Bricked Memory Layout (2/2)
Volume
Store volume as a set of equally sized cubes
(bricks)
Constant cache behavior!
Rays
10Bricked-wise Processing
Volume
Processing of all resample locations is done
brick-wise
High Cache Coherence!
Rays
11Outline
- Memory Layout Data Processing Scheme
- Gradient Caching
- Empty Space Skipping
- Parallelization Features
- Results
12Gradient Caching (1/3)
- Pre-computed gradients
- ? High performance
- ? For sufficient quality, memory requirements are
at least doubled - Compute gradients on-the-fly
- ? Calculation expensive
- ? No additional storage requirement
13Gradient Caching (2/3)
Cell
- To accelerate calculation ? Caching of gradients
- Brick-wise traversal allows to use a brick-sized
gradient cache which can be re-used for each brick
14Gradient Caching (3/3)
Volume
Gradient cache
- One brick-sized gradient cache
- Constant very small memory requirement
Rays
15Outline
- Memory Layout Data Processing Scheme
- Gradient Caching
- Empty Space Skipping
- Parallelization Features
- Results
16Empty Space Skipping (brick-level)
- Min-Max info contained in brick used for
discarding empty regions - Template based brick projection to rasterize
depth values - In software, very fast for orthographic
projections
17Empty Space Skipping (octree-level)
- Each brick contains three-level octree
- Caching of classification information
- Stored in linearized octree using hierarchy
compression - Octree goes down to 4x4x4 voxels
- Template based projection
Min-Max and classification caching increase the
memory requirements by approx. 4
18Cell Invisibility Cache (1/2)
Example ray
Skipped by octree
Not skipped by octree
19Cell Invisibility Cache (2/2)
NO
Re-sampling Gradient- Estimation Compositing Shadi
ng
Classi- fication
Advance ray
Visible
YES
YES
CIC
Visible
NO
CIC increase the memory requirements by approx.
6
20Empty Space Skipping
- Project all non-transparent bricks onto image
plane to find first entry points of rays - For finer resolution, use a min-max octree per
brick and project the octree - Cell Invisibility Cache
All these acceleration techniques increase the
memory requirements by just 10
21Outline
- Memory Layout Data Processing Scheme
- Gradient Caching
- Empty Space Skipping
- Parallelization Features
- Results
22Parallelization / Hyper Threading
Law and Yagel 1996
Log. CPU 1
Advancing Ray-front
Log. CPU 2
Phy. CPU 1
Phy. CPU 2
Log. CPU 1
1D Screen
Log. CPU 2
23Features
View aligned and axis aligned cutting planes
High quality
Multiple segmented object and Transfer-functions
Transfer-functions on clipping planes
24Outline
- Memory Layout Data Processing Scheme
- Gradient Caching
- Empty Space Skipping
- Parallelization Features
- Results
25Results (1/3) - Bricking
Linear vs. bricked memory layout
4
Optimal brick size
Speedup factor
3
Speedup 2.8
2
cache thrashing bricking overhead
linearvolumelayout
1
8
64
512
4096
1
32768
Brick size in kilo-byte
Cache size 512 KB
26Results (2/3) Gradient Caching
Speedup 3.4
Speedup 2.7
Pentium M 1.6 GHz 1 GB RAM
27Results (3/3) - Performance
Pentium M 1.6 GHz 1 GB RAM
28Conclusions
- Sub second frame rates for large datasets on a
standard notebook - Fully interactive volume visualization of large
data on commodity hardware is within reach - Alternative memory layouts are the key to
handling large datasets
29Questions?
Visible Male(587 x 341 x 1878)
Intel Pentium M 1600 MHz(software capture)
30Thank you for your attention
Sponsored by
Institute of Computer Graphic and Algorithms
Tiani MedGraph AG