Title: Anjul Patney and John D. Owens
1(No Transcript)
2Real-Time Reyes-Style Adaptive Surface Subdivision
Anjul Patney and John D. Owens University of
California, Davis
3Introduction
- Real-time pipelines constrained by performance
- Quality often suffers
- Restricted flexibility
- We explore an alternative
- Reyes-Style Subdivision
- Task can be broken into fundamental algorithms
- Programmable subdivision in real-time
4Motivation
- Polygon-based rendering
- Geometric artifacts, view dependence
- Inconvenient for dynamic geometry
- Two approaches
- Improve the existing pipeline
- Explore non-standard pipelines
- Geometry processing is an important first step
Image courtesy www.ign.com
5Related Work
- Pixar's RenderMan
- Industry Standard in high-quality rendering
- Permits only offline operation
- Reyes on a Stream Processor Owens et al. 2002
- Conclusion Geometry processing is a bottleneck
- Reyes on a PVM cluster Lazzarino et al. 2002
- Not practical for commodity applications
6The Reyes Pipeline
- Input higher-order surfaces
- Generate micropolygons from input
- Shade micropolygons in parallel
- Perform stochastic sampling
- Compose using A-buffer
7Reyes Geometry Stages
- Recursively split a surface till parts are small
enough - Uniformly dice each part to form micropolygons
8Split
Bound/Cull
No Split
Diceable?
Yes Dice
Dice
9Split
Bound/Cull
No Split
Diceable?
Yes Dice
Dice
10Split
Bound/Cull
No Split
Diceable?
Yes Dice
Dice
11Dice
Bound/Cull
1 Grid
No Split
Diceable?
Yes Dice
Dice
Micropolygons
12Challenges
- Split
- Inherently recursive (hard to parallelize)
- Dynamic generation/destruction of primitives
- Dice
- High memory usage
13Split - Native
14Split - Parallel
Many independent primitives (5k patches for
teapot) Highly parallel
Similar computation for all elements SPMD-friendly
15Challenges
- Split
- Inherently recursive (hard to parallelize)
- Dynamic generation/destruction of primitives
- Dice
- High memory usage
16Split - Work-queue analogy
A
B
C
D
E
F
G
H
I
A
A
B
C
E
E
F
H
C
17Step 1 Simple Allocation
A
B
C
D
E
F
G
H
I
A
B
C
E
F
H
A
C
E
A child primitive is offset by the queue length
18Step 2 Efficient Compaction
A
B
C
E
F
H
A
C
E
Fast scan-based compaction (Sengupta 07)
A
B
C
E
F
H
A
C
E
Contiguous work-queue
19One Complete Iteration
A
B
C
D
E
F
G
H
I
A
B
C
E
F
H
A
C
E
A
B
C
E
F
H
A
C
E
20Challenges
- Split
- Inherently recursive (hard to parallelize)
- Dynamic generation/destruction of primitives
- Dice
- High memory usage
21Dice Screen-space Buckets
- Very few micropolygons get rejected early
- Render in buckets
- Reduces workload
- But restricts parallelism
- Empirical results
22Implementation
NVIDIA GeForce 8800 GTX
CUDA
Split
Dice
OpenGL (VBO)
Display
Input
Bicubic Bézier patches
23Implementation Details
- Split
- 16 threads / primitive (1 for each control point)
- Intra-primitive parallelism
- Less divergence
- Dice
- 256 threads / primitive (1 for each grid vertex)
- SIMD efficient
- Control points in shared memory
24Results - Performance (Teapot)
- 512x512 pixels
- 32 patches ? 4823 grids
- 11 levels of subdivision
CUDA 1.1 CUDA 2.0
Split 3.46 ms 2.69 ms
Dice 2.42 ms 1.27 ms
Render 12.4 fps 60.07 fps
25Teapot Demo
26Results - Performance (Killeroo)
- 512x512 pixels
- 11532 patches ? 14426 grids
- 5 levels of subdivision
CUDA 1.1 CUDA 2.0
Split 6.99 ms 6.30 ms
Dice 7.21 ms 3.46 ms
Render 4.06 fps 29.69 fps
Killeroo Model Courtesy Headus Inc.
27Killeroo Demo
28Results - Performance (Random scenes)
Total
Split
Dice
29Results - Screen-space buckets
30Limitations
- Cannot split and dice together
- Uniform dicing is wasteful
- Subdivision Cracks
31Conclusions
- Recursive Subdivision in real-time
- Breadth-first formulation
- Maps well to GPUs
- Fast programmable tessellation (dicing)
- 500M micropolygons/second
- First step towards a real-time Reyes pipeline
32Future Work
- Cracks
- Displacement mapping
- Subdivision surfaces
- Implement Reyes pipeline
- Shading
- Stochastic sampling
- A-buffer
33Acknowledgments
- Anonymous reviewers
- Per Christensen, Charles Loop, Dave Luebke, Matt
Pharr, Daniel Wexler, Shubho Sengupta - Financial Support
- US Department of Energy
- National Science Foundation
- SciDAC Institute for Ultrascale Visualization
- Equipment support from NVIDIA
34(No Transcript)
35Adaptive Tessellation
- Goal of adaptive tessellation
- Remove artifacts with minimum polygons
- Expected to be much faster
- Reyes approach
- Generate micropolygons for every primitive
- Necessary for correct shading
Image courtesy Eisenacher et al. (I3D 2009)
36Geometry Shader
- Inefficient for fine subdivision
- Large magnification of data
- Performance overhead for large data output
- Limited number of output values per primitive
(1024)
37Dedicated Tessellation Unit
- Closest to a dicer
- Uniform
- Fast due to fixed-function processing
- Inner-patch adaptivity is hard
- Unless input is pre-split