Title: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination
1Afrigraph 2003 Course onAdvanced Interactive
Ray TracingandInteractive Global Illumination
- Ingo Wald Carsten Benthin Philipp Slusallek
- Saarland University
2First What is Ray Tracing ?
Ray-Generation
Ray-Traversal
Intersection
Shading
Framebuffer
3Agenda
- Introduction Motivation
- Why Interactive Ray Tracing at all ?
- Part I Interactive Ray Tracing Architectures
- Software Ray Tracing
- Ray Tracing on Programmable GPUs
- Dedicated Ray Tracing Hardware
- Part II Advanced Ray Tracing Issues
- Handling Dynamic Scenes
- The OpenRT Interactive Ray Tracing API
- Part III New Applications
- Industrial Application Interactive Visualization
of Car Headlights - Interactive Global Illumination
- Summary and Conclusions
4Why Interactive Ray Tracing ?
5We have NVidia so what do we need Ray Tracing
for ?
- Because it is high quality
- Fully Programmable and Arbitrary Shading
Operations - All operations performed in floating point
- Flexibility Can shoot arbitrary Rays
- Shadows, reflections, refractions,
- Even suitable for global illumination
- Simple Programming Model
- No need for multiple passes or OpenGL tricks
- For indirect effect (like shadows) just shoot a
ray ! - Automatic correctness
- No need for approximations (like reflection maps)
- ? Ray Tracing is much more flexible and powerful
rendering algorithm than classical triangle
rasterization
6We have NVidia so what do we need Ray Tracing
for ?
- But not only that Its also efficient !
- Logarithmic scene complexity
- Useful for increasingly complex scenes (1 mtri,
no problem ! ) - No multiple rendering passes
- Automatic Visibility Culling Occlusion
Culling - Hidden geometry not even touched
- Depth complexity not an issue
- No overdraw, shading performed exactly once per
ray - Very useful for increasingly costly shading
- Small bandwidth requirements (if you do it
right) - Memory access coherence culling single
shading
7We have NVidia so what do we need Ray Tracing
for ?
- To summarize
- its highly flexible
- its high-quality
- its efficient
- And All of that combines automatically
- Can do some of that sometimes in HW, but usually
not all together
8If its so good, then why isnt it real ?
- 1.) Better asymptotic complexity, but huge
constants - 1 ray 1000 CPU-cycles
- Runs on hardware that it doesnt really fit to
- Uses only tiny fraction of todays CPUs, no
parallelism, - Need many rays/sec for full interactivity
- 1Mpix/frame 4-fold anitaliasing 25
frames/sec 10 rays/pixel ? One billion rays
per second - 2.) Graphics users dont have the choice
- Rasterization has highly sophisticated HW
implementations - ? HW technology for rasterization 10 years ahead
of RT HW - There is no interactive ray tracing chip (yet),
no matter the cost - All applications are designed for OpenGL
- ? There is no market for interactive ray tracing
(really ?) - Still more money/time/effort spent on improving
rasterization
9Why is there no Ray Tracing Hardware ?
- Because Graphics hardware evolved 20 years ago !
- And Rasterization was the better choice back
then - Small scenes
- ? (asymptotic) complexity doesnt matter for
small N - Large triangles
- Coherence incremental ops interpolation, low
bandwidth - Simple (integer-)operations, highly pipelined
- FPU-requirements of ray tracing unthinkable 10
years ago - No fragment ops except interpolation
- Programmability not an issue
- ? Very deep pipelines no dependencies, no
branches, no nothing, - Can be built in HW very efficient, very fast,
very cheap - Note All of this is changing today !
- Eg today, GForce 3 already has more FPU power
than any CPU
10Todays State of the Art in Realtime Ray Tracing
- Software Implementations are slowly becoming
available - Michael Muuss, Army Research Labs
- Huge Cluster of SGI machines
- Parker et al, University of Utah
- 32-128 CPU SGI Origin
- Saarland University
- 4 dual PIIIs in 2000, up to 24 dual Athlon 1800
today - Hardware Architectures are already beeing
designed - SaarCOR (Schmittler et al., HWWS 2002)
- Ray Tracing on Programmable GPUs (Purcell,
SigGraph 2002) - Hybrid Software/GPU system (Hart, HWWS 2002)
- Several alternatives for future realtime ray
tracing - Cant yet decide which is best, only know Itll
come
11Todays State of the Art in Realtime Ray Tracing
- Even today, IRT solves tasks that even high-end
graphics hardware still cannot handle ! - Highly complex models (Muuss, Utah, Saarland
RW2001) - High-quality Isosurface and Volume Visualization
(Utah) - Shadows, reflections, arbitrary shading
Saarland, Utah - High-quality reflection simulation of car
headlights PGV2002 - Interactive Global Illumination RW2002
12Todays State of the Art- Some Snapshots
13Video
14Part IDifferent Approaches toRealtime Ray
Tracing
15Different Approaches to Realtime Ray Tracing
- Basically three choices
- Pure Software Implementations
- Today Highly parallel
- Shared Memory (Utah), or PC Clusters (Saarland)
- Future Single PC ?
- Moores Law also holds for CPUs !
- Perhaps with streaming co-processors (e.g.
SSE) - Mixed SW/HW RT on Programmable GPUs
- Purcell et al., Standford
- Converges to the coprocessor approach
- Pure HW
- Dedicated RT hardware (Schmittler et al.,
SaarCOR) - Summarize all three approaches
16Alternative ISoftware Ray Tracing(examplary on
the Saarland engine)
17The OpenRT Interactive Ray Tracing Engine
- Features of OpenRT
- Highly efficient implementation of RT kernels
- On a single Athlon MP 1800 CPU 500.000-1.5
million rays per second for average models
(100ktri 1 Mtri) - Up to 10 million rps (rays/sec) range (no
shading, simple scenes) - Sophisticated parallelization on cluster of PCs
- Dynamic load-balancing
- Using up to 24 dual-Athlon MP 1800 or 25 dual P4
Xeon 2.4GHz - Dynamically loadable, fully programmable Shaders
- Arbitrary c-code shading, arbitrary rays
- Renderman-like Shading Language
- Can handle dynamic scenes (later)
- OpenGL-like API (later)
18Where does the speed come from ?
- Speed depends on several factors
- Using fastest available hardware
- Fast CPUs, and many CPUs
- Good algorithms Avoid operations in the first
place - Fast Intersection and Traversal (kd-trees)
- Minimize Intersections and Trv-steps with
high-quality BSPs - Just as important Make sure youre using your
silicon correctly ! - Highly efficient implementation
- Machine-dependent code, if necessary (SSE)
19Where does the speed come from ?
- Keep the Computational Units busy !
- Make CPU doesnt stall
- Avoiding pipeline stalls has top priority
- Look at memory, caches and bandwidth !!!
- Example Cache miss during triangle intersection
costs about 4 times as much as the computations
themselves !!! - Packing, aligning, cache-friendly data layout,
prefetching, - But no details here
- Already covered that at Afrigraph 2001
- Its not one single method, its more a principle
20Distributed Ray Tracing
- One CPU still not fast enough
- 1 Mray/sec is fast, but not enough
- Need more CPUs ? Clusters are cheap (20k-50k)
- Many approaches
- Static vs dynamic load balancing
- Object-space vs image-space vs ray-based task
partitioning, - Pixel-interleaved (load balancing) vs tiles
(coherence) -
- Problem Interactivity constraint
- Have to finish whole frame in 1/10th of a second
- Few time for sophisticated reordering/scheduling
21Distributed Ray Tracing
- Our approach (mostly Carsten Benthin)
- Image-based task partitioning
- ? Break image up into tiles (usually 16x16 or
32x32) - Since API Can dynamically change task
partitioning scheme - Strongly varying workload
- ? Need dynamic load balancing Let clients ask
for work - Have to care about network-latencies
- (10ms Network-latency 10.000 rays !)
- Highly efficient networking/communication code
- Double-buffering, prefetching, packing,
streaming, asynchronous sending and rendering,
interleaving of different tasks, multithreading,
22Distributed Ray TracingResults
- Can efficiently use many CPUs
- 32x32 tiles at 640x480 150 tiles ? enough for
many CPUs - Usually limiting factor Pixels/second (not
rays/sec) - Bandwidth limited at server 640x480 at 10-15
frames/sec - For lt 10 fps Usually achieve 90-99 client
utilization - Client bandwidth usually not an issue (100Mbit)
- Rendering Complexity helps !
- More costly tiles better compute/BW ratio, less
Pixels/sec - Can use more CPUs without hitting bandwidth limit
- Doubling rays/pixel easier than doubling
framerate - Framerate scales linearly only up to max
framerate - But always scales linearly in rays/pixel
- Better networking hardware would definitely help
23Realtime Ray TracingApproach IIRay Tracing on
Programmable GPUs
24Ray Tracing on Programmable GPUs
- Graphics Hardware today
- GPUs are extremely powerful
- Already more transistors than P4
- Full IEEE floating point !
- Many, many, many parallel FPUs
- Moores Law Faster growth than for CPUs
- GPUs become more and more programmable
- First Register Combiners
- Then Vertex Shaders
- Programmable per vertex
- linear interpolation inside the vertices
- Today Pixel Shaders, Fragment Programs
- Fully programmable for each fragment
25Ray Tracing on Programmable GPUs
- GPU programmability today
- Full IEEE
- SIMD computations
- Access to memory (textures) in every
instruction - Multiple indirections (pointer chasing) now
possible - dependent texture reads
- Still Several restrictions
- Conditionals, loops, recursion, dependent texture
writes - Typically programmed in GPU-assembler
- Most recent High-level meta languages
- E.g. CG (C for GPUs)
26Streaming Computations on Programmable GPUs
- Idea Use GPU as streaming co-processor
- Dont use it for rasterizing at all
- Pixels form a stream of elements
- Apply small program (kernel) for whole stream
- Render screen-aligned quad with a fragment shader
- Fragment program executed for each screen pixel
- Each pixel operates on different data
- Read data from textures
- Screen-aligned textures 1 texel for each pixel
- Output to framebuffer 1 pixel for each
fragment program - Feedback Loop Copy framebuffer to textures
- Future Directly write into textures
27Ray Tracing on Programmable GPUs
Screen aligned Quad
Memory (Textures)
Fragment
Kernel (Fragment Shader)
Data (Texels)
Output
Frame Buffer
28Ray Tracing on Programmable GPUs
Screen aligned Quad
Memory (Textures)
Fragment
Kernel (Fragment Shader)
Data (Texels)
Output
Frame Buffer
29Ray Tracing on Programmable GPUs
Screen aligned Quad
Memory (Textures)
Fragment
Kernel (Fragment Shader)
Data (Texels)
Output
Frame Buffer
30Ray Tracing on Programmable GPUs
Screen aligned Quad
Memory (Textures)
Fragment
Kernel (Fragment Shader)
Data (Texels)
Output
Frame Buffer
31Ray Tracing on Programmable GPUs
Screen aligned Quad
Memory (Textures)
Fragment
Kernel (Fragment Shader)
Data (Texels)
Output
Frame Buffer
32Ray Tracing on Programmable GPUs
Screen aligned Quad
Memory (Textures)
Fragment
Kernel (Fragment Shader)
Data (Texels)
Output
Frame Buffer
33Ray Tracing on Programmable GPUs
Screen aligned Quad
Memory (Textures)
Fragment
Kernel (Fragment Shader)
Data (Texels)
Output
Frame Buffer
34Ray Tracing on Programmable GPUs
Screen aligned Quad
Memory (Textures)
Fragment
Kernel (Fragment Shader)
Data (Texels)
Output
Frame Buffer
35Ray Tracing on Programmable GPUs
Screen aligned Quad
Memory (Textures)
Fragment
Kernel (Fragment Shader)
Data (Texels)
Output
Frame Buffer
36Ray Tracing on Programmable GPUs
Screen aligned Quad
Memory (Textures)
Fragment
Kernel (Fragment Shader)
Data (Texels)
Output
Frame Buffer
Feedback !
37Ray Tracing on Programmable GPUs
- Mapping Ray Tracing to the GPU
- Use textures for the storing variables
- Ray origin and direction 2D textures (3
floats each) - Hit 2D texture (3 floats u,v,id)
- Vertices 1D-texture of vertex positions (3
floats each) - Triangles 1D-texture of vertex ids (1 float
each) - Acceleration structure e.g. 3D-texture for
simple grid - Multiple indirections no problem
- E.g. use trianglei as texture coordinate into
vertex texture - Up to 4 indirections (grid ? triangle list ?
triangle ? vertex)
38Ray Tracing on Programmable GPUs
- Write kernels for different ray tracing ops
- Ray Generation
- Get pixel position from texture coordinates
- Somehow get camera settings (e.g. from quad
color, or texture) - Compute corresponding ray
- Write to origin, direction, state textures
- Triangle Intersection
- Read triangle ID to be intersected from state
- Get triangle vertices from textures
- Intersect
- Update state texture
- Similar for traversal, triangle list
intersection, shading,
39Ray Tracing on Programmable GPUs
- Have kernels for ray generation, traversal,
intersection, etc. - Each ray is in exactly one state
- E.g. in intersection state
- Make sure only rays in correct state are
processed - E.g. apply intersection kernel only to rays in
intersect state - Usual GL masking methods, e.g. stencil bits,
early pixel kill etc. - ? Can generate overhead, but usually ok
- Fragment program can change state of ray
- E.g. change from traversal to intersection in
non-empty voxel - Combine different kernels by just calling them in
turn - E.g. rendering an intersection quad will do one
intersection step - (but only for rays in intersect state !)
- Secondary rays rel. easy for Shader kernel
- Update origindirection textures, go back to
traversal state
40Ray Tracing on Programmable GPUs
- Results
- Easy to exploit parallelism in the GPU
- Many more pixels than fragment pipelines
- Comparable performance to single CPU
- Even though its only a prototype implementation
- Limited by fragment pipeline very soon
- Main Limitation
- Fragment processing speed
- Texture memory
- Need many textures for each pixel
- Also need to store whole scene in texture
- Bandwidth
- Number of different states must be small !
41Ray Tracing on Programmable GPUs
- Additional limitations of current GPUs
- Bandwidth problems due to missing loops
- Often have to write data just to save it for next
iteration - Overhead due to missing write capability
- Accuracy problems no ints, all floats
- E.g. rounding modes when reading IDs from a
texture - Problems due to missing dependent writes
- Many textures for input, but only one framebuffer
for output - Need multiple passes computing more than 3 values
per pix. - Each fragment shader writes to exactly one
predetermined position - Hard to do recursive operations with that
limitation - Kd-tree construction ?
42Ray Tracing on Programmable GPUs
- Ray tracing on GPUs in the future ?
- Many limitations will (probably) change
- Loops, branches, dependent writes, int textures,
texture memory, early pixel kill - Performance will increase faster than for CPUs
- ? Might soon be faster, and similarly flexible,
as ray tracing on a CPU !
43Realtime Ray TracingApproach IIIDedicated Ray
Tracing Hardware
44Dedicated Ray Tracing Hardware
- Relatively low efficiency when using GPU for RT
- Many units not needed at all (rasterization,
z-buffer, clipping, lighting, ) - Lots of overhead
- Programmable units can never be as efficient as
dedicated HW - Dedicated ray tracing HW should be more efficient
- Building RT HW is feasible today
- FPU power not a problem any more
- (see GForce3 FPU performance)
- Die size/Nr of transistors not a problem any more
- Main problem Off-chip bandwidth !
- Already between chip and cache
45Dedicated Ray Tracing Hardware
- Bandwidth Same problem as in SW
- Approach in SW Bandwidth reduction by Coherent
Ray Tracing (packet traversal) - HW Much larger packets (64x64 vs 2x2 !)
- Much bigger bandwidth saving
- Target realtime full-screen resolutions
- Larger packet sizes not a problem ? Lots of
coherence - Avoiding overhead simple in HW
- Much simpler than with SSE
46SaarCOR Architecture
- Features
- Based on interactive software ray tracer
- Exactly same data structures,
- KD-trees as accelleration structure
- Pakets of rays to reduce bandwidth
- Fixed OpenGL-like shading
- plus shadow and reflection rays
- Goals
- Simple low bandwidth memory interface
- Half the floating point requirements of GeForce3
- Achieves frame rates comparable to todays
gfxcards
47SaarCOR Architecture
System overview
48SaarCOR Architecture
Features
- Scalable
- Fully pipelined
- Multi threading for latency hiding
- Simple communication pattern (no routing)
- Highly asynchronous
49SaarCOR Current Status
- Simulation on register-transfer level
- Core _at_ 533MHz, Memory 64 Bit _at_ 133 MHz (simple
SD-RAM, no DDR!) - Each pipeline uses 36 FP-units
- Standard SaarCOR
- 4 pipelines
- 16 threads per pipe
- 1 GB/s bandwidth to memory (!)
- 272 KB for caches (!)
- Four pipes ½ FP-resources of GeForce 3
50Issues
- On-chip memory of standard SaarCOR
- Caches 272 KB
- RF for rays 288 KB
- RF for stack 535 KB
- Register level simulations only
- Simple shading only
51BenchmarksScenes
- OpenGL-Like Shading
- No shadow rays
- No reflection rays
- Full screen resolution
- 1024 x 768 pixel
52Benchmarks Scenes (2)
53BenchmarksResults
Todays CPUs 0.5 0.8 mrays/s ? factor of
100-200!
54BenchmarksResults (3)
- Efficiency of standard SaarCOR
16 threads ? 32 threads 10
Performance scales with number of pipelines,
threads, cache size and bandwidth.
55What about shading?
- Right now Shading only coarsely approximated
- Fixed phong shader w/ bilinear texturing
- Programmable Shading currently evaluated
- Shading packets of rays exploits coherence
- BQD scene with bilinear textures
- 14 MB for shading data per frame
- 300 600 MB/s bandwidth
- Shading BW Ray Tracing BW
56Conclusions
- SaarCOR architecture
- Scales well in the numberof pipelines
- Highly efficient
- Uses half the FP power of GeForce3
- Requires very low bandwidth
- Provides full featured ray tracing
- Same frame rates as todays graphics cards
57Current Work
- Programmable shading
- API OpenRT Wald02
- Virtual Memory Management
- Incorporate Features and Algorithms from SW
system - Large Models Wald01
- Dynamic scenes Wald02
- Global Illumination Wald02
- Building a prototype
58Realtime Ray TracingApproaches I-IIISummary
and Conclusions
59Realtime Ray Tracing
- Summary
- Different upcoming (and competing !)
architectures. - All these have different advantages /
disadvantages - PC clusters most flexible, but not useful for
consumer market - GPUs better performance growth, cheap, but
awkward to use - HW best performance, best efficiency, but costly
- ? Cannot yet predict which one will win
-
-
-
60Realtime Ray Tracing
- Summary
- Different upcoming (and competing !)
architectures. - All these have different advantages /
disadvantages - PC clusters most flexible, but not useful for
consumer market - GPUs better performance growth, cheap, but
awkward to use - HW best performance, best efficiency, but costly
- ? Cannot yet predict which one will win
- But
- Question is not will realtime ray tracing ever
come ? - Questions rather is how and when will it come.
61End of Part I - Questions ?
62Part IIAdvanced Ray Tracing Issues
63Advanced Ray Tracing Issues
- Conclusions from Part I Realtime Ray Tracing
will come - Problem All these architectures mostly focus
only on the core ray tracing algorithms, i.e.
traversal intersection - Ubiquitous Realtime Ray Tracing opens new
problems - Dynamic Scenes ?
- Suitable API(s) ?
- Implications for future Applications / SceneGraph
libraries ?
64Interactive Ray Tracing
- So far
- Interactive RT possible even today, can already
beat SGI/NVidia - Complex models
- High-Quality Applications
- Can do high-quality, interactive walkthroughs
- But Walkthrough is not really interactive
- Not if scene remains static
65Issue I Dynamic Scenes
- Fact Ray Tracing needs acceleration structure
- Building it is very costly
- Precomputation only works for static scenes
- But Real scenes usually arent static
- ? What is interactive if I cannot interact
with it ? - Problem Few research on this topic
- Just wasnt interesting before interactive ray
tracing - Previous work Usually on special cases
- Utah Hack Keep dynamic objects out of accel
structure - Reinhard RW2001 Incremental updates of Uniform
Grid - Costly, not hierarchical
- Moeller, EG2001 Only rigid-body animation
66Handling Dynamic Scenes
- Different kinds of dynamic behavior
- Hierarchical, rigid-body motion vs unstructured
motion - Constrained unstructured motion (e.g. maximum
displacement) - All triangles animated vs few triangles animated
- Amortized over many rays/frames or over few rays
-
67Handling Dynamic Scenes
- Different kinds of dynamic behavior
- Hierarchical, rigid-body motion vs unstructured
motion - Constrained unstructured motion (e.g. maximum
displacement) - All triangles animated vs few triangles animated
- Amortized over many rays/frames or over few rays
-
- Inherently different problems need different
solutions - One single algorithm will hardly do the job
68Handling Dynamic Scenes
- Alternative approach
- Offer suite of different techniques
- Hierarchical animation of whole objects
- Fast Rebuild of objects for unstructured motion
(with sacrifices in traversal speed) - High-quality bsps for often-used static objects
(with relatively long rebuild time) - Let the application decide, which one is best for
what ! - If anybody knows whats best, its the
application programmer - Just like OpenGL Applications build display
lists, not the drivers ! - Allow combination of techniques
- E.g.some unstructured motion but otherwise
hierarchically animated - ? App needs good API to do that !
69Handling Dynamic Scenes
- Combining techniques in a hierarchical way
- Application groups geometry into objects
- Similar to building display lists (?API)
- Each object has separate BSP (just like
PowerPlant) - Hints can be given to control quality/speed
tradeoff - E.g. whether the object will be static or
unstructured - Objects can be instantiated
- Just like calling a display list (? API)
- Hierarchical animation Just re-instantiate with
new transform - Objects are kept in additional hierarchy level
- With separate, fast and high-quality BSP
- During traversal, just transform the rays when
they hit an object
70Handling Dynamic Scenes- Results
- Side Effect Instantiation is for free
- Terrain 1000 instances of 20ktri-tree 20 Mtri
(and dynamic !) - Sunflowers 36.000 x 24ktri-sunflowers 1 GigaTri
(dynamic !) - TopLevel BSP reconstruction tolerable
- Some milliseconds even for a few thousand objects
- But scalability bottleneck (redundant
computation on each client) - Hierarchical animation is cheap
- Transformations are cheap (compared with the
rest) - But Unstructured motion still costly
- Especially for big objects (? have to use
low(er)-quality BSPs) - High bandwidth requirements for sending data over
network !!! - Tolerable for moderately complex objects
(16k-64ktri) - In practice Total overhead usually 10-20
71Handling Dynamic Scenes- Conclusions
- Works for many different scenes (BART Benchmark
suite) - Robots Game-like scene, hierarchical animation
of 161 Objects - Kitchen Mostly static, with many secondary
effects - Museum Completely unstructured motion
- Correct (inter-)reflections, shadows, etc. also
on moving triangles ! - Also works for all applications we have built so
far - OpenRT based VRML97 viewer with VRML animations
- Inventor-port under way
- Dynamic scenes in Interactive Global Illumination
application
72Handling Dynamic Scenes- Results
73Handling Dynamic Scenes- Results
Video
74Handling Dynamic Scenes- Remaining Problems
- Lots of potential for future research !
- Faster kd-tree generation ?
- Kd-tree generation in HW ?
- On-demand generation of kd-trees ?
- More efficient solutions for special problems
- Skinning, morphing, progressive meshes,
75Issue II API Issues
- So far
- Fast, cheap, efficient,
- Flexible, powerful shading
- Can do big models and dynamic scenes,
- So why is nobody using it ?
76Issue II API Issues
- So far
- Fast, cheap, efficient,
- Flexible, powerful shading
- Can do big models and dynamic scenes,
- So why is nobody using it ?
Because without a proper API, you cant !
77Issue II API Issues
- Why do we need an API for Interactive Ray Tracing
? - Side Effect An API helps to divide-n-conquer
problems (e.g. shaders, globillum, raytracing
kernels, ) - E.g., can work separately on frontend and
backend - Can Abstract from dynamic scene issues in
globillum shader aso. - It helps to create a critical mass of users
- Rasterization only really took off after OpenGL
- Enables code portability
- Without an API, nobody will (or can) use it -
except insiders - Not everybody has his own realtime raytracer
- Not everybody wants to - or should - know all
implementation details - ? For widespread Realtime Ray Tracing, we do need
an API
78Issue II API Issues
- Problem There are no suitable APIs
- API has to support both interactive and ray
tracing - OpenGL interactive, but not suitable for ray
tracing - Renderman/Rayshade/Povray ray tracing capable,
but inherently offline - ? Need to find new API(s)
79Issue II API Issues
- Goals for an Interactive Ray Tracing API
- As easy to learn and use as (standard) OpenGL
- Leverage existing programmers experience with
OpenGL - As powerful in Shading as RenderMan
- Our Approach (OpenRT) Combine the best of both
- Application API much like OpenGL/GLUT
- With necessary modifications for Ray Tracing
(Shaders, Objects) - Shader API like RenderMan
80The OpenRT Interactive Ray Tracing API
- Application API very OpenGL-like
- Geometry rtVertex3f, rtNormal3f,
- Primitives rtBegin/End(RT_TRIANGLES, RT_QUAD, )
- Transformation rtPushMatrix(), rtMatrixMode(),
- Geometry Objects
- Just like Display Lists (except no side effects)
- rtNewObjects(), rtBeginObject(), rtEndObject(),
rtInstantiate(), - Shader Objects
- Surface, Light, and Pixel Shaders, exchangeable
Renderer Object - Even support GLUT-like functionality
- Porting GL/GLUT-applications relatively easy
- (except multi-pass, of course, )
81The OpenRT Interactive Ray Tracing API
- Shader Objects
- Similar to Stanford Programmable Shading API
- Dynamically loaded from DLLs/.sos
- rtShaderFile(), rtCreateShader(), rtBindShader()
- Light shaders rtCreateShader(), rtUseLight()
- Application-to-Shader communication via Shader
Parameters - rtDeclareParam(), rtParameterHandle(),
rtParameter3f(), - Parameters can be per vertex, per triangle, per
shader, - Retained-Mode / Frame Semantics
- Rendering uses Shader Parameters active at end
of frame - NOT at the time that shader/triangle was created
- Actual rendering triggered at rtSwapBuffers
- Rendering always done asynchronously
82The OpenRT Interactive Ray Tracing API
- Shader API Or how to write a shader
- Declare and Export Shader Parameters
- Store as member variables
- Write callback-functions
- Shade(), Illuminate(),
- Access Scene Data with RenderMan like API
- Geometry rtsShadingNormal(),
- Lights rtsIlluminate(), rtsOccluded(),
rtsLightTransparency(), - Shoot Arbitrary Secondary Rays
- rtsTrace()
- ? Porting RenderMan shaders relatively easy, too
83The OpenRT Interactive Ray Tracing API
- OpenRT Summary
- Fast and Interactive Rendering
- Dynamic Scenes
- Very Powerful Shading
- API for using it
- OpenRT is a complete 3D Rendering Engine
- Kernel behind OpenRT Saarland RTRT
- Might be changed to e.g. SaarCOR as soon as
available
84OpenRT Example 1VRML97 _at_OpenRT
- Example 1 VRML97 Viewer ported from OpenGL
- Porting relatively easy, almost all functionality
was there - Only Modification Have to gather small objects
into fewer bigger objects for performance
reasons - Results
- Can render all of VRML97
- Almost no matter how big
- Can put any kind of shader on any triangle (e.g.
GlobIllum) - Can do VRML animations, move objects, edit
shaders lights
Car Headlight, 800.000 tri
Soda hall Floor 400.000 tris
85OpenRT Example 2The BART Benchmark
- Example 2 The BART Benchmark scenes
- To our knowledge, only system so far to render
those at all - All different kind of dynamic behavior, including
reflections, refractions, shadows, - With GL Shader gt 10 frames per second
- With Raytracing Shader 2-5 frames per second
86OpenRT Example 3Complex Outdoor Scene
- Example 3 Massive Instantiation for Outdoor
Scenes - Pixel-accurate shadows !
87OpenRT Example 3Complex Outdoor Scene
88OpenRT Example 4Massive Model Visualization
- Example 4 The PowerPlant
- 12.5-37.5 million triangles
- Currently With replication, without
demand-loading/reordering - Just recently Can now also move the furnace -)
89OpenRT Example 5Complex Shading Stress Test
90OpenRT Example 5Complex Shading Stress Test
- Example 5 Shading Stress Test
- Volume Shader (CT Head)
- Applied to a box of geometry
- Lightfield Shader on simple quad
- Procedural Wood and Marble
- Procedural Bump-Mapping on mirror
- ? Procedurally bump-mapped reflections
- Result Everything combines perfectly
- Transparent Shadow from Volume on Procedural Wood
Shader - Lightfield reflected in procedurally bump-mapped
mirror - attenuated by semi-transparent volume
- Multiple interreflections
- Of course, everything is interactive and fully
dynamic
91OpenRT Example 5Complex Shading Stress Test
92OpenRT Example 6Interactive Global Illumination
Implementation Not now
93OpenRT Example 6Interactive Global Illumination
- Fully implemented in OpenRT
- GlobIllum Application is Shader like any other
- Automatically inherit capability for handline
dynamic scenes, distribution, - Same frontend as e.g. BART/Office
- Automatically inherit parser, user interface,
etc - Can be used from different applications (e.g.
VRML viewer) - Algorithms Implementation Later (Part III)
94Questions ?
- For more info, also visit
- http//www.OpenRT.de
95Part IIINew Applications enabled by Realtime
Ray Tracing
96For more information on OpenRT,
seehttp//www.OpenRT.de
97The Saarland Interactive Ray Tracing Project
- Started Jan 1st, 2000
- (Original) Goal
- Evaluate practicability of RT as an Interactive
Rendering Engine - Do a fair comparison and analysis of RT vs GL
- What are the advantages and disadvantages ?
- Compare on common ground OpenGL
likeshadowsreflections - No global illumination, no shading, no advanced
features, - And Find out why is it so slow
- Therefore, needed to build Fast Ray Tracer
98The Saarland Interactive Ray Tracing Project
- Goals have constantly changed since then
- It worked, so continue working on it
- One CPU not fast enough, so distribute
- Great for many triangles, so work on really large
models - People demand high quality, build full-featured
ray tracer - If its good in Software, why not build it in
hardware - Static scenes too limiting, make it dynamic
- Others want to use it, so build an API
- And, if we have it anyway, why not do global
illumination -
99Ray Tracing on Programmable GPUs
- Application program relatively easy
- Just render many screen-aligned quads with
different fragment shaders - Need some way of load balancing
- Want to not execute shade kernel if no rays is
in shade state - Important Approach is not SIMD
- 1 Quad (1 fragment program) for whole screen,
but - Different rays can be in different states
- Different pixels in fact behave differently
- No problem to already shade pixel 2 while still
intersecting pixel 1
100(No Transcript)