Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

About This Presentation

Title:

Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Description:

Ray Tracing on Programmable GPUs (Purcell, SigGraph 2002) ... Purcell et al., Standford. Converges to the coprocessor' approach. Pure HW ... – PowerPoint PPT presentation

Number of Views:53

Avg rating:3.0/5.0

Slides: 101

Provided by: ingowaldin

Category:

more less

Transcript and Presenter's Notes

Title: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

1
Afrigraph 2003 Course onAdvanced Interactive
Ray TracingandInteractive Global Illumination

Ingo Wald Carsten Benthin Philipp Slusallek
Saarland University

2
First What is Ray Tracing ?
Ray-Generation
Ray-Traversal
Intersection
Shading
Framebuffer
3
Agenda

Introduction Motivation
Why Interactive Ray Tracing at all ?
Part I Interactive Ray Tracing Architectures
Software Ray Tracing
Ray Tracing on Programmable GPUs
Dedicated Ray Tracing Hardware
Part II Advanced Ray Tracing Issues
Handling Dynamic Scenes
The OpenRT Interactive Ray Tracing API
Part III New Applications
Industrial Application Interactive Visualization
of Car Headlights
Interactive Global Illumination
Summary and Conclusions

4
Why Interactive Ray Tracing ?
5
We have NVidia so what do we need Ray Tracing
for ?

Because it is high quality
Fully Programmable and Arbitrary Shading
Operations
All operations performed in floating point
Flexibility Can shoot arbitrary Rays
Shadows, reflections, refractions,
Even suitable for global illumination
Simple Programming Model
No need for multiple passes or OpenGL tricks
For indirect effect (like shadows) just shoot a
ray !
Automatic correctness
No need for approximations (like reflection maps)
? Ray Tracing is much more flexible and powerful
rendering algorithm than classical triangle
rasterization

6
We have NVidia so what do we need Ray Tracing
for ?

But not only that Its also efficient !
Logarithmic scene complexity
Useful for increasingly complex scenes (1 mtri,
no problem ! )
No multiple rendering passes
Automatic Visibility Culling Occlusion
Culling
Hidden geometry not even touched
Depth complexity not an issue
No overdraw, shading performed exactly once per
ray
Very useful for increasingly costly shading
Small bandwidth requirements (if you do it
right)
Memory access coherence culling single
shading

7
We have NVidia so what do we need Ray Tracing
for ?

To summarize
its highly flexible
its high-quality
its efficient
And All of that combines automatically
Can do some of that sometimes in HW, but usually
not all together

8
If its so good, then why isnt it real ?

1.) Better asymptotic complexity, but huge
constants
1 ray 1000 CPU-cycles
Runs on hardware that it doesnt really fit to
Uses only tiny fraction of todays CPUs, no
parallelism,
Need many rays/sec for full interactivity
1Mpix/frame 4-fold anitaliasing 25
frames/sec 10 rays/pixel ? One billion rays
per second
2.) Graphics users dont have the choice
Rasterization has highly sophisticated HW
implementations
? HW technology for rasterization 10 years ahead
of RT HW
There is no interactive ray tracing chip (yet),
no matter the cost
All applications are designed for OpenGL
? There is no market for interactive ray tracing
(really ?)
Still more money/time/effort spent on improving
rasterization

9
Why is there no Ray Tracing Hardware ?

Because Graphics hardware evolved 20 years ago !
And Rasterization was the better choice back
then
Small scenes
? (asymptotic) complexity doesnt matter for
small N
Large triangles
Coherence incremental ops interpolation, low
bandwidth
Simple (integer-)operations, highly pipelined
FPU-requirements of ray tracing unthinkable 10
years ago
No fragment ops except interpolation
Programmability not an issue
? Very deep pipelines no dependencies, no
branches, no nothing,
Can be built in HW very efficient, very fast,
very cheap
Note All of this is changing today !
Eg today, GForce 3 already has more FPU power
than any CPU

10
Todays State of the Art in Realtime Ray Tracing

Software Implementations are slowly becoming
available
Michael Muuss, Army Research Labs
Huge Cluster of SGI machines
Parker et al, University of Utah
32-128 CPU SGI Origin
Saarland University
4 dual PIIIs in 2000, up to 24 dual Athlon 1800
today
Hardware Architectures are already beeing
designed
SaarCOR (Schmittler et al., HWWS 2002)
Ray Tracing on Programmable GPUs (Purcell,
SigGraph 2002)
Hybrid Software/GPU system (Hart, HWWS 2002)
Several alternatives for future realtime ray
tracing
Cant yet decide which is best, only know Itll
come

11
Todays State of the Art in Realtime Ray Tracing

Even today, IRT solves tasks that even high-end
graphics hardware still cannot handle !
Highly complex models (Muuss, Utah, Saarland
RW2001)
High-quality Isosurface and Volume Visualization
(Utah)
Shadows, reflections, arbitrary shading
Saarland, Utah
High-quality reflection simulation of car
headlights PGV2002
Interactive Global Illumination RW2002

12
Todays State of the Art- Some Snapshots
13
Video
14
Part IDifferent Approaches toRealtime Ray
Tracing
15
Different Approaches to Realtime Ray Tracing

Basically three choices
Pure Software Implementations
Today Highly parallel
Shared Memory (Utah), or PC Clusters (Saarland)
Future Single PC ?
Moores Law also holds for CPUs !
Perhaps with streaming co-processors (e.g.
SSE)
Mixed SW/HW RT on Programmable GPUs
Purcell et al., Standford
Converges to the coprocessor approach
Pure HW
Dedicated RT hardware (Schmittler et al.,
SaarCOR)
Summarize all three approaches

16
Alternative ISoftware Ray Tracing(examplary on
the Saarland engine)
17
The OpenRT Interactive Ray Tracing Engine

Features of OpenRT
Highly efficient implementation of RT kernels
On a single Athlon MP 1800 CPU 500.000-1.5
million rays per second for average models
(100ktri 1 Mtri)
Up to 10 million rps (rays/sec) range (no
shading, simple scenes)
Sophisticated parallelization on cluster of PCs
Dynamic load-balancing
Using up to 24 dual-Athlon MP 1800 or 25 dual P4
Xeon 2.4GHz
Dynamically loadable, fully programmable Shaders
Arbitrary c-code shading, arbitrary rays
Renderman-like Shading Language
Can handle dynamic scenes (later)
OpenGL-like API (later)

18
Where does the speed come from ?

Speed depends on several factors
Using fastest available hardware
Fast CPUs, and many CPUs
Good algorithms Avoid operations in the first
place
Fast Intersection and Traversal (kd-trees)
Minimize Intersections and Trv-steps with
high-quality BSPs
Just as important Make sure youre using your
silicon correctly !
Highly efficient implementation
Machine-dependent code, if necessary (SSE)

19
Where does the speed come from ?

Keep the Computational Units busy !
Make CPU doesnt stall
Avoiding pipeline stalls has top priority
Look at memory, caches and bandwidth !!!
Example Cache miss during triangle intersection
costs about 4 times as much as the computations
themselves !!!
Packing, aligning, cache-friendly data layout,
prefetching,
But no details here
Already covered that at Afrigraph 2001
Its not one single method, its more a principle

20
Distributed Ray Tracing

One CPU still not fast enough
1 Mray/sec is fast, but not enough
Need more CPUs ? Clusters are cheap (20k-50k)
Many approaches
Static vs dynamic load balancing
Object-space vs image-space vs ray-based task
partitioning,
Pixel-interleaved (load balancing) vs tiles
(coherence)
Problem Interactivity constraint
Have to finish whole frame in 1/10th of a second
Few time for sophisticated reordering/scheduling

21
Distributed Ray Tracing

Our approach (mostly Carsten Benthin)
Image-based task partitioning
? Break image up into tiles (usually 16x16 or
32x32)
Since API Can dynamically change task
partitioning scheme
Strongly varying workload
? Need dynamic load balancing Let clients ask
for work
Have to care about network-latencies
(10ms Network-latency 10.000 rays !)
Highly efficient networking/communication code
Double-buffering, prefetching, packing,
streaming, asynchronous sending and rendering,
interleaving of different tasks, multithreading,

22
Distributed Ray TracingResults

Can efficiently use many CPUs
32x32 tiles at 640x480 150 tiles ? enough for
many CPUs
Usually limiting factor Pixels/second (not
rays/sec)
Bandwidth limited at server 640x480 at 10-15
frames/sec
For lt 10 fps Usually achieve 90-99 client
utilization
Client bandwidth usually not an issue (100Mbit)
Rendering Complexity helps !
More costly tiles better compute/BW ratio, less
Pixels/sec
Can use more CPUs without hitting bandwidth limit
Doubling rays/pixel easier than doubling
framerate
Framerate scales linearly only up to max
framerate
But always scales linearly in rays/pixel
Better networking hardware would definitely help

23
Realtime Ray TracingApproach IIRay Tracing on
Programmable GPUs
24
Ray Tracing on Programmable GPUs

Graphics Hardware today
GPUs are extremely powerful
Already more transistors than P4
Full IEEE floating point !
Many, many, many parallel FPUs
Moores Law Faster growth than for CPUs
GPUs become more and more programmable
First Register Combiners
Then Vertex Shaders
Programmable per vertex
linear interpolation inside the vertices
Today Pixel Shaders, Fragment Programs
Fully programmable for each fragment

25
Ray Tracing on Programmable GPUs

GPU programmability today
Full IEEE
SIMD computations
Access to memory (textures) in every
instruction
Multiple indirections (pointer chasing) now
possible
dependent texture reads
Still Several restrictions
Conditionals, loops, recursion, dependent texture
writes
Typically programmed in GPU-assembler
Most recent High-level meta languages
E.g. CG (C for GPUs)

26
Streaming Computations on Programmable GPUs

Idea Use GPU as streaming co-processor
Dont use it for rasterizing at all
Pixels form a stream of elements
Apply small program (kernel) for whole stream
Render screen-aligned quad with a fragment shader
Fragment program executed for each screen pixel
Each pixel operates on different data
Read data from textures
Screen-aligned textures 1 texel for each pixel
Output to framebuffer 1 pixel for each
fragment program
Feedback Loop Copy framebuffer to textures
Future Directly write into textures

27
Ray Tracing on Programmable GPUs
Screen aligned Quad
Memory (Textures)
Fragment
Kernel (Fragment Shader)
Data (Texels)
Output
Frame Buffer
28
Ray Tracing on Programmable GPUs
Screen aligned Quad
Memory (Textures)
Fragment
Kernel (Fragment Shader)
Data (Texels)
Output
Frame Buffer
29
Ray Tracing on Programmable GPUs
Screen aligned Quad
Memory (Textures)
Fragment
Kernel (Fragment Shader)
Data (Texels)
Output
Frame Buffer
30
Ray Tracing on Programmable GPUs
Screen aligned Quad
Memory (Textures)
Fragment
Kernel (Fragment Shader)
Data (Texels)
Output
Frame Buffer
31
Ray Tracing on Programmable GPUs
Screen aligned Quad
Memory (Textures)
Fragment
Kernel (Fragment Shader)
Data (Texels)
Output
Frame Buffer
32
Ray Tracing on Programmable GPUs
Screen aligned Quad
Memory (Textures)
Fragment
Kernel (Fragment Shader)
Data (Texels)
Output
Frame Buffer
33
Ray Tracing on Programmable GPUs
Screen aligned Quad
Memory (Textures)
Fragment
Kernel (Fragment Shader)
Data (Texels)
Output
Frame Buffer
34
Ray Tracing on Programmable GPUs
Screen aligned Quad
Memory (Textures)
Fragment
Kernel (Fragment Shader)
Data (Texels)
Output
Frame Buffer
35
Ray Tracing on Programmable GPUs
Screen aligned Quad
Memory (Textures)
Fragment
Kernel (Fragment Shader)
Data (Texels)
Output
Frame Buffer
36
Ray Tracing on Programmable GPUs
Screen aligned Quad
Memory (Textures)
Fragment
Kernel (Fragment Shader)
Data (Texels)
Output
Frame Buffer
Feedback !
37
Ray Tracing on Programmable GPUs

Mapping Ray Tracing to the GPU
Use textures for the storing variables
Ray origin and direction 2D textures (3
floats each)
Hit 2D texture (3 floats u,v,id)
Vertices 1D-texture of vertex positions (3
floats each)
Triangles 1D-texture of vertex ids (1 float
each)
Acceleration structure e.g. 3D-texture for
simple grid
Multiple indirections no problem
E.g. use trianglei as texture coordinate into
vertex texture
Up to 4 indirections (grid ? triangle list ?
triangle ? vertex)

38
Ray Tracing on Programmable GPUs

Write kernels for different ray tracing ops
Ray Generation
Get pixel position from texture coordinates
Somehow get camera settings (e.g. from quad
color, or texture)
Compute corresponding ray
Write to origin, direction, state textures
Triangle Intersection
Read triangle ID to be intersected from state
Get triangle vertices from textures
Intersect
Update state texture
Similar for traversal, triangle list
intersection, shading,

39
Ray Tracing on Programmable GPUs

Have kernels for ray generation, traversal,
intersection, etc.
Each ray is in exactly one state
E.g. in intersection state
Make sure only rays in correct state are
processed
E.g. apply intersection kernel only to rays in
intersect state
Usual GL masking methods, e.g. stencil bits,
early pixel kill etc.
? Can generate overhead, but usually ok
Fragment program can change state of ray
E.g. change from traversal to intersection in
non-empty voxel
Combine different kernels by just calling them in
turn
E.g. rendering an intersection quad will do one
intersection step
(but only for rays in intersect state !)
Secondary rays rel. easy for Shader kernel
Update origindirection textures, go back to
traversal state

40
Ray Tracing on Programmable GPUs

Results
Easy to exploit parallelism in the GPU
Many more pixels than fragment pipelines
Comparable performance to single CPU
Even though its only a prototype implementation
Limited by fragment pipeline very soon
Main Limitation
Fragment processing speed
Texture memory
Need many textures for each pixel
Also need to store whole scene in texture
Bandwidth
Number of different states must be small !

41
Ray Tracing on Programmable GPUs

Additional limitations of current GPUs
Bandwidth problems due to missing loops
Often have to write data just to save it for next
iteration
Overhead due to missing write capability
Accuracy problems no ints, all floats
E.g. rounding modes when reading IDs from a
texture
Problems due to missing dependent writes
Many textures for input, but only one framebuffer
for output
Need multiple passes computing more than 3 values
per pix.
Each fragment shader writes to exactly one
predetermined position
Hard to do recursive operations with that
limitation
Kd-tree construction ?

42
Ray Tracing on Programmable GPUs

Ray tracing on GPUs in the future ?
Many limitations will (probably) change
Loops, branches, dependent writes, int textures,
texture memory, early pixel kill
Performance will increase faster than for CPUs
? Might soon be faster, and similarly flexible,
as ray tracing on a CPU !

43
Realtime Ray TracingApproach IIIDedicated Ray
Tracing Hardware
44
Dedicated Ray Tracing Hardware

Relatively low efficiency when using GPU for RT
Many units not needed at all (rasterization,
z-buffer, clipping, lighting, )
Lots of overhead
Programmable units can never be as efficient as
dedicated HW
Dedicated ray tracing HW should be more efficient
Building RT HW is feasible today
FPU power not a problem any more
(see GForce3 FPU performance)
Die size/Nr of transistors not a problem any more
Main problem Off-chip bandwidth !
Already between chip and cache

45
Dedicated Ray Tracing Hardware

Bandwidth Same problem as in SW
Approach in SW Bandwidth reduction by Coherent
Ray Tracing (packet traversal)
HW Much larger packets (64x64 vs 2x2 !)
Much bigger bandwidth saving
Target realtime full-screen resolutions
Larger packet sizes not a problem ? Lots of
coherence
Avoiding overhead simple in HW
Much simpler than with SSE

46
SaarCOR Architecture

Features
Based on interactive software ray tracer
Exactly same data structures,
KD-trees as accelleration structure
Pakets of rays to reduce bandwidth
Fixed OpenGL-like shading
plus shadow and reflection rays
Goals
Simple low bandwidth memory interface
Half the floating point requirements of GeForce3
Achieves frame rates comparable to todays
gfxcards

47
SaarCOR Architecture
System overview
48
SaarCOR Architecture
Features

Scalable
Fully pipelined
Multi threading for latency hiding
Simple communication pattern (no routing)
Highly asynchronous

49
SaarCOR Current Status

Simulation on register-transfer level
Core _at_ 533MHz, Memory 64 Bit _at_ 133 MHz (simple
SD-RAM, no DDR!)
Each pipeline uses 36 FP-units
Standard SaarCOR
4 pipelines
16 threads per pipe
1 GB/s bandwidth to memory (!)
272 KB for caches (!)
Four pipes ½ FP-resources of GeForce 3

50
Issues

On-chip memory of standard SaarCOR
Caches 272 KB
RF for rays 288 KB
RF for stack 535 KB
Register level simulations only
Simple shading only

51
BenchmarksScenes

OpenGL-Like Shading
No shadow rays
No reflection rays
Full screen resolution
1024 x 768 pixel

52
Benchmarks Scenes (2)
53
BenchmarksResults
Todays CPUs 0.5 0.8 mrays/s ? factor of
100-200!
54
BenchmarksResults (3)

Efficiency of standard SaarCOR

16 threads ? 32 threads 10
Performance scales with number of pipelines,
threads, cache size and bandwidth.
55
What about shading?

Right now Shading only coarsely approximated
Fixed phong shader w/ bilinear texturing
Programmable Shading currently evaluated
Shading packets of rays exploits coherence
BQD scene with bilinear textures
14 MB for shading data per frame
300 600 MB/s bandwidth
Shading BW Ray Tracing BW

56
Conclusions

SaarCOR architecture
Scales well in the numberof pipelines
Highly efficient
Uses half the FP power of GeForce3
Requires very low bandwidth
Provides full featured ray tracing
Same frame rates as todays graphics cards

57
Current Work

Programmable shading
API OpenRT Wald02
Virtual Memory Management
Incorporate Features and Algorithms from SW
system
Large Models Wald01
Dynamic scenes Wald02
Global Illumination Wald02
Building a prototype

58
Realtime Ray TracingApproaches I-IIISummary
and Conclusions
59
Realtime Ray Tracing

Summary
Different upcoming (and competing !)
architectures.
All these have different advantages /
disadvantages
PC clusters most flexible, but not useful for
consumer market
GPUs better performance growth, cheap, but
awkward to use
HW best performance, best efficiency, but costly
? Cannot yet predict which one will win

60
Realtime Ray Tracing

Summary
Different upcoming (and competing !)
architectures.
All these have different advantages /
disadvantages
PC clusters most flexible, but not useful for
consumer market
GPUs better performance growth, cheap, but
awkward to use
HW best performance, best efficiency, but costly
? Cannot yet predict which one will win
But
Question is not will realtime ray tracing ever
come ?
Questions rather is how and when will it come.

61
End of Part I - Questions ?
62
Part IIAdvanced Ray Tracing Issues
63
Advanced Ray Tracing Issues

Conclusions from Part I Realtime Ray Tracing
will come
Problem All these architectures mostly focus
only on the core ray tracing algorithms, i.e.
traversal intersection
Ubiquitous Realtime Ray Tracing opens new
problems
Dynamic Scenes ?
Suitable API(s) ?
Implications for future Applications / SceneGraph
libraries ?

64
Interactive Ray Tracing

So far
Interactive RT possible even today, can already
beat SGI/NVidia
Complex models
High-Quality Applications
Can do high-quality, interactive walkthroughs
But Walkthrough is not really interactive
Not if scene remains static

65
Issue I Dynamic Scenes

Fact Ray Tracing needs acceleration structure
Building it is very costly
Precomputation only works for static scenes
But Real scenes usually arent static
? What is interactive if I cannot interact
with it ?
Problem Few research on this topic
Just wasnt interesting before interactive ray
tracing
Previous work Usually on special cases
Utah Hack Keep dynamic objects out of accel
structure
Reinhard RW2001 Incremental updates of Uniform
Grid
Costly, not hierarchical
Moeller, EG2001 Only rigid-body animation

66
Handling Dynamic Scenes

Different kinds of dynamic behavior
Hierarchical, rigid-body motion vs unstructured
motion
Constrained unstructured motion (e.g. maximum
displacement)
All triangles animated vs few triangles animated
Amortized over many rays/frames or over few rays

67
Handling Dynamic Scenes

Different kinds of dynamic behavior
Hierarchical, rigid-body motion vs unstructured
motion
Constrained unstructured motion (e.g. maximum
displacement)
All triangles animated vs few triangles animated
Amortized over many rays/frames or over few rays
Inherently different problems need different
solutions
One single algorithm will hardly do the job

68
Handling Dynamic Scenes

Alternative approach
Offer suite of different techniques
Hierarchical animation of whole objects
Fast Rebuild of objects for unstructured motion
(with sacrifices in traversal speed)
High-quality bsps for often-used static objects
(with relatively long rebuild time)
Let the application decide, which one is best for
what !
If anybody knows whats best, its the
application programmer
Just like OpenGL Applications build display
lists, not the drivers !
Allow combination of techniques
E.g.some unstructured motion but otherwise
hierarchically animated
? App needs good API to do that !

69
Handling Dynamic Scenes

Combining techniques in a hierarchical way
Application groups geometry into objects
Similar to building display lists (?API)
Each object has separate BSP (just like
PowerPlant)
Hints can be given to control quality/speed
tradeoff
E.g. whether the object will be static or
unstructured
Objects can be instantiated
Just like calling a display list (? API)
Hierarchical animation Just re-instantiate with
new transform
Objects are kept in additional hierarchy level
With separate, fast and high-quality BSP
During traversal, just transform the rays when
they hit an object

70
Handling Dynamic Scenes- Results

Side Effect Instantiation is for free
Terrain 1000 instances of 20ktri-tree 20 Mtri
(and dynamic !)
Sunflowers 36.000 x 24ktri-sunflowers 1 GigaTri
(dynamic !)
TopLevel BSP reconstruction tolerable
Some milliseconds even for a few thousand objects
But scalability bottleneck (redundant
computation on each client)
Hierarchical animation is cheap
Transformations are cheap (compared with the
rest)
But Unstructured motion still costly
Especially for big objects (? have to use
low(er)-quality BSPs)
High bandwidth requirements for sending data over
network !!!
Tolerable for moderately complex objects
(16k-64ktri)
In practice Total overhead usually 10-20

71
Handling Dynamic Scenes- Conclusions

Works for many different scenes (BART Benchmark
suite)
Robots Game-like scene, hierarchical animation
of 161 Objects
Kitchen Mostly static, with many secondary
effects
Museum Completely unstructured motion
Correct (inter-)reflections, shadows, etc. also
on moving triangles !
Also works for all applications we have built so
far
OpenRT based VRML97 viewer with VRML animations
Inventor-port under way
Dynamic scenes in Interactive Global Illumination
application

72
Handling Dynamic Scenes- Results
73
Handling Dynamic Scenes- Results
Video
74
Handling Dynamic Scenes- Remaining Problems

Lots of potential for future research !
Faster kd-tree generation ?
Kd-tree generation in HW ?
On-demand generation of kd-trees ?
More efficient solutions for special problems
Skinning, morphing, progressive meshes,

75
Issue II API Issues

So far
Fast, cheap, efficient,
Flexible, powerful shading
Can do big models and dynamic scenes,
So why is nobody using it ?

76
Issue II API Issues

So far
Fast, cheap, efficient,
Flexible, powerful shading
Can do big models and dynamic scenes,
So why is nobody using it ?

Because without a proper API, you cant !
77
Issue II API Issues

Why do we need an API for Interactive Ray Tracing
?
Side Effect An API helps to divide-n-conquer
problems (e.g. shaders, globillum, raytracing
kernels, )
E.g., can work separately on frontend and
backend
Can Abstract from dynamic scene issues in
globillum shader aso.
It helps to create a critical mass of users
Rasterization only really took off after OpenGL
Enables code portability
Without an API, nobody will (or can) use it -
except insiders
Not everybody has his own realtime raytracer
Not everybody wants to - or should - know all
implementation details
? For widespread Realtime Ray Tracing, we do need
an API

78
Issue II API Issues

Problem There are no suitable APIs
API has to support both interactive and ray
tracing
OpenGL interactive, but not suitable for ray
tracing
Renderman/Rayshade/Povray ray tracing capable,
but inherently offline
? Need to find new API(s)

79
Issue II API Issues

Goals for an Interactive Ray Tracing API
As easy to learn and use as (standard) OpenGL
Leverage existing programmers experience with
OpenGL
As powerful in Shading as RenderMan
Our Approach (OpenRT) Combine the best of both
Application API much like OpenGL/GLUT
With necessary modifications for Ray Tracing
(Shaders, Objects)
Shader API like RenderMan

80
The OpenRT Interactive Ray Tracing API

Application API very OpenGL-like
Geometry rtVertex3f, rtNormal3f,
Primitives rtBegin/End(RT_TRIANGLES, RT_QUAD, )
Transformation rtPushMatrix(), rtMatrixMode(),
Geometry Objects
Just like Display Lists (except no side effects)
rtNewObjects(), rtBeginObject(), rtEndObject(),
rtInstantiate(),
Shader Objects
Surface, Light, and Pixel Shaders, exchangeable
Renderer Object
Even support GLUT-like functionality
Porting GL/GLUT-applications relatively easy
(except multi-pass, of course, )

81
The OpenRT Interactive Ray Tracing API

Shader Objects
Similar to Stanford Programmable Shading API
Dynamically loaded from DLLs/.sos
rtShaderFile(), rtCreateShader(), rtBindShader()
Light shaders rtCreateShader(), rtUseLight()
Application-to-Shader communication via Shader
Parameters
rtDeclareParam(), rtParameterHandle(),
rtParameter3f(),
Parameters can be per vertex, per triangle, per
shader,
Retained-Mode / Frame Semantics
Rendering uses Shader Parameters active at end
of frame
NOT at the time that shader/triangle was created
Actual rendering triggered at rtSwapBuffers
Rendering always done asynchronously

82
The OpenRT Interactive Ray Tracing API

Shader API Or how to write a shader
Declare and Export Shader Parameters
Store as member variables
Write callback-functions
Shade(), Illuminate(),
Access Scene Data with RenderMan like API
Geometry rtsShadingNormal(),
Lights rtsIlluminate(), rtsOccluded(),
rtsLightTransparency(),
Shoot Arbitrary Secondary Rays
rtsTrace()
? Porting RenderMan shaders relatively easy, too

83
The OpenRT Interactive Ray Tracing API

OpenRT Summary
Fast and Interactive Rendering
Dynamic Scenes
Very Powerful Shading
API for using it
OpenRT is a complete 3D Rendering Engine
Kernel behind OpenRT Saarland RTRT
Might be changed to e.g. SaarCOR as soon as
available

84
OpenRT Example 1VRML97 _at_OpenRT

Example 1 VRML97 Viewer ported from OpenGL
Porting relatively easy, almost all functionality
was there
Only Modification Have to gather small objects
into fewer bigger objects for performance
reasons
Results
Can render all of VRML97
Almost no matter how big
Can put any kind of shader on any triangle (e.g.
GlobIllum)
Can do VRML animations, move objects, edit
shaders lights

Car Headlight, 800.000 tri
Soda hall Floor 400.000 tris
85
OpenRT Example 2The BART Benchmark

Example 2 The BART Benchmark scenes
To our knowledge, only system so far to render
those at all
All different kind of dynamic behavior, including
reflections, refractions, shadows,
With GL Shader gt 10 frames per second
With Raytracing Shader 2-5 frames per second

86
OpenRT Example 3Complex Outdoor Scene

Example 3 Massive Instantiation for Outdoor
Scenes
Pixel-accurate shadows !

87
OpenRT Example 3Complex Outdoor Scene
88
OpenRT Example 4Massive Model Visualization

Example 4 The PowerPlant
12.5-37.5 million triangles
Currently With replication, without
demand-loading/reordering
Just recently Can now also move the furnace -)

89
OpenRT Example 5Complex Shading Stress Test
90
OpenRT Example 5Complex Shading Stress Test

Example 5 Shading Stress Test
Volume Shader (CT Head)
Applied to a box of geometry
Lightfield Shader on simple quad
Procedural Wood and Marble
Procedural Bump-Mapping on mirror
? Procedurally bump-mapped reflections
Result Everything combines perfectly
Transparent Shadow from Volume on Procedural Wood
Shader
Lightfield reflected in procedurally bump-mapped
mirror
attenuated by semi-transparent volume
Multiple interreflections
Of course, everything is interactive and fully
dynamic

91
OpenRT Example 5Complex Shading Stress Test
92
OpenRT Example 6Interactive Global Illumination
Implementation Not now
93
OpenRT Example 6Interactive Global Illumination

Fully implemented in OpenRT
GlobIllum Application is Shader like any other
Automatically inherit capability for handline
dynamic scenes, distribution,
Same frontend as e.g. BART/Office
Automatically inherit parser, user interface,
etc
Can be used from different applications (e.g.
VRML viewer)
Algorithms Implementation Later (Part III)

94
Questions ?

For more info, also visit
http//www.OpenRT.de

95
Part IIINew Applications enabled by Realtime
Ray Tracing
96
For more information on OpenRT,
seehttp//www.OpenRT.de
97
The Saarland Interactive Ray Tracing Project