Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination - PowerPoint PPT Presentation

1 / 100
About This Presentation
Title:

Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination

Description:

Ray Tracing on Programmable GPUs (Purcell, SigGraph 2002) ... Purcell et al., Standford. Converges to the coprocessor' approach. Pure HW ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 101
Provided by: ingowaldin
Category:

less

Transcript and Presenter's Notes

Title: Afrigraph 2003 Course on Advanced Interactive Ray Tracing and Interactive Global Illumination


1
Afrigraph 2003 Course onAdvanced Interactive
Ray TracingandInteractive Global Illumination
  • Ingo Wald Carsten Benthin Philipp Slusallek
  • Saarland University

2
First What is Ray Tracing ?
Ray-Generation
Ray-Traversal
Intersection
Shading
Framebuffer
3
Agenda
  • Introduction Motivation
  • Why Interactive Ray Tracing at all ?
  • Part I Interactive Ray Tracing Architectures
  • Software Ray Tracing
  • Ray Tracing on Programmable GPUs
  • Dedicated Ray Tracing Hardware
  • Part II Advanced Ray Tracing Issues
  • Handling Dynamic Scenes
  • The OpenRT Interactive Ray Tracing API
  • Part III New Applications
  • Industrial Application Interactive Visualization
    of Car Headlights
  • Interactive Global Illumination
  • Summary and Conclusions

4
Why Interactive Ray Tracing ?
5
We have NVidia so what do we need Ray Tracing
for ?
  • Because it is high quality
  • Fully Programmable and Arbitrary Shading
    Operations
  • All operations performed in floating point
  • Flexibility Can shoot arbitrary Rays
  • Shadows, reflections, refractions,
  • Even suitable for global illumination
  • Simple Programming Model
  • No need for multiple passes or OpenGL tricks
  • For indirect effect (like shadows) just shoot a
    ray !
  • Automatic correctness
  • No need for approximations (like reflection maps)
  • ? Ray Tracing is much more flexible and powerful
    rendering algorithm than classical triangle
    rasterization

6
We have NVidia so what do we need Ray Tracing
for ?
  • But not only that Its also efficient !
  • Logarithmic scene complexity
  • Useful for increasingly complex scenes (1 mtri,
    no problem ! )
  • No multiple rendering passes
  • Automatic Visibility Culling Occlusion
    Culling
  • Hidden geometry not even touched
  • Depth complexity not an issue
  • No overdraw, shading performed exactly once per
    ray
  • Very useful for increasingly costly shading
  • Small bandwidth requirements (if you do it
    right)
  • Memory access coherence culling single
    shading

7
We have NVidia so what do we need Ray Tracing
for ?
  • To summarize
  • its highly flexible
  • its high-quality
  • its efficient
  • And All of that combines automatically
  • Can do some of that sometimes in HW, but usually
    not all together

8
If its so good, then why isnt it real ?
  • 1.) Better asymptotic complexity, but huge
    constants
  • 1 ray 1000 CPU-cycles
  • Runs on hardware that it doesnt really fit to
  • Uses only tiny fraction of todays CPUs, no
    parallelism,
  • Need many rays/sec for full interactivity
  • 1Mpix/frame 4-fold anitaliasing 25
    frames/sec 10 rays/pixel ? One billion rays
    per second
  • 2.) Graphics users dont have the choice
  • Rasterization has highly sophisticated HW
    implementations
  • ? HW technology for rasterization 10 years ahead
    of RT HW
  • There is no interactive ray tracing chip (yet),
    no matter the cost
  • All applications are designed for OpenGL
  • ? There is no market for interactive ray tracing
    (really ?)
  • Still more money/time/effort spent on improving
    rasterization

9
Why is there no Ray Tracing Hardware ?
  • Because Graphics hardware evolved 20 years ago !
  • And Rasterization was the better choice back
    then
  • Small scenes
  • ? (asymptotic) complexity doesnt matter for
    small N
  • Large triangles
  • Coherence incremental ops interpolation, low
    bandwidth
  • Simple (integer-)operations, highly pipelined
  • FPU-requirements of ray tracing unthinkable 10
    years ago
  • No fragment ops except interpolation
  • Programmability not an issue
  • ? Very deep pipelines no dependencies, no
    branches, no nothing,
  • Can be built in HW very efficient, very fast,
    very cheap
  • Note All of this is changing today !
  • Eg today, GForce 3 already has more FPU power
    than any CPU

10
Todays State of the Art in Realtime Ray Tracing
  • Software Implementations are slowly becoming
    available
  • Michael Muuss, Army Research Labs
  • Huge Cluster of SGI machines
  • Parker et al, University of Utah
  • 32-128 CPU SGI Origin
  • Saarland University
  • 4 dual PIIIs in 2000, up to 24 dual Athlon 1800
    today
  • Hardware Architectures are already beeing
    designed
  • SaarCOR (Schmittler et al., HWWS 2002)
  • Ray Tracing on Programmable GPUs (Purcell,
    SigGraph 2002)
  • Hybrid Software/GPU system (Hart, HWWS 2002)
  • Several alternatives for future realtime ray
    tracing
  • Cant yet decide which is best, only know Itll
    come

11
Todays State of the Art in Realtime Ray Tracing
  • Even today, IRT solves tasks that even high-end
    graphics hardware still cannot handle !
  • Highly complex models (Muuss, Utah, Saarland
    RW2001)
  • High-quality Isosurface and Volume Visualization
    (Utah)
  • Shadows, reflections, arbitrary shading
    Saarland, Utah
  • High-quality reflection simulation of car
    headlights PGV2002
  • Interactive Global Illumination RW2002

12
Todays State of the Art- Some Snapshots
13
Video
14
Part IDifferent Approaches toRealtime Ray
Tracing
15
Different Approaches to Realtime Ray Tracing
  • Basically three choices
  • Pure Software Implementations
  • Today Highly parallel
  • Shared Memory (Utah), or PC Clusters (Saarland)
  • Future Single PC ?
  • Moores Law also holds for CPUs !
  • Perhaps with streaming co-processors (e.g.
    SSE)
  • Mixed SW/HW RT on Programmable GPUs
  • Purcell et al., Standford
  • Converges to the coprocessor approach
  • Pure HW
  • Dedicated RT hardware (Schmittler et al.,
    SaarCOR)
  • Summarize all three approaches

16
Alternative ISoftware Ray Tracing(examplary on
the Saarland engine)
17
The OpenRT Interactive Ray Tracing Engine
  • Features of OpenRT
  • Highly efficient implementation of RT kernels
  • On a single Athlon MP 1800 CPU 500.000-1.5
    million rays per second for average models
    (100ktri 1 Mtri)
  • Up to 10 million rps (rays/sec) range (no
    shading, simple scenes)
  • Sophisticated parallelization on cluster of PCs
  • Dynamic load-balancing
  • Using up to 24 dual-Athlon MP 1800 or 25 dual P4
    Xeon 2.4GHz
  • Dynamically loadable, fully programmable Shaders
  • Arbitrary c-code shading, arbitrary rays
  • Renderman-like Shading Language
  • Can handle dynamic scenes (later)
  • OpenGL-like API (later)

18
Where does the speed come from ?
  • Speed depends on several factors
  • Using fastest available hardware
  • Fast CPUs, and many CPUs
  • Good algorithms Avoid operations in the first
    place
  • Fast Intersection and Traversal (kd-trees)
  • Minimize Intersections and Trv-steps with
    high-quality BSPs
  • Just as important Make sure youre using your
    silicon correctly !
  • Highly efficient implementation
  • Machine-dependent code, if necessary (SSE)

19
Where does the speed come from ?
  • Keep the Computational Units busy !
  • Make CPU doesnt stall
  • Avoiding pipeline stalls has top priority
  • Look at memory, caches and bandwidth !!!
  • Example Cache miss during triangle intersection
    costs about 4 times as much as the computations
    themselves !!!
  • Packing, aligning, cache-friendly data layout,
    prefetching,
  • But no details here
  • Already covered that at Afrigraph 2001
  • Its not one single method, its more a principle

20
Distributed Ray Tracing
  • One CPU still not fast enough
  • 1 Mray/sec is fast, but not enough
  • Need more CPUs ? Clusters are cheap (20k-50k)
  • Many approaches
  • Static vs dynamic load balancing
  • Object-space vs image-space vs ray-based task
    partitioning,
  • Pixel-interleaved (load balancing) vs tiles
    (coherence)
  • Problem Interactivity constraint
  • Have to finish whole frame in 1/10th of a second
  • Few time for sophisticated reordering/scheduling

21
Distributed Ray Tracing
  • Our approach (mostly Carsten Benthin)
  • Image-based task partitioning
  • ? Break image up into tiles (usually 16x16 or
    32x32)
  • Since API Can dynamically change task
    partitioning scheme
  • Strongly varying workload
  • ? Need dynamic load balancing Let clients ask
    for work
  • Have to care about network-latencies
  • (10ms Network-latency 10.000 rays !)
  • Highly efficient networking/communication code
  • Double-buffering, prefetching, packing,
    streaming, asynchronous sending and rendering,
    interleaving of different tasks, multithreading,

22
Distributed Ray TracingResults
  • Can efficiently use many CPUs
  • 32x32 tiles at 640x480 150 tiles ? enough for
    many CPUs
  • Usually limiting factor Pixels/second (not
    rays/sec)
  • Bandwidth limited at server 640x480 at 10-15
    frames/sec
  • For lt 10 fps Usually achieve 90-99 client
    utilization
  • Client bandwidth usually not an issue (100Mbit)
  • Rendering Complexity helps !
  • More costly tiles better compute/BW ratio, less
    Pixels/sec
  • Can use more CPUs without hitting bandwidth limit
  • Doubling rays/pixel easier than doubling
    framerate
  • Framerate scales linearly only up to max
    framerate
  • But always scales linearly in rays/pixel
  • Better networking hardware would definitely help

23
Realtime Ray TracingApproach IIRay Tracing on
Programmable GPUs
24
Ray Tracing on Programmable GPUs
  • Graphics Hardware today
  • GPUs are extremely powerful
  • Already more transistors than P4
  • Full IEEE floating point !
  • Many, many, many parallel FPUs
  • Moores Law Faster growth than for CPUs
  • GPUs become more and more programmable
  • First Register Combiners
  • Then Vertex Shaders
  • Programmable per vertex
  • linear interpolation inside the vertices
  • Today Pixel Shaders, Fragment Programs
  • Fully programmable for each fragment

25
Ray Tracing on Programmable GPUs
  • GPU programmability today
  • Full IEEE
  • SIMD computations
  • Access to memory (textures) in every
    instruction
  • Multiple indirections (pointer chasing) now
    possible
  • dependent texture reads
  • Still Several restrictions
  • Conditionals, loops, recursion, dependent texture
    writes
  • Typically programmed in GPU-assembler
  • Most recent High-level meta languages
  • E.g. CG (C for GPUs)

26
Streaming Computations on Programmable GPUs
  • Idea Use GPU as streaming co-processor
  • Dont use it for rasterizing at all
  • Pixels form a stream of elements
  • Apply small program (kernel) for whole stream
  • Render screen-aligned quad with a fragment shader
  • Fragment program executed for each screen pixel
  • Each pixel operates on different data
  • Read data from textures
  • Screen-aligned textures 1 texel for each pixel
  • Output to framebuffer 1 pixel for each
    fragment program
  • Feedback Loop Copy framebuffer to textures
  • Future Directly write into textures

27
Ray Tracing on Programmable GPUs
Screen aligned Quad
Memory (Textures)
Fragment
Kernel (Fragment Shader)
Data (Texels)
Output
Frame Buffer
28
Ray Tracing on Programmable GPUs
Screen aligned Quad
Memory (Textures)
Fragment
Kernel (Fragment Shader)
Data (Texels)
Output
Frame Buffer
29
Ray Tracing on Programmable GPUs
Screen aligned Quad
Memory (Textures)
Fragment
Kernel (Fragment Shader)
Data (Texels)
Output
Frame Buffer
30
Ray Tracing on Programmable GPUs
Screen aligned Quad
Memory (Textures)
Fragment
Kernel (Fragment Shader)
Data (Texels)
Output
Frame Buffer
31
Ray Tracing on Programmable GPUs
Screen aligned Quad
Memory (Textures)
Fragment
Kernel (Fragment Shader)
Data (Texels)
Output
Frame Buffer
32
Ray Tracing on Programmable GPUs
Screen aligned Quad
Memory (Textures)
Fragment
Kernel (Fragment Shader)
Data (Texels)
Output
Frame Buffer
33
Ray Tracing on Programmable GPUs
Screen aligned Quad
Memory (Textures)
Fragment
Kernel (Fragment Shader)
Data (Texels)
Output
Frame Buffer
34
Ray Tracing on Programmable GPUs
Screen aligned Quad
Memory (Textures)
Fragment
Kernel (Fragment Shader)
Data (Texels)
Output
Frame Buffer
35
Ray Tracing on Programmable GPUs
Screen aligned Quad
Memory (Textures)
Fragment
Kernel (Fragment Shader)
Data (Texels)
Output
Frame Buffer
36
Ray Tracing on Programmable GPUs
Screen aligned Quad
Memory (Textures)
Fragment
Kernel (Fragment Shader)
Data (Texels)
Output
Frame Buffer
Feedback !
37
Ray Tracing on Programmable GPUs
  • Mapping Ray Tracing to the GPU
  • Use textures for the storing variables
  • Ray origin and direction 2D textures (3
    floats each)
  • Hit 2D texture (3 floats u,v,id)
  • Vertices 1D-texture of vertex positions (3
    floats each)
  • Triangles 1D-texture of vertex ids (1 float
    each)
  • Acceleration structure e.g. 3D-texture for
    simple grid
  • Multiple indirections no problem
  • E.g. use trianglei as texture coordinate into
    vertex texture
  • Up to 4 indirections (grid ? triangle list ?
    triangle ? vertex)

38
Ray Tracing on Programmable GPUs
  • Write kernels for different ray tracing ops
  • Ray Generation
  • Get pixel position from texture coordinates
  • Somehow get camera settings (e.g. from quad
    color, or texture)
  • Compute corresponding ray
  • Write to origin, direction, state textures
  • Triangle Intersection
  • Read triangle ID to be intersected from state
  • Get triangle vertices from textures
  • Intersect
  • Update state texture
  • Similar for traversal, triangle list
    intersection, shading,

39
Ray Tracing on Programmable GPUs
  • Have kernels for ray generation, traversal,
    intersection, etc.
  • Each ray is in exactly one state
  • E.g. in intersection state
  • Make sure only rays in correct state are
    processed
  • E.g. apply intersection kernel only to rays in
    intersect state
  • Usual GL masking methods, e.g. stencil bits,
    early pixel kill etc.
  • ? Can generate overhead, but usually ok
  • Fragment program can change state of ray
  • E.g. change from traversal to intersection in
    non-empty voxel
  • Combine different kernels by just calling them in
    turn
  • E.g. rendering an intersection quad will do one
    intersection step
  • (but only for rays in intersect state !)
  • Secondary rays rel. easy for Shader kernel
  • Update origindirection textures, go back to
    traversal state

40
Ray Tracing on Programmable GPUs
  • Results
  • Easy to exploit parallelism in the GPU
  • Many more pixels than fragment pipelines
  • Comparable performance to single CPU
  • Even though its only a prototype implementation
  • Limited by fragment pipeline very soon
  • Main Limitation
  • Fragment processing speed
  • Texture memory
  • Need many textures for each pixel
  • Also need to store whole scene in texture
  • Bandwidth
  • Number of different states must be small !

41
Ray Tracing on Programmable GPUs
  • Additional limitations of current GPUs
  • Bandwidth problems due to missing loops
  • Often have to write data just to save it for next
    iteration
  • Overhead due to missing write capability
  • Accuracy problems no ints, all floats
  • E.g. rounding modes when reading IDs from a
    texture
  • Problems due to missing dependent writes
  • Many textures for input, but only one framebuffer
    for output
  • Need multiple passes computing more than 3 values
    per pix.
  • Each fragment shader writes to exactly one
    predetermined position
  • Hard to do recursive operations with that
    limitation
  • Kd-tree construction ?

42
Ray Tracing on Programmable GPUs
  • Ray tracing on GPUs in the future ?
  • Many limitations will (probably) change
  • Loops, branches, dependent writes, int textures,
    texture memory, early pixel kill
  • Performance will increase faster than for CPUs
  • ? Might soon be faster, and similarly flexible,
    as ray tracing on a CPU !

43
Realtime Ray TracingApproach IIIDedicated Ray
Tracing Hardware
44
Dedicated Ray Tracing Hardware
  • Relatively low efficiency when using GPU for RT
  • Many units not needed at all (rasterization,
    z-buffer, clipping, lighting, )
  • Lots of overhead
  • Programmable units can never be as efficient as
    dedicated HW
  • Dedicated ray tracing HW should be more efficient
  • Building RT HW is feasible today
  • FPU power not a problem any more
  • (see GForce3 FPU performance)
  • Die size/Nr of transistors not a problem any more
  • Main problem Off-chip bandwidth !
  • Already between chip and cache

45
Dedicated Ray Tracing Hardware
  • Bandwidth Same problem as in SW
  • Approach in SW Bandwidth reduction by Coherent
    Ray Tracing (packet traversal)
  • HW Much larger packets (64x64 vs 2x2 !)
  • Much bigger bandwidth saving
  • Target realtime full-screen resolutions
  • Larger packet sizes not a problem ? Lots of
    coherence
  • Avoiding overhead simple in HW
  • Much simpler than with SSE

46
SaarCOR Architecture
  • Features
  • Based on interactive software ray tracer
  • Exactly same data structures,
  • KD-trees as accelleration structure
  • Pakets of rays to reduce bandwidth
  • Fixed OpenGL-like shading
  • plus shadow and reflection rays
  • Goals
  • Simple low bandwidth memory interface
  • Half the floating point requirements of GeForce3
  • Achieves frame rates comparable to todays
    gfxcards

47
SaarCOR Architecture
System overview
48
SaarCOR Architecture
Features
  • Scalable
  • Fully pipelined
  • Multi threading for latency hiding
  • Simple communication pattern (no routing)
  • Highly asynchronous

49
SaarCOR Current Status
  • Simulation on register-transfer level
  • Core _at_ 533MHz, Memory 64 Bit _at_ 133 MHz (simple
    SD-RAM, no DDR!)
  • Each pipeline uses 36 FP-units
  • Standard SaarCOR
  • 4 pipelines
  • 16 threads per pipe
  • 1 GB/s bandwidth to memory (!)
  • 272 KB for caches (!)
  • Four pipes ½ FP-resources of GeForce 3

50
Issues
  • On-chip memory of standard SaarCOR
  • Caches 272 KB
  • RF for rays 288 KB
  • RF for stack 535 KB
  • Register level simulations only
  • Simple shading only

51
BenchmarksScenes
  • OpenGL-Like Shading
  • No shadow rays
  • No reflection rays
  • Full screen resolution
  • 1024 x 768 pixel

52
Benchmarks Scenes (2)
53
BenchmarksResults
Todays CPUs 0.5 0.8 mrays/s ? factor of
100-200!
54
BenchmarksResults (3)
  • Efficiency of standard SaarCOR

16 threads ? 32 threads 10
Performance scales with number of pipelines,
threads, cache size and bandwidth.
55
What about shading?
  • Right now Shading only coarsely approximated
  • Fixed phong shader w/ bilinear texturing
  • Programmable Shading currently evaluated
  • Shading packets of rays exploits coherence
  • BQD scene with bilinear textures
  • 14 MB for shading data per frame
  • 300 600 MB/s bandwidth
  • Shading BW Ray Tracing BW

56
Conclusions
  • SaarCOR architecture
  • Scales well in the numberof pipelines
  • Highly efficient
  • Uses half the FP power of GeForce3
  • Requires very low bandwidth
  • Provides full featured ray tracing
  • Same frame rates as todays graphics cards

57
Current Work
  • Programmable shading
  • API OpenRT Wald02
  • Virtual Memory Management
  • Incorporate Features and Algorithms from SW
    system
  • Large Models Wald01
  • Dynamic scenes Wald02
  • Global Illumination Wald02
  • Building a prototype

58
Realtime Ray TracingApproaches I-IIISummary
and Conclusions
59
Realtime Ray Tracing
  • Summary
  • Different upcoming (and competing !)
    architectures.
  • All these have different advantages /
    disadvantages
  • PC clusters most flexible, but not useful for
    consumer market
  • GPUs better performance growth, cheap, but
    awkward to use
  • HW best performance, best efficiency, but costly
  • ? Cannot yet predict which one will win

60
Realtime Ray Tracing
  • Summary
  • Different upcoming (and competing !)
    architectures.
  • All these have different advantages /
    disadvantages
  • PC clusters most flexible, but not useful for
    consumer market
  • GPUs better performance growth, cheap, but
    awkward to use
  • HW best performance, best efficiency, but costly
  • ? Cannot yet predict which one will win
  • But
  • Question is not will realtime ray tracing ever
    come ?
  • Questions rather is how and when will it come.

61
End of Part I - Questions ?
62
Part IIAdvanced Ray Tracing Issues
63
Advanced Ray Tracing Issues
  • Conclusions from Part I Realtime Ray Tracing
    will come
  • Problem All these architectures mostly focus
    only on the core ray tracing algorithms, i.e.
    traversal intersection
  • Ubiquitous Realtime Ray Tracing opens new
    problems
  • Dynamic Scenes ?
  • Suitable API(s) ?
  • Implications for future Applications / SceneGraph
    libraries ?

64
Interactive Ray Tracing
  • So far
  • Interactive RT possible even today, can already
    beat SGI/NVidia
  • Complex models
  • High-Quality Applications
  • Can do high-quality, interactive walkthroughs
  • But Walkthrough is not really interactive
  • Not if scene remains static

65
Issue I Dynamic Scenes
  • Fact Ray Tracing needs acceleration structure
  • Building it is very costly
  • Precomputation only works for static scenes
  • But Real scenes usually arent static
  • ? What is interactive if I cannot interact
    with it ?
  • Problem Few research on this topic
  • Just wasnt interesting before interactive ray
    tracing
  • Previous work Usually on special cases
  • Utah Hack Keep dynamic objects out of accel
    structure
  • Reinhard RW2001 Incremental updates of Uniform
    Grid
  • Costly, not hierarchical
  • Moeller, EG2001 Only rigid-body animation

66
Handling Dynamic Scenes
  • Different kinds of dynamic behavior
  • Hierarchical, rigid-body motion vs unstructured
    motion
  • Constrained unstructured motion (e.g. maximum
    displacement)
  • All triangles animated vs few triangles animated
  • Amortized over many rays/frames or over few rays

67
Handling Dynamic Scenes
  • Different kinds of dynamic behavior
  • Hierarchical, rigid-body motion vs unstructured
    motion
  • Constrained unstructured motion (e.g. maximum
    displacement)
  • All triangles animated vs few triangles animated
  • Amortized over many rays/frames or over few rays
  • Inherently different problems need different
    solutions
  • One single algorithm will hardly do the job

68
Handling Dynamic Scenes
  • Alternative approach
  • Offer suite of different techniques
  • Hierarchical animation of whole objects
  • Fast Rebuild of objects for unstructured motion
    (with sacrifices in traversal speed)
  • High-quality bsps for often-used static objects
    (with relatively long rebuild time)
  • Let the application decide, which one is best for
    what !
  • If anybody knows whats best, its the
    application programmer
  • Just like OpenGL Applications build display
    lists, not the drivers !
  • Allow combination of techniques
  • E.g.some unstructured motion but otherwise
    hierarchically animated
  • ? App needs good API to do that !

69
Handling Dynamic Scenes
  • Combining techniques in a hierarchical way
  • Application groups geometry into objects
  • Similar to building display lists (?API)
  • Each object has separate BSP (just like
    PowerPlant)
  • Hints can be given to control quality/speed
    tradeoff
  • E.g. whether the object will be static or
    unstructured
  • Objects can be instantiated
  • Just like calling a display list (? API)
  • Hierarchical animation Just re-instantiate with
    new transform
  • Objects are kept in additional hierarchy level
  • With separate, fast and high-quality BSP
  • During traversal, just transform the rays when
    they hit an object

70
Handling Dynamic Scenes- Results
  • Side Effect Instantiation is for free
  • Terrain 1000 instances of 20ktri-tree 20 Mtri
    (and dynamic !)
  • Sunflowers 36.000 x 24ktri-sunflowers 1 GigaTri
    (dynamic !)
  • TopLevel BSP reconstruction tolerable
  • Some milliseconds even for a few thousand objects
  • But scalability bottleneck (redundant
    computation on each client)
  • Hierarchical animation is cheap
  • Transformations are cheap (compared with the
    rest)
  • But Unstructured motion still costly
  • Especially for big objects (? have to use
    low(er)-quality BSPs)
  • High bandwidth requirements for sending data over
    network !!!
  • Tolerable for moderately complex objects
    (16k-64ktri)
  • In practice Total overhead usually 10-20

71
Handling Dynamic Scenes- Conclusions
  • Works for many different scenes (BART Benchmark
    suite)
  • Robots Game-like scene, hierarchical animation
    of 161 Objects
  • Kitchen Mostly static, with many secondary
    effects
  • Museum Completely unstructured motion
  • Correct (inter-)reflections, shadows, etc. also
    on moving triangles !
  • Also works for all applications we have built so
    far
  • OpenRT based VRML97 viewer with VRML animations
  • Inventor-port under way
  • Dynamic scenes in Interactive Global Illumination
    application

72
Handling Dynamic Scenes- Results
73
Handling Dynamic Scenes- Results
Video
74
Handling Dynamic Scenes- Remaining Problems
  • Lots of potential for future research !
  • Faster kd-tree generation ?
  • Kd-tree generation in HW ?
  • On-demand generation of kd-trees ?
  • More efficient solutions for special problems
  • Skinning, morphing, progressive meshes,

75
Issue II API Issues
  • So far
  • Fast, cheap, efficient,
  • Flexible, powerful shading
  • Can do big models and dynamic scenes,
  • So why is nobody using it ?

76
Issue II API Issues
  • So far
  • Fast, cheap, efficient,
  • Flexible, powerful shading
  • Can do big models and dynamic scenes,
  • So why is nobody using it ?

Because without a proper API, you cant !
77
Issue II API Issues
  • Why do we need an API for Interactive Ray Tracing
    ?
  • Side Effect An API helps to divide-n-conquer
    problems (e.g. shaders, globillum, raytracing
    kernels, )
  • E.g., can work separately on frontend and
    backend
  • Can Abstract from dynamic scene issues in
    globillum shader aso.
  • It helps to create a critical mass of users
  • Rasterization only really took off after OpenGL
  • Enables code portability
  • Without an API, nobody will (or can) use it -
    except insiders
  • Not everybody has his own realtime raytracer
  • Not everybody wants to - or should - know all
    implementation details
  • ? For widespread Realtime Ray Tracing, we do need
    an API

78
Issue II API Issues
  • Problem There are no suitable APIs
  • API has to support both interactive and ray
    tracing
  • OpenGL interactive, but not suitable for ray
    tracing
  • Renderman/Rayshade/Povray ray tracing capable,
    but inherently offline
  • ? Need to find new API(s)

79
Issue II API Issues
  • Goals for an Interactive Ray Tracing API
  • As easy to learn and use as (standard) OpenGL
  • Leverage existing programmers experience with
    OpenGL
  • As powerful in Shading as RenderMan
  • Our Approach (OpenRT) Combine the best of both
  • Application API much like OpenGL/GLUT
  • With necessary modifications for Ray Tracing
    (Shaders, Objects)
  • Shader API like RenderMan

80
The OpenRT Interactive Ray Tracing API
  • Application API very OpenGL-like
  • Geometry rtVertex3f, rtNormal3f,
  • Primitives rtBegin/End(RT_TRIANGLES, RT_QUAD, )
  • Transformation rtPushMatrix(), rtMatrixMode(),
  • Geometry Objects
  • Just like Display Lists (except no side effects)
  • rtNewObjects(), rtBeginObject(), rtEndObject(),
    rtInstantiate(),
  • Shader Objects
  • Surface, Light, and Pixel Shaders, exchangeable
    Renderer Object
  • Even support GLUT-like functionality
  • Porting GL/GLUT-applications relatively easy
  • (except multi-pass, of course, )

81
The OpenRT Interactive Ray Tracing API
  • Shader Objects
  • Similar to Stanford Programmable Shading API
  • Dynamically loaded from DLLs/.sos
  • rtShaderFile(), rtCreateShader(), rtBindShader()
  • Light shaders rtCreateShader(), rtUseLight()
  • Application-to-Shader communication via Shader
    Parameters
  • rtDeclareParam(), rtParameterHandle(),
    rtParameter3f(),
  • Parameters can be per vertex, per triangle, per
    shader,
  • Retained-Mode / Frame Semantics
  • Rendering uses Shader Parameters active at end
    of frame
  • NOT at the time that shader/triangle was created
  • Actual rendering triggered at rtSwapBuffers
  • Rendering always done asynchronously

82
The OpenRT Interactive Ray Tracing API
  • Shader API Or how to write a shader
  • Declare and Export Shader Parameters
  • Store as member variables
  • Write callback-functions
  • Shade(), Illuminate(),
  • Access Scene Data with RenderMan like API
  • Geometry rtsShadingNormal(),
  • Lights rtsIlluminate(), rtsOccluded(),
    rtsLightTransparency(),
  • Shoot Arbitrary Secondary Rays
  • rtsTrace()
  • ? Porting RenderMan shaders relatively easy, too

83
The OpenRT Interactive Ray Tracing API
  • OpenRT Summary
  • Fast and Interactive Rendering
  • Dynamic Scenes
  • Very Powerful Shading
  • API for using it
  • OpenRT is a complete 3D Rendering Engine
  • Kernel behind OpenRT Saarland RTRT
  • Might be changed to e.g. SaarCOR as soon as
    available

84
OpenRT Example 1VRML97 _at_OpenRT
  • Example 1 VRML97 Viewer ported from OpenGL
  • Porting relatively easy, almost all functionality
    was there
  • Only Modification Have to gather small objects
    into fewer bigger objects for performance
    reasons
  • Results
  • Can render all of VRML97
  • Almost no matter how big
  • Can put any kind of shader on any triangle (e.g.
    GlobIllum)
  • Can do VRML animations, move objects, edit
    shaders lights

Car Headlight, 800.000 tri
Soda hall Floor 400.000 tris
85
OpenRT Example 2The BART Benchmark
  • Example 2 The BART Benchmark scenes
  • To our knowledge, only system so far to render
    those at all
  • All different kind of dynamic behavior, including
    reflections, refractions, shadows,
  • With GL Shader gt 10 frames per second
  • With Raytracing Shader 2-5 frames per second

86
OpenRT Example 3Complex Outdoor Scene
  • Example 3 Massive Instantiation for Outdoor
    Scenes
  • Pixel-accurate shadows !

87
OpenRT Example 3Complex Outdoor Scene
88
OpenRT Example 4Massive Model Visualization
  • Example 4 The PowerPlant
  • 12.5-37.5 million triangles
  • Currently With replication, without
    demand-loading/reordering
  • Just recently Can now also move the furnace -)

89
OpenRT Example 5Complex Shading Stress Test
90
OpenRT Example 5Complex Shading Stress Test
  • Example 5 Shading Stress Test
  • Volume Shader (CT Head)
  • Applied to a box of geometry
  • Lightfield Shader on simple quad
  • Procedural Wood and Marble
  • Procedural Bump-Mapping on mirror
  • ? Procedurally bump-mapped reflections
  • Result Everything combines perfectly
  • Transparent Shadow from Volume on Procedural Wood
    Shader
  • Lightfield reflected in procedurally bump-mapped
    mirror
  • attenuated by semi-transparent volume
  • Multiple interreflections
  • Of course, everything is interactive and fully
    dynamic

91
OpenRT Example 5Complex Shading Stress Test
92
OpenRT Example 6Interactive Global Illumination
Implementation Not now
93
OpenRT Example 6Interactive Global Illumination
  • Fully implemented in OpenRT
  • GlobIllum Application is Shader like any other
  • Automatically inherit capability for handline
    dynamic scenes, distribution,
  • Same frontend as e.g. BART/Office
  • Automatically inherit parser, user interface,
    etc
  • Can be used from different applications (e.g.
    VRML viewer)
  • Algorithms Implementation Later (Part III)

94
Questions ?
  • For more info, also visit
  • http//www.OpenRT.de

95
Part IIINew Applications enabled by Realtime
Ray Tracing
96
For more information on OpenRT,
seehttp//www.OpenRT.de
97
The Saarland Interactive Ray Tracing Project
  • Started Jan 1st, 2000
  • (Original) Goal
  • Evaluate practicability of RT as an Interactive
    Rendering Engine
  • Do a fair comparison and analysis of RT vs GL
  • What are the advantages and disadvantages ?
  • Compare on common ground OpenGL
    likeshadowsreflections
  • No global illumination, no shading, no advanced
    features,
  • And Find out why is it so slow
  • Therefore, needed to build Fast Ray Tracer

98
The Saarland Interactive Ray Tracing Project
  • Goals have constantly changed since then
  • It worked, so continue working on it
  • One CPU not fast enough, so distribute
  • Great for many triangles, so work on really large
    models
  • People demand high quality, build full-featured
    ray tracer
  • If its good in Software, why not build it in
    hardware
  • Static scenes too limiting, make it dynamic
  • Others want to use it, so build an API
  • And, if we have it anyway, why not do global
    illumination

99
Ray Tracing on Programmable GPUs
  • Application program relatively easy
  • Just render many screen-aligned quads with
    different fragment shaders
  • Need some way of load balancing
  • Want to not execute shade kernel if no rays is
    in shade state
  • Important Approach is not SIMD
  • 1 Quad (1 fragment program) for whole screen,
    but
  • Different rays can be in different states
  • Different pixels in fact behave differently
  • No problem to already shade pixel 2 while still
    intersecting pixel 1

100
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com