RealTime Parallel Radiosity - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

RealTime Parallel Radiosity

Description:

Specular lighting is ... The specular exponent may be evaluated using a power ... Specular makes things much more difficult. I will only discuss ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 43
Provided by: mattcra
Category:

less

Transcript and Presenter's Notes

Title: RealTime Parallel Radiosity


1
Real-Time Parallel Radiosity
  • Matt Craighead
  • May 8, 2002
  • 6.338J/18.337J, Course 6 AUP

2
What is Radiosity?
  • A computer graphics technique for lighting
  • Two types of lighting algorithms
  • Local easy, fast, but not realistic
  • Global slow, difficult, but highest quality
  • Radiosity is a global algorithm
  • Global algorithms try to take into account
    interreflections in scenes.

3
Radiosity in Real Time?
  • Local algorithms can easily run in real time,
    either in software or hardware.
  • Computational demands grow with number of
    surfaces and number of lights. 106 surfaces is
    not unreasonable!
  • Most global algorithms take quadratic time in
    number of surfaces (all interactions!).

4
Local Doesnt Mean Bad
  • Id Softwares Doom 3 (video capture)

5
but Global is Better
  • State-of-the-art radiosity rendering from 1988 (5
    hours render time!)

6
Local Lighting Math
  • Consider a light source at some point in
    three-dimensional space, and a surface at some
    other location.
  • How much does the light directly contribute to
    the brightness of the surface?
  • Note that global algorithms still need to do
    local lighting as a first step.

7
Local Lighting Math
  • Set up a 3-dimensional coordinate system centered
    on the surface. Define unit vectors L (light), N
    (normal), E (eye)
  • L points towards the light.
  • N is perpendicular to the surface.
  • E points towards the viewer.

N
E
L
8
Local Lighting Math
  • If an object sits between the light and surface,
    the lighting contribution is zero.
  • Otherwise
  • Surfaces generally reflect light in two ways
    diffuse (dull) and specular (shiny).
  • We can see three colors red, green, blue.
  • So let Md, Ms be 3-vectors indicating how much of
    R,G,B are reflected in each way.

9
Local Lighting Math
  • Diffuse lighting is independent of view angle.
    It is brightest when N and L are most closely
    aligned, and falls off with the cosine of the
    angle between them.
  • All lighting also falls off with the square of
    the distance from the light source.
  • So, the diffuse term is d-2Md(N L).

10
Local Lighting Math
  • Specular lighting is view-dependent. In one
    simple formulation (Blinn shading), we determine
    the vector H halfway between E and L and evaluate
    d-2Ms(N H)s, where s indicates the shininess of
    the surface.

H
N
E
L
11
Local Lighting Math
  • The easiest way to think of H, the half-angle
    vector, is that if H and N were aligned, and the
    surface were a mirror, then the light would
    reflect straight to the eye.
  • (N H)s can be thought of as representing a
    probability distribution of microfacets, whose
    normals are clustered around N but do vary.
    Smoother surfaces have higher s.

12
Local Lighting Math
  • So, our full contribution from a light source is
    d-2(Md(N L) Ms(N H)s).
  • This may also be multiplied by a light color.
  • We may also add in an emissive term Me for
    glowing objects.
  • If the lack of interreflection makes things too
    dark, we may add an ambient term LaMd.

13
Approximations
  • We may not compute this formula at every pixel on
    the screen, but only at the vertices of the
    object instead.
  • The specular exponent may be evaluated using a
    power approximation rather than a real power
    function.
  • The specular formula is already a cheesy hack

14
More Approximations
  • Shadows are hard. At the cost of much realism,
    they can be omitted or faked.
  • Draw a little dark spot under an object
  • Ambient is itself an approximation of real global
    illumination.
  • N L and (N H)s are idealizationsin reality,
    this can be an arbitrary 4-dimensional function
    called a BRDF.

15
Radiosity
  • Radiosity is a global algorithm that handles
    diffuse lighting only.
  • The term global illumination refers to global
    algorithms that handle specular lighting also.
  • Specular makes things much more difficult.
  • I will only discuss plain radiosity.

16
Radiosity in a Nutshell
  • Suppose there are n surfaces in the scene.
  • Let Ai be the area of surface i.
  • Let Ei be the amount of light energy emitted from
    surface i per unit time and area.
  • Let Bi be the amount of light energy emitted
    and/or reflected from surface i per unit time and
    area.
  • Let ?i be the diffuse albedo of surface i.

17
Radiosity in a Nutshell
  • Now, let Fij (called a form factor) be the
    fraction of light from surface i that reaches
    surface j.
  • Then, for all i, we must have
  • AiBi AiEi ?i ?j ? 1,n AjBjFji
  • This is just a linear system with n equations and
    n unknowns. Solve and you get the Bs.

18
That Easy?
  • Well, not quite that easy.
  • Solving the system of equations is O(n3).
  • So well iterate instead.
  • We still have to do local lighting.
  • These become the Eis.
  • We have to compute the Fij terms somehow.
  • This turns out to be expensive.
  • If the scene is static, precompute!

19
Computing Form Factors
  • Fij turns out to be a big ugly integral over the
    area of both i and j.
  • Worse, one of the terms in the integral is
    whether the two dAs can see one another!
  • So, no closed-form solution.
  • Standard numerical integration is no good. A
    raycast per sample takes too long.

20
Computing Form Factors
  • The usual solution is called the hemicube
    algorithm.
  • Render the scene from the point of view of the
    surface, in all directions. In effect, you are
    projecting onto a hemicube.
  • Count up the number of times you can see each
    surface (weighted appropriately).
  • Takes advantage of 3D acceleration!

21
Hemicube Algorithm in Action
22
Simplified Radiosity Equation
  • It so happens that FjiAj FijAi.
  • This is a simple property of the integral for F.
  • So we can simplify the radiosity equation
  • AiBi AiEi ?i ?j ? 1,n AjBjFji
  • AiBi AiEi ?i ?j ? 1,n AiBjFij
  • Bi Ei ?i ?j ? 1,n BjFij
  • B E RB (where Rij ?iFij)

23
Solving the Radiosity Equation
  • B E RB is just a matrix/vector equation.
  • Direct solution B (I ? R)-1E
  • Iterative solution
  • If E is a local lighting solution, then call it
    B0.
  • Now let Bi1 B0 RBi.
  • Then, Bi is simply the lighting solution after i
    bounces! Since ?i lt 1 for all i (conservation of
    energy), the Bis converge to B.

24
Iterative vs. Direct Radiosity
  • If F is a dense matrix, then direct solution
    takes time O(n3), while k steps of iteration
    takes time O(kn2).
  • Realistically, k is just a constant say, 5.
  • Iterative solution is practical with n ranging up
    to the hundreds of thousands!
  • Iteration time is proportional to the number of
    nonzero entries of F.

25
Sparsity of Form Factors
  • In a simple cube-shaped room, all form factors
    will be nonzero except for pairs on the same
    wall.
  • So F is 5/6 nonzero. Not very encouraging
  • As more objects are added, F becomes sparser.
  • As the scene expands beyond one room, F becomes
    much sparser.
  • So iterative radiosity scales extremely well!

26
Storage and Precision
  • Radiosity hogs memory.
  • If you grid a cube-shaped room 100x100 on each
    wall, and store F as a dense matrix of floats,
    thats 14.4 GB. (!!!)
  • Storing F as sparse helps a lot.
  • Good for iteration speed too.
  • Using smaller values than floats helps too.
  • 16-bit fixed-point is good enough.

27
RLE Encoding of Form Factors
  • F tends to have runs of zeros and nonzeros.
  • Smart traversal order of grids makes the runs
    longer.

Bad
Good
28
RLE Encoding of Form Factors
  • My disk storage format
  • 1 byte run length, run type
  • Length up to 84, type is zero, ?255, or ?65535
  • Then, variable bytes with run data
  • Compression ratio for my scene 5.971
  • My memory storage format
  • 2 bytes run length up to 65535
  • 2 bytes run type (zero or ?65535)
  • 2N bytes run data
  • Compression ratio for my scene 2.491

29
Parallelization
  • Split up the surfaces among the CPUs.
  • Each CPU owns those rows of the form factor
    matrix.
  • Each CPU computes its surfaces local lighting.
  • Every iteration requires an all-to-all
    communication, so that each CPU has the full B
    vector.
  • At present, my storage of F is unbalanced
    compression ranges from 4.31 to 1.61.

30
Radiosity Iteration Kernel
  • Hand-written MMX assembly code (only main inner
    loop shown)

inner_loop prefetchnta ebx128
prefetchnta eax128 movq mm4, eax
pshufw mm7, mm4, 0xFF pshufw mm6, mm4, 0xAA
pshufw mm5, mm4, 0x55 pshufw mm4, mm4,
0x00 pmulhuw mm4, ebx pmulhuw mm5,
ebx6 pmulhuw mm6, ebx12 pmulhuw
mm7, ebx18
paddw mm0, mm4 paddw mm1, mm5 paddw
mm2, mm6 paddw mm3, mm7 add eax, 8
add ebx, 24 dec esi jnz inner_loop
31
Radiosity Kernel Performance
  • Timed 8 CPUs doing 500 iterations.
  • Portable C fixed-point kernel 38.04 s
  • Optimized MMX kernel 21.64 s
  • On most loaded CPU, works out to 528 million
    multiply-adds/s for the C version, 1.24 billion
    multiply-adds/s for MMX.
  • But MMX code wastes 25 of them, so real rate is
    928 million.

32
Local Lighting Implementation
  • One raycast is required for each quad, for each
    light source! This can be expensive.
  • To accelerate raycasts, I made a simplified
    version of my scene that was virtually
    indistinguishable for raycasting purposes.
  • 13028 quads reduced to 120 polys, 110 cylinders
  • Some cylinders used as geometry, some as bounding
    volumes

33
Overall Performance
  • Again, 8 CPUs on 500 iterations
  • Iteration only 21.64 s
  • Communication only 5.96 s
  • Iteration plus communication 26.84 s
  • All computation 65.7 s
  • All computation plus communication 64.38 s

34
Remarks on Performance
  • The communication overlaps very well with the
    computation, to the point that it is actually a
    speedup. (!)
  • MPI_Isend, MPI_Irecv are essential to achieving
    this.
  • The O(n) local lighting computation is actually
    taking much longer than the O(n2) radiosity
    computation.
  • Local lighting is only pseudo-O(n), because of
    the raycast costalthough for large scenes,
    raycast cost should still be much less than
    O(n2), due to other optimizations.

35
Radiosity Frontend
  • Separate client application that runs on a
    Windows PC with OpenGL acceleration.
  • Radiosity solver running on cluster is server.
  • Original plan was that the frontend would send
    the scene to the server, and the server would use
    the scene provided.
  • Since the cluster has no OpenGL acceleration, I
    was reluctantly forced to precompute form
    factors.
  • All aspects of scene except form factors still
    sent to the server by the client form factors
    are read from disk.

36
Client/Server Architecture
  • User precomputes form factors and FTPs them to
    the Beowulf.
  • Server listens on beowulf.lcs.mit.edu5353.
  • Client connects and sends scene information.
  • Server reads form factors off of disk.
  • Both open a network thread.
  • Server streams radiosity to client via TCP/IP.
  • Computation, rendering, and communication are
    completely decoupled.

37
Frontend Features
  • Per-vertex or per-pixel local lighting, with
    local viewer. Optional specular.
  • Shadows implemented using stencil buffer.
  • Display radiosity input/output (E and B).
  • Bilinear filtering of radiosity solutions on
    grid-shaped surfaces.
  • Ultra-high-quality mode where radiosity is used
    for indirect lighting only.

38
Demonstration of Full System
39
Demonstration of Full System
40
Demonstration of Full System
41
Work Yet to be Done
  • More complex scene would be nice. This one is
    13K quads, and I should be able to do 50K.
  • More optimization work on raycasting.
  • Better load balancing.
  • Optimization of some modes on frontend, so they
    run reasonably on my laptops GeForce2 Go, not
    just on a GeForce4 Ti
  • Alleviation of ugly banding on certain lighting
    modes caused by 8-bit-per-component precision.

42
Conclusions
  • Real-time radiosity is feasible.
  • Not tomorrow, but today.
  • If todays cluster is tomorrows desktop,
    real-time radiosity could start showing up in
    real applications not too many years from now.
  • Biggest limitation may be the ability to compute
    form factors efficiently.
  • Faster graphics hardware will make this happen.
Write a Comment
User Comments (0)
About PowerShow.com