Title: Using FPGAs to Supplement RayTracing Computations on the Cray XD1
1Using FPGAs to Supplement Ray-Tracing
Computations on the Cray XD-1
United States Naval Academy Department of
Electrical Engineering United States Naval
Academy 105 Maryland Avenue, Stop 14B Annapolis,
Maryland 21402-5025
- Research supported by
- NASA Goddard Space Flight Center (Code 586)
- NRL Applied Optics Branch (Code 5630)
- DoD High Performance Computing Modernization
Program at NRL (Code 5593) - United States Naval Academy
- Xilinx, Inc.
2Topics
- Ray tracing
- Conventional parallel processing
- Modulo scheduling
- Coordination of sequential and parallel
processing - Expected Performance
3Ray tracing
- MODIS
- Moderate-resolution Imaging Spectroradiometer
- The Intersection Problem
- Finding the Perpendicular
- Refraction
- Reflection
4MODIS Optical System (Moderate-resolution
Imaging Spectroradiometer)
5MODIS Optical System
- 485 pinholes
- 400 rays per pinhole
- 241 121 rays reflected from the diffuser
- 5.66 109 rays
6Ray Directed to a Surface
- MODIS
- Moderate-resolution Imaging Spectroradiometer
- The Intersection Problem
- Finding the Perpendicular
- Refraction
- Reflection
- Coordinate Transformation
7Calculate the Intercept Point
- MODIS
- Moderate-resolution Imaging Spectroradiometer
- The Intersection Problem
- Finding the Perpendicular
- Refraction
- Reflection
- Coordinate Transformation
8Find the Normal
- MODIS
- Moderate-resolution Imaging Spectroradiometer
- The Intersection Problem
- Finding the Perpendicular
- Refraction
- Reflection
- Coordinate Transformation
9Find the Refracted Ray
- MODIS
- Moderate-resolution Imaging Spectroradiometer
- The Intersection Problem
- Finding the Perpendicular
- Refraction
- Reflection
- Coordinate Transformation
10Find the Reflected Ray
- MODIS
- Moderate-resolution Imaging Spectroradiometer
- The Intersection Problem
- Finding the Perpendicular
- Refraction
- Reflection
- Coordinate Transformation
11Coordinate Transformation
- MODIS
- Moderate-resolution Imaging Spectroradiometer
- The Intersection Problem
- Finding the Perpendicular
- Refraction
- Reflection
- Coordinate Transformation
- (Hard to visualize this!)
12Topics
- Ray tracing
- Conventional parallel processing
- Modulo scheduling
- Coordination of sequential and parallel
processing - Expected Performance
13Parallelism
14Performance (5.66 109 rays)
99.998
5,857
Rate based on a linear regression of results
obtained using a varying numbers of processors.
15Performance (5.66 109 rays)
16Efficiency
17Topics
- Ray tracing
- Conventional parallel processing
- Modulo scheduling
- Coordination of sequential and parallel
processing - Expected Performance
18Operations Required as a Function of Surface,
Aperture, and Interaction Types
Lots of these
Not too many of these
19Quadratic Equation
Latency
Critical Path (Data-Flow Limit) 88 cycles
20Modulo SchedulingOne Multiplier
21Modulo SchedulingOne Multiplier
22Modulo SchedulingOne Multiplier
23Modulo SchedulingOne Multiplier
24Modulo SchedulingOne Multiplier
25Modulo SchedulingOne Multiplier
26Modulo SchedulingOne Multiplier
27Modulo SchedulingOne Multiplier
Equal to the Data-Flow Limit
28Modulo SchedulingFilling the Pipeline
One collective computation
29Modulo SchedulingFilling the Pipeline
30Modulo SchedulingFilling the Pipeline
Multipliers are 100 utilized
No schedule conflicts
31Modulo SchedulingTwo Multipliers
Two multipliers with two multiplications each
32Modulo SchedulingTwo Multipliers
One adder with two additions
Two cycles
Maximum efficiency
33Modulo SchedulingTwo Multipliers
Improved efficiency Up from 25
34Modulo SchedulingTwo Multipliers
35Modulo SchedulingTwo Multipliers
36Modulo SchedulingTwo Multipliers
Less than the Data-Flow Limit
37Modulo SchedulingTwo Multipliers
Less than the Data-Flow Limit, but double the
throughput.
38Topics
- Ray tracing
- Conventional parallel processing
- Modulo scheduling
- Coordination of sequential and parallel
processing - Expected Performance
39Cray XD-1
- MPI (Message Passing Interface)
- Master node
- Reads file
- Distributes file
- Collates results
40One Node of the Cray XD-1
- Open MP (Multi Processing)
- 144 of 220 nodes have a Xilinx Virtex II Pro FPGA
- Opteron processors
- Sequential program
- Depth first
- FPGA
- Pipelined hardware
- Breadth first
41Topics
- Ray tracing
- Conventional parallel processing
- Modulo scheduling
- Coordination of sequential and parallel
processing - Expected Performance
42Performance
43Performance
44Performance
45Performance
46Summary
- Modulo scheduling produces 100 efficiency of
critical resources. - Sequential processors get a boost from
supplemental FPGA processing. - Deep pipelines are efficient only if filled much
of the time. - FPGAs beat ASICs only if they can take advantage
of special problem knowledge. - Opteron uses 55 W.
- Virtex II Pro FPGA uses 4 W to 45 W.
47Equations
- Intersection of a Ray with a Plane
- Intersection of a Ray with a Sphere
- Intersection of a Ray with a Conicoid
- Finding the Perpendicular
- Interaction of a Ray with an Optical Surface
- Coordinate Transformations
48Intersection of a Ray with a Plane
Point in the plane
Initial direction
Final point
Initial point
Normal to the plane
List of equations
49Intersection of a Ray with a Sphere
Initial direction
Initial point
Final point
List of equations
50Intersection of a Ray with a Conicoid
Final point
Initial point
Initial direction
List of equations
51Finding the Perpendicular
Unit Vector Normal to a Sphere
Unit Vector Normal to a Conicoid
List of equations
52Interaction of a Ray with an Optical Surface
Refraction
Reflection
Initial index of refraction
Final index of refraction
Normal to the plane
Initial direction
List of equations
Final direction
53Coordinate Transformations
Position in Frame of Reference k
Position in Frame of Reference k1
Rotation and Translation
Rotation
Direction in Frame of Reference k1
Rotation Matrix
Direction in Frame of Reference k
Translation Vector
List of equations