Sort-First,%20Distributed%20Memory%20Parallel%20Visualization%20and%20Rendering%20Wes%20Bethel,%20R3vis%20Corporation%20and%20Lawrence%20Berkeley%20National%20Laboratory

About This Presentation

Title:

Sort-First,%20Distributed%20Memory%20Parallel%20Visualization%20and%20Rendering%20Wes%20Bethel,%20R3vis%20Corporation%20and%20Lawrence%20Berkeley%20National%20Laboratory

Description:

Enter Chromium the means to use a bunch of PCs to do parallel rendering. ... To Chromium, a display list is an opaque blob of stuff. ... – PowerPoint PPT presentation

Number of Views:55

Avg rating:3.0/5.0

Slides: 24

Provided by: melissa4

Learn more at: http://graphics.stanford.edu

Category:

more less

Transcript and Presenter's Notes

Title: Sort-First,%20Distributed%20Memory%20Parallel%20Visualization%20and%20Rendering%20Wes%20Bethel,%20R3vis%20Corporation%20and%20Lawrence%20Berkeley%20National%20Laboratory

1
Sort-First, Distributed Memory Parallel
Visualization and RenderingWes Bethel, R3vis
Corporation and Lawrence Berkeley National
Laboratory

Parallel Visualization and Graphics Workshop
Sunday October 18, 2003,
Seattle, Washington

2
The Actual Presentation Title

Why Distributed Memory Parallel Rendering is a
Challenge Combining OpenRM Scene Graph and
Chromium for General Purpose Use on Distributed
Memory Clusters
Outline
Problem Statement, Desired Outcomes
Sort-First Parallel Architectural Overview
The Many Ways I Broke Chromium
Scene Graph Considerations
Demo?
Conclusions

3
Motivation and Problem Statement

The Allure of COTS solutions
Performance of COTS GPUs exceed that of custom
silicon.
Attractive price/performance of COTS platforms
(e.g., x86 PCs).
Gigabit Ethernet is cheap 100/NIC, 500 8-port
switch.
Can build a screamer cluster for about 2K/node.
Were accustomed to nice, friendly software
infrastructure. E.g., hardware accelerated
Xinerama.
Enter Chromium the means to use a bunch of PCs
to do parallel rendering.
Parallel submission of graphics streams is a
custom solution, and presents challenges.
Want a flexible, resilient API to interface
between parallel visualization/rendering
applications and Chromium.

4

5
Our Approach

Distributed memory parallel visualization
application design amortizes expensive data I/O
and visualization across many nodes.
The scene graph layer mediates interaction
between the application and the rendering
subsystem portability, hide the icky parallel
rendering details, provides an infrastructure for
accelerating rendering.
Chromium provides routing of graphics commands to
support hardware accelerated rendering on a
variety of platforms.
Focus on COTS solutions all hardware and
software we used is cheap (PC cluster) or free
(software).
Focus on simplicity our sample applications are
straightforward in implementation, easily
reproducible by others and highly portable.
Want an infrastructure that is suitable for use
regardless of the type of parallel programming
model used by the application
No parallel objects in the Scene Graph!!!!!

6
Our Approach, ctd.
7
The Many Ways I Broke Chromium

Retained mode object namespace collisions in
parallel submission.
Broadcasting how to burn up bandwidth without
even trying!
Scene Graph Issues to be discussed in our PVG
paper presentation on Monday.

8
The Collision Problem

Want to use OpenGL retained-mode semantics and
structures to realize performance gains in DM
environment
Problem
Namespace collision of retained-mode
identifiers during parallel submission of
graphics commands.
Example
The problem exists for all OpenGL retained mode
objects display lists, texture objects and
programs.
The problem extends to all OpenGL retained mode
objects display lists, texture object ids,
programs.

Process A GLuint n glNewList(1) printf(
idd\n) // id0 // build list, draw with list
Process A GLuint n glNewList(1) printf(
idd\n,n) // id 0 // build list, draw with
list
9
Manifestation of Collision Problem

Show image of four textured quads when the
problem is present.

10
Desired Result

Show image of four textured quads when the
problem is fixed.

11
Resolving the Collision Problem

New CR configuration file options
shared_textures, shared_display_lists,
shared_programs
When set to 1, beware of collisions in parallel
submission.
When set to 0, collisions resolved in parallel
submission.
Using shared_ to zero will enforce unique
retained mode identifiers across all parallel
submitters.
Thanks, Brian!

12
The Broadcast Problem

Whats the Problem?
Geometry and textures from N application PEs is
replicated across M crservers. Bumped into limits
of memory and bandwidth.
To Chromium, a display list is an opaque blob of
stuff. Tilesort doesnt peek inside a display
list to see where it should be sent.
Early performance testing showed two types of
broadcasting
Display lists being broadcast from one tilesort
to all servers.
Textures associated with textured geometry in
display lists was being broadcast from one
tilesort to all servers.

13
Broadcast Workarounds (Short Term)

Help Chromium decide how to route textures with
the GL_OBJECT_BBOX_CR extension.
Dont use display lists (for now).
Immediate mode geometry isnt broadcast.
Sorting/routing is accelerated using
GL_OBJECT_BBOX_CR to provide hints to tilesort
it doesnt have to look at all vertices in a
geometry blob to perform routing decisions.
For scientific visualization, which generates
lots of geometry, this is clearly a problem.
Our volume rendering application uses 3D textures
(N3 data) and textured geometry (N2 data), so
the heavy payload data isnt broadcast. The
cost is immediate mode transmission of geometry
(approximately 36KB/frame of geometry as compared
to 160MB/frame of texture data).

14
Broadcast Workarounds (Long Term)

Funding for Chromium developers to implement
display list caching and routing, similar to
existing capabilities to manage texture objects.
Lurking problems
Aging of display lists the crserver is like a
roach motel display lists check in, but they
never check out.
Adding retained mode object aging and management
to applications is an unreasonable burden (IMO).
There exists no commonly accepted mechanism for
LRU aging, etc. in the graphics API. Such an
aging mechanism will probably show up as an
extension.
Better as an extension with tunable parameters
than requiring applications to reach deeply
into graphics API implementations.

15
Scene Graph Issues and Performance Analysis

Discussed in our 2003 PVG Paper (Monday
afternoon).
Our parallel scene graph implementation can be
used by any parallel application, regardless of
the type of parallel programming model used by
the application developer.
The big issue in sort-first is how much data is
duplicated. What weve seen so far is about 1.8x
duplication was required for the first frame (in
a hardware accelerated volume rendering
application).
While the scene graph supports any type of
parallel operation, certain types of
synchronization are required to ensure correct
rendering. These can be achieved using only
Chromium barriers no parallel objects in the
scene graph are required.

16
Some Performance Graphs

Bandwidth vs. Traffic

17
Parallel Scene Graph API Stuff

Collectives
rmPipeSetCommSize(), rmPipeGetCommSize()
rmPipeSetMyRank(), rmPipeGetMyRank()
Chromium-specific
rmPipeBArrierCreateCR()
Creates a Chromium barrier, number of
participants is set by value specified using
rmPipeSetCommSize()
rmPipeBarrierExecCR()
Doesnt block application code execution.
Used to synchronize rendering execution from
rmPipeGetCommSize() streams of graphics commands.

18
Demo Applications Parallel Isosurface
19
Demo Application Parallel Volume Rendering
20
Demo Application Parallel Volume Rendering with
LOD Volumes
21
Conclusions

We met our objectives
General purpose infrastructure for doing parallel
visualization and hardware accelerated rendering
on PC clusters.
The infrastructure can be used by any parallel
application, regardless of the parallel
programming model.
The architecture scaled well from one to 24
displays, supporting extremely high-resolution
output (e.g, 7860x4096).
We bumped into network bandwidth limits (not a
big surprise).
Display lists are still broadcast in Chromium.
Please fund them to add this much-needed
capability, which is fundamental for efficient
sort-first operation of clusters.

22
Sources of Software

OpenRM Scene Graph www.openrm.org
Source code for OpenRMChromium applications
www.openrm.org.
Chromium chromium.sourceforge.net.

23
Acknowledgement

This work was supported by the U. S. Department
of Energy, Office of Science, Office of Advanced
Scientific Computing Research under SBIR grant
DE-FE03-02ER83443.
The authors wish to thank Randall Frank of the
ASCI/VIEWS program at Lawrence Livermore National
Laboratory and the Scientific Computing and
Imaging Institute at the University of Utah for
use of computing facilities during the course of
this research.
The Argon shock bubble dataset was provided
courtesy of John Bell and Vince Beckner at the
Center for Computational Sciences and
Engineering, Lawrence Berkeley National
Laboratory.

Write a Comment

User Comments (0)

About PowerShow.com

Sort-First,%20Distributed%20Memory%20Parallel%20Visualization%20and%20Rendering%20Wes%20Bethel,%20R3vis%20Corporation%20and%20Lawrence%20Berkeley%20National%20Laboratory - PowerPoint PPT Presentation

Sort-First,%20Distributed%20Memory%20Parallel%20Visualization%20and%20Rendering%20Wes%20Bethel,%20R3vis%20Corporation%20and%20Lawrence%20Berkeley%20National%20Laboratory

Enter Chromium the means to use a bunch of PCs to do parallel rendering. ... To Chromium, a display list is an opaque blob of stuff. ... – PowerPoint PPT presentation