Sort-First,%20Distributed%20Memory%20Parallel%20Visualization%20and%20Rendering%20Wes%20Bethel,%20R3vis%20Corporation%20and%20Lawrence%20Berkeley%20National%20Laboratory - PowerPoint PPT Presentation

About This Presentation
Title:

Sort-First,%20Distributed%20Memory%20Parallel%20Visualization%20and%20Rendering%20Wes%20Bethel,%20R3vis%20Corporation%20and%20Lawrence%20Berkeley%20National%20Laboratory

Description:

Enter Chromium the means to use a bunch of PCs to do parallel rendering. ... To Chromium, a display list is an opaque blob of stuff. ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Sort-First,%20Distributed%20Memory%20Parallel%20Visualization%20and%20Rendering%20Wes%20Bethel,%20R3vis%20Corporation%20and%20Lawrence%20Berkeley%20National%20Laboratory


1
Sort-First, Distributed Memory Parallel
Visualization and RenderingWes Bethel, R3vis
Corporation and Lawrence Berkeley National
Laboratory
  • Parallel Visualization and Graphics Workshop
  • Sunday October 18, 2003,
  • Seattle, Washington

2
The Actual Presentation Title
  • Why Distributed Memory Parallel Rendering is a
    Challenge Combining OpenRM Scene Graph and
    Chromium for General Purpose Use on Distributed
    Memory Clusters
  • Outline
  • Problem Statement, Desired Outcomes
  • Sort-First Parallel Architectural Overview
  • The Many Ways I Broke Chromium
  • Scene Graph Considerations
  • Demo?
  • Conclusions

3
Motivation and Problem Statement
  • The Allure of COTS solutions
  • Performance of COTS GPUs exceed that of custom
    silicon.
  • Attractive price/performance of COTS platforms
    (e.g., x86 PCs).
  • Gigabit Ethernet is cheap 100/NIC, 500 8-port
    switch.
  • Can build a screamer cluster for about 2K/node.
  • Were accustomed to nice, friendly software
    infrastructure. E.g., hardware accelerated
    Xinerama.
  • Enter Chromium the means to use a bunch of PCs
    to do parallel rendering.
  • Parallel submission of graphics streams is a
    custom solution, and presents challenges.
  • Want a flexible, resilient API to interface
    between parallel visualization/rendering
    applications and Chromium.

4
 
 
5
Our Approach
  • Distributed memory parallel visualization
    application design amortizes expensive data I/O
    and visualization across many nodes.
  • The scene graph layer mediates interaction
    between the application and the rendering
    subsystem portability, hide the icky parallel
    rendering details, provides an infrastructure for
    accelerating rendering.
  • Chromium provides routing of graphics commands to
    support hardware accelerated rendering on a
    variety of platforms.
  • Focus on COTS solutions all hardware and
    software we used is cheap (PC cluster) or free
    (software).
  • Focus on simplicity our sample applications are
    straightforward in implementation, easily
    reproducible by others and highly portable.
  • Want an infrastructure that is suitable for use
    regardless of the type of parallel programming
    model used by the application
  • No parallel objects in the Scene Graph!!!!!

6
Our Approach, ctd.
7
The Many Ways I Broke Chromium
  • Retained mode object namespace collisions in
    parallel submission.
  • Broadcasting how to burn up bandwidth without
    even trying!
  • Scene Graph Issues to be discussed in our PVG
    paper presentation on Monday.

8
The Collision Problem
  • Want to use OpenGL retained-mode semantics and
    structures to realize performance gains in DM
    environment
  • Problem
  • Namespace collision of retained-mode
    identifiers during parallel submission of
    graphics commands.
  • Example
  • The problem exists for all OpenGL retained mode
    objects display lists, texture objects and
    programs.
  • The problem extends to all OpenGL retained mode
    objects display lists, texture object ids,
    programs.

Process A GLuint n glNewList(1) printf(
idd\n) // id0 // build list, draw with list
Process A GLuint n glNewList(1) printf(
idd\n,n) // id 0 // build list, draw with
list
9
Manifestation of Collision Problem
  • Show image of four textured quads when the
    problem is present.

10
Desired Result
  • Show image of four textured quads when the
    problem is fixed.

11
Resolving the Collision Problem
  • New CR configuration file options
  • shared_textures, shared_display_lists,
    shared_programs
  • When set to 1, beware of collisions in parallel
    submission.
  • When set to 0, collisions resolved in parallel
    submission.
  • Using shared_ to zero will enforce unique
    retained mode identifiers across all parallel
    submitters.
  • Thanks, Brian!

12
The Broadcast Problem
  • Whats the Problem?
  • Geometry and textures from N application PEs is
    replicated across M crservers. Bumped into limits
    of memory and bandwidth.
  • To Chromium, a display list is an opaque blob of
    stuff. Tilesort doesnt peek inside a display
    list to see where it should be sent.
  • Early performance testing showed two types of
    broadcasting
  • Display lists being broadcast from one tilesort
    to all servers.
  • Textures associated with textured geometry in
    display lists was being broadcast from one
    tilesort to all servers.

13
Broadcast Workarounds (Short Term)
  • Help Chromium decide how to route textures with
    the GL_OBJECT_BBOX_CR extension.
  • Dont use display lists (for now).
  • Immediate mode geometry isnt broadcast.
    Sorting/routing is accelerated using
    GL_OBJECT_BBOX_CR to provide hints to tilesort
    it doesnt have to look at all vertices in a
    geometry blob to perform routing decisions.
  • For scientific visualization, which generates
    lots of geometry, this is clearly a problem.
  • Our volume rendering application uses 3D textures
    (N3 data) and textured geometry (N2 data), so
    the heavy payload data isnt broadcast. The
    cost is immediate mode transmission of geometry
    (approximately 36KB/frame of geometry as compared
    to 160MB/frame of texture data).

14
Broadcast Workarounds (Long Term)
  • Funding for Chromium developers to implement
    display list caching and routing, similar to
    existing capabilities to manage texture objects.
  • Lurking problems
  • Aging of display lists the crserver is like a
    roach motel display lists check in, but they
    never check out.
  • Adding retained mode object aging and management
    to applications is an unreasonable burden (IMO).
  • There exists no commonly accepted mechanism for
    LRU aging, etc. in the graphics API. Such an
    aging mechanism will probably show up as an
    extension.
  • Better as an extension with tunable parameters
    than requiring applications to reach deeply
    into graphics API implementations.

15
Scene Graph Issues and Performance Analysis
  • Discussed in our 2003 PVG Paper (Monday
    afternoon).
  • Our parallel scene graph implementation can be
    used by any parallel application, regardless of
    the type of parallel programming model used by
    the application developer.
  • The big issue in sort-first is how much data is
    duplicated. What weve seen so far is about 1.8x
    duplication was required for the first frame (in
    a hardware accelerated volume rendering
    application).
  • While the scene graph supports any type of
    parallel operation, certain types of
    synchronization are required to ensure correct
    rendering. These can be achieved using only
    Chromium barriers no parallel objects in the
    scene graph are required.

16
Some Performance Graphs
  • Bandwidth vs. Traffic

17
Parallel Scene Graph API Stuff
  • Collectives
  • rmPipeSetCommSize(), rmPipeGetCommSize()
  • rmPipeSetMyRank(), rmPipeGetMyRank()
  • Chromium-specific
  • rmPipeBArrierCreateCR()
  • Creates a Chromium barrier, number of
    participants is set by value specified using
    rmPipeSetCommSize()
  • rmPipeBarrierExecCR()
  • Doesnt block application code execution.
  • Used to synchronize rendering execution from
    rmPipeGetCommSize() streams of graphics commands.

18
Demo Applications Parallel Isosurface
19
Demo Application Parallel Volume Rendering
20
Demo Application Parallel Volume Rendering with
LOD Volumes
21
Conclusions
  • We met our objectives
  • General purpose infrastructure for doing parallel
    visualization and hardware accelerated rendering
    on PC clusters.
  • The infrastructure can be used by any parallel
    application, regardless of the parallel
    programming model.
  • The architecture scaled well from one to 24
    displays, supporting extremely high-resolution
    output (e.g, 7860x4096).
  • We bumped into network bandwidth limits (not a
    big surprise).
  • Display lists are still broadcast in Chromium.
    Please fund them to add this much-needed
    capability, which is fundamental for efficient
    sort-first operation of clusters.

22
Sources of Software
  • OpenRM Scene Graph www.openrm.org
  • Source code for OpenRMChromium applications
    www.openrm.org.
  • Chromium chromium.sourceforge.net.

23
Acknowledgement
  • This work was supported by the U. S. Department
    of Energy, Office of Science, Office of Advanced
    Scientific Computing Research under SBIR grant
    DE-FE03-02ER83443.
  • The authors wish to thank Randall Frank of the
    ASCI/VIEWS program at Lawrence Livermore National
    Laboratory and the Scientific Computing and
    Imaging Institute at the University of Utah for
    use of computing facilities during the course of
    this research.
  • The Argon shock bubble dataset was provided
    courtesy of John Bell and Vince Beckner at the
    Center for Computational Sciences and
    Engineering, Lawrence Berkeley National
    Laboratory.
Write a Comment
User Comments (0)
About PowerShow.com