Computing Architectures for Virtual Reality - PowerPoint PPT Presentation

1 / 116
About This Presentation
Title:

Computing Architectures for Virtual Reality

Description:

Computing Architectures for Virtual Reality – PowerPoint PPT presentation

Number of Views:3534
Avg rating:5.0/5.0
Slides: 117
Provided by: vpop
Category:

less

Transcript and Presenter's Notes

Title: Computing Architectures for Virtual Reality


1
Computing Architectures for Virtual Reality
Electrical and Computer Engineering Dept.
2

Computer (rendering pipeline)
System architecture
3

Computing Architectures
The VR Engine
Definition A key component of the VR system
which reads its input devices, accesses
task-dependent databases, updates the state of
the virtual world and feeds the results to the
output displays. It is an abstraction it can
mean one computer, several co-located cores in
one computer, several co-located computers, or
many remote computers collaborating in a
distribute simulation
4

Computing Architectures
  • The real-time characteristic of VR requires a VR
    engine
  • which is powerful in order to assure
  • fast graphics and haptics refresh rates (30 fps
    for graphics and
  • hundreds of Hz for haptics)
  • low latencies (lt100 ms to avoid simulation
    sickness)
  • at the core of such architecture is the
    rendering pipeline.
  • within the scope of this course rendering is
    extended to include
  • haptics

5

Computing Architectures
The Graphics Rendering Pipeline
The process of creating a 2-D scene from a 3-D
model is called rendering. The rendering
pipeline has three functional stages. The speed
of the pipeline is that of its slowest stage.
6

The Graphics Rendering Pipeline
Old rendering pipelines were done in software
(slow) Modern pipeline architecture uses
parallelism and buffers. The application stage
is implemented in software, while the other
stages are hardware-accelerated.
7
  • Modern pipelines also do anti-aliasing for
    points, lines or the whole scene

Aliased polygons (jagged edges)
Anti-aliased polygons
8
  • How is anti-aliasing done? Each pixel is
    subdivided
  • (sub-sampled) in n regions, and each sub-pixel
    has a color

The anti-aliased pixel is given a shade of
green-blue (5/16 blue 11/16 green). Without
sub-sampling the pixel would have been entirely
green the color of the center of the pixel
(from Wildcat manual)
9
  • More samples produce better anti-aliasing

8 sub-samples/pixel
16 sub-samples/pixel
From Wildcat SuperScene manual
http//62.189.42.82/product/technology/superscene
_antialiasing.htm
10

Ideal vs. real pipeline output (fps) vs. scene
complexity (Influence of pipeline
bottlenecks)
HP 9000 workstation
11

Computing Architectures
The Rendering Pipeline
12
  • The application stage
  • Is done entirely in software by the CPU
  • It reads Input devices (such as gloves, mouse)
  • It changes the coordinates of the virtual
    camera
  • It performs collision detection and collision
  • response (based on object properties) for
    haptics
  • One form of collision response if force feedback.

13
  • Application stage optimization
  • Reduce model complexity (models with less
    polygons less to feed down the pipe)

Higher resolution model 134,754 polygons.
Low res. Model 600 polygons
14
  • Application stage optimization
  • Reduce floating point precision (single
    precision
  • instead of double precision)
  • minimize number of divisions
  • Since all is done by the CPU, to increase
  • speed a dual-processor (super-scalar)
    architecture
  • is recommended.

15

Computing Architectures
The Rendering Pipeline
Rendering pipeline
16
  • The geometry stage
  • Is done in hardware
  • Consists first of model and view transforms
  • (to be discussed in Chapter 5)
  • Next the scene is shaded based on light models
  • Finally the scene is projected, clipped, and
  • mapped to the screen coordinates.

17
  • The lighting sub-stage
  • It calculates the surface color based on
  • type and number of simulated light sources
  • the lighting model
  • the reflective surface properties
  • atmospheric effects such as fog or smoke.
  • Lighting results in object shading which makes
  • the scene more realistic.

18

Computing architectures
I? Ia? Ka Od? fatt Ip? Kd
Od?cos? Ks Os?cosna
where I? is the intensity of light of wavelength
? Ia? is the intensity of ambient
light Ka is the surface ambient
reflection coefficient Od? is the
object diffuse color fatt is the
atmospheric attenuation factor Ip? is
the intensity of point light source of
wavelength ? Kd is the
diffuse reflection coefficient Ks
is the specular reflection coefficient
Os? is the specular color
19
  • The lighting sub-stage optimization
  • It takes less computation for fewer lights
  • in the scene
  • The simpler the shading model, the less
  • computations (and less realism)
  • Wire-frame models
  • Flat shaded models
  • Gouraud shaded
  • Phong shaded.

20
  • The lighting models
  • Wire-frame is simplest only shows polygon
  • visible edges
  • The flat shaded model assigns same color to all
  • pixels on a polygon (or side) of the object
  • Gouraud or smooth shading interpolates colors
  • Inside the polygons based on the color of the
    edges
  • Phong shading interpolates the vertex normals
  • before calculating the light intensity based on
    the
  • model described most realistic shading model.

21

Computing architectures
Wire-frame model
Flat shading model
Gouraud shading model
22
  • The rendering speed vs. surface polygon type
  • The way surfaces are described influences
    rendering speed.
  • If surfaces are described by triangle meshes,
    the rendering will
  • be faster than for the same object described by
    independent
  • quadrangles or higher-order polygons. This is due
    to the
  • graphics board architecture which may be
    optimized to render
  • triangles.
  • Example the rendering speed of SGI Reality
    Engine.

23

SGI Onyx 2 with Infinite Reality
24

Computing Architectures
The Rendering Pipeline
25
  • The Rasterizer Stage
  • Performs operations in hardware for speed
  • Converts 2-D vertices information from the
  • geometry stage (x,y,z, color, texture) into pixel
  • information on the screen
  • The pixel color information is in color buffer
  • The pixel z-value is stored in the Z-buffer (has
  • same size as color buffer)
  • Assures that the primitives that are visible
    from
  • the point of view of the camera are displayed.

26
  • The Rasterizer Stage - continued
  • The scene is rendered in the back buffer
  • It is then swapped with the front buffer which
  • stores the current image being displayed
  • This process eliminates flicker and is called
  • double buffering
  • All the buffers on the system are grouped into
    the
  • frame buffer.

27
  • Testing for pipeline bottlenecks
  • If CPU operates at 100 then the pipeline is
  • CPU-limited (bottleneck in application stage)
  • If the performance increases when all light
  • sources are removed, then the pipeline is
  • transform-limited (bottleneck in geometry
    stage)
  • If the performance increases when the resolution
  • of the display window, or its size are reduced
  • then the pipeline is fill-limited (bottleneck
    in
  • rasterizer stage).

28

Transform-limited (reduce level of detail)
Fill-limited (increase realism)
29

The Pipeline Balancing
Single buffering



Application (75)
Geometry (75)
Rasterizer (100)
Double buffering, balanced pipeline



Application (90)
Geometry (95)
Rasterizer (100)
30

Computing Architectures
The Haptics Rendering Pipeline
The process of computing the forces and
mechanical textures Associated with haptic
feedback. Is done is software and in hardware.
Has three stages too.
31
PC graphics architecture PC is King!
  • Went from 66 MHz Intel 486 in 1994 to 3.6 GHz
    Pentium IV today
  • Newer PC CPUs are dual (or quad) core improves
    performance by 50
  • Went from 7,000 G-shaded poly./sec (Spea Fire
    board) in 1994 to 27 Mil G-shaded poly/sec. (Fire
    GL 2 used to be in our lab)
  • Today PCs are used for single or multiple users,
    single
  • or tiled displays
  • Intensely competitive industry.

32
PC bus architecture just as important
  • Went from 33 MHz Peripheral Component
    Interface
  • (PCI) bus to 264 MHz Accelerated Graphics Port
  • (AGP4x) bus, and doubled again in the AGP8x
  • Larger throughput and lower latency since
    address bus
  • lines decoupled from data lines. AGP uses
    sideband lines

33

Intel 820/850 chipset
Graphics Accelerator (memory processors
AGP 8x rate 2 GBps
unidirectional 533 MHz x 32 bit/sec
PCI transfer rate 133 MBps 33 MHz x 32
bit/sec PCI Express rate 4 GBps
bidirectional
Todays PC system architecture
34

PC system architecture for the VR Teaching Lab
35

PC system architecture for VR Teaching Lab
36

Fire GL 2
Stereo glasses connector
Passive coolers
AGP bus connector
37

Fire GL 2 architecture
38
  • Fire GL 2 features
  • 27 Million G-shaded/sec., non-textured
    polygons/sec
  • Fill rate is 410 M Pixels/sec.
  • supports up to 16 light sources
  • has a 300 MHz D/A converter

39
Stereo glasses connector
Fire GL X3 256
Passive coolers
DVI-I video output
AGP bus connector
40

Fire GL X3-256 architecture
  • 24-bit pixel processing, 12 pixel pipes
  • dual 10-bit DAC and dual DVI-I connections
  • does not have Genlock
  • anti-aliased points and lines
  • quad-buffered stereo 3D support (2 front and 2
    back buffers)

41

NVIDIA Quadro FX 4000
500 MHz DDR Memory
Graphics processor Unit (GPU)
42

NVIDIA Quadro FX 4000 architecture
  • dual DVI-I connections
  • 32-bit pixel processing, 16 pixel pipes
  • has Genlock
  • anti-aliased points and lines
  • quad-buffered stereo 3D support

43

FireGL X3-256 vs. NVIDIA Quadro vs 3DLabs
44
CPU Evolution to Multi-Core
  • Places several processors on a single chip.
  • It has faster communication between cores than
    between separate processors
  • Each core has its own resources (L1 and L2
    caches) unlike multi-threads on a single core.
  • It is more energy efficient and results in higher
    performance

45
Multi-core details
46
AMD64 x2 Architecture
47
Guts of Native Quad Core (Next Gen)
48
  • Aims at a balance between hardware software and
    service
  • Has a flexible design by abandoning the
    nVidia-only deal of the xBox
  • Uses a multi-core design on a single die like
    having three PowerPC CPUs running at 3.2 GHz
  • Each of the three cores can process two threads
    at-a-time (like 6 conventional processors
  • Each core has a SIMD unit - exploits real-time
    graphics data parallelism

The X-Box 360
49

The X-Box 360
  • The GPU has a Unified Shader Architecture,
    meaning one unit that does both geometry and
    rasterization stage (vs. separate vertex and
    pixel shaders)
  • The Arbiter retrieves commands from the
    Reservation Stations and delivers them to the
    appropriate Processing Engine
  • The xBox 360 has several Arbiters and 48 ALUs

50

The X-Box 360
  • The GPU has embedded 10 MB DRAM for use as a
    frame buffer
  • Resolution up to 1920x1080 with full-screen
    anti-aliasing
  • The GPU has the memory controller connecting to
    the 3 cores at 22 GB/sec
  • Renders 500 million triangles/sec and fill rate
    of 16 Gsamples/sec

51
PlayStation 3 Information
  • Two simultaneous High-definition television
    streams for use on a title screen for a HD
    Blu-ray Movie.
  • High-definition IP video conferencing.
  • EyeToy interactive reality game.
  • EyeToy voice command recognition.
  • EyeToy virtual object manipulation.
  • Digital photograph display (JPEG).
  • MP3 and ATRAC download and playback.
  • Simultaneous World Wide Web access and
    gameplay.
  • Hub/Home Ethernet Gaming Network.
  • The Ability to Have 7 Controllers at Once

52
PS3 Specs
  • PS3 CPU Cell Processor
  • - Developed by IBM.- Cell Processor-
    PowerPC-base Core _at_ 3.2GHz- 1 VMX vector unit
    per core- 512KB L2 cache- 7 x SPE _at_ 3.2GHz- 7
    x 128b 128 SIMD GPRs- 7 x 256KB SRAM for SPE-
    1 of 8 SPEs reserved for redundancy- total
    floating point performance 218 GFLOPS

53
Cell Processor Architecture
  • The PowerPC core present in the system is a
    general-purpose 64-bit PowerPC processor that
    handles the Cell BE's general-purpose workload
    (or, the operating system) and manages
    special-purpose workloads for the SPEs.
  • The SPEs are SIMD units capable of operating on
    128-bit vectors consisting of four 32-bit operand
    types at a time. Each SPE has a large register
    file of 128x128-bit registers for operating on
    128-bit vector data types and has an instruction
    set heavily biased towards vector computation.
    The SPEs have a fairly simple implementation to
    save power and silicon area.

54
Element Interconnect Bus(the communication path)
  • It turns out that the physical center of the
    processor is not any of the processor
    elements, but the bus which connects them.
  • Main memory bandwidth about 25.6GB/s
  • I/O bandwidth 35GB/s inbound and another 40GB/s
    outbound
  • and a fair amount of bandwidth left over for
    moving data within the processor.

55
PlayStation 3 use of the multi-core processor
(IEEE Spectrum 2006)
56
PS3 chip Physical Layout
57
Screenshot -Resident Evil
58
Screenshot -Gran Turismo
59
PlayStation 3 Videos
FFVII Tech Demo
Madden Nextgen Demo
60
Other I/O Components
  • Audio/video output
  • - Supported screen sizes 480i, 480p, 720p,
    1080i, 1080p
  • - Two HDMI (Type A) outputs (Dual-screen HD
    outputs)
  • - S/PDIF optical output for digital audio
  • - Multiple analog outputs (Composite, S-Video,
    Component video)
  • Sound
  • - Dolby Digital 5.1, DTS, LPCM (DSP
    functionality handled by the Cell processor)

61
The Nintendo Wii
  • Nintendos fifth video game console, 1.2 million
    sold by February 1, 2007.
  • The concept involved focusing on a new form of
    player interaction accelerometer and IR
    tracking
  • Contains solid-state accelerometers and
    gyroscopes.
  • Tilting and rotation up and down, left and right
    and along the main axis (as with a screwdriver).
  • Acceleration up /down, left /right, toward the
    screen and away.
  • Dramatically improved interface for video games.
  • Innovative controller, integrates vibration
    feedback.
  • Uses Bluetooth technology, 30 foot range.
  • As a pointing device, can send a signal up to 15
    feet away. Up to 4 Wii Remotes connected at once.

62
Playing tennis with Nintendo Wii
  • Dramatically improved interface for video games.
  • Innovative controller, integrates vibration
    feedback.
  • Uses Bluetooth technology, 30 foot range.
  • As a pointing device, can send a signal up to 15
    feet away. Up to 4 Wii Remotes connected at once.

63
http//www.winsupersite.com/showcase/xbox360_vs_ps
3.asp
64
  • Graphics Benchmarks
  • Benchmark established by independent
    organization
  • Allow comparison of graphics cards performance
    based standardized application cases.
  • Can be application-specific like SPECapc
    (Application Performance Characterization)
  • Or general-purpose for OpenGL architectures like
    SPECviewperf

65

for OpenGL-based systems
66

Accelerator boards viewperf 8.0.1 comparison
  • SPECviewperf is a portable OpenGL performance
    benchmark
  • program written in CSPECviewperf reports
    performance in frames per second.
  • There are six tests
  • 3ds max for graphics design software.
  • CATIA (DX) for CAD design application.
  • EnSight(DRV) a 3D visualization package.
  • Maya, an animation application.
  • ProEngineer
  • Lightscape radiosity application for large data
    sets.
  • Solidworks
  • Unigrfaphics

for OpenGL-based systems
67
Accelerator boards viewperf 9.1
  • larger, more complex viewsets that place greater
    stress on graphics hardware
  • memory and list allocation improvements that
    allow data to be reused and shared in the same
    manner as within actual applications
  • better compression, enabling the inclusion of
    larger viewsets
  • mixing of primitive types and graphics modes,
    helping to ensure that optimizations for a
    viewset will be reflected in real-world
    performance.

68

Accelerator boards viewperf comparison
  • Updated regularly at www.spec.org
  • SPECviewperf uses a geometric mean formula to
  • determine scores
  • Geometric mean (fps) (test1 weight 1) ? (test2
    weight 2)
  • . ?
    (testN weight n)

69

Accelerator boards Viewperf comparison
70

Accelerator boards Viewperf comparison
71

72
  • Workstation-based architectures
  • Second-largest computation base
  • Unix system is well suited for VR multi-tasking
    needs
  • Multi-processor, superscalar architecture is
    also appropriate for VR real-time needs
  • Example SGI InfiniteReality

73
The SGI InfiniteReality computer
  • A massively parallel architecture based on
    proprietary
  • ASIC technology Was considered for a long time
    the crème-de-la-crème in VR computers.
  • Can have up to 24 R10,000 CPUs in the
    application stage,
  • The geometry board consists of a host interface
    processor (HIP), a geometry distributor and
    geometry engines (with a FIFO queue)
  • The HIP task is to pull data from main memory
    (using DMA) it also has its own 16 MB cache,
    such that the need to pull data is reduced.

74
Influence of HIP Display List caching
75

76
The SGI InfiniteReality - continued
  • The HIP sends data to the geometry distributor
    which
  • distributes the load to the geometry engines on a
    least busy
  • fashion (with a FIFO queue)
  • Each Geometry Engine uses SIMD
    (single-instruction-
  • multiple-data) by processing the three
    coordinates of the vertex
  • in parallel on three floating-point cores.
  • The GE floating point core has its own ALU,
    multiplier and
  • 32-word register in a four-stage pipeline
  • The FIFO holds the results of the GEs output and
    writes the
  • merged stream to the vertex bus

77

SGI Infinite Reality system architecture
  • Data from the vertex bus are received by the
    fragment generators on the raster memory board
  • The fragment generator performs the texturing,
    color, depth pixel interpolation and
    anti-aliasing (4 to 8 sub-samples/pixel)
  • Their output is then distributed equally among
    80 image engines on the raster board
  • The image engine tiling pattern is 320x80
    pixels
  • The display hardware has dynamic video resize,
    video timing and D/A conversion

78
  • Distributed VR architectures
  • Single-user systems
  • multiple side-by-side displays
  • multiple LAN-networked computers
  • Multi-user systems
  • client-server systems
  • pier-to-pier systems
  • hybrid systems

79

Single-user, multiple displays
(3DLabs Inc.)
80
  • Side-by-side displays.
  • Used is VR workstations (desktop), or in large
    volume displays (CAVE or the Wall)
  • One solution is to use one PC with graphics
    accelerator for every projector
  • This results is a rack mounted architecture,
    such as the MetaVR Channel Surfer used in
  • flight simulators or the Princeton Display Wall

81
  • Side-by-side displays.
  • Another (cheaper) solution is to use one PC
    only with several graphics accelerator cards
    (one for every monitor). Windows 2000 allows this
    option, while Windows NT allowed only one
    accelerator per system
  • Accelerators need to be installed on a PCI bus

82
  • Genlock..
  • If the output of two or more graphics pipes is
    used to drive monitors placed side-by-side, then
    the display channels need to be synchronized
    pixel-by-pixel
  • Moreover, the edges have to be blended, by
    creating a region of overlap.

83

(Courtesy of Quantum3D Inc.)
84
  • Problems with non-synchronized displays...
  • CRTs that are side-by-side induce fields in each
    other, resulting in electronic beam distortion
    and flickers need to be shielded
  • Image artifacts reduce simulation realism,
    increase latencies, and induce simulation
    sickness.

85

Problems with non-synchronized CRT displays...
86

(Courtesy of Quantum3D Inc.)
87
  • Synchronization of displays
  • software synchronized system commands that
    frame processing start at same time on different
    rendering pipes
  • does not work if one pipe is overloaded one
    image finishes first

Synchronization command
88
  • Synchronization of displays
  • frame buffer synchronized system commands that
    frame buffer swapping starts at same time on
    different rendering pipes
  • does not work because swapping depends on
    electronic gun refresh - one buffer will swap up
    to 1/72 sec before the other.


CRT
Synchronization command
Buffer
89
  • Synchronization of displays
  • video synchronized system commands that CRT
    vertical beam starts at same time one CRT
    becomes the master
  • does not work if horizontal beam is not
    synchronized too (one line too many or too few).

Master CRT
Buffer
Synchronization command
Buffer
Slave CRT
90
  • Synchronization of displays
  • Best method is to have software buffer video
    synchronization of the two (or more) rendering
    pipes

Master CRT
Buffer
Synchronization command
Synchronization command
Synchronization command
Buffer
Slave CRT
91

Video synchronized displays (three PCs)
done
release
(Digital Video Interface- Video out)
Wildcat 4210
92

(Courtesy of Quantum3D Inc.)
93
  • Graphics and Haptics Pipeline Synchronization
  • Has to be done at the application stage to allow
    decoupling of the rendering stages (have vastly
    different output rates)

94

Haptic Interface Controller (embedded Pentium)
Graphics pipe and Haptics pipe
Pentium II Dual-processor Host computer
Haptic Interface
95

Physics Processing Unit (PPU)
  • First Physics Processing Unit made by Ageia Inc.
    is called PhysX
  • PhysX available as an add on card (see above).
  • Helps the CPU do computations related to
    material properties (elasticity, friction,
    density)
  • Better smog and fog effects and more realistic
    clothing simulation (characters clothes will
    react differently based on the material and other
    factors like rain and wind
  • Better fluid dynamics simulation and collision
    effects Cost 160

96

Physics Processing Unit (PPU)
97
  • Co-located Rendering Pipelines
  • Another, cheaper, solution is to use a single
    multi-pipe graphics accelerator
  • one output channel for every monitor.

Wildcat II 5110
98

Wildcat II 5110
99
  • Wildcat4 7210 features
  • 38 Million Gouraud-shaded, Z-buffered
    triangles/sec/
  • 400 Megapixel/sec texture fill rate
  • 32 light sources in hardware
  • Independent dual display support
  • 1529x856 frame-sequential stereo _at_ 120 Hz.

100
  • Wildcat Realizm 800 features
  • Uses a Visual Processing Unit (VPU)
  • Uses OpenGL Shading Language

101
  • Wildcat Realizm 800 features
  • Texture sizes up to 4K x 4K
  • 32 light sources in hardware
  • Independent dual 400 MHz 10-bit DAC
  • 3D textures are applied throughout the volume of
    a model, not just on the external surfaces

102

Computing architectures
  • PC Clusters
  • multiple LAN-networked computers
  • used for multiple-PC video output
  • used for multiple computer collaboration (when
    computing power is insufficient on a single
    machine) older approach.

103
Chromium cluster of 32 rendering servers and four
control servers
104

Chromium networking architecture
105

Frame refresh rate comparison
106

Princeton display wall using eight LCD rear
projectors (1998)
107

Princeton display wall eight 4-way Pentium-Pro
SMPs with ES graphics accelerators. They drive
8 Proxima 9200 LCD projectors. (1998)
108

VRX Rack - Ciara Technologies 256 Xeon processors
and 1.T TerraBytes of DDR Memory Best
price/performance ratio, Lynux and Windows OS
Ciara VRX
109

Computing architectures
  • Multi-User distributed remote system
    architecture
  • Multiple modem-networked computers
  • multiple LAN-networked computers
  • multiple WAN-networked computers
  • what is the network topology and influence on
    number of users?

110

Network connections
111
  • Two-User Shared Virtual Environments
  • These were the first multi-user environments to
    be introduced (they are the simplest)
  • Communicate over LAN using unicast packets with
    TCP/IP protocols

112

Server-mediated communication Unicast
mode Sever is bottleneck on allowable number of
clients
Server
Client 1
Client 2
Client n

(adapted from Networked Virtual Environments
Singhal and Zyda, 1999)
113
Client 2,1
Client 2,2
Client 2,n
Server-mediated communication Allows more
clients to be networked over LANs
Server 2
LAN
LAN
Server 1
LAN
Client 1,1
Client 1,2
Client 1,n
(adapted from Networked Virtual Environments
Singhal and Zyda, 1999)

114
Pier-to-pier communication Allows more clients
to be networked over LANs Can use broadcast or
multicast Reduces network traffic, BUT.. More
vulnerable to viruses, and does not work well
over WAN.
LAN
Multicast packets
Area of interest management
AOIM 1
AOIM 3
AOIM n
User 1
User 3
User n
(adapted from Networked Virtual Environments
Singhal and Zyda, 1999)
115
Hybrid network using multiple servers
communicating through multicast allows
deployment over WAN - no broadcasting allowed
WAN
Unicast packets
Unicast packets
Proxy Server 1
Proxy Server 2
Proxy Server 3
Proxy Server n
Multicast packets
LAN
User 1,1
User 1,2
User 1,n
For very large DVEs current WAN - do not support
multicasting
(adapted from Avatars in Networked Virtual
Environments Chapin,
Pandzic, Magnenat-Thalman and Thalman, 1999)
116
Example of distributed Virtual Environment (connec
tion between Geneva and Lausanne in Switzerland
Cybertennis
Write a Comment
User Comments (0)
About PowerShow.com