Computing Architectures for Virtual Reality

About This Presentation

Title:

Computing Architectures for Virtual Reality

Description:

Computing Architectures for Virtual Reality – PowerPoint PPT presentation

Number of Views:3534

Avg rating:5.0/5.0

Slides: 117

Provided by: vpop

Category:

more less

Transcript and Presenter's Notes

Title: Computing Architectures for Virtual Reality

1
Computing Architectures for Virtual Reality
Electrical and Computer Engineering Dept.
2

Computer (rendering pipeline)
System architecture
3

Computing Architectures
The VR Engine
Definition A key component of the VR system
which reads its input devices, accesses
task-dependent databases, updates the state of
the virtual world and feeds the results to the
output displays. It is an abstraction it can
mean one computer, several co-located cores in
one computer, several co-located computers, or
many remote computers collaborating in a
distribute simulation
4

Computing Architectures

The real-time characteristic of VR requires a VR
engine
which is powerful in order to assure
fast graphics and haptics refresh rates (30 fps
for graphics and
hundreds of Hz for haptics)
low latencies (lt100 ms to avoid simulation
sickness)
at the core of such architecture is the
rendering pipeline.
within the scope of this course rendering is
extended to include
haptics

5

Computing Architectures
The Graphics Rendering Pipeline
The process of creating a 2-D scene from a 3-D
model is called rendering. The rendering
pipeline has three functional stages. The speed
of the pipeline is that of its slowest stage.
6

The Graphics Rendering Pipeline
Old rendering pipelines were done in software
(slow) Modern pipeline architecture uses
parallelism and buffers. The application stage
is implemented in software, while the other
stages are hardware-accelerated.
7

Modern pipelines also do anti-aliasing for
points, lines or the whole scene

Aliased polygons (jagged edges)
Anti-aliased polygons
8

How is anti-aliasing done? Each pixel is
subdivided
(sub-sampled) in n regions, and each sub-pixel
has a color

The anti-aliased pixel is given a shade of
green-blue (5/16 blue 11/16 green). Without
sub-sampling the pixel would have been entirely
green the color of the center of the pixel
(from Wildcat manual)
9

More samples produce better anti-aliasing

8 sub-samples/pixel
16 sub-samples/pixel
From Wildcat SuperScene manual
http//62.189.42.82/product/technology/superscene
_antialiasing.htm
10

Ideal vs. real pipeline output (fps) vs. scene
complexity (Influence of pipeline
bottlenecks)
HP 9000 workstation
11

Computing Architectures
The Rendering Pipeline
12

The application stage
Is done entirely in software by the CPU
It reads Input devices (such as gloves, mouse)
It changes the coordinates of the virtual
camera
It performs collision detection and collision
response (based on object properties) for
haptics
One form of collision response if force feedback.

Application stage optimization
Reduce model complexity (models with less
polygons less to feed down the pipe)

Higher resolution model 134,754 polygons.
Low res. Model 600 polygons
14

Application stage optimization
Reduce floating point precision (single
precision
instead of double precision)
minimize number of divisions
Since all is done by the CPU, to increase
speed a dual-processor (super-scalar)
architecture
is recommended.

15

Computing Architectures
The Rendering Pipeline
Rendering pipeline
16

The geometry stage
Is done in hardware
Consists first of model and view transforms
(to be discussed in Chapter 5)
Next the scene is shaded based on light models
Finally the scene is projected, clipped, and
mapped to the screen coordinates.

The lighting sub-stage
It calculates the surface color based on
type and number of simulated light sources
the lighting model
the reflective surface properties
atmospheric effects such as fog or smoke.
Lighting results in object shading which makes
the scene more realistic.

18

Computing architectures
I? Ia? Ka Od? fatt Ip? Kd
Od?cos? Ks Os?cosna
where I? is the intensity of light of wavelength
? Ia? is the intensity of ambient
light Ka is the surface ambient
reflection coefficient Od? is the
object diffuse color fatt is the
atmospheric attenuation factor Ip? is
the intensity of point light source of
wavelength ? Kd is the
diffuse reflection coefficient Ks
is the specular reflection coefficient
Os? is the specular color
19

The lighting sub-stage optimization
It takes less computation for fewer lights
in the scene
The simpler the shading model, the less
computations (and less realism)
Wire-frame models
Flat shaded models
Gouraud shaded
Phong shaded.

The lighting models
Wire-frame is simplest only shows polygon
visible edges
The flat shaded model assigns same color to all
pixels on a polygon (or side) of the object
Gouraud or smooth shading interpolates colors
Inside the polygons based on the color of the
edges
Phong shading interpolates the vertex normals
before calculating the light intensity based on
the
model described most realistic shading model.

21

Computing architectures
Wire-frame model
Flat shading model
Gouraud shading model
22

The rendering speed vs. surface polygon type
The way surfaces are described influences
rendering speed.
If surfaces are described by triangle meshes,
the rendering will
be faster than for the same object described by
independent
quadrangles or higher-order polygons. This is due
to the
graphics board architecture which may be
optimized to render
triangles.
Example the rendering speed of SGI Reality
Engine.

23

SGI Onyx 2 with Infinite Reality
24

Computing Architectures
The Rendering Pipeline
25

The Rasterizer Stage
Performs operations in hardware for speed
Converts 2-D vertices information from the
geometry stage (x,y,z, color, texture) into pixel
information on the screen
The pixel color information is in color buffer
The pixel z-value is stored in the Z-buffer (has
same size as color buffer)
Assures that the primitives that are visible
from
the point of view of the camera are displayed.

The Rasterizer Stage - continued
The scene is rendered in the back buffer
It is then swapped with the front buffer which
stores the current image being displayed
This process eliminates flicker and is called
double buffering
All the buffers on the system are grouped into
the
frame buffer.

Testing for pipeline bottlenecks
If CPU operates at 100 then the pipeline is
CPU-limited (bottleneck in application stage)
If the performance increases when all light
sources are removed, then the pipeline is
transform-limited (bottleneck in geometry
stage)
If the performance increases when the resolution
of the display window, or its size are reduced
then the pipeline is fill-limited (bottleneck
in
rasterizer stage).

28

Transform-limited (reduce level of detail)
Fill-limited (increase realism)
29

The Pipeline Balancing
Single buffering

Application (75)
Geometry (75)
Rasterizer (100)
Double buffering, balanced pipeline

Application (90)
Geometry (95)
Rasterizer (100)
30

Computing Architectures
The Haptics Rendering Pipeline
The process of computing the forces and
mechanical textures Associated with haptic
feedback. Is done is software and in hardware.
Has three stages too.
31
PC graphics architecture PC is King!

Went from 66 MHz Intel 486 in 1994 to 3.6 GHz
Pentium IV today
Newer PC CPUs are dual (or quad) core improves
performance by 50
Went from 7,000 G-shaded poly./sec (Spea Fire
board) in 1994 to 27 Mil G-shaded poly/sec. (Fire
GL 2 used to be in our lab)
Today PCs are used for single or multiple users,
single
or tiled displays
Intensely competitive industry.

32
PC bus architecture just as important

Went from 33 MHz Peripheral Component
Interface
(PCI) bus to 264 MHz Accelerated Graphics Port
(AGP4x) bus, and doubled again in the AGP8x
Larger throughput and lower latency since
address bus
lines decoupled from data lines. AGP uses
sideband lines

33

Intel 820/850 chipset
Graphics Accelerator (memory processors
AGP 8x rate 2 GBps
unidirectional 533 MHz x 32 bit/sec
PCI transfer rate 133 MBps 33 MHz x 32
bit/sec PCI Express rate 4 GBps
bidirectional
Todays PC system architecture
34

PC system architecture for the VR Teaching Lab
35

PC system architecture for VR Teaching Lab
36

Fire GL 2
Stereo glasses connector
Passive coolers
AGP bus connector
37

Fire GL 2 architecture
38

Fire GL 2 features
27 Million G-shaded/sec., non-textured
polygons/sec
Fill rate is 410 M Pixels/sec.
supports up to 16 light sources
has a 300 MHz D/A converter

39
Stereo glasses connector
Fire GL X3 256
Passive coolers
DVI-I video output
AGP bus connector
40

Fire GL X3-256 architecture

24-bit pixel processing, 12 pixel pipes
dual 10-bit DAC and dual DVI-I connections
does not have Genlock
anti-aliased points and lines
quad-buffered stereo 3D support (2 front and 2
back buffers)

41

NVIDIA Quadro FX 4000
500 MHz DDR Memory
Graphics processor Unit (GPU)
42

NVIDIA Quadro FX 4000 architecture

dual DVI-I connections
32-bit pixel processing, 16 pixel pipes
has Genlock
anti-aliased points and lines
quad-buffered stereo 3D support

43

FireGL X3-256 vs. NVIDIA Quadro vs 3DLabs
44
CPU Evolution to Multi-Core

Places several processors on a single chip.
It has faster communication between cores than
between separate processors
Each core has its own resources (L1 and L2
caches) unlike multi-threads on a single core.
It is more energy efficient and results in higher
performance

45
Multi-core details
46
AMD64 x2 Architecture
47
Guts of Native Quad Core (Next Gen)
48

Aims at a balance between hardware software and
service
Has a flexible design by abandoning the
nVidia-only deal of the xBox
Uses a multi-core design on a single die like
having three PowerPC CPUs running at 3.2 GHz
Each of the three cores can process two threads
at-a-time (like 6 conventional processors
Each core has a SIMD unit - exploits real-time
graphics data parallelism

The X-Box 360
49

The X-Box 360

The GPU has a Unified Shader Architecture,
meaning one unit that does both geometry and
rasterization stage (vs. separate vertex and
pixel shaders)
The Arbiter retrieves commands from the
Reservation Stations and delivers them to the
appropriate Processing Engine
The xBox 360 has several Arbiters and 48 ALUs

50

The X-Box 360

The GPU has embedded 10 MB DRAM for use as a
frame buffer
Resolution up to 1920x1080 with full-screen
anti-aliasing
The GPU has the memory controller connecting to
the 3 cores at 22 GB/sec
Renders 500 million triangles/sec and fill rate
of 16 Gsamples/sec

51
PlayStation 3 Information

Two simultaneous High-definition television
streams for use on a title screen for a HD
Blu-ray Movie.
High-definition IP video conferencing.
EyeToy interactive reality game.
EyeToy voice command recognition.
EyeToy virtual object manipulation.
Digital photograph display (JPEG).
MP3 and ATRAC download and playback.
Simultaneous World Wide Web access and
gameplay.
Hub/Home Ethernet Gaming Network.
The Ability to Have 7 Controllers at Once

52
PS3 Specs

PS3 CPU Cell Processor
- Developed by IBM.- Cell Processor-
PowerPC-base Core _at_ 3.2GHz- 1 VMX vector unit
per core- 512KB L2 cache- 7 x SPE _at_ 3.2GHz- 7
x 128b 128 SIMD GPRs- 7 x 256KB SRAM for SPE-
1 of 8 SPEs reserved for redundancy- total
floating point performance 218 GFLOPS

53
Cell Processor Architecture

The PowerPC core present in the system is a
general-purpose 64-bit PowerPC processor that
handles the Cell BE's general-purpose workload
(or, the operating system) and manages
special-purpose workloads for the SPEs.
The SPEs are SIMD units capable of operating on
128-bit vectors consisting of four 32-bit operand
types at a time. Each SPE has a large register
file of 128x128-bit registers for operating on
128-bit vector data types and has an instruction
set heavily biased towards vector computation.
The SPEs have a fairly simple implementation to
save power and silicon area.

54
Element Interconnect Bus(the communication path)

It turns out that the physical center of the
processor is not any of the processor
elements, but the bus which connects them.
Main memory bandwidth about 25.6GB/s
I/O bandwidth 35GB/s inbound and another 40GB/s
outbound
and a fair amount of bandwidth left over for
moving data within the processor.

55
PlayStation 3 use of the multi-core processor
(IEEE Spectrum 2006)
56
PS3 chip Physical Layout
57
Screenshot -Resident Evil
58
Screenshot -Gran Turismo
59
PlayStation 3 Videos
FFVII Tech Demo
Madden Nextgen Demo
60
Other I/O Components

Audio/video output
- Supported screen sizes 480i, 480p, 720p,
1080i, 1080p
- Two HDMI (Type A) outputs (Dual-screen HD
outputs)
- S/PDIF optical output for digital audio
- Multiple analog outputs (Composite, S-Video,
Component video)
Sound
- Dolby Digital 5.1, DTS, LPCM (DSP
functionality handled by the Cell processor)

61
The Nintendo Wii

Nintendos fifth video game console, 1.2 million
sold by February 1, 2007.
The concept involved focusing on a new form of
player interaction accelerometer and IR
tracking
Contains solid-state accelerometers and
gyroscopes.
Tilting and rotation up and down, left and right
and along the main axis (as with a screwdriver).
Acceleration up /down, left /right, toward the
screen and away.
Dramatically improved interface for video games.
Innovative controller, integrates vibration
feedback.
Uses Bluetooth technology, 30 foot range.
As a pointing device, can send a signal up to 15
feet away. Up to 4 Wii Remotes connected at once.

62
Playing tennis with Nintendo Wii

Dramatically improved interface for video games.
Innovative controller, integrates vibration
feedback.
Uses Bluetooth technology, 30 foot range.
As a pointing device, can send a signal up to 15
feet away. Up to 4 Wii Remotes connected at once.

63
http//www.winsupersite.com/showcase/xbox360_vs_ps
3.asp
64

Graphics Benchmarks
Benchmark established by independent
organization
Allow comparison of graphics cards performance
based standardized application cases.
Can be application-specific like SPECapc
(Application Performance Characterization)
Or general-purpose for OpenGL architectures like
SPECviewperf

65

for OpenGL-based systems
66

Accelerator boards viewperf 8.0.1 comparison

SPECviewperf is a portable OpenGL performance
benchmark
program written in CSPECviewperf reports
performance in frames per second.
There are six tests
3ds max for graphics design software.
CATIA (DX) for CAD design application.
EnSight(DRV) a 3D visualization package.
Maya, an animation application.
ProEngineer
Lightscape radiosity application for large data
sets.
Solidworks
Unigrfaphics

for OpenGL-based systems
67
Accelerator boards viewperf 9.1

larger, more complex viewsets that place greater
stress on graphics hardware
memory and list allocation improvements that
allow data to be reused and shared in the same
manner as within actual applications
better compression, enabling the inclusion of
larger viewsets
mixing of primitive types and graphics modes,
helping to ensure that optimizations for a
viewset will be reflected in real-world
performance.

68

Accelerator boards viewperf comparison

Updated regularly at www.spec.org
SPECviewperf uses a geometric mean formula to
determine scores
Geometric mean (fps) (test1 weight 1) ? (test2
weight 2)
. ?
(testN weight n)

69

Accelerator boards Viewperf comparison
70

Accelerator boards Viewperf comparison
71

72

Workstation-based architectures
Second-largest computation base
Unix system is well suited for VR multi-tasking
needs
Multi-processor, superscalar architecture is
also appropriate for VR real-time needs
Example SGI InfiniteReality

73
The SGI InfiniteReality computer

A massively parallel architecture based on
proprietary
ASIC technology Was considered for a long time
the crème-de-la-crème in VR computers.
Can have up to 24 R10,000 CPUs in the
application stage,
The geometry board consists of a host interface
processor (HIP), a geometry distributor and
geometry engines (with a FIFO queue)
The HIP task is to pull data from main memory
(using DMA) it also has its own 16 MB cache,
such that the need to pull data is reduced.

74
Influence of HIP Display List caching
75

76
The SGI InfiniteReality - continued

The HIP sends data to the geometry distributor
which
distributes the load to the geometry engines on a
least busy
fashion (with a FIFO queue)
Each Geometry Engine uses SIMD
(single-instruction-
multiple-data) by processing the three
coordinates of the vertex
in parallel on three floating-point cores.
The GE floating point core has its own ALU,
multiplier and
32-word register in a four-stage pipeline
The FIFO holds the results of the GEs output and
writes the
merged stream to the vertex bus

77

SGI Infinite Reality system architecture

Data from the vertex bus are received by the
fragment generators on the raster memory board
The fragment generator performs the texturing,
color, depth pixel interpolation and
anti-aliasing (4 to 8 sub-samples/pixel)
Their output is then distributed equally among
80 image engines on the raster board
The image engine tiling pattern is 320x80
pixels
The display hardware has dynamic video resize,
video timing and D/A conversion

Distributed VR architectures
Single-user systems
multiple side-by-side displays
multiple LAN-networked computers
Multi-user systems
client-server systems
pier-to-pier systems
hybrid systems

79

Single-user, multiple displays
(3DLabs Inc.)
80

Side-by-side displays.
Used is VR workstations (desktop), or in large
volume displays (CAVE or the Wall)
One solution is to use one PC with graphics
accelerator for every projector
This results is a rack mounted architecture,
such as the MetaVR Channel Surfer used in
flight simulators or the Princeton Display Wall

Side-by-side displays.
Another (cheaper) solution is to use one PC
only with several graphics accelerator cards
(one for every monitor). Windows 2000 allows this
option, while Windows NT allowed only one
accelerator per system
Accelerators need to be installed on a PCI bus

Genlock..
If the output of two or more graphics pipes is
used to drive monitors placed side-by-side, then
the display channels need to be synchronized
pixel-by-pixel
Moreover, the edges have to be blended, by
creating a region of overlap.

83

(Courtesy of Quantum3D Inc.)
84

Problems with non-synchronized displays...
CRTs that are side-by-side induce fields in each
other, resulting in electronic beam distortion
and flickers need to be shielded
Image artifacts reduce simulation realism,
increase latencies, and induce simulation
sickness.

85

Problems with non-synchronized CRT displays...
86

(Courtesy of Quantum3D Inc.)
87

Synchronization of displays
software synchronized system commands that
frame processing start at same time on different
rendering pipes
does not work if one pipe is overloaded one
image finishes first

Synchronization command
88

Synchronization of displays
frame buffer synchronized system commands that
frame buffer swapping starts at same time on
different rendering pipes
does not work because swapping depends on
electronic gun refresh - one buffer will swap up
to 1/72 sec before the other.

CRT
Synchronization command
Buffer
89

Synchronization of displays
video synchronized system commands that CRT
vertical beam starts at same time one CRT
becomes the master
does not work if horizontal beam is not
synchronized too (one line too many or too few).

Master CRT
Buffer
Synchronization command
Buffer
Slave CRT
90

Synchronization of displays
Best method is to have software buffer video
synchronization of the two (or more) rendering
pipes

Master CRT
Buffer
Synchronization command
Synchronization command
Synchronization command
Buffer
Slave CRT
91

Video synchronized displays (three PCs)
done
release
(Digital Video Interface- Video out)
Wildcat 4210
92

(Courtesy of Quantum3D Inc.)
93

Graphics and Haptics Pipeline Synchronization
Has to be done at the application stage to allow
decoupling of the rendering stages (have vastly
different output rates)

94

Haptic Interface Controller (embedded Pentium)
Graphics pipe and Haptics pipe
Pentium II Dual-processor Host computer
Haptic Interface
95

Physics Processing Unit (PPU)

First Physics Processing Unit made by Ageia Inc.
is called PhysX
PhysX available as an add on card (see above).
Helps the CPU do computations related to
material properties (elasticity, friction,
density)
Better smog and fog effects and more realistic
clothing simulation (characters clothes will
react differently based on the material and other
factors like rain and wind
Better fluid dynamics simulation and collision
effects Cost 160

96

Physics Processing Unit (PPU)
97

Co-located Rendering Pipelines
Another, cheaper, solution is to use a single
multi-pipe graphics accelerator
one output channel for every monitor.

Wildcat II 5110
98

Wildcat II 5110
99

Wildcat4 7210 features
38 Million Gouraud-shaded, Z-buffered
triangles/sec/
400 Megapixel/sec texture fill rate
32 light sources in hardware
Independent dual display support
1529x856 frame-sequential stereo _at_ 120 Hz.

100

Wildcat Realizm 800 features
Uses a Visual Processing Unit (VPU)
Uses OpenGL Shading Language

101

Wildcat Realizm 800 features
Texture sizes up to 4K x 4K
32 light sources in hardware
Independent dual 400 MHz 10-bit DAC
3D textures are applied throughout the volume of
a model, not just on the external surfaces

102

Computing architectures

PC Clusters
multiple LAN-networked computers
used for multiple-PC video output
used for multiple computer collaboration (when
computing power is insufficient on a single
machine) older approach.

103
Chromium cluster of 32 rendering servers and four
control servers
104

Chromium networking architecture
105

Frame refresh rate comparison
106

Princeton display wall using eight LCD rear
projectors (1998)
107

Princeton display wall eight 4-way Pentium-Pro
SMPs with ES graphics accelerators. They drive
8 Proxima 9200 LCD projectors. (1998)
108

VRX Rack - Ciara Technologies 256 Xeon processors
and 1.T TerraBytes of DDR Memory Best
price/performance ratio, Lynux and Windows OS
Ciara VRX
109

Computing architectures

Multi-User distributed remote system
architecture
Multiple modem-networked computers
multiple LAN-networked computers
multiple WAN-networked computers
what is the network topology and influence on
number of users?

110

Network connections
111

Two-User Shared Virtual Environments
These were the first multi-user environments to
be introduced (they are the simplest)
Communicate over LAN using unicast packets with
TCP/IP protocols

112

Server-mediated communication Unicast
mode Sever is bottleneck on allowable number of
clients
Server
Client 1
Client 2
Client n

(adapted from Networked Virtual Environments
Singhal and Zyda, 1999)
113
Client 2,1
Client 2,2
Client 2,n
Server-mediated communication Allows more
clients to be networked over LANs
Server 2
LAN
LAN
Server 1
LAN
Client 1,1
Client 1,2
Client 1,n
(adapted from Networked Virtual Environments
Singhal and Zyda, 1999)

114
Pier-to-pier communication Allows more clients
to be networked over LANs Can use broadcast or
multicast Reduces network traffic, BUT.. More
vulnerable to viruses, and does not work well
over WAN.
LAN
Multicast packets
Area of interest management
AOIM 1
AOIM 3
AOIM n
User 1
User 3
User n
(adapted from Networked Virtual Environments
Singhal and Zyda, 1999)
115
Hybrid network using multiple servers
communicating through multicast allows
deployment over WAN - no broadcasting allowed
WAN
Unicast packets
Unicast packets
Proxy Server 1
Proxy Server 2
Proxy Server 3
Proxy Server n
Multicast packets
LAN
User 1,1
User 1,2
User 1,n
For very large DVEs current WAN - do not support
multicasting
(adapted from Avatars in Networked Virtual
Environments Chapin,
Pandzic, Magnenat-Thalman and Thalman, 1999)
116
Example of distributed Virtual Environment (connec
tion between Geneva and Lausanne in Switzerland
Cybertennis

Write a Comment

User Comments (0)