Title: Advanced Physics for NextGen, MultiCore and PhysX Scalability
1Advanced Physics for Next-Gen, Multi-Core and
PhysX Scalability
- Philipp Hatt
- Field Applications Engineer
- AGEIA Technologies, Inc
2Topics
- Introduction
- Platforms
- Programming API
- Game and engine design
- Uses for multiple processors
- Game loop design
- Scaling needs and strategies
- Programming
- API, conventions, threading, code samples
3Platforms
- The past single-processor architectures
- PC
- Xbox
- Some parallelism with PS2 and GPU programmability
- The present its a multi-processor world!
- Xbox 360, PS3, Revolution
- Multi-core processors from Intel and AMD
- PC equipped with the AGEIA PhysX chip
4Processor
- Definition (for this talk)
- A piece of hardware that can execute code in
parallel to other processors in a system - Examples
- Additional CPUs on a motherboard
- Additional cores on a single die (dual-core)
- Additional cores on a single die, shared cache
(Xbox 360) - HT hardware (some dual-core, Xbox 360)
- PS3 SPEs
- PhysX add-in hardware
5Programming API
- Increased physics is a great way to use this
extra power - Processing needs
- Significantly lags graphics for realism
- AGEIAs NovodeX PhysX SDK
- Multi-threaded architecture
- Available on all current multi-processor
platforms - Still runs great on a single-core PC
- Results
- Hollywood-like effects
- Reduced data (physics can replace animation, e.g.)
6Game Engine Design
7Physics Running on Multiple Processors
- Application physics (processor 1)
- Physics that must run in-line with game logic
- Custom FPS character control
- Low simulation requirements frees time for more
game code - Game-play physics (processor 2)
- Typically, current-generation physics
- 3rd person character control, FPS BV rep
- Vehicle control
- Quest items, crates, etc.
8Physics Running on Multiple Processors
- Special-effects physics (processor 3)
- Particle systems, crash debris
- Grass trees
- Low-interaction physics islands
- Smoke fog
- Cloth hair
- Physics preparation
- Deformable terrain BV tree generation
- Prediction / correction (networked physics, smart
AI)
9Physics Running on Multiple Processors
10Game Loop Design
- In-frame physics
- Synchronous with game loop, runs on main
processor - Custom FPS character control
- Immediate feedback from raycasts
- Full-frame physics
- Parallel to game loop
- Results are available in time for rendering
preparation - Game-play physics
- Will not make full use of parallelism
11Game Loop Design
- Frame-delay physics
- Parallel to game loop
- One frame rendering lag
- Batched raycasts, effects physics
- 3rd person and AI character control
- Great parallelism
- Just-in-time physics
- Feed transforms meshes directly to VRAM or
shared memory - Parallel to game loop
- Great parallelism
12CPU
GPU
Traditional Physics Game Loop
PPU
Start physics
Fetch physics
User input
Animation/AI
Update graphics
Asynchronous GPU rendering
13CPU
GPU
Full-Frame Physics Game Loop
PPU
Start physics
Start physics
Fetch physics
Asynchronous game-play physics processing
User input
Animation/AI
Fetch physics
Update graphics
Asynchronous GPU rendering
14CPU
GPU
Just-in-Time Physics Game Loop
PPU
Start physics
Start physics
Fetch physics
Asynchronous fluid / particle physics processing
User input
Animation/AI
Update graphics
Fetch physics
Asynchronous GPU rendering
15CPU
GPU
Frame-Delay Physics Game Loop
PPU
Start physics
Start physics
Fetch physics
Asynchronous effects physics processing
User input
Animation/AI
Update graphics
Fetch physics
Asynchronous GPU rendering
16CPU
GPU
Tying it all Together
PPU
Start physics
Start physics
Start physics
Start physics
Fetch physics
Asynchronous effects physics processing
Asynchronous fluid / particle physics processing
Asynchronous game-play physics processing
User input
Animation/AI
Fetch physics
Update graphics
Fetch physics
Fetch physics
Asynchronous GPU rendering
17Efficient Use of Physics Processors
- Think about simulation locality
- Game-play physics is always running--perhaps
sleep-paged - Effects physics, however, only needed in local
area of player - Think in terms of physics systems
- Traditionally, particle or fluid systems
- Grass system for each 10x10 area
- More processing power?add systems or increase
complexity - Use LOD or PVS scheme to enable systems
- Fallback LOD implementations are traditional
graphics FX
18Efficient Use of Physics Processors
- Physics simulation interactions
- Limit scene-scene interactions where possible
- One-way, keyframed interactions
- Non-interactive simulations (possibly on a GPU)
19Scaling for Advanced Hardware--Capabilities
- PC, single-core
- Minimum spec target for some time to come
- many developers consider PS2 / Xbox separate
versions of game - Developers willing to commit only 10-15 of CPU
cycles - allows simulation of dozens of active objects and
constraints - hundreds of sleeping objects in simulation world
- Game-play physics likely to be the only target
20Scaling for Advanced Hardware--Capabilities
- PC, dual-core
- Higher performance cores than min-spec PC
- Using 10-30 of first core 100 of second core
- allows simulation of hundreds of active objects
and constraints - thousands of sleeping objects in simulation world
- Effects physics possible, but no fluid simulation
- Xbox 360
- Like dual-core PC, but three, higher-performance
cores - Shared L2 cache penalties offset by third core
hardware HTs - Cores possibly shared with other Xbox 360
libraries
21Scaling for Advanced Hardware--Capabilities
- Xbox 360
- Like dual-core PC, but three, higher-performance
PPC cores - Shared L2 cache penalties offset by third core
hardware HTs - Cores possibly shared with other Xbox 360
libraries - PS3
- High performance PPC processor
- Several Synergistic Processing Units (SPEs)
- More simulation potential than PC or Xbox 360
22Scaling for Advanced Hardware--Capabilities
- AGEIA PhysX-enabled PC
- Brings next-gen console power to the PC
- Thousands of active objects
- Tens of thousands of sleeping objects
- Effects physics and fluid simulation
- Might result in more traditional physics than
you can render - How to harness this capability continuum
effectively?
23Scaling for Advanced Hardware--Strategies
- Enable physics features per platform
- Consistent game-play physics on all platforms
- Effects physics on dual-core, Xbox 360, PS3, and
PhysX - Fluid simulations (water, fog, smoke) on PS3 and
PhysX - Use more complex object representations
- More detailed level geometry
- Convex hulls instead of boxes and spheres
- Multi-shape objects
- Bone-accurate ragdolls
24Scaling for Advanced Hardware--Strategies
Graphic model Which
model do you prefer?
25Scaling for Advanced Hardware--Strategies
- Use more sophisticated simulations
- Tons of physics computations--but only one
rendered mesh - Multi-constraint systems (ragdoll, cloth)
- Fluid simulations
- Explore new worlds
- Network physics prediction / correction
- Batched raycasts for more intelligent AI
- Use as part of LOD or PVS system
- Use simulation instead of animation
- Deformable objects
26Programming
27NovodeX Naming Conventions
- Actor
- A rigid body
- Position, velocity, mass, etc.
- Shape
- The collision representation for a rigid body
- Box, sphere, convex/concave mesh, composite (list
of shapes) - Scene
- A set of actors simulated together (the world)
28Creating Multiple Physics Scenes
- One thread per scene
- Happens automatically when you create a scene
- Transient worker threads may be used internally
- Actors are unique per scene
- Actors can live in multiple scenes
- Create a version for each scene
- Static level geometry
- Dynamic actors
- one scene simulates dynamic actor
- other scenes move key-framed representations
29Creating Multiple Physics Scenes
- Actors can share complex shapes
- Triangle meshes, convex hulls
- Greatly reduces memory requirements
- Aids in cache-use (shared L2 on Xbox 360, PC
caches) - Use for instanced geometry within a scene
- Use for duplicate copies of same actor in other
scenes - Double-buffered physics system
- We do the bookkeeping for you
- Dont need to store previous frame info yourself
30NovodeX PhysX SDK Threading API
- Thread-related simulation calls
- NxScenesimulate
- blocking or non-blocking flavors
- NxScenefetchResults
- blocking or non-blocking flavors
- performs the buffer swap
- fires callbacks in a batch just before the swap
- NxScenecheckResults
- checks for simulation completion
- no swap or callbacks
31NovodeX PhysX SDK Threading API
- Sub-stepping capable
- Single step versus multi-sub-step
- NxScenesetTiming((1/60.0f), 1,
NX_TIMESTEP_FIXED) - vs.
- NxScenesetTiming((1/60.0f)/4.0f, 4,
NX_TIMESTEP_FIXED) - More precise simulation
- Reduces inter-thread communication overhead
- Predictor-corrector implementation
- simplified and faster
- networking extrapolation
- smarter AI
- simulate ahead for 5 seconds with one call, not
300
32Game-Loop Sample Code
- Using NovodeX PhysX SDK
- Slide-show friendly formatting
- Formatting conventions
- //-- Comment text
- StandardLoopFunction()
- novodexRelatedFunction()
- opportunityForParallelismFunction()
33Traditional Synchronous Game-Loop
- //-- Read/write to current physics state, AA
- fiddleWithScene()
- doAmazingAIAndMore()
-
- //-- Compute next physics state, BB
(multi-threaded, blocked) - scene-gtsimulate(1.0f/60.0f, true)
- //-- Subtle timing state issues (gloss over for
now) - handleBatchedCallbacks()
- //-- Render the new physics state, BB
- prepareGeometryAndSendToGPU()
34Loop With Some Parallelism
- //-- Read/write to current physics state, AA
- fiddleWithScene()
-
- //-- Compute next physics state, BB
(non-blocking) - scene-gtsimulate(1.0f/60.0f)
-
- //-- Further reads are on AA, writes are lost
- doAmazingAIAndMore()
-
- //-- AA done. Now, like OpenGL swapBuffers call
(true-gtblock) - scene-gtfetchResults(NX_RIGID_BODY_FINISHED,
true) - handleBatchedCallbacks()
- //-- Render the new physics state, BB
- prepareGeometryAndSendToGPU()
35Loop With Even More Parallelism
- //-- Read/write to current physics state, AA
- fiddleWithScene()
- //-- Compute next physics state, BB
(non-blocking) - scene-gtsimulate(1.0f/60.0f)
-
- //-- Further reads from AA, writes lost. Render
is frame-delayed - doAmazingAIAndMore() prepareGeometryAndSendToGPU(
) -
- //-- AA done. Do more processing while waiting
for BB - while (!scene-gtfetchResults(NX_RIGID_BODY_FINISHED
, false)) - readSomePackets() pumpWindowsSome()
- sleep(0) //-- Let some other threads work,
perhaps -
-
- handleBatchedCallbacks()
36Multi-Scene Loop
- //-- Read/write to current physics states, AAi
- fiddleWithScenes()
- //-- Compute next physics states, BBi
(non-blocking) - for (int i0 iltnumScenes i)
scenesi-gtsimulate(1.0f/60.0f) -
- //-- Further reads from AAi scenes, writes
lost. Render is frame-delayed - doAmazingAIAndMore() prepareGeometryAndSendToGPU(
) - //-- Do work while waiting for all scenes to
finish processing - int j do
- for (j0 jltnumScenes j) //--
checkResults doesnt swap buffers - if (scenesj-gtcheckResults(NX_RIGID_BODY_FI
NISHED, false)) break -
- readSomePackets_pumpWindowsSome_orSleep()
- (while j ! numScenes)
- //-- Finally, fetch all physics states BBi
- for (int k0 kltnumScenes k)
37Multi-Scene, Multi-Strategy Loop
- //-- Read/write to current physics states, AAi
- fiddleWithScenes()
- //-- Start simulation on all scenes at once
- for (int i0 iltnumScenes i)
scenesi-gtsimulate(1.0f/60.0f) -
- //-- Wait for results of synchronous physics
- scenesSYNCH-gtfetchResults(NX_RIGID_BODY_FINISHED
, true) - handleBatchedCallbacks(SYNCH)
-
- //-- Except for SYNCH scene, further reads from
AA scenes, writes are lost - doAmazingAIAndMore()
- //-- Wait for results of full-frame physics
- scenesFRAME-gtfetchResults(NX_RIGID_BODY_FINISHED
, true) - handleBatchedCallbacks(FRAME)
- //-- Render new states, BBSYNCH BBFRAME,
and old state, AADELAY - prepareGeometryAndSendToGPU()
38Multi-Scene Loops
- You can get even fancier than that
- Scenes that handle adjacent cells of the game
world - MMO server implementations, allows load balancing
- border interactions somewhat tricky
- Additional, temporary worker scenes
- prepare dynamic geometry
- fracture simulation
- Lots of options
- Depends on processing requirements / game-design
- People still argue over Windows message loops!
39Handling Callbacks
- Callbacks are now generally avoided
- They provide flexibility for single-threaded
physics, but... - Inter-thread talk causes stalls for
multi-threaded SDK - Embedded callbacks are an option for ones that
require immediate attention (tire contacts,
collision filtering) - Callbacks are batched by SDK
- Current state is buffered and reported to
callback function - Cant write to scene within callback function
- cant create or delete actors
- cant move or apply forces to actors
- need to store info and process in application
codeafter fetchResults() is called
40Use of Callbacks for Parallelism
- Collision results
- Collisions of game-play objects in one scene
could be used to spawn effects physics in a
separate scene - No timing issues, as callbacks processed outside
of simulation - Trigger events
- Trigger around player / camera
- flag objects to remove from effects scene
- determine level geometry to load and process
asynchronously - Triggers around scenes
- when to pass actors between adjacent scenes
- release actors when they leave area of interest
- load / simulate actors when required (save memory)
41Conclusion
- Multi-processor hardware
- Inescapable, wave of the future
- Delivers more immersive environments via
increased physics - Game and engine design
- Multiple physics strategies should be explored
- Game loop redesign required
- Scalability needs to be addressed
- Programming
- Multi-threaded physics now required
- NovodeX PhysX SDK here to help!
42Conclusion
- For more information, contact
- AGEIA Technologies, Inc.
- http//www.ageia.com
- devrel_at_ageia.com