Title: Flexible Agent Based Simulation for Pedestrian Modelling on GPU Hardware
1Flexible Agent Based Simulation for Pedestrian
Modelling on GPU Hardware
- Paul Richmond
- The Department of Computer Science
- University of Sheffield, UK
- paul_at_dcs.shef.ac.uk
- www.dcs.shef.ac.uk/paul
- Richmond Paul, Coakley Simon, Romano Daniela,
"Cellular Level Agent Based Modelling on the
Graphics Processing Unit (with FLAME GPU)", To
appear in the special issue "Parallel and
Ubiquitous methods and tools in Systems Biology"
of the international journal Briefings in
Bioinformatics 2010 - Richmond Paul, Coakley Simon, Romano Daniela
(2009), "Cellular Level Agent Based Modelling on
the Graphics Processing Unit", Proc. of HiBi09 -
High Performance Computational Systems Biology,
14-16 October 2009,Trento, Italy - Richmond Paul, Coakley Simon, Romano
Daniela(2009), "A High Performance Agent Based
Modelling Framework on Graphics Card Hardware
with CUDA", Proc. of 8th Int. Conf. on Autonomous
Agents and Multiagent Systems (AAMAS 2009), May,
1015, 2009, Budapest, Hungary - Richmond Paul, Romano Daniela(2008), "A High
Performance Framework For Agent Based Pedestrian
Dynamics On GPU Hardware", Proceedings of EUROSIS
ESM 2008 (European Simulation and Modelling),
October 27-29, 2008, Universite du Havre, Le
Havre, France
2Introduction and Scope
- Agent Based Modelling (ABM)
- Emergence of Complex natural behaviour for simple
rules - Individuals are agents with memory
- Update own memory by considering neighbours
- Of Pedestrian Behaviour
- Continuous space mobile agents
- Discrete time steps
- On the GPU
- Why? Performance and real time visualisation
- Aim is for Flexibility Want to be able to
harness the GPUs power without modellers having
to understand GPU programming - Not Continuum based (Treuille 06) or using mobile
discrete agents (DSouza 07)
3FLAME and FLAME GPU
- What is FLAME (and what FLAME is not)?
- Flexible Large-scale Agent Modelling Environment
- XML Model specification based on the X-Machine
(state based agents) - Template system for generating simulation code
- Why extend FLAME to the GPU
- Complete modelling environment (beyond that of
simple swarms) - Formal and portable specification technique based
on the X-Machine - Many existing models to be used for benchmarking
- What is FLAME GPU
- Data parallel implementation of FLAME using CUDA
(with real time visualisation) - Cost effective solution for high performance ABM
- XSLT Driven Templates (rather than the XParser)
4Programming the GPU
- Purpose of the GPU
- Data parallel device for operation on streams of
data - Programming for General Purpose Use
- Graphics API Technique
- Not ideal
- High Level Alternatives
- Brook GPU (Buck 04) SIMD Stream programming
extension for C - Sh (McCool 02) C language with a Compiler for
GPU backends - Hardware Specific
- Stream SDK Low level ATI specific native
instruction set and High Level support with Brook
- CUDA NVIDIA programming for GPU using a compiler
and a C syntax with extensions - OpenCL New standard but growing, limited support
- CUDA
- GPU is a coprocessor to CPU (with its own global
memory) - Many light weight parallel threads grouped into
regular sized blocks (execution units) - Threads in same execution unit perform the
instructions (SIMD)
5Mapping Agent Functions to the GPU
__FLAME_GPU_FUNC__ int input_function(
xmachine_memory_pedestrian xmemory,
xmachine_message_pedestrian_location_list
location_messages) / Get the first message
/ xmachine_message_pedestrian_location
location_message
get_first_pedestrian_location_message(location_mes
sages) / Repeat untill there are no more
messages / while(location_message) /
Process the message / if distance_check(xmemo
ry, location_message)
updateSteerVelocity(xmemory, location_message)
/ Get the next message /
location_message get_next_pedestrian_loc
ation_message(location_message,
location_messages) /
Update any other xmemory variables /
xmemory-gtx xmemory-gtvel_xTIME_STEP ...
return 0
- Each transition function is wrapped by a GPU
kernel - Each agent is a thread performing the function
- Functions can input and output messages
- Functions can output new agents (agent birth)
- An agent can be removed (agent death) by
returning non 0 value
6Implementation Techniques used within FLAME GPU
- Avoiding diversity across agents in execution
blocks - Agents are stored and processed in state lists to
avoid conditional branching - Sparse lists are compacted during births, filters
and optional message outputs - Ensure data access is performed efficiently
- Lists are stored using an Structure of Arrays
(SoA) rather than an Array of Structures (AoS)
7Message Communication
- Brute Force Communication
- Tile blocks of message lists into shared memory
to reduce global memory access (Nyland 07) - Use of Shared memory has roughly an order of
magnitude performance impact. - Spatially Partitioned Communication
- Split the environment into uniform grid based on
the message radius. - Each agent reads all messages from each
neighbouring partition - Requires the use of parallel sort and a boundary
matrix - Roughly 2/3 messages are outside the message
radius but much better than O(n)² - Discrete Agent Message Communication (CA)
- Large block of messages loaded into shared memory
- Or use the texture cache to minimise global
reads.
8A Pedestrian Model Example
- Inter agent interaction (using spatially
partitioned messaging) is based on a hybrid of
Reynolds and Social Forces - Social repulsion force
- Navigates pedestrians to area of low
concentration - Limited forward Vision
- Preference over agents in direct line of sight
- Scaled depending on distance to neighbour
- Close Range Interaction Force
- Very short range with no limited vision
- Acts as collision avoidance
9Visualisation and Animation Technique
- Agent data is already on the GPU for
visualisation - Need to draw a copy of the agent for each in the
simulation (instancing) - The model geometry can be stored on the GPU to
reduce draw calls - Only requires a single call per agent
- Each agent is displaced an orientated.
- Use Levels of Detail to avoid rendering high
detailed models for every agent - On the GPU so must remain parallel
- Sort the agents by LOD Level and render in groups
- Animation - Very simple
- Interpolate between 2 key frames
- Rotate the model depending on velocity direction
10Demo Agents coloured by LOD
11Performance Results
- Observables
- Performance Dependant on Communication Radius
- Larger communication less partitions more
agents considered per update - LOD technique has a cost
- Dont use for small populations
- Very large population sizes possible in real time
12Environment Collision Avoidance
- Discrete grid of agents to encode the environment
- Static Discrete Agents
- Repulsive forces direct agents from wall
- Automatically generated in advance
- Continuous Pedestrian Agents read discrete
messages - Apply a collision force
- Displace pedestrian agents by height value
13Long Range Navigation
- Many agents following similar paths so a global
solution is used - Fluid flow route for each path through the
environment - Calculated offline in advance by backtracking
from exit point - Smooth movement around obstacles
- Discrete Agents also responsible for pedestrian
birth allocation
14(No Transcript)
15Conclusions and Future Work
- Summary
- Flexible agent architecture for the GPU suitable
for force models - Easily extendible
- Massive performance/cost benefits
- Scope for Future Work
- Multi GPU
- Would enable extremely large populations of
systems to be simulated - For Spatial partitioning only partition
boundaries would need to be communicated between
GPU devices - Improve pedestrian models
- Improved collision detection (more accurate)
- Long range individual path planning without flow
grids - Physically accurate animation and movement
- Much larger models (need appropriate scenarios)
16References
- A. Treuille, S. Cooper, and Z. Popovic,
"Continuum crowds," in SIGGRAPH '06 ACM SIGGRAPH
2006 Papers. New York, NY, USA ACM, 2006, pp.
1160-1168. - R. M. DSouza, M. Lysenko, and K. Rahmani.
Sugarscape on steroids simulating over a million
agents at interactive rates. In Proceedings of
Agent2007, 2007. -
- Samuel Eilenberg. Automata, Languages, and
Machines. Academic Press, Inc., Orlando, FL, USA,
1974. - T. Balanescu, A. J. Cowling, H. Georgescu,
M. Gheorghe, M. Holcombe, and C. Vertan.
Communicating stream x-machines systems are no
more than x-machines. j-jucs, 5(9)494507, 1999.
http//www.jucs.org/jucs_5_9/communicating_stream
_x_machines. - Ian Buck, Tim Foley, Daniel Horn, Jeremy
Sugerman, Kayvon Fatahalian, Mike Houston, and
Pat Hanrahan. Brook for gpus stream computing on
graphics hardware. ACM Trans. Graph.,
23(3)777786, 2004. - Michael D. McCool, Zheng Qin, and Tiberiu S.
Popa. Shader metaprogramming. In HWWS 02
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS
conference on Graphics hardware, pages 5768,
Aire-la-Ville, Switzerland, Switzerland, 2002.
Eurographics Association. - Lars Nyland, Mark Harris, and Jan Prins. Fast
n-body simulation with cuda. In Hubert Nguyen,
editor, GPU Gems 3, chapter 31. Addison Wesley
Professional, August 2007.