Title: The Cactus Code: A Parallel, Collaborative, Framework for Large Scale Computing
1The Cactus Code A Parallel, Collaborative,
Framework for Large Scale Computing
- Gabrielle Allen
- Max Planck Institute for Gravitational Physics,
- (Albert Einstein Institute)
2Outline
- CACTUS is a freely available, modular,
- portable and manageable environment
- for collaboratively developing parallel, high-
- performance multi-dimensional simulations
THE GRID Dependable, consistent, pervasive
access to high-end resources
3History
- Cactus originated in 1997 as a code for numerical
relativity, following a long line of codes
developed in Ed Seidels research groups, at the
NCSA and recently the AEI. - Numerical Relativity complicated 3D
hyperbolic/elliptic PDEs, dozens of equations,
thousands of terms, many people from very
different disciplines working together, needing a
fast, portable, flexible, easy-to-use, code which
can incorporate new technologies without
disrupting users. - Originally Paul Walker, Joan Masso, John Shalf,
Ed Seidel. - Cactus 4.0, August 1999 Total rewrite and
redesign of code, learning from experiences with
previous versions.
4Gravitational Waves Astronomy New Field,
Fundamental New Information about the Universe
5Numerical Relativity With Cactus
- Biggest computations ever
- 256 proc O2K at NCSA, 225,000 SUs, 1Tbyte Output
Data in a Few Weeks - Black Holes (prime source for GW)
- Increasingly complex collisions now doing full
3D grazing collisions - Gravitational Waves
- Study linear waves as testbeds
- Move on to fully nonlinear waves
- Interesting Physics BH formation in full 3D!
- Neutron Stars
- Developing capability to do full GR hydro
- Now can follow full orbits!
6What is Cactus
- Flesh (ANSI C) provides code infrastructure
(parameter, variable, scheduling databases, error
handling, APIs, make, parameter parsing, ) - Thorns (F77/F90/C/C/Java/Perl/Python) are
plug-in and swappable modules or collections of
subroutines providing both the computational
instructructure and the physical application.
Well-defined interface through 3 config files - Just about anything can be implemented as a
thorn Driver layer (MPI, PVM, SHMEM, ), Black
Hole evolvers, elliptic solvers, reduction
operators, interpolators, web servers, grid
tools, IO, - User driven easy parallelism, no new paradigms,
flexible - Collaborative thorns borrow concepts from OOP,
thorns can be shared, lots of collaborative tools - Computational Toolkit existing thorns for
(Parallel) IO, elliptic, MPI unigrid driver, - Integrate other common packages and tools HDF5,
Globus, PETSc, PAPI, Panda, FlexIO, GrACE,
Autopilot, LCAVision, OpenDX, Amira, ... - Trivially Grid enabled!
7Current Version Cactus 4.0
- Cactus 4.0 beta 1 released September 1999
- Community code Distributed under GNU GPL
- Currently Cactus 4.0 beta 8
- Supported Architectures
- SGI Origin
- SGI 32/64
- Cray T3E
- Dec Alpha
- Intel Linux IA32/IA64
- Windows NT
- HP Exemplar
- IBM SP2
- Sun Solaris
- Hitachi SR8000-F
- NEC SX-5
- Mac Linux
- ...
8Cactus Computational Toolkit Parallel utilities
(thorns) for computational scientist
- CactusBase
- Boundary, IOUtil, IOBasic, CartGrid3D, IOASCII,
Time - CactusBench
- BenchADM
- CactusConnect
- HTTPD, HTTPDExtra
- CactusExample
- WaveToy1DF77, WaveToy2DF77
- CactusElliptic
- EllBase, EllPETSc, EllSOR, EllTest
- CactusPUGH
- Interp, PUGH, PUGHSlab, PUGHReduce
- CactusPUGHIO
- IOFlexIO, IOHDF5, IsoSurfacer
- CactusTest
- TestArrays, TestCoordinates, TestInclude1,
TestInclude2, TestComplex, TestInterp, TestReduce - CactusWave
- IDScalarWave, IDScalarWaveC, IDScalarWaveCXX,
WaveBinarySource, WaveToyC, WaveToyCXX,
WaveToyF77, WaveToyF90, WaveToyFreeF90 - external
- IEEEIO, RemoteIO, TCPXX, jpeg6b
- BetaThorns (In Development)
- IOStreamedHDF5, IOJpeg, IOHDF5Util,, many more
9How To Use Cactus
- Application scientist usually concentrates on the
application - Physics, Performance, Algorithms
- Logically Operations on a grid (structured or
unstructured) - Program in any language
- Then takes advantage of parallel API features
enabled by Cactus - IO, Data streaming, remote visualization/steering,
AMR, MPI/PVM, checkpointing, Grid Computing,
interpolations, reductions, etc - Abstraction allows one to switch between
different MPI, PVM layers, different I/O layers,
etc, with no or minimal changes to application! - (nearly) All architectures supported and
autoconfigured - Common to develop on laptop (w/wo MPI) run on
anything - Metacode Concept
- Very, very lightweight, not a huge framework
- User specifies desired code modules in
configuration files - Desired code generated, automatic routine calling
sequences, syntax checking, etc - You can actually read the code it creates...
10Cactus Community
11Grid Computing
- AEI Numerical Relativity Group has access to
high-end resources in over ten centers in
Europe/USA - They want
- Bigger simulations, more simulations and faster
throughput - Intuitive IO at local workstation
- No new systems/techniques to master!!
- How to make best use of these resources?
- Provide easier access noone can remember ten
usernames, passwords, batch systems, file
systems, great start!!! - Combine resources for larger productions runs
(more resolution badly needed!) - Dynamic scenarios automatically use what is
available - Many other reasons for Grid Computing for
computer scientists, funding agencies,
supercomputer centers ...
12Grid-Enabled Cactus
- Cactus and its ancestor codes have been using
Grid infrastructure since 1993 - Support for Grid computing was part of the design
requirements for Cactus 4.0 (experiences with
Cactus 3) - Cactus compiles out-of-the-box with Globus
using globus device of MPICH-G(2)
- Design of Cactus means that applications are
unaware of the underlying machine/s that the
simulation is running on applications become
trivially Grid-enabled - Infrastructure thorns (I/O, driver layers) can be
enhanced to make most effective use of the
underlying Grid architecture
13Cactus Globus
Cactus Application Thorns Distribution
information hidden from programmer Initial data,
Evolution, Analysis, etc
Grid Aware Application Thorns Drivers for
parallelism, IO, communication, data
mapping PUGH parallelism via MPI (MPICH-G2,
grid enabled message passing library)
Grid Enabled Communication Library MPICH-G2
implementation of MPI, can run MPI programs
across heterogenous computing resources
Standard MPI
Single Proc
14Grid Experiments
- SC93
- remote CM-5 simulation with live viz in CAVE
- SC95
- Heroic I-Way experiments leads to development of
Globus. Cornell SP-2, Power Challenge, with live
viz in San Diego CAVE - SC97
- Garching 512 node T3E, launched, controlled,
visualized in San Jose - SC98
- HPC Challenge. SDSC, ZIB, and Garching T3E
compute collision of 2 Neutron Stars, controlled
from Orlando - SC99
- Colliding Black Holes using Garching, ZIB T3Es,
with remote collaborative interaction and viz at
ANL and NCSA booths - 2000
- Single simulation LANL, NCSA, NERSC, SDSC, ZIB,
Garching, - Dynamic distributed computing spawning new
simulations!!
15Grand Picture
Viz of data from previous simulations in SF café
Remote steering and monitoring from airport
Remote Viz in St Louis
Remote Viz and steering from Berlin
DataGrid/DPSS Downsampling
IsoSurfaces
http
HDF5
T3E Garching
Origin NCSA
Globus
Simulations launched from Cactus Portal
Grid enabled Cactus runs on distributed machines
16Demo Remote Computing
- Have most of this working now
- Need to make it common place, and trivially
available to users - Requires development of readers/networks for Viz
clients too - Remote simulation
- Monitor and steer using thorn HTTPD
- Display live isosurfacers with thorn
isosurfacer/IsoView GUI - Display full live viz with HDF5 thorns and OpenDX
17Remote Visualization
OpenDX
OpenDX
Amira
Contour plots (download)
LCA Vision
IsoSurfaces and Geodesics
Grid FunctionsStreaming HDF5
Amira
18Remote Visualization
- Streaming data from Cactus simulation to viz
client - Clients OpenDX, Amira, LCA Vision, ...
- Protocols
- Proprietary
- Isosurfaces, geodesics
- HTTP
- Parameters, xgraph data, JPegs
- Streaming HDF5
- HDF5 provides downsampling and hyperslabbing
- all above data, and all possible HDF5 data (e.g.
2D/3D) - two different technologies
- Streaming Virtual File Driver (I/O rerouted over
network stream) - XML-wrapper (HDF5 calls wrapped and translated
into XML)
19Remote Visualization (2)
- Clients
- Proprietary
- Amira
- HTTP
- Any browser ( xgraph helper application)
- HDF5
- Any HDF5 aware application
- h5dump
- Amira
- OpenDX
- LCA Vision (soon)
- XML
- Any XML aware application
- Perl/Tk GUI
- Future browsers (need XSL-Stylesheets)
20Remote Visualization - Issues
- Parallel streaming
- Cactus can do this, but readers not yet available
on the client side - Handling of port numbers
- clients currently have no method for finding the
port number that Cactus is using for streaming - development of external meta-data server needed
(ASC/TIKSL) - Generic protocols
- Data server
- Cactus should pass data to a separate server that
will handle multiple clients without interfering
with simulation - TIKSL provides middleware (streaming HDF5) to
implement this - Output parameters for each client
21Remote Steering
Any Viz Client
HTTP
Remote Viz data
XML
HDF5
Amira
Remote Viz data
22Remote Steering
- Stream parameters from Cactus simulation to
remote client, which changes parameters (GUI,
command line, viz tool), and streams them back to
Cactus where they change the state of the
simulation. - Cactus has a special STEERABLE tag for
parameters, indicating it makes sense to change
them during a simulation, and there is support
for them to be changed. - Example IO parameters, frequency, fields,
timestep, debugging flags - Current protocols
- XML (HDF5) to standalone GUI
- HDF5 to viz tools (Amira)
- HTTP to Web browser (HTML forms)
23Thorn HTTPD
- Thorn which allows simulation any to act as its
own web server - Connect to simulation from any browser anywhere
- Monitor run parameters, basic visualization, ...
- Change steerable parameters
- See running example at www.CactusCode.org
- Wireless remote viz, monitoring
and steering
24Remote Steering - Issues
- Same kinds of problems as remote visualization
- generic protocols
- handling of port numbers
- broadcasting of active Cactus simulations
- Security
- Logins
- Who can change parameters?
- Lots of issues still to resolve ...
25Remote Offline Visualization
Viz in Berlin
VisualizationClient
Downsampling, hyperslabs
Only what is needed
Remote Data Server
4TB at NCSA
26Remote Offline Visualization
- Accessing remote data for local visualization
- Should allow downsampling, hyperslabbing, etc.
- Access via DPSS is working (TIKSL)
- Waiting for DataGrid support for HTTP and FTP to
remove dependency on the DPSS file systems.
27New Grid Applications
- Dynamic Staging move to faster/cheaper/bigger
machine - Cactus Worm
- Multiple Universe
- create clone to investigate steered parameter
(Cactus Virus) - Automatic Convergence Testing
- from intitial data or initiated during simulation
- Look Ahead
- spawn off and run coarser resolution to predict
likely future - Spawn Independent/Asynchronous Tasks
- send to cheaper machine, main simulation carries
on - Thorn Profiling
- best machine/queue
- choose resolution parameters based on queue
- .
28New Grid Applications (2)
- Dynamic Load Balancing
- inhomogeneous loads
- multiple grids
- Portal
- resource choosing
- simulation launching
- management
- Intelligent Parameter Surveys
- farm out to different machines
- Make use of
- Running with management tools such as Condor,
Entropia, etc. - Scripting thorns (management, launching new jobs,
etc) - Dynamic use of eg MDS for finding available
resources
29Dynamic Grid Computing
Add more resources
SDSC
Queue time over, find new machine
Free CPUs!!
RZG
SDSC
Clone job with steered parameter
Calculate/Output Invariants
LRZ
Archive data
Found a horizon, try out excision
Calculate/Output Grav. Waves
Look for horizon
Find best resources
Go!
NCSA
30Users View
31Cactus Worm
- Egrid Test Bed 10 Sites
- Simulation starts on one machine, seeks out new
resources (faster/cheaper/bigger) and migrates
there, etc, etc - Uses Cactus, Globus
- Protocols gsissh, gsiftp, streams or copies data
- Queries Egrid GIIS at each site
- Publishes simulation information to Egrid GIIS
- Demonstrated at Dallas SC2000
- Development proceeding with KDI ASC (USA),
TIKSL/GriKSL (Germany), GrADS (USA), Application
Group of Egrid (Europe) - Fundamental dynamic Grid application !!!
- Leads directly to many more applications
32Demo Cactus Worm
- Worm running around 10 sites of the Egrid testbed
- Currently developing more features/fault
tolerance/logging - Will run for around 1000 generations (1day) then
dies!
33Dynamic Grid Computing
- Fundamental Issues (all needed for Cactus Worm)
- Dynamic resource selection (query information
server) - Authentification (how to move files, issue remote
shell commands) - Executable staging (build on demand, or maintain
database?) - Data migration (copy, stream, which protocol?)
- Fault tolerance (essential!!!!)
- Book-keeping (essential!!!! where did the
output go, what actually happened?) - Publishing of simulation information (information
should be available to you and your collaborators)
34User Portal
- Find resources
- automatically finds machines with a user
allocation (group aware!) - continuously monitor resources, network etc.
- Authentification
- single login, dont need to remember lots of
usernames/passwords - Launch simulation
- automatically create executable on chosen machine
- write data to appropriate storage location
- negotiate local queue structures
- Monitor/steer simulations
- access remote visualization and steering while
simulation is running - collaborative choose who else can look in
and/or steer - performance how efficient is the simulation?
- Archiving
- store thorn lists, parameter files, output
locations, configurations,
35Cactus Portal
- KDI ASC Project
- Technology Globus, GSI, Java Beans, DHTML, Java
CoG, MyProxy, GPDK, TomCat, Stronghold - Allows submission of distributed runs
- Accesses the ASC Grid Testbed (SDSC, NCSA,
Argonne, ZIB, LRZ, AEI) - Undergoing testing by users now!
- Main difficulty now is that it requires
everything to work robustness!! - But is going to revolutionise our use of
computing resources
36Grid Related Projects
- ASC Astrophysics Simulation Collaboratory
- NSF Funded (WashU, Rutgers, Argonne, U. Chicago,
NCSA) - Collaboratory tools, Cactus Portal
- Starting to use Portal for production runs
- E-Grid European Grid Forum (GGF Global Grid
Forum) - Working Group for Testbeds and Applications
(Chair Ed Seidel) - Test application CactusGlobus
- Demos at Dallas SC2000
- GrADs Grid Application Development Software
- NSF Funded (Rice, NCSA, U. Illinois, UCSD, U.
Chicago, U. Indiana...) - Application driver for grid software
37Grid Related Projects (2)
- Distributed Runs
- AEI, Argonne, U. Chicago
- Working towards running on several computers,
1000s of processors (different processors,
memories, OSs, resource management, varied
networks, bandwidths and latencies) - TIKSL/GriKSL
- German DFN funded AEI, ZIB, Garching
- Remote online and offline visualization, remote
steering/monitoring - Cactus Team
- Dynamic distributed computing
- Testing of alternative communication protocols
MPI, PVM, SHMEM, pthreads, OpenMP, Corba, RDMA,
... - Developing Grid Application Development Toolkit
38Grid Application Development Toolkit
- Application developer should be able to build
simulations with tools that easily enable dynamic
grid capabilities - Want to build programming API to easily allow
- Query information server (e.g. GIIS)
- Whats available for me? What software? How many
processors? - Network Monitoring
- Decision Thorns
- How to decide? Cost? Reliability? Size?
- Spawning Thorns
- Now start this up over here, and that up over
there - Authentification Server
- Issues commands, moves files on your behalf
(cant pass-on Globus proxy)
39Grid Application Development Toolkit (2)
- Information Server
- What is running where? Where to connect for
viz/steering? What and where are other people in
the group running? - Spawn hierarchies
- Distribute/loadbalance
- Data Transfer
- Use whatever method is desired
- Gsi-ssh, Gsi-ftp, Streamed HDF5, scp, GASS, Etc
- LDAP routines for simulation codes
- Write simulation information in LDAP format
- Publish to LDAP server
- Stage Executables
- CVS checkout of new codes that become connected,
etc - Etc
- If we build this, we can get developers and users!
40More Information ...
- Cactus
- Web Site www.CactusCode.org (Documentation/Tutori
als etc) - Cactus Worm www.CactusCode.org/Development/Egrid.
html - Global Grid Forum (Egrid)
- www.egrid.org
- www.gridforum.org
- ASC Portal
- www.ascportal.org
- TIKSL Gigabit Computing
- www.zib.de/Visual/projects/TIKSL/
- Black Holes and Neutron Star Pictures and Movies
- jean-luc.aei.mpg.de
- Any questions cactus_at_cactuscode.org