The Cactus Code: A Parallel, Collaborative, Framework for Large Scale Computing - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

The Cactus Code: A Parallel, Collaborative, Framework for Large Scale Computing

Description:

SDSC, ZIB, and Garching T3E compute collision of 2 Neutron Stars, controlled from Orlando ... steering and monitoring from airport. Origin: NCSA. Remote Viz in ... – PowerPoint PPT presentation

Number of Views:62
Avg rating:3.0/5.0
Slides: 41
Provided by: gabriel1
Category:

less

Transcript and Presenter's Notes

Title: The Cactus Code: A Parallel, Collaborative, Framework for Large Scale Computing


1
The Cactus Code A Parallel, Collaborative,
Framework for Large Scale Computing
  • Gabrielle Allen
  • Max Planck Institute for Gravitational Physics,
  • (Albert Einstein Institute)

2
Outline
  • CACTUS is a freely available, modular,
  • portable and manageable environment
  • for collaboratively developing parallel, high-
  • performance multi-dimensional simulations

THE GRID Dependable, consistent, pervasive
access to high-end resources
3
History
  • Cactus originated in 1997 as a code for numerical
    relativity, following a long line of codes
    developed in Ed Seidels research groups, at the
    NCSA and recently the AEI.
  • Numerical Relativity complicated 3D
    hyperbolic/elliptic PDEs, dozens of equations,
    thousands of terms, many people from very
    different disciplines working together, needing a
    fast, portable, flexible, easy-to-use, code which
    can incorporate new technologies without
    disrupting users.
  • Originally Paul Walker, Joan Masso, John Shalf,
    Ed Seidel.
  • Cactus 4.0, August 1999 Total rewrite and
    redesign of code, learning from experiences with
    previous versions.

4
Gravitational Waves Astronomy New Field,
Fundamental New Information about the Universe
5
Numerical Relativity With Cactus
  • Biggest computations ever
  • 256 proc O2K at NCSA, 225,000 SUs, 1Tbyte Output
    Data in a Few Weeks
  • Black Holes (prime source for GW)
  • Increasingly complex collisions now doing full
    3D grazing collisions
  • Gravitational Waves
  • Study linear waves as testbeds
  • Move on to fully nonlinear waves
  • Interesting Physics BH formation in full 3D!
  • Neutron Stars
  • Developing capability to do full GR hydro
  • Now can follow full orbits!

6
What is Cactus
  • Flesh (ANSI C) provides code infrastructure
    (parameter, variable, scheduling databases, error
    handling, APIs, make, parameter parsing, )
  • Thorns (F77/F90/C/C/Java/Perl/Python) are
    plug-in and swappable modules or collections of
    subroutines providing both the computational
    instructructure and the physical application.
    Well-defined interface through 3 config files
  • Just about anything can be implemented as a
    thorn Driver layer (MPI, PVM, SHMEM, ), Black
    Hole evolvers, elliptic solvers, reduction
    operators, interpolators, web servers, grid
    tools, IO,
  • User driven easy parallelism, no new paradigms,
    flexible
  • Collaborative thorns borrow concepts from OOP,
    thorns can be shared, lots of collaborative tools
  • Computational Toolkit existing thorns for
    (Parallel) IO, elliptic, MPI unigrid driver,
  • Integrate other common packages and tools HDF5,
    Globus, PETSc, PAPI, Panda, FlexIO, GrACE,
    Autopilot, LCAVision, OpenDX, Amira, ...
  • Trivially Grid enabled!

7
Current Version Cactus 4.0
  • Cactus 4.0 beta 1 released September 1999
  • Community code Distributed under GNU GPL
  • Currently Cactus 4.0 beta 8
  • Supported Architectures
  • SGI Origin
  • SGI 32/64
  • Cray T3E
  • Dec Alpha
  • Intel Linux IA32/IA64
  • Windows NT
  • HP Exemplar
  • IBM SP2
  • Sun Solaris
  • Hitachi SR8000-F
  • NEC SX-5
  • Mac Linux
  • ...

8
Cactus Computational Toolkit Parallel utilities
(thorns) for computational scientist
  • CactusBase
  • Boundary, IOUtil, IOBasic, CartGrid3D, IOASCII,
    Time
  • CactusBench
  • BenchADM
  • CactusConnect
  • HTTPD, HTTPDExtra
  • CactusExample
  • WaveToy1DF77, WaveToy2DF77
  • CactusElliptic
  • EllBase, EllPETSc, EllSOR, EllTest
  • CactusPUGH
  • Interp, PUGH, PUGHSlab, PUGHReduce
  • CactusPUGHIO
  • IOFlexIO, IOHDF5, IsoSurfacer
  • CactusTest
  • TestArrays, TestCoordinates, TestInclude1,
    TestInclude2, TestComplex, TestInterp, TestReduce
  • CactusWave
  • IDScalarWave, IDScalarWaveC, IDScalarWaveCXX,
    WaveBinarySource, WaveToyC, WaveToyCXX,
    WaveToyF77, WaveToyF90, WaveToyFreeF90
  • external
  • IEEEIO, RemoteIO, TCPXX, jpeg6b
  • BetaThorns (In Development)
  • IOStreamedHDF5, IOJpeg, IOHDF5Util,, many more

9
How To Use Cactus
  • Application scientist usually concentrates on the
    application
  • Physics, Performance, Algorithms
  • Logically Operations on a grid (structured or
    unstructured)
  • Program in any language
  • Then takes advantage of parallel API features
    enabled by Cactus
  • IO, Data streaming, remote visualization/steering,
    AMR, MPI/PVM, checkpointing, Grid Computing,
    interpolations, reductions, etc
  • Abstraction allows one to switch between
    different MPI, PVM layers, different I/O layers,
    etc, with no or minimal changes to application!
  • (nearly) All architectures supported and
    autoconfigured
  • Common to develop on laptop (w/wo MPI) run on
    anything
  • Metacode Concept
  • Very, very lightweight, not a huge framework
  • User specifies desired code modules in
    configuration files
  • Desired code generated, automatic routine calling
    sequences, syntax checking, etc
  • You can actually read the code it creates...

10
Cactus Community
11
Grid Computing
  • AEI Numerical Relativity Group has access to
    high-end resources in over ten centers in
    Europe/USA
  • They want
  • Bigger simulations, more simulations and faster
    throughput
  • Intuitive IO at local workstation
  • No new systems/techniques to master!!
  • How to make best use of these resources?
  • Provide easier access noone can remember ten
    usernames, passwords, batch systems, file
    systems, great start!!!
  • Combine resources for larger productions runs
    (more resolution badly needed!)
  • Dynamic scenarios automatically use what is
    available
  • Many other reasons for Grid Computing for
    computer scientists, funding agencies,
    supercomputer centers ...

12
Grid-Enabled Cactus
  • Cactus and its ancestor codes have been using
    Grid infrastructure since 1993
  • Support for Grid computing was part of the design
    requirements for Cactus 4.0 (experiences with
    Cactus 3)
  • Cactus compiles out-of-the-box with Globus
    using globus device of MPICH-G(2)
  • Design of Cactus means that applications are
    unaware of the underlying machine/s that the
    simulation is running on applications become
    trivially Grid-enabled
  • Infrastructure thorns (I/O, driver layers) can be
    enhanced to make most effective use of the
    underlying Grid architecture

13
Cactus Globus
Cactus Application Thorns Distribution
information hidden from programmer Initial data,
Evolution, Analysis, etc

Grid Aware Application Thorns Drivers for
parallelism, IO, communication, data
mapping PUGH parallelism via MPI (MPICH-G2,
grid enabled message passing library)
Grid Enabled Communication Library MPICH-G2
implementation of MPI, can run MPI programs
across heterogenous computing resources
Standard MPI
Single Proc
14
Grid Experiments
  • SC93
  • remote CM-5 simulation with live viz in CAVE
  • SC95
  • Heroic I-Way experiments leads to development of
    Globus. Cornell SP-2, Power Challenge, with live
    viz in San Diego CAVE
  • SC97
  • Garching 512 node T3E, launched, controlled,
    visualized in San Jose
  • SC98
  • HPC Challenge. SDSC, ZIB, and Garching T3E
    compute collision of 2 Neutron Stars, controlled
    from Orlando
  • SC99
  • Colliding Black Holes using Garching, ZIB T3Es,
    with remote collaborative interaction and viz at
    ANL and NCSA booths
  • 2000
  • Single simulation LANL, NCSA, NERSC, SDSC, ZIB,
    Garching,
  • Dynamic distributed computing spawning new
    simulations!!

15
Grand Picture
Viz of data from previous simulations in SF café
Remote steering and monitoring from airport
Remote Viz in St Louis
Remote Viz and steering from Berlin
DataGrid/DPSS Downsampling
IsoSurfaces
http
HDF5
T3E Garching
Origin NCSA
Globus
Simulations launched from Cactus Portal
Grid enabled Cactus runs on distributed machines
16
Demo Remote Computing
  • Have most of this working now
  • Need to make it common place, and trivially
    available to users
  • Requires development of readers/networks for Viz
    clients too
  • Remote simulation
  • Monitor and steer using thorn HTTPD
  • Display live isosurfacers with thorn
    isosurfacer/IsoView GUI
  • Display full live viz with HDF5 thorns and OpenDX

17
Remote Visualization
OpenDX
OpenDX
Amira
Contour plots (download)
LCA Vision
IsoSurfaces and Geodesics
Grid FunctionsStreaming HDF5
Amira
18
Remote Visualization
  • Streaming data from Cactus simulation to viz
    client
  • Clients OpenDX, Amira, LCA Vision, ...
  • Protocols
  • Proprietary
  • Isosurfaces, geodesics
  • HTTP
  • Parameters, xgraph data, JPegs
  • Streaming HDF5
  • HDF5 provides downsampling and hyperslabbing
  • all above data, and all possible HDF5 data (e.g.
    2D/3D)
  • two different technologies
  • Streaming Virtual File Driver (I/O rerouted over
    network stream)
  • XML-wrapper (HDF5 calls wrapped and translated
    into XML)

19
Remote Visualization (2)
  • Clients
  • Proprietary
  • Amira
  • HTTP
  • Any browser ( xgraph helper application)
  • HDF5
  • Any HDF5 aware application
  • h5dump
  • Amira
  • OpenDX
  • LCA Vision (soon)
  • XML
  • Any XML aware application
  • Perl/Tk GUI
  • Future browsers (need XSL-Stylesheets)

20
Remote Visualization - Issues
  • Parallel streaming
  • Cactus can do this, but readers not yet available
    on the client side
  • Handling of port numbers
  • clients currently have no method for finding the
    port number that Cactus is using for streaming
  • development of external meta-data server needed
    (ASC/TIKSL)
  • Generic protocols
  • Data server
  • Cactus should pass data to a separate server that
    will handle multiple clients without interfering
    with simulation
  • TIKSL provides middleware (streaming HDF5) to
    implement this
  • Output parameters for each client

21
Remote Steering
Any Viz Client
HTTP
Remote Viz data
XML
HDF5
Amira
Remote Viz data
22
Remote Steering
  • Stream parameters from Cactus simulation to
    remote client, which changes parameters (GUI,
    command line, viz tool), and streams them back to
    Cactus where they change the state of the
    simulation.
  • Cactus has a special STEERABLE tag for
    parameters, indicating it makes sense to change
    them during a simulation, and there is support
    for them to be changed.
  • Example IO parameters, frequency, fields,
    timestep, debugging flags
  • Current protocols
  • XML (HDF5) to standalone GUI
  • HDF5 to viz tools (Amira)
  • HTTP to Web browser (HTML forms)

23
Thorn HTTPD
  • Thorn which allows simulation any to act as its
    own web server
  • Connect to simulation from any browser anywhere
  • Monitor run parameters, basic visualization, ...
  • Change steerable parameters
  • See running example at www.CactusCode.org
  • Wireless remote viz, monitoring
    and steering

24
Remote Steering - Issues
  • Same kinds of problems as remote visualization
  • generic protocols
  • handling of port numbers
  • broadcasting of active Cactus simulations
  • Security
  • Logins
  • Who can change parameters?
  • Lots of issues still to resolve ...

25
Remote Offline Visualization
Viz in Berlin
VisualizationClient
Downsampling, hyperslabs
Only what is needed
Remote Data Server
4TB at NCSA
26
Remote Offline Visualization
  • Accessing remote data for local visualization
  • Should allow downsampling, hyperslabbing, etc.
  • Access via DPSS is working (TIKSL)
  • Waiting for DataGrid support for HTTP and FTP to
    remove dependency on the DPSS file systems.

27
New Grid Applications
  • Dynamic Staging move to faster/cheaper/bigger
    machine
  • Cactus Worm
  • Multiple Universe
  • create clone to investigate steered parameter
    (Cactus Virus)
  • Automatic Convergence Testing
  • from intitial data or initiated during simulation
  • Look Ahead
  • spawn off and run coarser resolution to predict
    likely future
  • Spawn Independent/Asynchronous Tasks
  • send to cheaper machine, main simulation carries
    on
  • Thorn Profiling
  • best machine/queue
  • choose resolution parameters based on queue
  • .

28
New Grid Applications (2)
  • Dynamic Load Balancing
  • inhomogeneous loads
  • multiple grids
  • Portal
  • resource choosing
  • simulation launching
  • management
  • Intelligent Parameter Surveys
  • farm out to different machines
  • Make use of
  • Running with management tools such as Condor,
    Entropia, etc.
  • Scripting thorns (management, launching new jobs,
    etc)
  • Dynamic use of eg MDS for finding available
    resources

29
Dynamic Grid Computing
Add more resources
SDSC
Queue time over, find new machine
Free CPUs!!
RZG
SDSC
Clone job with steered parameter
Calculate/Output Invariants
LRZ
Archive data
Found a horizon, try out excision
Calculate/Output Grav. Waves
Look for horizon
Find best resources
Go!
NCSA
30
Users View
31
Cactus Worm
  • Egrid Test Bed 10 Sites
  • Simulation starts on one machine, seeks out new
    resources (faster/cheaper/bigger) and migrates
    there, etc, etc
  • Uses Cactus, Globus
  • Protocols gsissh, gsiftp, streams or copies data
  • Queries Egrid GIIS at each site
  • Publishes simulation information to Egrid GIIS
  • Demonstrated at Dallas SC2000
  • Development proceeding with KDI ASC (USA),
    TIKSL/GriKSL (Germany), GrADS (USA), Application
    Group of Egrid (Europe)
  • Fundamental dynamic Grid application !!!
  • Leads directly to many more applications

32
Demo Cactus Worm
  • Worm running around 10 sites of the Egrid testbed
  • Currently developing more features/fault
    tolerance/logging
  • Will run for around 1000 generations (1day) then
    dies!

33
Dynamic Grid Computing
  • Fundamental Issues (all needed for Cactus Worm)
  • Dynamic resource selection (query information
    server)
  • Authentification (how to move files, issue remote
    shell commands)
  • Executable staging (build on demand, or maintain
    database?)
  • Data migration (copy, stream, which protocol?)
  • Fault tolerance (essential!!!!)
  • Book-keeping (essential!!!! where did the
    output go, what actually happened?)
  • Publishing of simulation information (information
    should be available to you and your collaborators)

34
User Portal
  • Find resources
  • automatically finds machines with a user
    allocation (group aware!)
  • continuously monitor resources, network etc.
  • Authentification
  • single login, dont need to remember lots of
    usernames/passwords
  • Launch simulation
  • automatically create executable on chosen machine
  • write data to appropriate storage location
  • negotiate local queue structures
  • Monitor/steer simulations
  • access remote visualization and steering while
    simulation is running
  • collaborative choose who else can look in
    and/or steer
  • performance how efficient is the simulation?
  • Archiving
  • store thorn lists, parameter files, output
    locations, configurations,

35
Cactus Portal
  • KDI ASC Project
  • Technology Globus, GSI, Java Beans, DHTML, Java
    CoG, MyProxy, GPDK, TomCat, Stronghold
  • Allows submission of distributed runs
  • Accesses the ASC Grid Testbed (SDSC, NCSA,
    Argonne, ZIB, LRZ, AEI)
  • Undergoing testing by users now!
  • Main difficulty now is that it requires
    everything to work robustness!!
  • But is going to revolutionise our use of
    computing resources

36
Grid Related Projects
  • ASC Astrophysics Simulation Collaboratory
  • NSF Funded (WashU, Rutgers, Argonne, U. Chicago,
    NCSA)
  • Collaboratory tools, Cactus Portal
  • Starting to use Portal for production runs
  • E-Grid European Grid Forum (GGF Global Grid
    Forum)
  • Working Group for Testbeds and Applications
    (Chair Ed Seidel)
  • Test application CactusGlobus
  • Demos at Dallas SC2000
  • GrADs Grid Application Development Software
  • NSF Funded (Rice, NCSA, U. Illinois, UCSD, U.
    Chicago, U. Indiana...)
  • Application driver for grid software

37
Grid Related Projects (2)
  • Distributed Runs
  • AEI, Argonne, U. Chicago
  • Working towards running on several computers,
    1000s of processors (different processors,
    memories, OSs, resource management, varied
    networks, bandwidths and latencies)
  • TIKSL/GriKSL
  • German DFN funded AEI, ZIB, Garching
  • Remote online and offline visualization, remote
    steering/monitoring
  • Cactus Team
  • Dynamic distributed computing
  • Testing of alternative communication protocols
    MPI, PVM, SHMEM, pthreads, OpenMP, Corba, RDMA,
    ...
  • Developing Grid Application Development Toolkit

38
Grid Application Development Toolkit
  • Application developer should be able to build
    simulations with tools that easily enable dynamic
    grid capabilities
  • Want to build programming API to easily allow
  • Query information server (e.g. GIIS)
  • Whats available for me? What software? How many
    processors?
  • Network Monitoring
  • Decision Thorns
  • How to decide? Cost? Reliability? Size?
  • Spawning Thorns
  • Now start this up over here, and that up over
    there
  • Authentification Server
  • Issues commands, moves files on your behalf
    (cant pass-on Globus proxy)

39
Grid Application Development Toolkit (2)
  • Information Server
  • What is running where? Where to connect for
    viz/steering? What and where are other people in
    the group running?
  • Spawn hierarchies
  • Distribute/loadbalance
  • Data Transfer
  • Use whatever method is desired
  • Gsi-ssh, Gsi-ftp, Streamed HDF5, scp, GASS, Etc
  • LDAP routines for simulation codes
  • Write simulation information in LDAP format
  • Publish to LDAP server
  • Stage Executables
  • CVS checkout of new codes that become connected,
    etc
  • Etc
  • If we build this, we can get developers and users!

40
More Information ...
  • Cactus
  • Web Site www.CactusCode.org (Documentation/Tutori
    als etc)
  • Cactus Worm www.CactusCode.org/Development/Egrid.
    html
  • Global Grid Forum (Egrid)
  • www.egrid.org
  • www.gridforum.org
  • ASC Portal
  • www.ascportal.org
  • TIKSL Gigabit Computing
  • www.zib.de/Visual/projects/TIKSL/
  • Black Holes and Neutron Star Pictures and Movies
  • jean-luc.aei.mpg.de
  • Any questions cactus_at_cactuscode.org
Write a Comment
User Comments (0)
About PowerShow.com