Terascale Numerical Relativity using Cactus - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

Terascale Numerical Relativity using Cactus

Description:

The Cactus Code ... Cactus originated in 1997 as a code for numerical relativity, following a long ... Cactus 4.0, August 1999: Total rewrite and redesign of ... – PowerPoint PPT presentation

Number of Views:104
Avg rating:3.0/5.0
Slides: 46
Provided by: gabr77
Category:

less

Transcript and Presenter's Notes

Title: Terascale Numerical Relativity using Cactus


1
Terascale Numerical Relativity using Cactus
  • John Shalf
  • LBNL/NERSC
  • Ed Seidel, Gabrielle Allen Cactus Team
  • Max Planck Institute for Gravitational Physics,
  • (Albert Einstein Institute)

2
The Story in 5 Chapters
  • The Science
  • Cactus A Community Code
  • Grid Pervasive Access to Distributed Resources
  • Portals Spacetime Superglue to Put it All
    Together
  • Whats Next? Dynamically Computing on the Grid?

3
Chapter I
  • The Science

4
Gravitational Waves Astronomy New Field,
Fundamental New Information about the Universe
5
Motivation for Grand Challenge Simulations
  • NSF Black Hole
  • Grand Challenge
  • 8 US Institutions, 5 years
  • Towards colliding black holes
  • Examples of Future of
  • Science Engineering
  • Require Large Scale Simulations, beyond reach of
    any single machine
  • Require Large Geo-Distributed Cross-Disciplinary
    Collaborations
  • Require Grid Technologies, but not yet using
    them!
  • Both Apps and Grids Dynamic

6
Working Towards the Big Splash
  • Finite Difference Evolution of Einsteins
    Equations (ADM-BSSN method)
  • Schwartzchild (1916 solution!)
  • Kerr (Spinning, charged, 1963!)
  • Misner (head-on collision)
  • Good for calibration, but not a likely event
  • 16Gigs of Memory for 1903 Octant Symmetry in 3D
    on 512 CPU CM5 in 95
  • Grazing Collisions Full In-Spiral
  • This is astrophysically relevant!
  • No analytic solution
  • 1.5 TByte 5Tflops for bitant symmetry on NERSC-3
  • 3 Tbyte req. for full 3D
  • 10Tbytes for wave extraction
  • Initial Conditions (the next big thing)

7
2002 Big Splash on Seaborg (NERSC)
  • The Splash (Recipe)
  • The Cactus Code
  • 5 TFlop supercomputer system at NERSC Oakland
    Scientific Computing Facility (OSF)
  • 1.5 TBytes of RAM (1024x1024x768 DP x 250
    gridfunctions)
  • Set aside 5 TB of disk space (2TB for checkpoint
    alone)
  • Two Deployment Scenarios
  • 184nodes
  • 64 fat nodes
  • Consumed over 1M CPU hours in 2months (114 CPU
    years!)
  • Results?
  • Followed closely the preditions of Meudon Model
    (counter to Cook-Baumgarte Model for
    coalescences). More analysis to come!
  • Visualization of BH Merger in April Scientific
    American Article
  • Discovery Channel Movie
  • Vis by Werner Benger, production by Tom Lucas,
    Donna Cox and Bob Patterson

8
2D BBH Spacetime Splashes Circa 1992 (vis by
Mark Bajuk)
9
3D Big Splash in Scientific American(Image By
Werner Benger)
10
Evalutation of Apparent Horizon Boundary
Conditions
11
Uncovering Painfully Obvious Numerial Nonesense
12
AMR Diagnostics
Debug a Clustering Algorithm
Convergence Testing
13
Role of Visualization?
  • Research is what Im doing when I dont know
    what Im doing (Werner Von Braun)
  • Data Mining? (maybe?)
  • Drill Down (yes)
  • Larger simulations mean larger spatial dynamic
    range
  • Understanding connection between large scale and
    small scale features is critical
  • New datastructures (tensor vis, geodesics, AMR
    hierarchies, multidimensional analysis)
  • Computational Monitoring is important
  • Rapid visual inspection for quick turn-around
    during development
  • Shepherd/protect very costly hero runs
  • Best way to deal with big data is to move it as
    little as possible
  • Offline analysis is also important, but may
    involve a completely different set of tools and
    methods (even serial raytracing)
  • Physicists have little tolerance for complexity
    or software installation
  • Motivates need for vis portals and thin-client
    interfaces to vis tools.
  • More dimsmore qualitative (1D vis is still
    critical!)
  • Need vis tools customized for the domain
  • general purpose tools have too many options.
    Confusing and unwieldy

14
Multidisciplinary Scientific Communities
  • Nature is fundamentally multidisciplinary. As
    we strive to understand its complexity,
    researchers from different fields and different
    locations must become engaged in large
    multinational teams to tackle these Grand
    Challenge problems
  • Need a software infrastructure to support this
    the multidisciplinary Virtual Organization (VO)
  • Community code (open/modular/shared simulation
    codes)
  • Tools that support collaboration and data sharing
  • Location-independent equal-access to shared
    resources (visualization, supercomputers,
    experiments, telescopes etc..)

15
Chapter II
  • Cactus
  • A Community Code

16
Cactus
  • CACTUS is a freely available, modular,
  • portable and manageable environment
  • for collaboratively developing parallel, high-
  • performance multi-dimensional simulations

THE GRID Dependable, consistent, pervasive
access to high-end resources
www.CactusCode.org
17
History
  • Cactus originated in 1997 as a code for numerical
    relativity, following a long line of codes
    developed in Ed Seidels research groups, at the
    NCSA and recently the AEI.
  • Numerical Relativity complicated 3D
    hyperbolic/elliptic PDEs, dozens of equations,
    thousands of terms, many people from very
    different disciplines working together, needing a
    fast, portable, flexible, easy-to-use, code which
    can incorporate new technologies without
    disrupting users.
  • Originally Paul Walker, Joan Masso, John Shalf,
    Ed Seidel.
  • Cactus 4.0, August 1999 Total rewrite and
    redesign of code, learning from experiences with
    previous versions.

18
What Is Cactus?
  • Modular Component Architecture for Simulation
    Code Development
  • Multi-Language C,C, F90/F77
  • Tightly Integrated with Revision Control System
    (CVS)
  • Trivially Grid-Enabled
  • Open Source Community Code Distributed under GNU
    GPL and Actively Supported/Documented
  • Current Release Cactus 4.0 B12
  • Supported Architectures
  • IBM SP2
  • Cray T3E
  • Hitachi SR8000-F
  • NEC SX-5
  • Intel Linux IA32/IA64
  • Windows NT
  • MacOS-X
  • HP Exemplar
  • Sun Solaris
  • SGI Origin (n32/64)
  • Dec Alpha
  • ...

19
Modularity of Cactus...
Symbolic Manip App
Sub-app
Legacy App 2
Application 2
Application 1
...
User selects desired functionality Code
created...
Cactus Flesh
Abstractions...
Unstructured...
AMR (GrACE, Carpet, etc)
I/O layer 2
MPI layer 3
MDS/Remote Spawn
Remote Steer 2
Globus Metacomputing Services
20
Supported Simulation Types
  • Unigrid
  • Numerics Einstein, Hydro (Valencia, EOS), MHD
    (Zeus), PETSc
  • Features Comp Monitoring, Vis, Parallel I/O
    (PANDA, HDF5, FlexIO)
  • Metacomputing MPICH-G2, SC2001 Gordon Bell
    Award
  • AMR (Berger-Oliger, Berger-Collela)
  • DAGH (the Framework) 97
  • Carpet
  • PAGH/GrACE
  • Unstructured Grids
  • 99 Unstructured Grids summit _at_ LBL (AEI,Cornell,
    LLNL,Stanford,SDSC)
  • PPPL PIM, curvilinear meshes
  • Chemistry
  • U. Kansas (Karen Camarda)
  • Cornell Crack Propagation

21
Cactus Community
22
Chapter III
  • The Grid
  • Pervasive Access to Distributed Resources

23
Why Grid Computing?
  • Cactus Numerical Relativity Community has access
    to high-end resources in over ten centers in
    Europe/USA
  • They want
  • Bigger simulations, more simulations and faster
    throughput
  • Intuitive IO at local workstation
  • No new systems/techniques to master!!
  • How to make best use of these resources?
  • Provide easier access no one can remember ten
    usernames, passwords, batch systems, file
    systems, great start!!!
  • Combine resources for larger productions runs
    (more resolution badly needed!)
  • Dynamic scenarios automatically use what is
    available
  • Remote/collaborative visualization, steering,
    monitoring
  • Many other motivations for Grid computing ...

24
Grid Applications Some Examples
  • Dynamic Staging move to faster/cheaper/bigger
    machine
  • Cactus Worm
  • Multiple Universe
  • create clone to investigate steered parameter
  • Automatic Convergence Testing
  • from intitial data or initiated during simulation
  • Look Ahead
  • spawn off and run coarser resolution to predict
    likely future
  • Spawn Independent/Asynchronous Tasks
  • send to cheaper machine, main simulation carries
    on
  • Thorn Profiling
  • best machine/queue, choose resolution parameters
    based on queue
  • Dynamic Load Balancing
  • inhomogeneous loads, multiple grids
  • Intelligent Parameter Surveys
  • farm out to different machines
  • Must get application community to rethink
    algorithms

25
Grand Picture
Viz of data from previous simulations in SF café
Remote steering and monitoring from airport
Remote Viz in St Louis
Remote Viz and steering from Berlin
DataGrid/DPSS Downsampling
IsoSurfaces
http
HDF5
SP2 NERSC
Origin AEI
Globus
Simulations launched from Cactus Portal
Grid enabled Cactus runs on distributed machines
26
Remote Visualization
OpenDX
OpenDX
Amira
Amira
All Remote Files VizLauncher (download)
IsoSurfaces and Geodesics
LCA Vision
Grid Functions Streaming HDF5 (downsampling to
match bandwidth)
Use variety of local clients to view remote
simulation data. Collaborative, colleagues can
access from anywhere. Now adding matching of data
to network characteristics
Amira
27
Remote Monitoring/Steering Thorn HTTPD
  • Thorn which allows simulation any to act as its
    own web server
  • Connect to simulation from any browser anywhere
    collaborate
  • Monitor run parameters, basic visualization, ...
  • Change steerable parameters
  • See running example at www.CactusCode.org
  • Wireless remote viz, monitoring
    and steering

28
Remote Steering
Any Viz Client
HTTP
Remote Viz data
XML
HDF5
Amira
Remote Viz data
29
Vis Launcher
  • VizLauncher Output data (remote files/streamed
    data) automatically launched into appropriate
    local Viz Client (extending to include
    application specific networks)
  • Debugging information (individual thorns can
    easily provide their own information)
  • Timing information (thorns, communications, IO),
    allows users to steer their simulation for better
    performance (switch of analysis/IO)

30
Remote File Access
Viz in Berlin
VisualizationClient
Downsampling, hyperslabs
Only what is needed
Web Server
FTP Server
DPSS Server
Remote Data Server
4TB at NCSA
31
Remote File Access
HDF5 VFD/ GridFTP Clients use file
URL (downsampling,hyperslabbing)
More Bandwidth Available
NCSA (USA)
32
Chapter IV
  • Portal Architecture
  • Spacetime Superglue to make these components work
    together for the Virtual Organization

33
Cactus/ASC Portal
  • KDI ASC Project (Argonne, NCSA, AEI, LBL, WashU)
  • Technology Web Based (end user requirement)
  • Globus, GSI, DHTML, Java CoG, MyProxy, GPDK,
    TomCat, Stronghold/Apache, SQL/RDBMS
  • Portal should hide/simplify the Grid for users
  • Single access, locates resources, builds/finds
    executables, central management of parameter
    files/job output, submit jobs to local batch
    queues, tracks active jobs. Submission/management
    of distributed runs
  • Accesses the ASC Grid Testbed

34
Portal Client Layers
  • Thin Client Slow interaction, but you know its
    going to work!
  • Delivery DHTML to any ol web-browser
  • Users No time investment
  • Slender Client Faster interaction, but primary
    work on remote server. Download on every
    invocation!
  • Delivery Java applet, signed applications, DCOM,
    tiny binaries
  • Users Some time investment in aquiring compliant
    JVM.
  • Fat Clients Portal merely a data broker between
    distributed resources and your helper
    application.
  • Delivery Standalone applications of any sort (or
    even veneer)
  • Users More significant time investment to
    install helper app.

35
Computational Physics Complex Workflow
Code Dev
Sim Dev
Production
Analysis
Select largest Rsrc and run For a week
Select and Stage data to Storage array
Acquire Code Modules
Set Params Initial Data
Configure And Build
Run Many Test Jobs
Remove vis and steer
Regression
Rmt Vis
Bugs?
Novel Results?
Correct?
N
Y
Data Mine
Y
Observation
N
Y
N
Report/Fix bugs
Steer, Kill, Or restart
Archive TBs Of Data
Papers Nobel Prizes
36
Dynamic Grid Computing
Add more resources
Queue time over, find new machine
Free CPUs!!
Clone job with steered parameter
Physicist has new idea !
37
Chapter V
  • Whats Next?
  • Distributed Applications with Intelligent
    Adaptation? Nomadic Grid Entities?

38
New Paradigms
  • Dynamically reDistributed Applications
  • Code should be aware of its environment
  • What resources are out there NOW, and what is
    their current state?
  • What is my allocation?
  • What is the bandwidth/latency between sites?
  • Code should be able to make decisions on its own
  • A slow part of my simulation can run
    asynchronouslyspawn it off!
  • New, more powerful resources just became
    availablemigrate there!
  • Machine went downreconfigure and recover!
  • Need more memoryget it by adding more machines!
  • Code should be able to publish this information
    to Portal for tracking, monitoring, steering
  • Unexpected eventnotify users!
  • Collaborators from around the world all connect,
    examine simulation.
  • Two protypical examples
  • Dynamic, Adaptive Distributed Computing
  • Cactus Worm Intelligent Simulation Migration

39
Distributing Computing
  • Why do this?
  • Capability Need larger machine memory than a
    single machine has
  • Throughput For smaller jobs, can still be
    quicker than queues
  • Technology
  • Globus GRAM for job submission/authentification
  • MPICH-G2 for communications (Native MPI/TCP)
  • Cactus simply compiled with MPICH-G2
    implementation of MPI
  • gmake cactus MPIglobus
  • New Cactus Communication Technologies
  • Overlap communication/communications
  • Simulation dynamically adapts to WAN network
  • Compression/Buffer size for communication
  • Extra ghostzones, communicate across WAN every N
    timesteps
  • Available generically all applications/grid
    topologies

40
Dynamic Adaptive Distributed Computation(with
Argonne/U.Chicago)
Large Scale Physics Calculation For accuracy
need more resolution than memory of one machine
can provide
OC-12 line (But only 2.5MB/sec)
  • This experiment
  • Einstein Equations (but could be any Cactus
    application)
  • Achieved
  • First runs 15 scaling
  • With new techniques 70-85 scaling, 250GF

41
Dynamic Adaptation
  • Automatically adapt to bandwidth latency issues
  • Application has NO KNOWLEDGE of machines(s) it is
    on, networks, etc
  • Adaptive techniques make NO assumptions about
    network
  • Issues
  • More intellegent adaption algorithm
  • Eg if network conditions change faster than
    adaption
  • Next Real BH hole run across Linux Clusters for
    high quality data for viz.

Adapt
42
Cactus Worm Basic ScenarioLive Demo at
http//www.cactuscode.org
  • Cactus simulation starts
  • Queries a Grid Information Server, finds
    resources
  • Makes intelligent decision to move
  • Locates new resource migrates
  • Registers new location to GIS
  • Continues around Europe
  • Basic prototypical example of
  • many things we want to do!

43
Migration due to Contract Violation(Foster,
Angulo, Cactus Team)
44
GridLab Enabling Dynamic Grid Applications
  • Large EU Project under negotiation with EC
  • Members AEI, ZIB, PSNC, Lecce, Athens, Cardiff,
    Amsterdam, SZTAKI, Brno, ISI, Argonne, Wisconsin,
    Sun, Compaq
  • Grid Application Toolkit for application
    developers and infrastructure (APIs/Tools)
  • Will be around 20 new Grid positions in Europe !!
    Look at www.gridlab.org for details

45
More Information
  • The Science of Numerical Relativity (Chpt 1)
  • http//jean-luc.ncsa.uiuc.edu/
  • http//www.nersc.gov/
  • http//dsc.discovery.com/schedule/episode.jsp?epis
    ode23428000
  • Cactus Community Code (Chpt 2)
  • http//www.cactuscode.org/
  • The Grid/Globus (Chpt 3)
  • http//www.gridforum.org
  • http//www.globus.org/
  • http//www.zib.de/Visual/projects/TIKSL/ (the
    TIKSL project at ZIB)
  • The ASC Portal (Chpt 4)
  • http//www.ascportal.org
  • http//www-itg.lbl.gov/grid/projects/GPDK/
  • Whats Next Dynamic Grid Computing (Chpt 5)
  • http//www.gridlab.org/
Write a Comment
User Comments (0)
About PowerShow.com