Requirements for an end-to-end solution the Center for Plasma Edge Simulation (FSP) - PowerPoint PPT Presentation

About This Presentation

Title:

Requirements for an end-to-end solution the Center for Plasma Edge Simulation (FSP)

Description:

Requirements for an endtoend solution the Center for Plasma Edge Simulation FSP – PowerPoint PPT presentation

Number of Views:21

Avg rating:3.0/5.0

Slides: 20

Provided by: scottk47

Learn more at: https://sdm.lbl.gov

Category:

more less

Transcript and Presenter's Notes

Title: Requirements for an end-to-end solution the Center for Plasma Edge Simulation (FSP)

1
Requirements for an end-to-end solution the
Center for Plasma Edge Simulation (FSP)

SDM AHM
October 5, 2005
Scott A. Klasky
ORNL

2
Perhaps not just the CPES FSP

Can we form the CAFÉ solution?
Combustion, Astrophysics, Fusion End-to-end
framework
Combustion SciDAC
Astrophysics TSI SciDAC.
Fusion SciDACS (CPES, SWIM, GPS, CEMM)
SNS Follow closely, and try to exchange
technology.

3
Center for Plasma Edge Simulations (A Fusion
Simulation Project SciDAC)
How can a particular plasma edge condition
dramatically improve the confinement of fusion
plasma, as observed in the experiments? The
physics of the transitional edge plasma that
connects the hot core (of order
100-million-degree-C, or tens of keV) with the
material walls is the subject of this research
question. 5-year goal Predict the edge pedestal
behavior for the ITER and existing devices. This
must be answered for the success of ITER
We are developing a testable pedestal simulation
framework which incorporates the relevant
spectrum of physics processes (e.g, transport,
kinetic and magnetohydrodynamic stability and
turbulence, flows, and atomic physics in
realistic geometry) that span the range of plasma
parameters relevant to ITER.
Pedestal
Use Kepler for end-to-end solution with autonomic
high performance NXM data transfers for code
coupling, code monitoring, saving results.
M3D simulation depicting edge localized modes
(ELMs)
Input Files
Data Interpolation
MHD Linear Stability monitor
Job submission
XGC-ET Simulation on leadership-class computer
True
STABLE?
False
Noise Monitor
-ET
Distributed Storage
Distributed Storage
M3D Simulation
Portal
Data Interpolation
a
XGC-ET Compute SOL
a
Out-of-core isosurface

Codes used in this project
XGC-ET
A fully kinetic PIC code which will solve
turbulence, neoclassical, and neutral dynamics
self-consistently.
High velocity space resolution and arbitrary
shaped wall are necessary to solve this research
problem.
Will acquire the gyrokinetic machinery from the
GTC code, part of the GPS SciDAC.
Will include Degas-2 for more accurate neutral
atomic physics around the boundary.
M3D-edge
An edge modified version of M3D MHD/2-fluid code,
part of the CEMM SciDAC.
For nonlinear MHD ELM crashes.

Linear solvers
Simple preconditioners for diagonally dominant
systems
Multigrid for scalable elliptic solves.
perfect weak scaling
investigation of tree code methods (e.g. fast
multipole) for direct calculation of
electrostatic forces (i.e., PIC w/o cells)

a
a
4
Code Coupling Forming a computational pipeline

2 computers (or more)
1 computer runs in batch.
Other system(s) is for interactive parallel use.
Security will be by-passed if we can have all
computers at ORNL.

Cray XT3 XGC on 1,024P
Move 10MB lt1 second
Move 10MB lt1 second
I. cluster Mhd-L on 4P
I. cluster M3D on 32P
30GB/minute
I. cluster Noise monitor 80P
5
Interfaces must be designed to couple codes.

What variables are to be moved/what units?
What is the data decomposition on the sending
side? On the receiving side?
Intercomm (Sussman) seems very interesting (PVM)
Development of algorithms and techniques for
effectively solving key problems in software
support for coupled simulations.
Concentrate on three main issues
Comprehensive support for determining at runtime
what data is to be moved between simulations
Flexibly and efficiently determining when the
data should be moved
Effectively deploying coupled simulation codes in
a Grid computing environment.
A major goal is to minimize the changes that must
be made to each individual simulation code.
Accomplished by having an individual simulation
model only specify what data will be made
available for a potential data transfer and not
specify when an actual data transfer will take
place.
Decisions about when data transfers will take
place will be made through a separate
coordination specification, that generally will
be provided by the person building the complete
coupled simulation.

6
Look at Mbs, not total data sizes

Hawkes (SciDAC 2005)
INCITE calculation
2000 Seaborg processors, 2.5 million hours total
5tb data, 9.3Mbs.
Blondin (SciDAC 2005)
4 TB, 30 hours 310Mbs
CPES code coupling 1.3Mbs, data saving (3D)
300 - 30(0)GB/10 minutes
Future is difficult to predict for data
generation rates.
Codes add more physics, which slow down the code,
algorithms speed up the code, new variables are
generated, computers speed up,
This is also true for analysis of the Data.
Do we need all of the data at all of the
timesteps before we can analyze?
Can we do analysis and data movement together?
Analysis/Visualization systems might have to be
changed.

7
What happens when the Mbs gets too large?

Must understand the features in the data.
Use AMR-like scheme to save the data.
Does the data change dramatically everywhere?
Is the data smooth in some regions?
Can save 100x in compression techniques, but must
be able to use data.
New viz/analysis tools?
Could just stitch up the grid, and use old tools.
Useful for Level of Detail Visualization (more
detail in regions which change).
Use in combination with smart data caching/
data compression (see below)

8
End-to-end/workflow requirements.

Easy to Install
Good examples (MPI, Netcdf, HDF5, LN, bbcp)
Easy to Use
Ensight-Gold
Must have value-added over simple approaches.
Value added discussed in the following slides.
Must be robust/fault tolerant.
The workflow can not crash our simulations/nodes!

9
Need a data model

Allows the CS community to design modules which
can understand the data.
Allow for netcdf, hdf5.
Develop interfaces to extract portion of the
data from the files/memory.
Must come from the application areas teaming up
with the CS community.
HDF5/Netcdf is not a data model.
Can we use the data model in SciRun/AVSExpress/Ens
ight as a start?
Meshes (uniform, rectlinear, structured,
unstructured).
Hierarchy in meshes (AMR).
Cell Centered, Vertex Centered, Edge Centered
data.
Multiple variables on a mesh.
Can we use simple APIs in the codes which can
write the data out?

10
Monitoring.

We want to watch portions of the data from the
simulation, as the simulation progresses.
Want the ability to play back from t0 to the
current frame. I.e. snapshot movies.
Want this information presented so that we can
collaborate during/after the simulation.
Highlights part of the data, to discuss with
other users.
Draw on the figures.
Mostly 1D plots, some 2d (surface/contour) plots,
some 3D plots.
Example (http//w3.pppl.gov/transp/ElVis/121472A03
_D3D.html)

11
Portal to launch workflow/monitor jobs

Use the portal as a front-end to the workflow.
Would like to see the workflow/ but not monitor
it.
Perhaps it will allow us to choose different
workflows which were created?
Would like to launch workflow, and have
automatic job submission for known
clusters/HPC.
Submit to all, kill all when one starts running

12
Users want to write their own analysis

Requires that they can do this in F90, C/C,
Python.
Need wizards to allow users to describe their
input/output.
Similar to AVS/Express, SciRun, OpenDX.
Common scenario
Users want the main data field (field_in), they
want a string (temperature), they want a
condition (gt), they want an output field. They
also want this to run on their cluster with M
processors. They also want to change the inputs
at any given time.

13
Efficient Data Movement

One same node
Use memory reference.
On same cluster
Use MPI communication.
On different clusters (NXM communication)
2 approaches memory-memory vs. files.
File approach is not always useable.
Will break the solution for code-coupling
approaches since I/O can become the bottleneck.
(open/close/read/write).
Working with Parashar/Kohl to look into the NXM
problem.
Do we make this part of Kepler?

14
Distributed Data Storage - 1

Users do NOT want to know where their data is
stored.
Users want the FASTEST possible method to get to
their data.
Users seldom look at all of their data at once.
Usually, we look at a handful of variables at a
time, with only a few time slices at a time.
(DONT need 4 TB in a second).
Users require that solution works on their laptop
when traveling! (must cache results from
local-disk).
Users do NOT want to change their
mode-of-operational during travel.

15
Distributed-data storage -2

LN is a good example of an almost useable
system.
Needs to directly understand HDF5/netcdf.
Needs to be able to cache information on local
disks, and modify the eXnodes.
Needs to be able to work with HPSS.
But this is NOT enough!

16
Smart data cache

Users typically access their data in similar
patterns.
Look at timestep 1 for variables A,B, look at
ts2 for A,B, ..
If we know what the user wants, when he/she wants
it, then we can use smart technologies.
In a collaboration, the data access gets more
complicated.
Neural Networks to the rescue!

17
Need data mining technology integrated into the
solution

We must understand the features of the data.
Requires a working relationship with app.
Scientists and computer scientists.
Want to detect features on-the-fly (from the
current, and previous timesteps).
Could feature born analysis be done by the end of
the simulation?
Pre-compute everything possible by the end of the
simulation. DO NOT REQUIRE the end user to wait
for anything that we know we want.

18
Security

Users do NOT want to deal with this.
But of course, they have to.
Will DOE require single sign-ins.
Can trusted sites talk to other trusted sites
via ports being opened from A-B?
Will this be the death for workflow automation?
Can automate data movement, if we must sign on
each time with unique passwords.

19
Conclusions.

We need Kepler in order for the CPES project to
be successful.
We need efficient NXM data moved, and monitored.
We need to be able to provide feedback to the
simulation(s).
Codes must be coupled, and we need an efficient
mechanism to couple the data.
What do we do with single-logins?
ORNL tells me that we can have ports open from
one site to another without violating the
security model. What about other sites?
Are we prepared for new architectures?
Cray XT3 has only 1 small pipe out to the world.