Progress on Release, API Discussions, Vote on APIs, and Quarterly Report - PowerPoint PPT Presentation

About This Presentation
Title:

Progress on Release, API Discussions, Vote on APIs, and Quarterly Report

Description:

Todo: migration, checkpoint file management not overflow disks (list,delete) ... Latest software release (in OSCAR) uses SSSRMAP v2 ... – PowerPoint PPT presentation

Number of Views:230
Avg rating:3.0/5.0
Slides: 29
Provided by: AlGe6
Learn more at: https://www.csm.ornl.gov
Category:

less

Transcript and Presenter's Notes

Title: Progress on Release, API Discussions, Vote on APIs, and Quarterly Report


1
Progress on Release, API Discussions,Vote on
APIs, and Quarterly Report
Al Geist May 6-7, 2004 Chicago, ILL
2
Participating Organizations
Coordinator Al Geist
Participating Organizations
ORNL ANL LBNL PNNL
PSC SDSC IBM SGI
SNL LANL Ames NCSA
Cray Intel
How do we position ourselves for the DOE
Ultrascale facility winner to be announced May
12 Regardless of who is chosen we should try to
be in a position to help with the system software
needs of the facility.
3
Scalable Systems Software
IBM Cray Intel SGI
ORNL ANL LBNL PNNL
SNL LANL Ames
NCSA PSC SDSC
Participating Organizations
Problem
  • Computer centers use incompatible, ad hoc set of
    systems tools
  • Present tools are not designed to scale to
    multi-Teraflop systems

Goals
  • Collectively (with industry) define standard
    interfaces between systems components for
    interoperability
  • Create scalable, standardized management tools
    for efficiently running our large computing
    centers

To learn more visit
www.scidac.org/ScalableSystems
4
Scalable Systems Software Suite
Updates to this diagram
Grid Interfaces
Components written in any mixture of C, C,
Java, Perl, and Python can be integrated into the
Scalable Systems Software Suite
Meta Scheduler
Meta Monitor
Meta Manager
Meta Services
Accounting
Scheduler
System Job Monitor
Node State Manager
Service Directory
Standard XML interfaces
Node Configuration Build Manager
authentication communication
Event Manager
Allocation Management
Usage Reports
Packaging Install
Process Manager
Job Queue Manager
Hardware Infrastructure Manager
Validation Testing
Checkpoint / Restart
5
Review of Last Meeting
Scalable Systems Software Center
January 15-16 Argonne
Details in Main project notebook
6
Highlights from Jan. mtg
Craig 1280 dual xeon cluster Titanium is
available this evening To test the scalability of
SSS suite. One node will be used as Head node to
install our suite and run on entire
cluster. Could build everything but Bambo and
ssslib due to Xerses Will begin to be available
at 6pm Late night session on 1280 node
testbed PM ran at 1280 worked at 4000, hung at
6000 Warehouse had a problem at 1280 and took out
head node RM components ran on head node OK until
Warehouse crashed it Scott Jackson Gold
running on 11 TF PNNL cluster Thomas Naughton
2nd release March. Discussion of how many orgs in
our group could shakedown the tarball. Group
feels better to have few very reliable components
than all components
7
Highlights from Jan. mtg (cont.)
Rusty Lusk Process Manager Spec for first
vote Presentation and discussion Who is
responsible for limited enforcement PM or QM?
I.e. Must use certain amount of memory, must not
execute OS command (in general - things that
happen after fork) Rusty says the question is
good and he needs to think about How this may
affect the interface. Other items to think about
- use of wildcard as to be returned operator
OK - Inclusion but dont show me. - Dynamic
jobs and PM. - improve readability Delay vote
until we have a written proposal.
8
Highlights from Jan. mtg
Discussion of having two XML syntax styles
(functional, object) Al says he would like to see
one common one across the suite that he didnt
care which one as long as the whole group could
agree. Narayan Restriction Syntax Overview. An
issue of uniqueness was brought up and was to be
taken into consideration by Narayan Rusty Lusk
Restriction Syntax on Chiba City David would like
to see a paper of the requirements that the
Chiba effort required. Andrew and Paul and Craig
offer to investigate a prototype translator To
see how / if it is possible. Investigate
standardization of tokens across the two syntax
9
Progress Since Last Meeting
Scalable Systems Software Center
January-May
10
SciDAC PI mtg March 22-24, 2004
In Charleston SC with several attending for
Scalable Systems 2 page project summary
report Annual report for Fred 20 minute talk
presented by Rusty Fred asked each ISIC to use
new speaker Poster Presentation by
Stephen/John
11
Systems Software Suite 2nd Release
Target Date March 04 So we could announce it
at the PI meeting. Real Status? SSS-OSCAR will
hear more in next talk Need way to test that the
suite is installed correctly
12
Five Project Notebooks
  • A main notebook for general information
  • And individual notebooks for each working group
  • Over 300 total pages
  • BC and PM groups need to get specs into their
    notebooks
  • Add Telecom meeting notes even if short (Kudos to
    RM group)

Get to all notebooks through main web site
www.scidac.org/ScalableSystems Click on side
bar or at project notebooks at bottom of page
13
Bi-Weekly Working Group Telecoms RM is only notes
I see in notebook
Resource management, scheduling, and accounting
Tuesday 300 pm (Eastern) 1-800-664-0771
keyword SSS mtg Proccess management,
monitoring, and checkpointing Thursday 100
pm (Eastern) 1-877-252-5250 mtg code
160910 Node build, configuration, and
information service Thursday 300 pm
(Eastern) 1-888-469-1934 mtg code (changes)
14
This Meeting
Scalable Systems Software Center
May 6-7, 2004
15
Major Topics this Meeting
Stability of Systems Software Suite second
release is out. Are we ready for outside
users? Quarterly Report Due would like to get
one to Fred by end of May. Will need text from WG
leaders. Formal API presentations and voting -
we left several things hanging last meeting MICS
PI Mtg - August 9-12 at Argonne. A good time to
have a highlight of outside user(s) SC04 Mtg -
November in Pittsburg. Talks? Tutorial? Birds of
a feather?
16
Agenda May 6
830 Al Geist Project Status. 915
Thomas Naughton SSS OSCAR software suite
release
Working Group Reports Progress report on what
their group has done API Proposals for adoption
by the group Progress on software suite
improvements 930 Narayan Desai Node Build,
Configure 1030 Break 1130 Will McClendon
Validation and Testing 1230 Lunch (on own
cafeteria) 130 Ron Oldfield ASAP
testing, and formalism issues 200 Paul
Hargrove Process Management Craig and Rusty
300 Scott Jackson Resource Management
400 Paul/Craig findings about trying to build
a syntax translator 430 Group Discussion
on getting outside users of 2nd release 500
Al Discussion on SC04, other conferences,
papers, etc. 530 Adjourn
17
Agenda May 7
830 Discussion, proposals, votes Craig
discussion Paul straw vote on two syntax
Rusty - Process Manager proposal (deferred)
Scott Allocation Manager proposal
(deferred) Al - Quarterly report,
papers, SC04, other meetings. 1030
Break 1100 Al Geist Release 2 and outside
users (Jazz? Ram? NCSA? SNL?)
MICS PI Mtg August at Argonne (news to come)
next meeting date August 26-27,
2004 location Argonne 1200 meeting ends
18
Meeting notes
Al Geist presents project overview and goals
for this meeting Thomas Naughton SSS-OSCAR in
tarball is Bamboo, BRLC, Gold, LAM/MPI, MAUI-SSS,
SSSLib, Warehouse, MPD2 SSSLib contains SD, EM,
PM, BCM, NSM, NHw, plus communication Todo bug
tracker, test sss-oscar-v2a6-v3.0 for
pre-release, Documentation- use scidac review 1
pager, add license-sss to directory Need A test
suite and a few test machines to test
on Discussion on APItest and who creates tests,
etc. Each does individual Establish release
schedule thru SC04 Add easier way for authors to
test just their stuff SC04 fully tested
release v1.0 with all SSS components code freeze
Friday September 3
19
Meeting notes
Narayan Dasi Build Configure Library
improvements- bugfixes, testing of java support,
SSL testing Infrastructure Improvements-sss
python library improvements, EM bugfixes BCM
component usage experience Hardware
infrastructure still seeking purpose Restriction
Syntax examples given and discused craig
thankful that !d (dont display this field) now
works Uniqueness issue-default is to return all
duplicates new flag uniquetrue to remove
duplicates much discussion. Rusty suggests
remove only duplicate lines Paul brings up the
problem on action commands ie kill jobs
twice Al says the problem is not solvable in
general in restriction syntax Scott asked if RMAP
syntax can handle this? Much work on the board.
And question of atomicity of queries which
require multiple SQL queries to complete.
20
Meeting notes
Will McClendon Component Interface
Testing APITest v0.1.2 It is now available by
FTP by putting it under GPL Cplant
license ftp//ftp.sandia.gov/outgoing/apitest
(also in notebook) Not integrated back into
ssslib HTTP Interface development Twisted
Python framework Info and www.effbot.org Scott
helped find bug in python popen3 now uses
Twisted SpawnProcess Better support for browsing
test data within session Batch and test data
stored in an in-memory in XML file
format writing out data to file available
soon Shows an XML example that runs test. Several
questions answered Shows an XML batch file
example. Runs live demo works fine. Discussion
follows. Ron Oldfield replacing Eric
DeBenedictis who is moving to other SNL
jobs -ORNL help set up a testing
environment -Testing for correct installation and
individual tests, then whole suite test
21
Meeting notes
Ron Oldfield (cont) simulating real
workloads performance and scalability testing
needed in the future portability is important
for our reference implementation discussion code
portability vs feature portability authorization
also needs testing What are the issues in
lightweight OS Standard naming conventions both
format and semantics someone really needs to go
through the existing schemaes RMAP dictionary
makes a good starting point Paul Hargrove
process management Still continue development on
all three components Syntax translation effort to
be discussed later today. Checkpoint
pre-emption (suspend and resume)
works -checkpointing (ckpt works, restart in
progress) Todo migration, checkpoint file
management not overflow disks
(list,delete) Query- can I restart here
22
Meeting notes
Paul Hargrove process management
(cont) Suspend/resume works with Bamboo, SD, EM,
OM, PM components Still need to design
restart-time interactions with RM group Open
files support under testing Bug fix releases as
needed. Checkpoint manger outstanding
issues Implement full interface using
restriction syntax, event generation, error
reporting Must implement file management think
ls and rm, expiration Craig Steffan no
slides Tried run on 1280 nodes on Tungsten
failed, did run on 128 Can now run on 1024 nodes.
Being stopped by sockets limit Harvesting can
now be done of other info f.e. myrinet HW Next
adding support for job management
start interfacing with Build group help
to get it on Chiba
23
Meeting notes
Rusty Lusk process manager update PM component
added limits interface, dynamic jobs
(mpi_comm_spawn) can spawn lots of nodes and
the use unused ones as needed show limits
spec MPD2 improvements found by production
use on chiba support for limits support
for mpi_comm_spawn interactive debugging via
mpigdb allows control of stdin, stderr,
stdout Future need to work more closely with QM
QM interface for requesting dynamic jobs
24
Meeting notes
Multi-step job
Scott Jackson resource manager update Diagram
on board Released SSSRMAPv3 spec New things -
wire protocol - message format - job
groups Latest software release (in OSCAR) uses
SSSRMAP v2 Second release of Bamboo in March w/
epilogue and prologue support Gold now fully
SSSRMAP v2 - second alpha release due June -
which will be in Perl (first release in Java ran
into memory size limits) - user guide done -
first release running on PNNLs SGI Altix Testing
using APITest begun Silver several,various
improvements in XML Future work implement
SSSRMAP v3 in the components - merger of Maui
3.2 and SSS. Integrate chkpt/restart. Limit
enforcement - now SSS affects all Maui users.
Ability to handle dynamic jobs
Job group
Job
Job
Job
T
Task group
T T T T
25
Meeting notes
Paul translator report (no slides) looking at
the two syntax and seeing if we could automate
Translation between sssrmap and restriction
syntax Found sssrmap could say 4ltproclt16 but
not in RS RS band aid special operators to
handle ranges For multiple table queries nested
RS syntax doesnt have Information (primary data
type) to know how to combine multiple SQL
results There is no way to translate between
these cases. Paul discourages the implementation
of a translator.
26
Meeting notes Day 2
Craig General thoughts on official V1.0 (no
slides) Released at SC04 this will be the first
time many people will see Our orthogonal
directions in syntax is damaging If we dont
make a decision soon - project progress towards
V1.0 Brett, who works with both, favors the
SSSRMAP He likes the more descriptive nature of
it and OO nature. Rusty says that we need two
written proposals for a component that we can
compare and vote on otherwise we are just all
talk. Paul says the one is better but two is not
too bad. Scott doesnt think we can reconcile
Paul asks for straw vote for a preference, Scott
seconds SSRMAP 7 and 5 institutions (but
one is Al) Restriction Syntax - 3 all ANL
Abstain 3 and 2 institutions Craig says he will
do whatever it takes to make either work. he
is going to make ssslib SSSRMAP work Neil says
users are guiding factor and RMAP better there
Paul says understandability and acceptability is
key and RMAP is better Both say that RS is more
compact and elegant.
27
Meeting notes Day 2 (cont)
Narayan- asks does it just need documentation and
tutorials Paul says no. There is closer match for
SOAP et al. the OO was not a factor in his
choice, but it is more popular today. Neil says
potential users wont have a Narayan to figure
this out. Components are both client and server
so developer has to know syntax. Rusty if there
was something else added to RS that made it
easier to use or understand. He is not sure it
is a good idea. Will documentation is better in
RMAP and he has looked at RMAP more Would all
this stuff be more abstracted? User does as
little as they can read manual only after they
get stuck. Doesnt care as long we pick ONE!
Need to have a same look and feel across the
project. Rick I dont care which. I dont like
XML. What about the SD and EM that are already
accepted. Al says that he feels that RMAP
would be more acceptable to vendors and this
would be a critical to long term success of the
project. Paul says that Process manager document
is not complete enough to vote on at this time.
28
Meeting notes Day 2 (cont)
Discussion -
Write a Comment
User Comments (0)
About PowerShow.com