A View from the Top Preparing for Review - PowerPoint PPT Presentation

About This Presentation
Title:

A View from the Top Preparing for Review

Description:

booth space, posters, demos. Working Group Leaders ... SC2002 Systems Posters. Five Project Notebooks filling up ... Poster Presentation. External SciDAC Review ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 26
Provided by: AlGe6
Learn more at: https://www.csm.ornl.gov
Category:

less

Transcript and Presenter's Notes

Title: A View from the Top Preparing for Review


1
A View from the TopPreparing for Review
Al Geist February 24-25 Chicago, IL
2
Participating Organizations
Coordinator Al Geist
Participating Organizations
ORNL ANL LBNL PNNL
PSC SDSC IBM
SNL LANL Ames NCSA
Cray Intel Unlimited Scale
Main Web Site
www.scidac.org/ScalableSystems
3
Scalable Systems Software
IBM Cray Intel Unlimited Scale
ORNL ANL LBNL PNNL
SNL LANL Ames
NCSA PSC SDSC
Participating Organizations
Problem
  • Computer centers use incompatible, ad hoc set of
    systems tools
  • Present tools are not designed to scale to
    multi-Teraflop systems

Goals
  • Collectively (with industry) define standard
    interfaces between systems components for
    interoperability
  • Create scalable, standardized management tools
    for efficiently running our large computing
    centers

Impact
  • Reduced facility mgmt costs.
  • More effective use of machines by scientific
    applications.

www.scidac.org/ScalableSystems
To learn more visit
4
Progress so far on Integrated Suite
Grid Interfaces
Meta Scheduler
Meta Monitor
Meta Manager
Working Components and Interfaces (bold)
Meta Services
Accounting
Scheduler
System Job Monitor
Node State Manager
Service Directory
Standard XML interfaces
Node Configuration Build Manager
authentication communication
Event Manager
Allocation Management
Usage Reports
Validation Testing
Process Manager
Job Queue Manager
Components written in any mixture of C, C,
Java, Perl, and Python can be integrated into the
Scalable Systems Software Suite
Hardware Infrastructure Manager
Checkpoint / Restart
5
Review of Last Meeting
Scalable Systems Software Center
October 10-11 Houston TX
Details in Main project notebook
6
Progress Reports at Oct. mtg
Al Geist preparation for Supercomputing 2002,
booth space, posters, demos Working Group
Leaders What areas their working group is
addressing Progress report on what their
group has done Present problems being
addressed Next steps for the group
Discussion items for the larger group to
consider Demonstrations of Prototype
Components Prep for SC demo
Slides can be found in Main Notebook page 29
7
Consensus and Voting
8
Grid Interfaces
Meta Scheduler
Meta Monitor
Meta Manager
Meta Services
Accounting
Scheduler
System Job Monitor
Node Configuration Build Manager
These Interface To all
Service Directory
File System
Event Manager
Allocation Management
User DB
Process Manager
Job Queue Manager
Usage Reports
High Performance Communication I/O
User Utilities
Checkpoint / Restart
Application Environment
9
Progress Since Last Meeting
Scalable Systems Software Center
November-February
10
SciDAC Booth
11
SC2002 Systems Posters
12
Five Project Notebooks filling up
  • A main notebook for general information
  • And individual notebooks for each working group
  • Over 216 total pages 20 added since last
    meeting
  • A lot of XML scheme to comment on
  • New subscription feature

Get to all notebooks through main web site
www.scidac.org/ScalableSystems Click on side
bar or at project notebooks at bottom of page
13
Weekly Working Group Telecoms
Resource management, scheduling, and accounting
Tuesday 300 pm (Eastern) 1-800-664-0771
keyword SSS mtg Validation and Testing (hasnt
met since last year) Wednesday 100 pm
(Eastern) 1-877-540-9892 mtg code
999157 Proccess management, system monitoring,
and checkpointing Thursday 100 pm
(Eastern) 1-877-252-5250 mtg code 160910 Node
build, configuration, and information service
Thursday 300 pm (Eastern) 1-888-469-1934 mtg
code (changes)
14
This Meeting
Scalable Systems Software Center
February 24-25, 2003
15
Agenda February 24
830 Al Geist Project Status. SciDAC PI
mtg and External Project review
900 Matt Sottile Science Appliance Project
Working
Group Reports 930 Scott Jackson Resource
Management 1030 Break 1100 Erik
Debenedictis Validation and Testing 1200
Lunch (on own - walk to cafeteria) 100
Paul Hargrove Process Management 200
Narayan Desai Node Build, Configure 3.00
Break 330 Large Scale Run on
Chiba debugging components 500 Open
Discussion of Review report 530 Adjourn
Working groups may wish to hack in evening
16
Agenda February 25
830 Discussion, proposals, straw
votes Write paper on each component
Draft report in main notebook
Comments on restricted interface XML shown by
Rusty External review demo can
we? 1030 Break 1100 Al Geist
Summary PI mtg talk and poster.
External review agenda next meeting
date June 56 at Argonne. thank
our hosts ANL 1200 meeting ends
17
SciDAC PI mtg all 50 projects
March10-11, 2003 Napa California Attending for
Scalable Systems Al Geist, Brett Bode 20
minute talk presented by Al Scalable Systems,
CCA, PERC, SDM Poster Presentation
18
External SciDAC Review mtg
March12-13, 2003 Napa California Attending for
Scalable Systems Al Geist, Brett Bode, Paul
Hargrove, Narayan Desai, Mike Showerman.
(Rusty) Four ISIC Projects are reviewed
separately Scalable Systems, CCA, PERC,
SDM External review panel (8 members) Bob
Lucas, Jim McGraw, Jose Munoz, Lauren Smith,
Richard Mount, Ricky Kendall, Rod Oldehoeft, and
Tony Mezzacappa John Grosh? We owe them a
Review report Day 1 Each gets 1 ¾ hours to
present project Day 2 Each project gets grilled
by panel for 1½ hrs
19
External Review mtg Agenda
Wednesday, March 12   745 Welcome, charge to
reviewers 815 Plenary session for Common
Component Architecture ISIC 10
00 Break 1015 Plenary session for Scalable
Systems Software ISIC 1200 Reviewer
caucus   1215 Lunch   115 Plenary session
for Scientific Data Management ISIC
300 Break 315 Plenary session for
Performance Engineering ISIC 500
Reviewer caucus 530 Adjourn
20
External Review mtg Agenda
Thursday, March 13 800 Meetings between
reviewers and ISIC members
A.     Common Component
Architecture
B.     Scalable Systems Software
945 Break 1000 Meetings between reviewers
and ISIC members
C.     Scientific Data Management

D.     Performance Engineering 1145 Reviewer
Caucus/End of ISIC Reviews 1215 Lunch (on your
own) 115 Programming Models Review Session I
300 Break 315 Programming Models Review
Session II 500 Programming Models Reviewer
Caucus 530 Meeting adjourns
21
Meeting Notes
Matt- Pink a 1024 node science appliance.
Provide pseudo SSI that scales to 1024. Tolerates
failure. Singe point for management. Reduce boot
and install time by x100. Reduce number of FTP
per number of nodes. Science Appliance very
little in common with older linux. Software is
called Clustermatic linuxBIOS, Bproc, V9fs,
supermon, Panasas or Lustre (parallel file system
by someone else) Beoboot, asymmetric SSI, private
name spaces from Plan 9, BJS (Bproc Job
Scheduler) Other work ZPL (automatic check
point) Debuggers (parallel, relative debugging
Guard) port totalview. Latency tolerant
applications Users SNL/CA, U Penn,
Clemson What are overlap opportunities? Each
piece can be separated out. Supermon, Bproc
Remy will be sending more material on
collaboration soon
22
Meeting Notes
Scott- RM update. Diagram of architecture and
infrastructure services Sc02 demo what components
working. They used polling. Now moving to event
driven components Release of initial RM suite
from website http//sss.scl.ameslab.gov/software/
OpenPBS-sss 2.3.15-1 Maui scheduler
3.2.6 Qbank 2.10.4 (accounting
system) SSSRMAP protocol using HTTP
validated Scalability testing performed on all
components Scheduler progress Queue Manager
progress Accounting and Allocation Manager
progress (Qbank and Gold prototype) Meta-scheduler
progress Globus interface, Gold Information
service. Next work Release 2 of RM interface
Implement and test SSSRMAP security
authentication (XML digital sigs) Discuss need to
have SSS wrappers on initial RM suite
23
Meeting Notes
Will- Validation and Testing update Users
expect a high degree of quality in todays
HPC. Strategies QMTest RM group using it
(www.codesourcery.com) They like it easy App
test packages APITEST growing out of October
discussion C driven XML schema scriptable
test of network components blackbox testing.
Tcp, ssslib, portals support, fault injection
whitebox testing. Try to exercise all paths in
a known suite v0.1a underway 75 done
Discussion how this could be useful to Scalable
Systems Cluster Integration Toolkit (CIT) James
Laros jhlaros_at_sandia.gov management tasks on
Cplant scalable to 1800 nodes done in
Perl create Scalable Systems interface to
CIT would be a good test of implementation
of flexibility of standard. USI, IBM, and
Linux Networx looking at it.
24
Meeting Notes
Paul Process management report. Moving beyond
prototypes of Checkpoint manager beta-code
April release awaiting legal OK will do
scalability test today working on XML
interface for checkpoint/restart (draft in May)
Mike - Monitoring job, system, node, and
meta-version what data is needed an
extensible framework defined stream and
single item. working on scalability now Rusty -
Process Manager schematic of PM component
MPD-2 in python and distributed with MPICH-2
-supports separate executables, arguments, and
environment variables New XML for PM (with
queries that allow wildcards and
ranges) Combination of published interfaces, XML,
and communication lib gives us a power greater
than the sum of its parts.
25
Meeting Notes
Narayan Build and configure report Tests
suggest scalability to 2000 host
clusters Communication Infrastructure more
protocol support, high availability option. Build
and configuration complete implementation on
Chiba City second OSCAR implementation
undreway three components - hardware
manager (needs more modular, extensible design)
- build system - node manager (admin control
panel for a cluster) system diagnostics Restrictio
n Based Syntax for XML interfaces API
augmentation APIs need more documentation to
describe event handling protocol
26
Meeting Notes
John Dawson asks about license. Al says like
MPI. Don (Cray) asks about license !GNU and
holding a workshop for industry Talk with Remy
about Science Appliance collaboration Talk with
Rusty about writing a paper on each
component. Groups Work on large scalability test
on Chiba City and XTORC
Write a Comment
User Comments (0)
About PowerShow.com