Scalable Systems Software for Terascale Computer Centers - PowerPoint PPT Presentation

1 / 10
About This Presentation
Title:

Scalable Systems Software for Terascale Computer Centers

Description:

Coordinator: Al Geist. Participating Organizations. ORNL. ANL. LBNL. PNNL ... can run stand alone on laptop. www.epm.ornl.gov/~geist/ ORNL Electronic Notebook ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 11
Provided by: alge67
Learn more at: http://www.cels.anl.gov
Category:

less

Transcript and Presenter's Notes

Title: Scalable Systems Software for Terascale Computer Centers


1
Scalable Systems Softwarefor Terascale Computer
Centers
Coordinator Al Geist
Participating Organizations
ORNL ANL LBNL PNNL
PSC SDSC IBM Compaq
SNL LANL Ames NCSA
SGI Scyld Intel Unlimited Scale
www.scidac.org/ScalableSystems
2
The Problem Today
www.scidac.org/ScalableSystems
System administrators and managers of terascale
computer centers are facing a crisis
  • Computer centers use incompatible, ad hoc set of
    systems tools
  • Present tools are not designed to scale to
    multi-Teraflop systems
  • Commercial solutions not happening because
    business forces drive industry towards servers
    not HPC.

3
Scope of the Effort
www.scidac.org/ScalableSystems
Submit jobs To batch queue
Resource Queue Management
Accounting user mgmt
Allocation management
Allocation management
Fault Tolerance
Checkpoint restart
Checkpoint restart
Security
Job Monitoring
System Monitoring
System Build Configure
Start parallel processes
Job management
4
Goals
www.scidac.org/ScalableSystems
Collectively (with industry) agree on and specify
standardized interfaces between system components
in order to promote interoperability,
portability, and long-term usability. The
specification will proceed through a series of
open meetings following a format similar to that
used by the MPI forum. Produce a fully integrated
suite of systems software and tools for the
effective management and utilization of terascale
computational resources particularly those at the
DOE facilities. Research and development of more
advanced versions of the components required to
support the scalability, fault tolerance, and
performance requirements of large science
applications.  Carry out a software lifecycle
plan for support and maintenance of systems
software suite.
5
Impact
www.scidac.org/ScalableSystems
  • Fundamentally change the way future high-end
    systems software is developed and distributed
  • Reduced facility management costs
  • reduce need to support ad hoc software
  • better systems tools available
  • able to get machines up and running faster and
    keep running
  • More effective use of machines by scientific
    applications
  • scalable launch of jobs and checkpoint/restart
  • job monitoring and management tools
  • allocation management interface

6
Four Working Groupsto interact with
www.scidac.org/ScalableSystems
  • Node build, configuration, and information
    service
  • Resource management, scheduling, and allocation
  • Proccess management, system monitoring, and
    checkpointing
  • Validation and Integration

Electronic Notebooks keep WG on track
A main notebook for general information mtg
notes And individual notebooks for each working
group
  • Allows groups to keep track of other groups
    progress and comment on the items of overlap
  • Allows Center members and interested parties to
    see what is being defined and implemented

7
Interactions
www.scidac.org/ScalableSystems
Principle customers are sysadmin and
supercomputer managers CCA looks to Scalable
Systems to provide services to launch parallel
components on large systems and provide event
services for fault detection and monitoring. DOE
Science GRID will be involved with the Scalable
Systems through their integration of Grid tools
with the monitoring and resource management
services layer of the systems software Applicatio
ns using the terascale SciDAC resources including
climate, accelerator design, and astrophysics,
etc. will be utilizing job submission, job
monitoring, user assisted checkpointing, and
allocation tools developed by the Center. Other
organizations and vendors participating in the
Scalable Systems effort even though not funded by
SciDAC.
8
ORNL Electronic Notebook
Shared electronic notebook Accessible with
password through secure web site
  • Advantages and Features
  • lookfeel of paper notebook
  • access from any web browser
  • no software to install
  • can be shared across group
  • or setup as personal notebook
  • can run stand alone on laptop

Reading entries
Drag and drop notes from private to shared
notebooks
Annotation by remote colleagues
  • Input from
  • Keyboard
  • Files
  • Images
  • voice
  • Instruments
  • sketchpad

Personal (stand alone) notebook
www.epm.ornl.gov/geist/
9
(No Transcript)
10
(No Transcript)
11
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com