Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting February 24-25, 2003 - PowerPoint PPT Presentation

About This Presentation
Title:

Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting February 24-25, 2003

Description:

The Resource Management Working Group is involved in the areas of resource ... created and software available for download (intended for friendly beta testers) ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 19
Provided by: scottmj
Learn more at: https://www.csm.ornl.gov
Category:

less

Transcript and Presenter's Notes

Title: Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting February 24-25, 2003


1
Scalable Systems Software CenterResource
Management and Accounting Working
GroupFace-to-Face MeetingFebruary 24-25, 2003
2
Resource Management and Accounting Working Group
  • Working group scope
  • Progress over last quarter
  • Next steps
  • Topics for group consideration

3
Working Group Scope
  • The Resource Management Working Group is involved
    in the areas of resource management, scheduling
    and accounting.
  • This working group will focus on the following
    software components
  • Queue Manager
  • Scheduler
  • Allocation Manager (and accounting)
  • Meta Scheduler
  • Other critical resource management components are
    being developed in the Process Management and
    Monitoring Working Group
  • Process Manager
  • Cluster Monitor

4
Proposed Component Architecture
Infrastructure Services
Meta Scheduler
Discovery Service
Allocation Manager
Local Scheduler
Information Service
Queue Manager
Node Monitor
Event Manager
Color Key Working Group
Resource Management and Accounting
Execution Management and Monitoring
Node Configuration and Infrastructure
Security System
Node Manager
Process Manager
5
Resource Management Prototype Demonstration
This demo runs a simple end-to-end test with a
job being submitted running past its wallclock
limit
4 Create-Reservation
Allocation Manager
Local Scheduler
9 Withdraw-Allocation
2 Query-Job
7 Query-Job
8 Delete-Job
3 Query-Node
5 Run-Job
Queue Manager
Node Monitor
Job Submission Client
1 Submit-Job
Color Key Working Group
Resource Management and Accounting
Execution Management and Monitoring
Node Configuration and Infrastructure
0 Service-Lookup
6 Exec-Process
Process Manager
Discovery Service
6
General Progress
  • Released v1.0 Initial SSS Resource Management
    Suite
  • OpenPBS-SSS 2.3.15-1
  • Maui Scheduler 3.2.6
  • QBank 2.10.4 (accounting system)
  • Website created and software available for
    download (intended for friendly beta testers)
  • SSSRMAP protocol (using HTTP) validated in Maui
    Scheduler, Queue Manager, PBS front-end, and Gold
    Allocation Manager (complex query support
    validated and utility shown within a diversity of
    usage scenarios)
  • Scalability testing performed on all components

7
Scheduler Progress
  • Scheduler implemented interfaces for the system
    monitor, the event manager, the service
    directory, as well as a scheduling extension
    interface (allow scheduling plug-ins to enable to
    scheduling algorithms and capabilities)
  • enhanced native support for LoadLeveler, PBS,
    SGE, LSF, and BProc based systems
  • significantly enhanced web based scheduler
    documentation, additional scheduler command man
    pages for select commands
  • SSS Requirements document completed

8
Scheduler Progress
  • Security improvements
  • Support DES, HMAC, MD5, and external source
    secret key based algorithms has been implemented
    for client/server authentication
  • Improved buffer overflow protection has been
    added to critical scheduler interfaces
  • A generalized secret key management facility has
    been implemented for secure multi-party
    communication.
  • Scalability improvements
  • decreasing memory consumption by over 80
  • enabling support for up to 8,000 nodes
  • enabling support for up to 32,000 processors
  • enabling support for up to 2,000 simultaneous
    active jobs
  • enabling support for jobs requesting up to 16,000
    hosts

9
Scheduler Progress
  • Fault Tolerance
  • migration of all Resource Manager calls to a
    threaded Resource Manager interface (enabling
    scheduler survival of interface hangs and
    crashes)
  • incorporation of Resource Manager and Allocation
    Manager diagnostics and failure tracking
    statistics
  • implementation of improved data checking and
    handling routines to detect and correct corrupt
    Resource Manager data
  • Dynamic job support interfaces have been designed
  • Limited support for generic resources has been
    enabled (i.e., software licenses, network
    bandwidth, global disk caches, etc.).

10
Queue Manager Progress
  • Both Ames Queue Manager and PNNL PBS front-end
    have implemented and validated SSSRMAP HTTP
    interface
  • Replaced third-party XML parser with SSS-created
    routines
  • Created Resource Management Suite Software
    website
  • PNNL created and tested patches for PBS
    scalability improvements and packaged as RPMs
    (and tarball patch) for beta distribution
  • Requirements document completed
  • Updated Process-Manager interface for new XML
    schema
  • Ames Queue-Manager has implemented a nearly
    complete PB-like command line interface

11
Accounting and Allocation Manager Progress
  • QBank
  • a test harness was installed, test suites
    created, significant testing performed and bugs
    fixed
  • Security was strengthened (new qauth uses
    libcrypto and key in separate file for greater
    stability and so binary versions can be
    distributed)
  • The install process for QBank was streamlined and
    made non-interactive
  • Packaged in RPMs and tarballs for Linux and
    released in v1.0 SSS Resource Management System
  • Documentation was significantly improved
    including the creation of a user guide, a
    deployment guide, man pages, and updated online
    documentation

12
Accounting and Allocation Manager Progress
  • Gold
  • Time-travel implemented
  • Initial support for object-joined queries
  • Implemented Reservations
  • Implemented Balance Checking
  • Scalability Testing
  • Component-level testing was done to test timings
    to perform barrages of common accounting and
    allocation operations (charges, reservations,
    balance checks, etc.)
  • Simulations were performed with the Maui
    Scheduler to test transaction times with the
    allocation manager interface

13
Meta-Scheduler Progress
  • SSS Requirements document completed
  • Support has been added for Globus 2.0 and 2.2
    based job staging
  • The initial information service interface has
    been designed
  • Security has been enhanced by adding Globus
    credential caching and enabling generalized
    secret session key management
  • Support has been added for retrying resources
  • Additional functionality includes the basic data
    management interface and an initial file staging
    capability

14
Next Work
  • Release v2 SSS Resource Management and Accounting
    interface specification
  • Implement and test SSSRMAP security
    authentication
  • Try to get more components under a testing
    framework
  • Portability enhancements (AIX, Tru64, possibly
    Cray)

15
Next Work
  • Local Scheduler
  • Test interaction with checkpoint/restart
    mechanisms when interfaces ready
  • virtual partitioning through resource limit
    enforcement and tracking
  • quality of service support for completion time
    guarantees
  • Security integration
  • Progress on graphical interfaces

16
Next Work
  • Queue manager
  • Implement persistence via database (replacing
    flat files)
  • Add Epilogue/Prologue support and job submission
    verification script
  • Interface with Node Monitor
  • Full PBS qsub compatibility (nearly complete)
  • Implement full input/output handling (need to
    define PM interfaces, if any)
  • Add interface with Node Manager to support job
    dependent node OS image installation

17
Next Work
  • Accounting and Allocation manager
  • Quotations (Gold)
  • Flexible charging (Gold)
  • Continuing effort on open source of new and old
    Allocation Managers
  • SSSRMAP XML Security integration (Gold)
  • Support for operations on returned fields (sort,
    sum, max, unique, group by, etc)
  • Begin Portability testing for Gold and QBank

18
Issues requiring inter-group discussion
Write a Comment
User Comments (0)
About PowerShow.com