Title: Scalable Systems Software Center Resource Management and Accounting Working Group FacetoFace Meeting
1Scalable Systems Software CenterResource
Management and Accounting Working
GroupFace-to-Face MeetingJan 25-26,
2005Washington D.C.
2Resource Management and Accounting Working Group
- Working group scope
- Progress since last face-to-face
- Future Work
3Working Group Scope
- The Resource Management Working Group is involved
in the areas of resource management, scheduling
and accounting. - This working group will focus on the following
software components - Queue Manager
- Scheduler
- Accounting and Allocation Manager
- Meta Scheduler
- Other critical resource management components are
being developed in the Process Management and
Monitoring Working Group - Process Manager
- Cluster Monitor
4Resource Management Component Architecture
Grid Scheduler
Infrastructure Services
Allocation Manager
Cluster Scheduler
Discovery Service
Queue Manager
Node Monitor
Event Manager
Security System
Node Manager
Process Manager
5Resource Management Prototype Demonstration
This demo runs a simple end-to-end test with a
job being submitted running past its wallclock
limit
4 Create-Reservation
Allocation Manager
Cluster Scheduler
9 Withdraw-Allocation
2 Query-Job
7 Query-Job
8 Delete-Job
3 Query-Node
5 Run-Job
Queue Manager
Node Monitor
Job Submission Client
1 Submit-Job
0 Service-Lookup
6 Exec-Process
Process Manager
Discovery Service
6General Progress
- Protocol has stabilized very little change in
SSSRMAP Wire Protocol or Message Format - Scott - Wrote a good deal of the SSSRMAP Message
Format SDK (Python classes) - all that is left is Data integration into Request
and Response - Craig initial efforts on SSSRMAP Wire Level
integration into ssslib
7General Progress
- SC2004 release of RMWG components
- System tested and bundled w/ SSS-OSCAR 1.0
- Bamboo Queue Manager v1.0.0
- Maui Scheduler v3.2.6p10
- Gold Accounting and Allocation Manager v2.0.b1.1
- Warehouse System Monitor v0.7.0
8General Progress
- Starting to see evidences of adoption and value
add of the SSS components - Bamboo Queue Manager
- built-in support for checkpoint/restart
- PBS or LoadLeveler job submission syntax
- interfaces with ANL process manager
- has been in production use on Ames cluster for
over a year now
9General Progress
- Adoption and value add (continued)
- Gold Allocation Manager
- very successful in ensuring that the right work
gets done - very successful in establishing a project cycle
and managing capacity - Gold is in production use on multiple PNNL
systems including the 11.8TF Linux Cluster - Dozens of sites have downloaded it
- about 3 other sites currently evaluating Gold
(also began discussions with DOD HPCMP sites)
10General Progress
- Adoption and value add (continued)
- Maui Scheduler
- implemented support for checkpoint/restart
- sites are using the new resource utilization
tracking and enforcement capabilities to
advantage - because of SSS-directed work in enhanced
prioritization, throttling policies and quality
of service, sites are better able to dial in
their preferences for improved - fairness
- higher system utilization
- improved response time
- targeted cycle delivery
11General Progress
- Maui Scheduler (continued)
- Maui has been installed on over 2,500 clusters
- and downloaded over 100,000 times last year
- Maui is running on more supercomputers than any
other scheduler in the world - In early 2003 it was found to be running on (out
of top 500 list) - 15 out of the top 20
- 75 out of the top 100
12Queue Manager Progress
- v1.0 (and v1.0.1) release of Bamboo made
available - Full support for SSSRMAP v3 message format
- Submission clients support PBS in addition to
LoadLeveler style job scripts - CheckPoint/Restart manager interfaces tested and
debugged. - Job output now correct for suspended jobs.
- SSS suite was updated on cluster in Ames in
November with the full SC code release.
13Accounting and Allocation Manager Progress
- Released Gold Beta release at SC2004
- Included in SSS-OSCAR 1.0 distribution
- Beta version of Gold in production on PNNLs
11.8TF Linux cluster - Full-featured Web-based Graphical User Interface
- Performance testing and tuning carried out
- Improved robustness (timeout select in
non-blocking read/write loops prevents client and
server communication hangs)
14Accounting and Allocation Manager Progress
- Ported Gold to Tier1 and Tier2 OSs
- Added support for SQLite embedded database
- Added support for encryption/decryption (in Perl)
- Support for variable decimal precision currency
- New reservation design improves handling of
charges that span allocation boundaries - Created a project usage report
- New User Guide chapters on Allocations,
Installation, Roles, gold shell, Passwords
15Cluster Scheduler Progress
- Peer Diagnostics - added service health checks
- SSS Interface - added support for numerous job
attributes - Packaging - Enhanced packaging for pre-req
auto-detection - Security - added interface buffer overflow
prevention - Allocation Manager Interface - extended support
for allocation debit/reservation attributes - Added end-to-end support for BambooBerkeley
Checkpoint Manager based suspend/resume - General - numerous stability and usability
enhancements
16Grid Scheduler Progress
- Cluster Service API - rewrote Cluster Service
interface to use SSS job object and message layer
communication protocols - Usability - added node monitoring, job
monitoring, statistics, and job management client
commands - Submission - significantly enhanced job
submission client and Globus job staging
infrastructure - Data Staging - improved performance and
reliability of gridFTP, GASS, and SCP based data
staging - Grid Fairness - added initial support for grid
level usage policies, fairshare, and priority - General - enhanced multi-cluster job
co-allocation, improved packaging, documentation,
and internal diagnostics of Globus, network, job,
and resource failures.
17MCOM Progress(common library used by the cluster
scheduler and grid scheduler)
- XML - added failure logging and exception
handling for corrupt XML - Compression - added inline socket data
compression - Encryption - added initial key based data
encryption (not full SSS standard) - General - made general improvements in socket
communication, XML processing, SSS job
processing, and node resource monitoring
18Future Work
- General release of all components
- Including new Silver Meta-scheduler
- Increase deployment base
- Portability testing for new components
- Tier 1 LinuxRedHat (9.0)
- Tier 2 LinuxSuSE, AIX, Tru-64
- Tier 3 OS-X, Unicos
- Tier 4 HP-UX, IRIX, Solaris
- Fault Tolerance supporting 25 cluster loss
19Future Work
- Queue manager
- Add job group support (mainly for submission)
- Add Task Group support/ multi-requirement job
support to submission clients - Add Job Submission filter
- Finish final missing portions of PBS style job
language support.
20Future Work
- Accounting and Allocation manager
- General release to be made available by mid-year
- Production deployment of Gold on additional sites
- Port Gold to other OSs (Tiers 3 and 4) and
databases - Complete and test design for distributed
accounting and multi-organizational involvement
in job startup - Add support for multi-site authentication/authoriz
ation (each site having its own symmetric key) - Improvements in the web-based GUI
- Documentation to include object customization
21Future Work
- Cluster Scheduler
- Peer Diagnostics - add auto-recovery to failed
service interfaces - Resource Utilization - complete development of
all resource utilization objectives - Resource Limits - complete development of all
resource limits objectives - Checkpoint Restart - optimize resource management
for suspended jobs
22Future Work
- Grid Scheduler
- Reliability - complete Globus failure diagnostics
and auto-recovery - Data Staging - complete Globus/Non-Globus data
staging failure auto-recovery - Optimization - add network co-allocation
reservation - Fairness - complete Priority, Fairshare, and
Usage Limit based policy enforcement - Statistics - add credential, job, and cluster
based usage statistics - General - mature client commands to provide
status reporting in more intuitive manner