SciDAC SSS Infrastructure and BCM Update 10/2002 - PowerPoint PPT Presentation

About This Presentation
Title:

SciDAC SSS Infrastructure and BCM Update 10/2002

Description:

Abstraction. Service Directory Status. Static Schemas ... Abstraction problems. BCM model unknown. Implementation feedback. Multiple implementations help ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 13
Provided by: Naraya
Learn more at: https://www.csm.ornl.gov
Category:

less

Transcript and Presenter's Notes

Title: SciDAC SSS Infrastructure and BCM Update 10/2002


1
SciDAC SSS Infrastructure and BCM Update 10/2002
  • Narayan Desai
  • desai_at_mcs.anl.gov
  • Argonne National Laboratory

2
Overview
  • Infrastructure
  • Service Directory
  • Event Manager
  • SSSlib
  • Bindings
  • Wire Protocol Modules
  • Build and Configuration Management
  • Cluster Hardware Infrastructure
  • Build System
  • Node State Manager
  • Abstraction

3
Service Directory Status
  • Static Schemas
  • Complete Error Handling
  • Deployed
  • Reliable at Moderate Scale

4
Event Manager
  • Usage Model
  • Stable Schemas
  • Relatively Complete Error Handling
  • Completely Rewritten
  • Received Moderate Testing
  • Potential Subscription Type Differentiation

5
SSSlib
  • Robust
  • New C implementation
  • Bindings
  • C
  • Java
  • Python
  • Perl
  • Wire protocol modules
  • Basic
  • Challenge
  • http-rm (in development)
  • http (in development)

6
BCM Abstraction
  • 2nd try Abstraction
  • Components
  • Node Manager
  • Cluster Control
  • Didnt Work
  • 3rd is the charm
  • Components
  • Cluster Hardware Infrastructure
  • Build System
  • Node State Manager
  • Why we think it will work

7
Cluster Hardware Infrastructure
  • Handles new node integration
  • Abstracts cluster hardware infrastructure
  • Power Controllers
  • Serial Consoles
  • Bios Issues
  • Node Inventory
  • Node Identification

8
Node State Manager
  • Node State Tracking
  • Basic state monitoring
  • Node Administrative State
  • Online/Offline
  • Corrective Action Facility
  • Pull the plug on bad nodes
  • Unknown criterion

9
Build System
  • Disk Setup
  • Software Configuration
  • Software Deployment

10
Example Node Introduction
  • New node added
  • CHI identifies node
  • CHI hands off control of node to BS
  • BS builds node into proper configuration
  • BS hands off control of node to NSM
  • NSM can set node administrative state
  • In case of errors, node can be rebuilt or other
    actions can be taken

11
Soon
  • Start to standardize events
  • Enhance event data format?
  • Implement more wire protocols
  • Complete hot-swap tests for BCM components
  • Logic for the node state manager
  • Implement a modular cluster hardware
    infrastructure

12
Issues
  • Abstraction problems
  • BCM model unknown
  • Implementation feedback
  • Multiple implementations help
Write a Comment
User Comments (0)
About PowerShow.com