Managing Configuration of Computing Clusters with Kickstart and XML using NPACI Rocks - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Managing Configuration of Computing Clusters with Kickstart and XML using NPACI Rocks

Description:

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE. Managing ... Bootable CD floppy which contains all the packages and site configuration info ... – PowerPoint PPT presentation

Number of Views:78
Avg rating:3.0/5.0
Slides: 41
Provided by: Phi675
Category:

less

Transcript and Presenter's Notes

Title: Managing Configuration of Computing Clusters with Kickstart and XML using NPACI Rocks


1
Managing Configuration of Computing Clusters
withKickstart and XMLusing NPACI Rocks
  • Philip M. Papadopoulos
  • Program Director, Grid and Cluster Computing
  • San Diego Supercomputer Center
  • University of California, San Diego

2
The Rocks Guys
  • Philip Papadopoulos
  • Parallel message passing expert (PVM and Fast
    Messages)
  • Mason Katz
  • Network protocol expert (x-kernel, Scout and Fast
    Messages)
  • Greg Bruno
  • 10 years experience with NCRs Teradata Systems
  • Builders of clusters which drive very large
    commercial databases
  • All three of us have worked together for the past
    3 years building NT and Linux clusters

3
Computing Clusters
  • Background
  • Overview of the Rocks Methodology and Toolkit
  • Description based configuration
  • Taking the administrator out of cluster
    administration
  • XML-based assembly instructions
  • Whats next

4
Scoping Rules
  • Focused on computing clusters
  • Large number of nodes that need similar system
    software footprints
  • MPI-style parallelism is the dominant application
    model
  • Not assuming homogeneity of hardware
    configurations
  • Do assume the same OS
  • Even homogeneous systems exhibit hardware
    differences
  • Not high-availability clusters
  • Our techniques can help here, but we dont
    address the specific software needs of HA

5
Many variations on a basic layout
Front-end Node(s)
Power Distribution (Net addressable units as
option)
Public Ethernet
Fast-Ethernet Switching Complex
Gigabit Network Switching Complex
6
Current Configuration of the Meteor
  • Rocks v2.2 (RedHat 7.2)
  • 2 Frontends, 4 NFS Servers
  • 100 nodes
  • Compaq
  • 800, 933, IA-64
  • SCSI, IDA
  • IBM
  • 733, 1000
  • SCSI
  • 50 GB RAM
  • Ethernet
  • For management
  • Myrinet 2000

7
NPACI Rocks Toolkit rocks.npaci.edu
  • Techniques and software for easy installation,
    management, monitoring and update of Linux
    clusters
  • Installation
  • Bootable CD floppy which contains all the
    packages and site configuration info to bring up
    an entire cluster
  • Management and update philosophies
  • Trivial to completely reinstall any (all) nodes.
  • Nodes are 100 automatically configured
  • Use of DHCP, NIS for configuration
  • Use RedHats Kickstart to define the set of
    software that defines a node.
  • All software is delivered in a RedHat Package
    (RPM)
  • Encapsulate configuration for a package (e.g..
    Myrinet)
  • Manage dependencies
  • Never try to figure out if node software is
    consistent
  • If you ever ask yourself this question, reinstall
    the node

8
Goals
  • Simplify cluster management (Make clusters easy)
  • Remove the system administrator
  • Make software available to a wide audience
  • Build on de facto standards
  • Allow contributors to solve specific problems and
    package software components
  • Track the rapid pace of Linux development
  • Redhat 6.2 one update every 3 days
  • Redhat 7.x two updates every 3 days
  • Leverage and remain open source
  • Unlikely that computational cluster managment is
    a long-term commercial business
  • Some components should be purchased! (compilers,
    debuggers )

9
Who is Using It?
  • Growing list of users that we know about
  • SDSC, SIO, UCSD (8 Clusters, including CMS
    (GriPhyN) prototype)
  • Caltech
  • Burnham Cancer Institute
  • PNNL (several clusters, small, medium, large)
  • University of Texas
  • University of North Texas
  • Northwestern University
  • University of Hong Kong
  • Compaq (Working relationship with their Intel
    Standard Servers Group)
  • Singapore Bioinformatics Institute
  • Myricom (Their internal development cluster)

10
What we thought we Learned
  • Clusters are phenomenal price/performance
    computational engines, but are hard to manage
  • Cluster management is a full-time job which gets
    linearly harder as one scales out.
  • Heterogeneous Nodes are a bummer (network,
    memory, disk, MHz, current kernel version, PXE,
    CDs).

11
You Must Unlearn What You Have Learned
12
Installation/Management
  • Need to have a strategy for managing cluster
    nodes
  • Pitfalls
  • Installing each node by hand
  • Difficult to keep software on nodes up to date
  • Disk Imaging techniques (e.g.. VA Disk Imager)
  • Difficult to handle heterogeneous nodes
  • Treats OS as a single monolithic system
  • Specialized installation programs (e.g. IBMs
    LUI, or RWCPs Multicast installer)
  • let OS packaging vendors do their job
  • Penultimate
  • RedHat Kickstart
  • Define packages needed for OS on nodes, kickstart
    gives a reasonable measure of control.
  • Need to fully automate to scale out (Rocks gets
    you there)

13
Networks
  • High-performance networks
  • Myrinet, Giganet, Servernet, Gigabit Ethernet,
    etc.
  • Ethernet only ? Beowulf-class
  • Management Networks (Light Side)
  • Ethernet 100 Mbit
  • Management network used to manage compute nodes
    and launch jobs
  • Nodes are in Private IP (192.168.x.x) space,
    front-end does NAT
  • Ethernet 802.11b
  • Easy access to the cluster via laptops
  • Plus, wireless will change your life
  • Evil Management Networks (Dark Side)
  • A serial console network is not necessary
  • A KVM (keyboard/video/monitor) switching system
    adds too much complexity, cables, and cost

14
How to Build Your Rocks Cluster
  • Get and burn ISO CD image from Rocks.npaci.edu
  • Fill-out form to build initial kickstart file for
    your first front-end machine
  • Kickstart naked frontend with CD and kickstart
    file
  • Reboot frontend machine
  • Integrate compute nodes with Insert Ethers
  • Ready to go!

15
insert-ethers
  • Used to populate the nodes MySQL table
  • Parses a file (e.g., /var/log/messages) for
    DHCPDISCOVER messages
  • Extracts MAC addr and, if not in table, adds MAC
    addr and hostname to table
  • For every new entry
  • Rebuilds /etc/hosts and /etc/dhcpd.conf
  • Reconfigures NIS
  • Restarts DHCP and PBS
  • Hostname is
  • ltbasenamegt-ltcabinetgt-ltchassisgt
  • Configurable to change hostname
  • E.g., when adding new cabinets

16
Configuration Derived from Database
Automated node discovery
mySQL DB
Node 0
insert-ethers
Node 1
makehosts
makedhcp
pbs-config-sql
Node N
/etc/hosts
/etc/dhcpd.conf
pbs node list
17
Remote re-installationShoot-node and eKV
  • Rocks provides a simple method to remotely
    reinstall a node
  • CD/Floppy used to install the first time
  • By default, hard power cycling will cause a node
    to reinstall itself.
  • Addressable PDUs can do this on generic hardware
  • With no serial (or KVM) console, we are able to
    watch a node as installs (eKV), but
  • Cant see BIOS messages at boot up
  • Syslog for all nodes sent to a log host (and to
    local disk)
  • Can look at what a node was complaining about
    before it went offline

18
Remote re-installationShoot-node and eKV
192.168.254.254
Remotely starting reinstallation on two nodes
192.168.254.253
19
Key Ideas
  • No difference between OS and application software
  • OS installation is disposable
  • Unique state that is kept only at a node is bad
  • Identical mechanisms used to install both
  • Single step installation of updated software OS
  • Security patches pre-applied to the distribution
    not post-applied on the node
  • Inheritance of software configurations
  • Distribution
  • Configuration
  • Description-based configuration rather than
    image-based

20
Dont Differentiate OS and Application SW
  • All software delivered in RPM packages
  • Use a package manager to handles conflicts
  • RPM is not totally complete, but
  • Packages will not overwrite each other without
    explicit override
  • Tracking what has changed between the software as
    packaged and what is on disk
  • rpm verify
  • We install a complete system from a selected list
    of packages and associated configuration
  • latest security patches applied before
    installation.

21
System State ?
  • What is the installed state of a system?
  • Software bits on disk
  • Configuration information (files, registry,
    database)
  • OR
  • Software bits in memory
  • Configuration in memory
  • How you answer this question is fundamental to
    how one moves (updates) a system from one state
    to the next.
  • If the first, then you can update an installation
    and configuration in a single (re)install/reboot
    step
  • If the second, you may have to make several state
    changes (ordering dependencies) to update state.

22
Rocks Hierarchy
Collection of all possible software
packages (AKA Distribution)
Descriptive information to configure a node
Kickstart file
RPMs
Appliances
Compute Node
IO Server
Web Server
23
Description-based Configuration
Collection of all possible software
packages (AKA Distribution)
Descriptive information to configure a node
Kickstart file
RPMs
Compute Node
IO Server
Web Server
24
Building Distributions Rocks-dist
  • Integrate Packages from
  • Redhat (mirror) base distribution updates
  • Contrib directory
  • Locally produced packages
  • Local contrib (e.g. commercially bought code)
  • Packages from rocks.npaci.edu
  • Produces a single updated distribution that
    resides on front-end
  • This is a RedHat Distribution with patches and
    updates pre-applied

25
NPACI / SDSC
  • rocks-dist mirror
  • Red Hat mirror
  • Red Hat 7.2 release
  • Red Hat 7.2 updates
  • rocks-dist dist
  • Rocks 2.2 release
  • Red Hat 7.2 release
  • Red Hat 7.2 updates
  • Rocks software
  • Contributed software

26
Your Site
  • rocks-dist mirror
  • Rocks mirror
  • Rocks 2.2 release
  • Rocks 2.2 updates
  • rocks-dist dist
  • Kickstart distribution
  • Rocks 2.2 release
  • Rocks 2.2 updates
  • Local software
  • Contributed software
  • This is the same procedure NPACI Rocks uses.
  • Organizations can customize Rocks for their site.
  • Depts can customize

27
Rocks-dist Summary
  • Created for us to build software release
  • Modifies a stock Red Hat release
  • Applies all updates
  • Adds local and contributed software
  • Patches boot images
  • eKV ? allows us to monitor at a remote
    installation without a KVM
  • URL kickstart ? description and rpms transferred
    over http
  • Inheritance hierarchy allows customization of
    software collection at many levels
  • End-user
  • Group
  • Department
  • Company
  • Community ? important for distributed science
    group

28
Description-based Configuration
Collection of all possible software
packages (AKA Distribution)
Descriptive information to configure a node
Kickstart file
RPMs
Compute Node
IO Server
Web Server
29
Description-based Configuration
  • Built an infrastructure that "describes the
    roles of cluster nodes
  • Nodes are installed using Red Hat's kickstart
  • ASCII file with names of packages to install and
    "post processing commands
  • Rocks builds kickstart on-the-fly, tailored for
    each node
  • NPACI Rocks kickstart is general configuration
    local node configuration
  • General configuration is described by modules
    linked in a configuration graph
  • Local node configuration (applied during post
    processing) is stored in a MySQL database

VS.
30
What are the Challenges
  • Kickstart file is ASCII
  • There is some structure
  • Pre-configuration
  • Package list
  • Post-configuration
  • Not a programmable format
  • Most complicated section is post-configuration
  • Usually this is handcrafted
  • Want to be able to build sections of the
    kickstart file from pieces

31
Break down configuration of appliances into
small compositional pieces
32
Cluster Description Appliances
33
Allows small differences in configuration to be
easily described
34
Architecture Dependencies
  • Allows users to focus only on the differences
  • Architecture type is passed from the top

35
XML Used to Describe Modules
  • lt?xml version"1.0" standalone"no"?gt
  • lt!DOCTYPE kickstart SYSTEM "_at_KICKSTART_DTD_at_"
    lt!ENTITY ssh "openssh"gtgt
  • ltkickstartgt
  • ltdescriptiongt Enable SSH lt/descriptiongt
  • ltpackagegt ssh lt/packagegt
  • ltpackagegt ssh-clientslt/packagegt
  • ltpackagegt ssh-serverlt/packagegt
  • ltpackagegt ssh-askpasslt/packagegt
  • lt!-- include XFree86 packages for xauth --gt
  • ltpackagegtXFree86lt/packagegt
  • ltpackagegtXFree86-libslt/packagegt
  • ltpostgt
  • cat gt /etc/ssh/ssh_config ltlt 'EOF' lt!--
    default client setup --gt
  • Host
  • CheckHostIP no
  • ForwardX11 yes
  • ForwardAgent yes
  • StrictHostKeyChecking no
  • UsePrivilegedPort no
  • Abstract Package Names, versions, architecture
  • ssh-client
  • Not
  • ssh-client-2.1.5.i386.rpm
  • Allow an administrator to encapsulate a logical
    subsystem
  • Node-specific configuration can be retrieved from
    a database
  • IP Address
  • Firewall policies
  • Remote access policies

36
Creating the Kickstart file
  • Node makes HTTP request to get configuration
  • Can be online or captured to a file
  • Node reports architecture type, IP address,
    appliance type, options
  • Kpp preprocessor
  • Start at appliance type (node) and make a single
    large XML file by traversing the graph
  • Kgen generation
  • Translation to kickstart format. Other formats
    could be supported
  • Node-specific configuration looked up in a
    database
  • Graph visualization using dot (ATT)

37
HTTP as Transport
  • Kickstart file is retrieved VIA HTTP
  • Rocks Web site provides a form to build
    configuration to build a remote sites frontend
    (bootstrap, captured to a file)
  • Cluster frontend as server for cluster nodes
    (online, bootstrap nodes)
  • RPMs transported via HTTP
  • Web infrastructure is very scalable and robust
  • Managing configurations can go beyond a cluster
  • Weve installed/configured our home machines from
    SDSC over a cable modem

38
Payoff Never before seen hardware
  • Dual Athlon, White box, 20 GB IDE, 3Com Ethernet
  • 300 PM In cardboard box
  • Shook out the loose screws
  • Dropped in a Myrinet card
  • Inserted it into cabinet 0
  • Cabled it up
  • 325 PM Inserted the NPACI Rocks CD
  • Ran insert-ethers (assigned node name
    compute-0-24)
  • 340 PM Ran Linpack

39
Futures
  • Improve Monitoring, debugging, self-diagnosis of
    cluster-specific software
  • Improve documentation!
  • Continue Tracking RedHat updates/releases
  • Prepare for Infiniband Interconnect
  • Global file systems, I/O is an Achilles heel of
    clusters
  • Grid Tools (Development and Testing)
  • Globus
  • Grid research tools (APST)
  • GridPort toolkit
  • Integration with other SDSC projects
  • SRB
  • MiX - data mediation
  • Visualization Cluster - Display Wall

40
Summary
  • Rocks significantly lowers the bar for users to
    deploy usable compute clusters
  • Very simple hardware assumptions
  • XML module descriptions allows encapsulation
  • Graph interconnection allows appliances to share
    configuration
  • Deltas among appliances easily visualize
  • HTTP transport scalable in
  • Performance
  • Distance
Write a Comment
User Comments (0)
About PowerShow.com