Managing Linux Clusters with Rocks Tim Carlson - PNNL tim@pnl.gov - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Managing Linux Clusters with Rocks Tim Carlson - PNNL tim@pnl.gov

Description:

Managing Linux Clusters with Rocks Tim Carlson - PNNL tim_at_pnl.gov Introduction Cluster Design The ins and outs of designing compute solutions for scientists Rocks ... – PowerPoint PPT presentation

Number of Views:145
Avg rating:3.0/5.0
Slides: 21
Provided by: infoOrnlG
Category:
Tags: pnnl | carlson | clusters | gov | linux | managing | pnl | rocks | tim

less

Transcript and Presenter's Notes

Title: Managing Linux Clusters with Rocks Tim Carlson - PNNL tim@pnl.gov


1
Managing Linux Clusters with RocksTim Carlson -
PNNLtim_at_pnl.gov
2
Introduction
  • Cluster Design
  • The ins and outs of designing compute solutions
    for scientists
  • Rocks Cluster Software
  • What it is and some basic philosophies of Rocks
  • Midrange computing with Rocks at PNNL
  • How PNNL uses Rocks to manage 25 clusters ranging
    from 32 to 1500 compute cores

3
I Need a Cluster!
  • Can you make use of existing resources?
  • chinook
  • 2310 Barcelona CPUs with DDR Infiniband
  • Requires EMSL proposal
  • superdome
  • 256 core Itanium 2 SMP machine
  • Short proposal required
  • Department clusters
  • HPCaNS manages 25 clusters. Does your department
    have one of them?
  • Limited amount of PNNL general purpose compute
    cycles

4
I Really Need a Cluster!
  • Why?
  • Run bigger models?
  • Maybe you need a large memory desk side machine.
    72G in a desk side is doable (dual Nehalem with
    18 x 4G DIMMS)
  • Do you need/want to run parallel code?
  • Again, maybe a desk side machine is appropriate.
    8 cores in single machine

5
You Need a Cluster
  • What software do you plan to run?
  • WRF/MM5 (atmospheric/climate)
  • May benefit from low latency network
  • Quad core scaling?
  • NWChem (molecular chemistry)
  • Usually requires a low latency network
  • Need an interconnec that is fully supported by
    ARMCI/GA
  • Fast local scratch required. Fast global scratch
    a good idea
  • Home Grown
  • Any idea of the profile of your code?
  • Can we have a test case to run on our test
    cluster?

6
Processor choices
  • Intel
  • Harpertown or Nehalem
  • Do you need the Nehalem memory bandwidth?
  • AMD
  • Barcelona or Shanghai
  • Shanghai is a better Barcelona
  • Disclaimer
  • This talk was due 4 weeks early. All of the above
    could have changed in that time ?

7
More Hardware Choices
  • Memory per core
  • Be careful configuring Nehalem
  • Interconnect
  • GigE, DDR, QDR
  • Local disk I/O
  • Do you even use this?
  • Global file system
  • At any reasonable scale you probably arent using
    NFS
  • Lustre/PVFS2/Panasas

8
Rocks Software Stack
  • Redhat based
  • PNNL is mostly Redhat so the environment is
    familiar
  • NFS Funded since 2000
  • Several HPC Wire awards
  • Our choice since 2001
  • Originally based on Redhat 6.2, now based on RHEL
    5.3

9
Rocks is a Cluster Framework
  • Customizable
  • Not locked into a vendor solution
  • Modify default disk partitioning
  • Use your own custom kernel
  • Add software via RPMs or Rolls
  • Need to make more changes?
  • Update an XML file, rebuild the distribution,
    reinstall all the nodes
  • Rocks is not system imager based
  • All nodes are installed and not imaged

10
Rocks Philosophies
  • Quick to install
  • It should not take a month (or even more than a
    day) to install a thousand node cluster
  • Nodes are 100 configured
  • No after the fact tweaking
  • If a node is out of configuration, just reinstall
  • Dont spend time on configuration management of
    nodes
  • Just reinstall

11
What is a Roll
  • A Roll is a collection of software packages and
    configuration information
  • Rolls provide more specific tools
  • Commercial compiler Rolls (Intel, Absoft,
    Portland Group)
  • Your choice of scheduler (Sun Grid Engine,
    Torque)
  • Science specific (Bio Roll)
  • Many others (Java, Xen, PVFS2, TotalView, etc)
  • Users can build their own Rolls
  • https//wiki.rocksclusters.org/wiki/index.php/Main
    _Page

12
Scalable
  • Not system imager based
  • Non-homogeneous makes system imager types
    installation problematic
  • Nodes install from kickstart files generated from
    a database
  • Several clusters registered with over 500 nodes
  • Avalanche installer removes pressure from any
    single installation server
  • Introduced in Rocks 4.1
  • Torrent based
  • Nodes share packages during installation

13
Community and Commercial Support
  • Active mailing list averaging over 700 posts per
    month
  • Annual Rocks-A-Palooza meeting for community
    members
  • Talks, tutorials, working groups
  • Rocks cluster register has over 1100 clusters
    registered representing more than 720 Teraflops
    of computational power
  • ClusterCorp sells Rocks support based on open
    source Rocks

14
PNNL Midrange Clusters
  • Started in 2001
  • 8 node VALinux cluster
  • Dual PIII 500Mhz with 10/100 ethernet
  • Chose Rocks as the software stack
  • Built our first big cluster that same year
  • 64 Dual Pentium III at 1 Ghz
  • Rebuild all the nodes with Rocks in under 30
    minutes
  • Parts of this system are still in production
  • Currently manage 25 clusters
  • Range in size from 16 to1536 cores
  • Infiniband is the primary interconnect
  • Attached storage ranges from 1 to 100 Terabytes

15
HPCaNS Management Philosophy
  • Create service center to handle money
  • Charge customers between 300 and 800/month
    based on size and complexity
  • Covers account management, patching, minimal
    backups (100G), compiler licenses, BigBrother
    monitoring, general sysadmin
  • Use .75 FTE to manage all the clusters
  • Non-standard needs are charged by time and
    materials
  • Adding new nodes
  • Rebuilding to a new OS
  • Software porting or debugging
  • Complex queue configurations

16
Support Methods
  • BigBrother alerts
  • Hooks into ganglia checking for
  • Node outages
  • Disk usage
  • Email problems to cluster sysadmins
  • See next slide after a bad power outage!
  • Support queue
  • Users pointed to central support queue
  • 5 UNIX admins watching the queue for cluster
    items
  • Try to teach users to use the support queue

17
(No Transcript)
18
Typical Daily Questions
  • Can you add application X, Y, Z?
  • My job doesnt seem to be running in the queue?
  • The compiler gives me this strange error!
  • Do you have space/power/cooling for this new
    cluster I want to buy?
  • This code runs on cluster X, but doesnt run on
    cluster Y. Why is that? Arent they the same?
  • Can I add another 10T of disk storage?
  • The cluster is broken!

19
Always Room for Improvement
  • Clusters live in 4 different computer rooms
  • Can we consolidate?
  • Never enough user documentation
  • Standardize on resource managers
  • Currently have various versions of Torque and
    SLURM
  • Should we be upgrading older OSes ?
  • Still have RHEL 3 based clusters
  • Do we need to be doing shared/grid/cloud
    computing?
  • Why in the world do you have 25 clusters?

20
Questions, comments, discussion!
Write a Comment
User Comments (0)
About PowerShow.com