Alliance Clusters, - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Alliance Clusters,

Description:

How to stuff a penguin in a box. and make everyone happy, even the penguin. ... club at all levels! National Computational Science. Cluster in a Box Goals ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 22
Provided by: robp62
Category:

less

Transcript and Presenter's Notes

Title: Alliance Clusters,


1
Alliance Clusters, Cluster in a Box
How to stuff a penguin in a box and make
everyone happy, even the penguin.
Rob Pennington Acting Associate
Director Computing and Communications
Division NCSA
2
Where Do Clusters Fit?
1 TF/s delivered
15 TF/s delivered
Distributed systems
MP systems
Legion\Globus
Berkley NOW
Superclusters
SETI_at_home
ASCI Red Tflops
Condor
Beowulf
Internet
  • Gather (unused) resources
  • System SW manages resources
  • System SW adds value
  • 10 - 20 overhead is OK
  • Resources drive applications
  • Time to completion is not critical
  • Time-shared
  • Commercial PopularPower, United Devices,
    Centrata, ProcessTree, Applied Meta, etc.
  • Bounded set of resources
  • Apps grow to consume all cycles
  • Application manages resources
  • System SW gets in the way
  • 5 overhead is maximum
  • Apps drive purchase of equipment
  • Real-time constraints
  • Space-shared

Src B. Maccabe, UNM, R.Pennington NCSA
3
Alliance Clusters Overview
  • Major Alliance cluster systems
  • NT-based cluster at NCSA
  • Linux-based clusters
  • University of New Mexico - Roadrunner, LosLobos
  • Argonne National Lab - Chiba City
  • Develop Locally, Run Globally
  • Local clusters Used for Development and Parameter
    Studies
  • Issues
  • Compatible Software Environments
  • Compatible Hardware
  • Evaluate Technologies at Multiple Sites
  • OS, Processors, Interconnect, Middleware
  • Computational resource for users

4
Cluster in a Box Rationale
  • Conventional wisdom Building a cluster is easy
  • Recipe
  • Buy hardware from Computer Shopper, Best Buy or
    Joes place
  • Find a grad student not making enough progress on
    thesis work and distract him/her with the
    prospect of playing with the toys
  • Allow to incubate for a few days to weeks
  • Install your application, run and be happy
  • Building it right is a little more difficult
  • Multi user cluster, security, performance tools
  • Basic question - what works reliably?
  • Building it to be compatible with
    Grid/Alliance...
  • Compilers, libraries
  • Accounts, file storage, reproducibility
  • Hardware configs may be an issue

5
Alliance Cluster Growth 1 TFLOP IN 2 YEARS
1600 Intel CPUs
Oct-00
6
Alliance Cluster Status
  • UNM Los Lobos
  • Linux
  • 512 processors
  • May 2000
  • operational system
  • first performance tests
  • friendly users
  • Argonne Chiba City
  • Linux
  • 512 processors
  • Myrinet interconnect
  • November 1999
  • deployment
  • NCSA NT Cluster
  • Windows NT 4
  • 256 processors
  • Myrinet
  • December 1999
  • Review Board Allocations
  • UNM Road Runner
  • Linux, 128 processors
  • Myrinet
  • September 1999
  • Review Board Allocations

7
NT Cluster Usage - Large, Long Jobs
8
A Pyramid Scheme(Involve Your Friends and Win
Big)
Full production resources at major site(s)
Can a Cluster in a Box support all of the
different configs at all of the sites??
This is a non-exclusive club at all levels!
No, but it can provide an established
tested base configuration
Alliance resources at partner sites
Small, private systems in labs/offices
9
Cluster in a Box Goals
  • Open source software kit for scientific computing
  • Surf the ground swell
  • Some things are going to be add-ons
  • Invest in compilers, vendors have spent BIG
    optimizing them
  • Integration of commonly used components
  • Minimal development effort
  • Time to delivery is critical
  • Initial target is small to medium clusters
  • Up to 64 processors
  • 1 interconnect switch
  • Compatible environment for development and
    execution across different systems (Grid,
    anyone?)
  • Common libraries, compilers

10
Key Challenges and Opportunities
  • Technical and Applications
  • Development Environment
  • Compilers, Debuggers
  • Performance Tools
  • Storage Performance
  • Scalable Storage
  • Common Filesystem
  • Admin Tools
  • Scalable Monitoring Tools
  • Parallel Process Control
  • Node Size
  • Resource Contention
  • Shared Memory Apps
  • Few Users gt Many Users
  • 600 Users/month on O2000
  • Heterogeneous Systems
  • New generations of systems
  • Integration with the Grid
  • Organizational
  • Integration with Existing Infrastructure
  • Accounts, Accounting
  • Mass Storage
  • Training
  • Acceptance by Community
  • Increasing Quickly
  • Software environments

11
Cluster Configuration
Mgmt Nodes
Debug Nodes
Compute Nodes
Storage
User Logins
Front end Nodes
I/O Nodes
HSM
Network
Systems Testbed
Visualization Nodes
Green present generation clusters
12
Space Sharing Example on 64 Nodes
Users own the nodes allocated to them
App5
App1
App2
App6
App3
App4
13
OSCAR A(nother) Package for Linux Clustering
OSCAR Open Source Cluster Application Resources
is a snapshot of the best known methods for
building and using cluster software.
14
The OSCAR Consortium
  • OSCAR is being developed by
  • NCSA/Alliance
  • Oak Ridge National Laboratory
  • Intel
  • IBM
  • Veridian Systems
  • Additional supporters are
  • SGI, HP, Dell, MPI Software Technology, MSC

15
OSCAR Components Status
Core validation OSs selected (Red Hat,Turbo and
Suse). Integration support issues being worked.
OS
Configuration Database design is complete. LUI
is complete and awaitingintegration with
database.
Installation Cloning
PBS validated and awaiting integration.Long term
replacement for PBS underconsideration.
Job Management
Integration underway. Documentation under
development.
Packaging
Src N. Gorsuch, NCSA
16
Open Source Cluster Application Resources
  • Open source cluster on a CD
  • Integration meeting v0.5 - September 2000
  • Integration meeting at ORNL October 24 25 -
    v1.0
  • v1.0 to be released at Supercomputing 2000
    (November 2000)
  • Research and industry consortium
  • NCSA, ORNL, Intel, IBM, MCS Software, SGI, HP,
    Veridian, Dell
  • Components
  • OS Layer Linux (Redhat, Turbulinux, Suse,
    etc.)
  • Installation and cloning LUI
  • Security openssh for now
  • Cluster management C3/M3C
  • Job management OpenPBS
  • Programming environment gcc etc.
  • Packaging OSCAR

Src N. Gorsuch, NCSA
17
OSCAR Cluster Installation Process
KEEP IT SIMPLE!
  • Install Linux on cluster master or head node
  • Copy contents of OSCAR CD into cluster head
  • Collect cluster information and enter into LUI
    database
  • This is a manual phase right now
  • Run the pre-client installation script
  • Boot the clients and let them install themselves
  • Can be done over the net or from a floppy
  • Run the post-client installation script

18
Testbeds
  • Basic cluster configuration for prototyping at
    NCSA
  • Interactive node 4 compute nodes
  • Development site for OSCAR contributors
  • 2nd set of identical machines for testbed
  • Rolling development between the two testbeds
  • POSIC - Linux
  • 56 dual processor nodes
  • Mixture of ethernet and Myrinet
  • User accessible testbed for apps porting and
    testing

19
IA-64 Itanium Systems at NCSA
  • Prototype systems
  • Early hardware
  • Not running at production spec
  • Code porting and validation
  • Community codes
  • Required software infrastructure
  • Running 64 bit Linux and Windows
  • Dual boot capable
  • Usually one OS for extended periods
  • Clustered IA-64 systems
  • Focused on MPI applications porting/testing
  • Myrinet, Ethernet, Shared Memory

20
HPC Applications Running on Itanium
IA-64 test cluster IA-64 compute nodes
IA-32 compile nodes Linux or Win64
Applications/Packages
PUPI ASPCG HDF4, 5 PBS FFTW Globus
Cactus MILC ARPI-3D ATLAS sPPM WRF
IA-64 Compute Nodes
IA-64 2p
IA-64 4p
IA-64 4p
IA-64 4p
IA-64 4p
Interconnects Shared memory Fast Enet
MPICH MyrinetGMVMIMPICH
Myrinet
IA-32 Win32
IA-32 Linux
Compilers for C/C/F90
21
Future
  • Scale up Current Cluster Efforts
  • Capability computing at NCSA and Alliance sites
  • NT and Linux clusters expand
  • Scalable Computing Platforms
  • Commodity turnkey systems
  • Current technology has 1 TF Within Reach
  • lt1000 IA-32 processors
  • Teraflop Systems Integrated With the Grid
  • Multiple Systems Within the Alliance
  • Complement to current SGI SMP Systems at NCSA
  • Next generation of technologies
  • Itanium at 3 GFLOP, 1 TF is 350 Processors
Write a Comment
User Comments (0)
About PowerShow.com