Title: Building Beowulfs for High Performance Computing
1Building Beowulfs for High Performance Computing
- Duncan Grove
- Department of Computer Science
- University of Adelaide
- http//dhpc.adelaide.edu.au/projects/beowulf
2Anatomy of a Beowulf
- Cluster of networked PCs
- Intel PentiumII or Compaq Alpha
- Switched 100Mbit/s Ethernet or Myrinet
- Linux
- Parallel and batch software support
Switching Infrastructure
Front-end Node
Outside World
n1
nN
n2
Compute Nodes
3Why build Beowulfs?
- Science/
- Some problems take lots of processing
- Many supercomputers are used as batch processing
engines - Traditional supercomputers wasteful high
throughput computing - Beowulfs
- useful computational cycles at the lowest
possible price. - Suited to high throughput computing
- Effective at an increasingly large set of
parallel problems
4Three Computational Paradigms
- Data Parallel
- Regular grid based problems
- Parallelising compilers, eg HPF
- Eg physicists running lattice gauge calculations
- Message Passing
- Unstructured parallel problems.
- MPI, PVM
- Eg chemists running molecular dynamics
simulations. - Task Farming
- High throughput computing - batch jobs
- Queuing systems
- Eg chemists running Gaussian.
5A Brief Cluster History
- Caltech Prehistory
- Berkeley NOW
- NASA Beowulf
- Stone SouperComputer
- USQ Topcat
- UIUC NT Supercluster
- LANL Avalon
- SNL Cplant
- AU Perseus?
6Beowulf Wishlist
- Single System Image (SSI)
- Unified process space
- Distributed shared memory
- Distributed file system
- Performance easily extensible
- Just add more bits
- Is fault tolerant
- Is simple to administer and use
7Current Sophistication?
- Shrinkwrapped solutions or do-it-yourself
- Not much more than a nicely installed network of
PCs - A few kernel hacks to improve performance
- No magical software for making the cluster
transparent to the user - Queuing software and parallel programming
software can create the appearance of a more
unified machine
8Stone SouperComputer
9Iofor
- Learning platform
- Program development
- Simple benchmarking
- Simple performance evaluation of real applcaions
- Teaching machine
- Money lever
10iMacwulf
- Student lab by day, Beowulf by night?
- MacOS with Appleseed
- LinuxPPC 4.0, soon LinuxPPC 5.0
- MacOS/X
11Gigaflop harlotry
- Machine Cost Processors Peak Speed
- Cray T3E 10s million 1084 1300Gflop/s
- SGI Origin 2000 10s million 128 128Gflop/s
- IBM SP2 10s million 512 400Gflop/s
- Sun HPC 1s million 64 50Gflop/s
- TMC CM5 5 Million (1992) 128 20Gflop/s
- SGI PowerChallenge 1 Million (1995) 20 20Gflop/s
- Beowulf cluster myrinet 1 Million 256 120Gflop
/s - Beowulf cluster 300K 256 120Gflop/s
12The obvious, but important
- In the past
- Commomdity processors way behind supercomputer
processors - Commodity networks way, way, way behind
supercomputer networks - In the now
- Commomdity processors only just behind
supercomputer processors - Commmodity networks still way, way behind
supercomputer networks - More exotic networks still way behind
supercomputer networks - In the future
- Commodity processors will be supercomputer
processors - Will the commodity networks catch up?
13Hardware possibilities
14OS possibilities
15Open Source
- The good...
- Lots of users, active development
- Easy access to make your own tweaks
- Aspects of Linux are still immature, but recently
- SGI has release xfs as open source
- Sun has released its HPC software as open source
- And the bad...
- Theres a lot of bad code out there!
16Network technologies
- So many choices!
- Interfaces, cables, switches, hubs ATM,
Ethernet, Fast Ethernet, gigabit Ethernet,
firewire, HiPPI, serial HiPPI, Myrinet, SCI - The important issues
- latency
- bandwidth
- availability
- price
- price/performance
- application type!
17Disk subsystems
- I/O a problem in parallel systems
- Data not local on compute nodes is a performance
hit - Distributed file systems
- CacheFS
- CODA
- Parallel file systems
- PVFS
- On-line bulk data is interesting in itself
- Beowulf Bulk Data Server
- cf with slow, expensive tape silos...
18Perseus
- Machine for chemistry simulations
- Mainly high throughput computing
- RIEF grant in excess of 300K
- 128 nodes. For lt 2K per node
- Dual processor PII450
- At least 256MB RAM
- Some nodes up to 1GB
- 6GB local disk each
- 5x24 (2x4) port Intel 100Mbit/s switches
19Perseus Phase 1
- Prototype
- 16 dual processor PII
- 100Mbit/s switched Ethernet
20Perseus installing a node
Switching Infrastructure
Front-end Node
Outside World
n1
nN
n2
User node, administration, compilers, queues,
nfs, dns, NIS, /etc/, bootp/dhcp, kickstart, ...
Floppy disk or bootrom
21Software on perseus
- Software to support the three computational
paradigms - Data Parallel
- Portland Group HPF
- Message Passing
- MPICH, LAM/MPI, PVM
- High throughput computing
- Condor, GNU Queue
- Gaussian94, Gaussian98
22Expected parallel performance
- Loki, 1996
- 16 Pentium Pro processors, 10Mbit/s Ethernet
- 3.2 Gflop/s peak, achieved 1.2 real Gflop/s on
Linpack benchmark - Perseus, 1999
- 256 PentiumII processors, 100Mbit/s Ethernet
- 115 Gflop/s peak
- 40 Gflop/s on Linpack benchmark?
- Compare with top 500!
- Would get us to about 200 currently
- Other Australian machines?
- NEC SX/4 _at_ BOM at 102
- Sun HPC at 181, 182, 255
- Fujitsi VPP _at_ ANU at 400
23(No Transcript)
24(No Transcript)
25(No Transcript)
26(No Transcript)
27Reliability in a large system
- Build it right!
- Is the operating system and software running ok?
- Is heat dissipation going to be a problem?
- Monitoring daemon
- Normal features
- CPU, network, memory, disk
- More exotic features
- Power supply and CPU fan speeds
- Motherboard and CPU temperatures
- Do we have any heisen-cabling?
- Racks and lots of cable ties!
28The limitations...
- Scalability
- Load balancing
- Effects of machines capabilities
- Desktop machines vs. dedicated machines
- Resource allocation
- Task Migration
- Distributed I/O
- System monitoring and control tools
- Maintenance requirements
- Installation, upgrading, versioning
- Complicated scripts
- Parallel interactive shell?
29 and the opportuntities
- A large proportion of the current limitations
compared with traditional HPC solutions are
merely systems integration problems - Some contributions to be made in
- HOWTOs
- Monitoring and maintenance
- Performance modelling and real benchmarking