Scalability on Linux Clusters ASCIASAP Scalability Workshop in Santa Fe

About This Presentation

Title:

Scalability on Linux Clusters ASCIASAP Scalability Workshop in Santa Fe

Description:

Scalability on Linux Clusters. ASCI/ASAP Scalability Workshop in Santa Fe ... Windows runs the seti_at_home screen saver just fine, but it wont run on ASCI Red ... – PowerPoint PPT presentation

Number of Views:29

Avg rating:3.0/5.0

Slides: 21

Provided by: rolfr5

Category:

more less

Transcript and Presenter's Notes

Title: Scalability on Linux Clusters ASCIASAP Scalability Workshop in Santa Fe

1
Scalability on Linux ClustersASCI/ASAP
Scalability Workshop in Santa Fe

Rolf Riesen, Ron Brightwell, and the CplantTM
Team
Sandia National Laboratories
Scalable Computing Systems Department
May 11, 2000

2
Machines are not Scalable

Size is not a measure of scalability
Modem-connected i386 PCs run seti_at_home just
fine, but they wont run MPLINPACK
Network speed is not a measure of scalability
Bus-connected processors (SMPs) run MPLINPACK
just fine, but wont grow to 9000 nodes
Neither are topology, architecture, CPU type
The best a machine or architecture can hope for,
is to not inhibit scalability

3
System Software is not Scalable

The data transport layer is not a measure of
scalability
TCP/IP is just fine for cracking RSA challenges,
but it wont do for an FFT transform
The OS is not a measure of scalability
Windows runs the seti_at_home screen saver just
fine, but it wont run on ASCI Red
Neither are point-to-point latency or bandwidth
The best system software can hope for, is to not
inhibit scalability

4
What is Scalability?

Whether a machine or system software is scalable,
depends on the intended application
Designing and building scalable systems requires
to leave out features that prevent the intended
application from running
Applications range from distributed to tightly
coupled parallel

5
Distributed and Parallel Systems
Tech Report SAND98-2221
Distributed systems hetero- geneous
Massively parallel systems homo- geneous
Legion\Globus
Berkley NOW
SETI_at_home
ASCI Red Tflops
Beowulf
Internet
Cplant

Gather (unused) resources
Steal cycles
System SW manages resources
System SW adds value
10 - 20 overhead is OK
Resources drive applications
Time to completion is not critical
Time-shared

Bounded set of resources
Apps grow to consume all cycles
Application manages resources
System SW gets in the way
5 overhead is maximum
Apps drive purchase of equipment
Real-time constraints
Space-shared

6
Cplant Approach to Scalability

Build a scalable system out of COTS parts that
runs high-performance, scientific applications on
machines with up to 8192 nodes
Distributed applications can run on parallel
machines (at a reduced cost efficiency)
Parallel applications can not run on distibuted
machines (no matter how much money is involved)
Core pieces we are working on
Scalable app load, boot, and maintenance
Message passing (Portals 3.0)
I/O

7
Cplant Goals

Scalable -)
Production system
Multiple users
General purpose for scientific applications (not
Beowulf dedicated to a single user)
1st step Tflops look and feel for users

8
Cplant Strategy

Hybrid approach combining commodity cluster
technology with MPP technology
Build on the design of the Tflops
large systems should be built from independent
building blocks
large systems should be partitioned to provide
specialized functionality
large systems should have significant resources
dedicated to system maintenance

9
Cplant Approach

Emulate the ASCI Red environment
Partition model (functional decomposition)
Space sharing (reduce turnaround time)
Scalable services (allocator, loader, launcher)
Ephemeral user environment
Complete resource dedication
Use Existing Software when possible
Red Hat distribution, Linux/Alpha
Software developed for ASCI Red

10
Phase II Production (Alaska)

400 Digital PWS 500a (Miata)
500 MHz Alpha 21164 CPU
2 MB L3 Cache, 192 MB RAM
16-port Myrinet switch
32-bit, 33 MHz LANai-4 NIC
6 DEC AS1200, 12 RAID (.75 Tbyte) file server
1 DEC AS4100 compile user file server
Integrated by Compaq
125.2 GFLOPS on MPLINPACK (350 nodes)
would place 53rd on June 1999 Top 500

11
Phase III Production (Siberia)

624 Compaq XP1000 (Monet)
500 MHz Alpha 21264 CPU
4 MB L3 Cache
256 MB ECC SDRAM
16-port Myrinet switch
64-bit, 33 MHz LANai-7 NIC
1.73 TB disk I/O
Integrated by Compaq and Abba Technologies
247.6 GFLOPS on MPLINPACK (572 nodes)
would place 40th on Nov 1999 Top 500

12
CTH Grind Time
13
Phase IV (Antarctica, Zermatt?)

1350 DS10 Slates (NMCA)
466MHz EV6, 256MBRAM
Myrinet 33MHz 64bit LANai 7.x
Will be combined with Siberia for a 1600-node
system
Red, black, green switchable

14
Myrinet Switch
16-port switch

Based on 64-port Clos switch
8x2 16-port switches in a 12U rack-mount case
64 LAN cables to nodes
32 SAN cables (64 links) to mesh

4 nodes
15
One Switch Rack One Plane

4 Clos switches in one rack
256 nodes per plane (8 racks)
Wrap-around in x and y direction
128128 links in z direction

16
Cplant 2000
Connected to classified network
Wrap-around and z links and nodes not shown
Connected to unclassified network
Compute nodes swing between red, black, or green
Connected to open network
17
Cplant 2000 cont.

1056 256 256 nodes ? 1600 nodes ? 1.5TFlops
320 64-port switches 144 16-port switches
from Siberia
40 16 system support stations

18
MPP Network Paragon and Tflops
Network interface is on the memory bus
Network
Memory
Memory Bus
Processor
Processor
Message passing or computational co-processor
19
Commodity Myrinet
Network is far from the memory
Processor
Memory
Memory Bus
Bridge
PCI Bus
OS Bypass
NIC
Network
20
http//www.cs.sandia.gov/cplant

Write a Comment

User Comments (0)