Scalability on Linux Clusters ASCIASAP Scalability Workshop in Santa Fe - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Scalability on Linux Clusters ASCIASAP Scalability Workshop in Santa Fe

Description:

Scalability on Linux Clusters. ASCI/ASAP Scalability Workshop in Santa Fe ... Windows runs the seti_at_home screen saver just fine, but it wont run on ASCI Red ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 21
Provided by: rolfr5
Category:

less

Transcript and Presenter's Notes

Title: Scalability on Linux Clusters ASCIASAP Scalability Workshop in Santa Fe


1
Scalability on Linux ClustersASCI/ASAP
Scalability Workshop in Santa Fe
  • Rolf Riesen, Ron Brightwell, and the CplantTM
    Team
  • Sandia National Laboratories
  • Scalable Computing Systems Department
  • May 11, 2000

2
Machines are not Scalable
  • Size is not a measure of scalability
  • Modem-connected i386 PCs run seti_at_home just
    fine, but they wont run MPLINPACK
  • Network speed is not a measure of scalability
  • Bus-connected processors (SMPs) run MPLINPACK
    just fine, but wont grow to 9000 nodes
  • Neither are topology, architecture, CPU type
  • The best a machine or architecture can hope for,
    is to not inhibit scalability

3
System Software is not Scalable
  • The data transport layer is not a measure of
    scalability
  • TCP/IP is just fine for cracking RSA challenges,
    but it wont do for an FFT transform
  • The OS is not a measure of scalability
  • Windows runs the seti_at_home screen saver just
    fine, but it wont run on ASCI Red
  • Neither are point-to-point latency or bandwidth
  • The best system software can hope for, is to not
    inhibit scalability

4
What is Scalability?
  • Whether a machine or system software is scalable,
    depends on the intended application
  • Designing and building scalable systems requires
    to leave out features that prevent the intended
    application from running
  • Applications range from distributed to tightly
    coupled parallel

5
Distributed and Parallel Systems
Tech Report SAND98-2221
Distributed systems hetero- geneous
Massively parallel systems homo- geneous
Legion\Globus
Berkley NOW
SETI_at_home
ASCI Red Tflops
Beowulf
Internet
Cplant
  • Gather (unused) resources
  • Steal cycles
  • System SW manages resources
  • System SW adds value
  • 10 - 20 overhead is OK
  • Resources drive applications
  • Time to completion is not critical
  • Time-shared
  • Bounded set of resources
  • Apps grow to consume all cycles
  • Application manages resources
  • System SW gets in the way
  • 5 overhead is maximum
  • Apps drive purchase of equipment
  • Real-time constraints
  • Space-shared

6
Cplant Approach to Scalability
  • Build a scalable system out of COTS parts that
    runs high-performance, scientific applications on
    machines with up to 8192 nodes
  • Distributed applications can run on parallel
    machines (at a reduced cost efficiency)
  • Parallel applications can not run on distibuted
    machines (no matter how much money is involved)
  • Core pieces we are working on
  • Scalable app load, boot, and maintenance
  • Message passing (Portals 3.0)
  • I/O

7
Cplant Goals
  • Scalable -)
  • Production system
  • Multiple users
  • General purpose for scientific applications (not
    Beowulf dedicated to a single user)
  • 1st step Tflops look and feel for users

8
Cplant Strategy
  • Hybrid approach combining commodity cluster
    technology with MPP technology
  • Build on the design of the Tflops
  • large systems should be built from independent
    building blocks
  • large systems should be partitioned to provide
    specialized functionality
  • large systems should have significant resources
    dedicated to system maintenance

9
Cplant Approach
  • Emulate the ASCI Red environment
  • Partition model (functional decomposition)
  • Space sharing (reduce turnaround time)
  • Scalable services (allocator, loader, launcher)
  • Ephemeral user environment
  • Complete resource dedication
  • Use Existing Software when possible
  • Red Hat distribution, Linux/Alpha
  • Software developed for ASCI Red

10
Phase II Production (Alaska)
  • 400 Digital PWS 500a (Miata)
  • 500 MHz Alpha 21164 CPU
  • 2 MB L3 Cache, 192 MB RAM
  • 16-port Myrinet switch
  • 32-bit, 33 MHz LANai-4 NIC
  • 6 DEC AS1200, 12 RAID (.75 Tbyte) file server
  • 1 DEC AS4100 compile user file server
  • Integrated by Compaq
  • 125.2 GFLOPS on MPLINPACK (350 nodes)
  • would place 53rd on June 1999 Top 500

11
Phase III Production (Siberia)
  • 624 Compaq XP1000 (Monet)
  • 500 MHz Alpha 21264 CPU
  • 4 MB L3 Cache
  • 256 MB ECC SDRAM
  • 16-port Myrinet switch
  • 64-bit, 33 MHz LANai-7 NIC
  • 1.73 TB disk I/O
  • Integrated by Compaq and Abba Technologies
  • 247.6 GFLOPS on MPLINPACK (572 nodes)
  • would place 40th on Nov 1999 Top 500

12
CTH Grind Time
13
Phase IV (Antarctica, Zermatt?)
  • 1350 DS10 Slates (NMCA)
  • 466MHz EV6, 256MBRAM
  • Myrinet 33MHz 64bit LANai 7.x
  • Will be combined with Siberia for a 1600-node
    system
  • Red, black, green switchable

14
Myrinet Switch
16-port switch
  • Based on 64-port Clos switch
  • 8x2 16-port switches in a 12U rack-mount case
  • 64 LAN cables to nodes
  • 32 SAN cables (64 links) to mesh

4 nodes
15
One Switch Rack One Plane
  • 4 Clos switches in one rack
  • 256 nodes per plane (8 racks)
  • Wrap-around in x and y direction
  • 128128 links in z direction

16
Cplant 2000
Connected to classified network
Wrap-around and z links and nodes not shown
Connected to unclassified network
Compute nodes swing between red, black, or green
Connected to open network
17
Cplant 2000 cont.
  • 1056 256 256 nodes ? 1600 nodes ? 1.5TFlops
  • 320 64-port switches 144 16-port switches
    from Siberia
  • 40 16 system support stations

18
MPP Network Paragon and Tflops
Network interface is on the memory bus
Network
Memory
Memory Bus
Processor
Processor
Message passing or computational co-processor
19
Commodity Myrinet
Network is far from the memory
Processor
Memory
Memory Bus
Bridge
PCI Bus
OS Bypass
NIC
Network
20
http//www.cs.sandia.gov/cplant
Write a Comment
User Comments (0)
About PowerShow.com