SOS8 - PowerPoint PPT Presentation

About This Presentation

Title:

SOS8

Description:

Big and Not so Big Iron at SNL SNL CS R&D Accomplishment Pathfinder for MPP Supercomputing Our Approach Large systems with a few processors per node ... – PowerPoint PPT presentation

Number of Views:32

Avg rating:3.0/5.0

Slides: 19

Provided by: DouglasD67

Learn more at: https://www.csm.ornl.gov

Category:

more less

Transcript and Presenter's Notes

Title: SOS8

1
Big and Not so Big Iron at SNL
2
SNL CS RD AccomplishmentPathfinder for MPP
Supercomputing

Sandia successfully led the DOE/DP revolution
into MPP supercomputing through CS RD
nCUBE-10
nCUBE-2
IPSC-860
Intel Paragon
ASCI Red
Cplant
and gave DOE a strong, scalable parallel
platforms effort

Computing at SNL is an Applications
success (i.e., uniquely-high scalability
reliability among FFRDCs) because CS RD
paved the way
Cplant
Note There was considerable skepticism in the
community that MPP computing would be a success
3
Our Approach

Large systems with a few processors per node
Message passing paradigm
Balanced architecture
Efficient systems software
Critical advances in parallel algorithms
Real engineering applications
Vertically integrated technology base
Emphasis on scalability reliability in all
aspects

4
A Scalable Computing Architecture
5
ASCI Red

4,576 compute nodes
9,472 Pentium II processors
800 MB/sec bi-directional interconnect
3.21 Peak TFlops
2.34 TFlops on Linpack
74 of peak
9632 Processors
TOS on Service Nodes
Cougar LWK on Compute Nodes
1.0 GB/sec Parallel File System

6
Computational Plant

Antarctica - 2,376 Nodes
Antarctica has 4 heads with a switchable
center section
Unclassified Restricted Network
Unclassified Open Network
Classified Network
Compaq (HP) DS10L Slates
466MHz EV6, 1GB RAM
600Mhz EV67, 1GB RAM
Re-deployed Siberia XP1000 Nodes
500Mhz EV6, 256MB RAM
Myrinet
3D Mesh Topology
33MHz 64bit
A mix of 1,280 and 2,000 Mbit/sec technology
LANai 7.x and 9.x

Runtime Software
Yod - Application loader
Pct - Compute node process control
Bebopd - Allocation
OpenPBS - Batch scheduling
Portals Message Passing API
Red Hat Linux 7.2 w/2.4.x Kernel
Compaq (HP) Fortran, C, C
MPICH over Portals

7
Institutional Computing Clusters

Two (classified/unclassified), 256 Node Clusters
in NM
236 compute nodes
Dual 3.06GHz Xeon processors, 2GB memory
Myricom Myrinet PCI NIC (XP, REV D, 2MB)
2 Admin nodes
4 Login nodes
2 MetaData Server (NDS) nodes
12 Object Store Target (OST) nodes
256 port Myrinet Switch
128 node (unclassified) and a 64 Node
(classified) Clusters in CA

Compute nodes
RedHat Linux 7.3
Application Directory
MKL math library
TotalView client
VampirTrace client
MPICH-GM
OpenPBS client
PVFS client
Myrinet GM

Login nodes
RedHat Linux 7.3
Kerberos
Intel Compilers
C, C
Fortran
Open Source Compilers
Gcc
Java
TotalView
VampirTrace
Myrinet GM

Administrative Nodes
Red Hat Linux 7.3
OpenPBS
Myrinet GM w/Mapper
SystemImager
Ganglia
Mon
CAP
Tripwire

8
Usage
9
Red Squall Development Cluster

Hewlett Packard Collaboration
Integration, Testing, System SW support
Lustre and Quadrics Expertise
RackSaver BladeRack Nodes
High Density Compute Server Architecture
66 Nodes (132 processors) per Rack
2.0GHz AMD Opteron
Same as Red Storm but w/commercial Tyan
motherboards
2 Gbytes of main memory per node (same as RS)
Quadrics QsNetII (Elan4) Interconnect
Best in Class (commercial cluster interconnect)
Performance
I/O subsystem uses DDN S2A8500 Couplets with
Fiber Channel Disk Drives (same as Red Storm)
Best in Class Performance
Located in the new JCEL facility

10
(No Transcript)
11
Red Storm Goals

Balanced System Performance - CPU, Memory,
Interconnect, and I/O.
Usability - Functionality of hardware and
software meets needs of users for
Massively Parallel Computing.
Scalability - System Hardware and Software scale,
single cabinet system to
20,000 processor system.
Reliability - Machine stays up long enough
between interrupts to make real
progress on completing application run (at least
50 hours MTBI), requires full
system RAS capability.
Upgradability - System can be upgraded with a
processor swap and additional
cabinets to 100T or greater.
Red/Black Switching - Capability to switch major
portions of the machine
between classified and unclassified computing
environments.
Space, Power, Cooling - High density, low power
system.
Price/Performance - Excellent performance per
dollar, use high volume
commodity parts where feasible.

12
Red Storm Architecture

True MPP, designed to be a single system.
Distributed memory MIMD parallel supercomputer.
Fully connected 3-D mesh interconnect. Each
compute node and service and I/O node processor
has a high bandwidth, bi-directional connection
to the primary communication network.
108 compute node cabinets and 10,368 compute node
processors.
(AMD Opteron _at_ 2.0 GHz)
10 TB of DDR memory _at_ 333 MHz
Red/Black switching - 1/4, 1/2, 1/4.
8 Service and I/O cabinets on each end (256
processors for each color).
240 TB of disk storage (120 TB per color).
Functional hardware partitioning - service and
I/O nodes, compute nodes, and RAS nodes.
Partitioned Operating System (OS) - LINUX on
service and I/O nodes, LWK (Catamount) on compute
nodes, stripped down LINUX on RAS nodes.
Separate RAS and system management network
(Ethernet).
Router table based routing in the interconnect.
Less than 2 MW total power and cooling.
Less than 3,000 square feet of floor space.

13
Red Storm Layout

Less than 2 MW total power and cooling.
Less than 3,000 square feet of floor space.
Separate RAS and system management network
(Ethernet).

3D Mesh 27 x 16 x 24 (x, y, z)
Red/Black split 2688 4992 2688
Service I/O 2 x 8 x 16

14
Red Storm Cabinet Layout

Compute Node Cabinet
3 Card Cages per Cabinet
8 Boards per Card Cage
4 Processors per Board
4 NIC/Router Chips per Board
N1 Power Supplies
Passive Backplane
Service and I/O Node Cabinet
2 Card Cages per Cabinet
8 Boards per Card Cage
2 Processors per Board
2 NIC/Router Chips per Board
PCI-X for each Processor
N1 Power Supplies
Passive Backplane

15
Red Storm Software

Operating Systems
LINUX on service and I/O nodes
LWK (Catamount) on compute nodes
LINUX on RAS nodes
File Systems
Parallel File System - Lustre (PVFS)
Unix File System - Lustre (NFS)
Run-Time System
Logarithmic loader
Node allocator
Batch system - PBS
Libraries - MPI, I/O, Math

Programming Model
Message Passing
Support for Heterogeneous Applications
Tools
ANSI Standard Compilers - Fortran, C, C
Debugger - TotalView
Performance Monitor
System Management and Administration
Accounting
RAS GUI Interface
Single System View

16
Red Storm Performance

Based on application code testing on production
AMD Opteron processors we are now expecting that
Red Storm will deliver around 10 X performance
improvement over ASCI Red on Sandias suite of
application codes.
Expected MP-Linpack performance - 30 TF.
Processors
2.0 GHz AMD Opteron (Sledgehammer)
Integrated dual DDR memory controllers _at_ 333 MHz
Page miss latency to local processor memory is
80 nano-seconds.
Peak bandwidth of 5.3 GB/s for each processor.
Integrated 3 Hyper Transport Interfaces _at_ 3.2
GB/s each direction
Interconnect performance
Latency lt2 µs (neighbor) lt5 µs (full machine)
Peak Link bandwidth 3.84 GB/s each direction
Bi-section bandwidth 2.95 TB/s Y-Z, 4.98 TB/s
X-Z, 6.64 TB/s X-Y
I/O system performance
Sustained file system bandwidth of 50 GB/s for
each color.
Sustained external network bandwidth of 25 GB/s
for each color.

17
HPC RD Efforts at SNL

Advanced Architectures
Next Generation Processor Interconnect
Technologies
Simulation and Modeling of Algorithm Performance
Message Passing
Portals
Application characterization of message passing
patterns
Light Weight Kernels
Project to design a next-generation lightweight
kernel (LWK) for compute nodes of a distributed
memory massively parallel system
Assess the performance, scalability, and
reliability of a lightweight kernel versus a
traditional monolithic kernel
Investigate efficient methods of supporting
dynamic operating system services
Light Weight File System
only critical I/O functionality (storage,
metadata mgmt, security)
special functionality implemented in I/O
libraries (above LWFS)
Light Weight OS
Linux configuration to eliminate the need of a
remote /root
Trimming the kernel to eliminate unwanted and
unnecessary daemons
Cluster Management Tools
Diskless Cluster Strategies and Techniques
Operating Systems Distribution and Initialization