Brief presentation of Earth Simulation Center - PowerPoint PPT Presentation

1 / 57

About This Presentation

Title:

Brief presentation of Earth Simulation Center

Description:

Highly parallel vector supercomputer of the distributed-memory type. 640 Processor nodes (PNs) ... Daewoo Lee. CS610. Terascale Cluster: System X ... – PowerPoint PPT presentation

Number of Views:309

Avg rating:3.0/5.0

Slides: 58

Provided by: camarsK

Category:

more less

Transcript and Presenter's Notes

Title: Brief presentation of Earth Simulation Center

1
Brief presentation of Earth Simulation Center

Jang, Jae-Wan

2
Hardware configuration

Highly parallel vector supercomputer of the
distributed-memory type
640 Processor nodes (PNs)
PN
8 vector-type arithmetic processors (APs)
16 GB main momory
Remote control and I/O parts

3
Arithmetic processor
4
Processor node
5
Processor node
6
Interconnection network
7
Interconnection Network
8
65m
50m
Earth Simulator Research and Development Center
9
Software

OS
NECs UNIX-based OS SUPER-UX
Programming model
Supported language
Fortran90, C, C (modified for ES)

hybrid flat
Inter-PN HPF/MPI HPF/MPI
Intra-PN Microtasking/OpenMP HPF/MPI
AP Automatic vectoriztion Automatic vectoriztion
10
Earth Simulator Center
First results from the Earth Simulator
Resolution ? 300km
11
Earth Simulator Center
First results from the Earth Simulator
Resolution ? 120km
12
Earth Simulator Center
First results from the Earth Simulator
Resolution ? 20km
13
Earth Simulator Center
First results from the Earth Simulator
Resolution ? 10km
14
First results from the Earth Simulator
? resolution 0.1º 0.1º ( ? 10km) ? initial
condition Levitus data (1982)? computer
resources number of nodes 175,
elapsed time ? 8,100
hours
15
First results from the Earth Simulator
16
Terascale ClusterSystem X

Virginia Tech, Apple, Mellanox, Cisco, and
Liebert
2003. 3. 16
Daewoo Lee

17
Terascale Cluster System X

A Groundbreaking Supercomputer Cluster with
Industrial Assistance
Apple, Mellanox, Cisco, and Liebert
5.2 million for hardware
10280/17600 GFlops of Performance with 1100 Nodes
(3rd Ranked in TOP500 Supercomputer Site)

18
Goals
Dual Usage Mode (90 of computational cycles
devoted to production use)
19
Hardware Architecture
Node Apple G5 Platform Dual IBM PowerPC 970 (64-bit CPU)
Primary Communication InfiniBand by Mellanox (20Gbps full duplex, fat-tree topology)
Secondary Communication Gigabit Ethernet by Cisco
Cooling System by Liebert
20
Software

Mac OS X (FreeBSD based)
MPI-2 (MPICH-2)
Support C/C/Fortran compilation
Déjà vu transparent fault-tolerance system
Maintain computer stability by transferring a
failed application to another location without
alerting the computer, thus keeping the
application intact.

21
Reference

Terascale Cluster Web Site
http//computing.vt.edu/research_computing/terasca
le

22
4th fastest supercomputerTungsten
PAK, EUNJI
23
4th NCSA Tungsten

Top500.org
National Center for Supercomputing Applications
(NCSA)
University of Illinois at Urbana-Champaign

24
Tungsten Architecture 1/3

Tungsten

Xeon 3.0 GHz Dell cluster
2,560 processors
3 GB memory/node
Peak performance 15.36 TF
Top 500 list debut 4 (9.819 TF, November 2003)
Currently 4th fastest supercomputer in the world

25
Tungsten Architecture 2/3

Components

26
Tungsten Architecture 3/3

1450 nodes
Dell PowerEdge 1750 Server
Intel Xeon 3.06GHZ Peak performance 6.12GFLOPS
1280 compute nodes, 104 I/O nodes
Parallel I/O
11.1 Gigabytes per second (GB/s) of I/O
throughput
Complements the clusters 9.8TFLOPS of
computational capability
104 node I/O sub-cluster with more than 120TB
Node local 73GB, Shared 122TB

27
Applications on Tungsten 1/3

PAPI and PerfSuite
PAPI Portable interface to hardware performance
counters
PerfSuite Set of tools for performance analysis
on Linux platforms

28
Applications on Tungsten 2/3

PAPI and PerfSuite

29
Applications on Tungsten 3/3

CHARMM (Harvard Version)
Chemistry at Harvard Macromolecular Mechanics
General purpose molecular mechanics, molecular
dynamics and vibrational analysis packages
Amber 7.0
A set of molecular mechanical force fields for
the simulation of bimolecular
Package of molecular simulation programs

30
MPP2 SupercomputerThe worlds largest Itanium2
cluster.

Molecular Science Computing Facility
Pacific Northwest National Laboratory
2004. 3. 16
Presentation Kim SangWon

31
Contents

MPP2 Supercomputer Overview
Configuration
HP rx2600(Longs Peak) Node
QsNet ELAN Interconnect Network
System/Application Software
File System
Future Plan

32
MPP2 Overview

MPP2
The High Performance Computing System-2
At the Molecular Science Computing Facilityin
the William R. Wiley Environmental Molecular
Sciences Laboratoryat Pacific Northwest National
Laboratory
the fifth-fastest supercomputer in the world in
the November 2003

33
MPP2 Overview

System Name Mpp2
Linux Supercomputer cluster
11.8(8.633) Teraflops
6.8 Terabytes of memory
Purpose Production
Platform HP Integrity rx2600
bi-Itanium2 1,5 Ghz
Nodes 980 (Processors 1960)
¾ Megawatt of power
220 Tons of Air Conditioning
4,000 Sq. Ft.
Cost 24.5 million (estimated)

UPS
Generator
34
Configuration(Phase2b)
Operational September 2003
1,900 next generation Itanium processors
11.4TF 6.8TB Memory
1,856 Madison Batch CPUs
928 compute nodes
...
Elan4 Not Operational
Elan4
Elan3

Lustre
SAN / 53TB
2 System Mgt nodes
4 Login nodes with 4Gb-Enet
35
HP rx2600 Longs Peak Node Architecture

Each node has
2 Intel Itanium 2 Processors(1.5Ghz)
6.4GB/s System bus
8.5GB/s Memory bus
12GB of RAM
1 1000T Connection
1 100T Connection
1 Serial Connection
2 Elan3 Connections

Elan3
PCI-X2 (1GB/s)
Elan3
2SCSI160
36
QsNet ELAN Interconnect Network

High bandwidth, Ultra low latency and scalability
900Mbytes/s user space to user space bandwidth.
1024 nodes for standard QsNet conf., rising to
4096 in QsNetII systems.
Optimized libraries for common distributed memory
programming models exploit the full capabilities
of the base hardware.

37
Software on MPP2 (1/2)

System Software
Operating System - Red Hat Linux 7.2 Advanced
Server
NWLinux tailored to IA64 clusters (2.4.18
kernel with various patches)
Cluster Management Resource Management
System(RMS) by Quadrix
A single point interface to the system for
resource management
Monitoring, Fault diagnosis, Data collection,
Allocating CPUs, Parallel jobs execution
Job Management Software
LSF(Load Sharing Facility) Batch Scheduler
QBank Control and Manage CPU resources
allocated to projects or users.
Compiler Software
C (ecc), F77/F90/F95 (efc), G
Code Development
Etnus TotalView
A parallel and multithreaded application debugger
Vampir
the GUI driven frontend used to visualize the
profile data of running a program
gdb

38
Software on MPP2 (2/2)

Application Software
Quantum Chemistry Codes
GAMESS(The General Atomic and Molecular
Electronic Structure System)
performing a variety of ab initio molecular
orbital (MO) calculations
MOLPRO
an advanced ab initio quantum chemistry software
package
NWChem
computational chemistry software developed by
EMSL
ADF (Amsterdam Density Functional) 2000
software for first-principle electronic structure
calculations via Density-Functional Theory (DFT)
General Molecular Modeling Software
Amber
Unstructured Mesh Modeling Codes
NWGrid (Grid Generator)
hybrid mesh generation, mesh optimization, and
dynamic mesh maintenance
NWPhys (Unstructured Mesh Solvers)
a 3D, full-physics, first principles,
time-domain, free-Lagrange code for parallel
processing using hybrid grids.

39
File System on MPP2

Four file systems available on the cluster
Local filesystem(/scratch)
On each of the compute nodes
Non-persistent storage area provided to a
parallel job running on that node.
NFS filesystem(/home)
User home directory and files are located.
Uses RAID-5 for reliability
Lustre Global filesystem(/dtemp)
Designed for the world's largest high-performance
compute clusters.
Aggregate write rate of 3.2 Gbyte/s.
Restart files and files needed for post analysis.
Long term global scratch space
AFS filesystem(/msrc)
On the front-end (non-compute) nodes

40
Future Plan

MPP2 will be upgraded with the faster Quadrics
QsNetII interconnect in early 2004

928 compute nodes
1,856 Madison Batch CPUs
...
Elan4

Lustre
SAN / 53TB
4 Login nodes with 4Gb-Enet
2 System Mgt nodes
41
Bluesky Supercomputer

Top 500 Supercomputers
CS610 Parallel Processing
Donghyouk Lim
(Dept of Computer Science, KAIST)

42
Contents

Introduction
National Center for Atmosphere Research
Scientific Computing Division
Hardware
Software
Recommendations for usage
Related Link

43
Introduction

Bluesky
13th Supercomputer in the world
Clustered Symmetric Multi-Processing(SMP) System
1600 IBM Power 4 processor
Peak of 8.7 TFLOP

44
National Center for Atmosphere Research

Established in 1960
Located in Boulder, Colorado
Research area
Earth system
Climate change
Changes in atmospheric composition

45
Scientific Computing Division

Research on high-performance supercomputing
Computing resources
Bluesky (IBM Cluster 1600 running AIX) 13th
place
blackforest (IBM SP RS/6000 running AIX) 80th
place
Chinook complex Chinook (SGI Origin3800 running
IRIX) and Chinook (SGI Origin2100 running IRIX)

46
Hardware

Processor
1600 Power 4 Processors 1.3 GHz
each can perform up to 4 fp operations per cycle
Peak of 8.7 TFLOPS
Memory
2 GB memory per processor
memory on a node is shared between processors on
that node
Memory Caches
L1 cache 64KB I-cache, 32KB d-cache, direct
mapped
L2 cache For pair of processors, 1.44MB, 8-way
set associative
L3 cache 32MB, 512byte cache line, 8-way set
associative

47
Hardware

Computing Nodes
8-way processor nodes 76
32-way processor nodes 25
32-processor nodes for running interactive jobs
4
Separate nodes for user logins
System support nodes
12 nodes dedicated to the General Parallel File
System (GPFS)
Four nodes dedicated to HiPPI communications to
the Mass Storage System
Two master nodes dedicated to controlling
LoadLeveler operations
One dedicated system monitoring node
One dedicated test node for system
administration, upgrades, testing

48
Hardware

Storage
RAID disk storage capacity 31.0 TB total
Each user application can access 120 GB of
temporary space
Interconnect fabric
SP switch2 (Colony switch)
Two full duplex network path to increase
throughput
Bandwidth 1.0GB per second bidirectional
Worst case latency 2.5 microsecond
HiPPI(High-Performance Parallel Interface) to the
Mass Storage System
Gigabit Ethernet network

49
Software

Operating System AIX (IBM-proprietary UNIX)
Compilers Fortran (95/90/77), C, C
Batch subsystem LoadLeveler
Managing serial and parallel jobs over a cluster
of servers
File System General Parallel File System (GPFS)
System information commands spinfo for general
information, lslpp for information about
libraries

50
Related Links

NCAR http//www.ncar.ucar.edu/ncar/
SCD http//www.scd.ucar.edu/
Bluesky http//www.scd.ucar.edu/computers/bluesk
y/
IBM p690 http//www-903.ibm.com/kr/eserver/pseri
es/highend/p690.html

51
About Cray X1

Kim, SooYoung (sykim_at_camars.kaist.ac.kr)
(Dept of Computer Science, KAIST)

52
Features (1/2)

Contributing areas
weather and climate prediction, aerospace
engineering, automotive design, and a wide
variety of other applications important in
government and academic research
Army High Performance Computing Research Center
(AHPCRC), Boeing, Ford, Warsaw Univ., U.S.
Government, Department of Energy's Oak Ridge
National Laboratory (ORNL)
Operating System UNICOS/mptm from UNICOS,
UNICOS/mktm
True single system image (SSI)
Scheduling algorithms for parallel applications
Accelerated application mode and migration
Variable processor utilization Each CPU has four
internal processors
Together as a closely coupled, multistreaming
processor (MSP)
Individually as four single-streaming processors
(SSPs)
Flexible system partitioning

53
Features (2/2)

Scalable system architecture
Distributed shared memory (DSM)
Scalable cache coherence protocol
Scalable address translation
Parallel programming models
Shared-memory parallel models
Traditional distributed-memory parallel models
MPI and SHMEM
Up-and-coming global distributed-memory parallel
models Unified Parallel C(UPC)
Programming environments
Fortran compiler, C and C compiler
High-performance scientific library (LibSci),
language support libraries, system libraries
Etnus TotalView debugger, CrayPat (Cray
Performance Analysis Tool)

54
Node Architecture
Figure 1. Node, Containing Four MSPs
55
System Conf. Examples
Cabinets CPUs Memory Peak Performance
1 (AC) 16 64 256 GB 204.8 Gflops
1 64 256 1024 GB 819.0 Gflops
4 256 1024 4096 GB 3.3 Tflops
8 512 2048 8192 GB 6.6 Tflops
16 1024 4096 16384 GB 13.1 Tflops
32 2048 8192 32768 GB 26.2 Tflops
64 4096 16384 65536 GB 52.4 Tflops
56
Technical Data (1/2)
Technical specifications Technical specifications
Peak performance 52.4 Tflops in a 64 cabinet configuration
Architecture Scalable vector MPP with SMP nodes
Processing element Processing element
Processor Cray custom design vector CPU 16 vector floating-point operations/clock cycle 32- and 64-bit IEEE arithmetic
Memory size 16 to 64GB per node
Data error protection SECDED
Vector clock speed 800MHz
Peak performance 12.8 Gflops per CPU
Peak memory bandwidth 34.1 GB/sec per CPU
Peak cache bandwidth 76.8 GB/sec per CPU
Packaging 4 CPUs per node Up to 4 nodes per AC cabinet, up to 4 interconnected cabinets Up to 16 nodes per LC cabinet, up to 64 interconnected cabinets
57
Technical Data (2/2)
Memory Memory
Technology RDRAM with 204 GB/sec peak bandwidth per node
Architecture Cache coherent, physically distributed, globally addressable
Total system memory size 32 GB to 64 TB
Interconnect network Interconnect network
Topology Modified 2D torus
Peak global bandwidth 400 GB/sec for a 64-CPU Liquid Cooled (LC) system
I/O I/O
I/O system port channels 4 per node
Peak I/O bandwidth 1.2 GB/sec per channel

Write a Comment

User Comments (0)