Title: Scalable Scientific Computing at Compaq
1Scalable Scientific Computing at Compaq
- CAS 2001
- Annecy, France
- October 29 November 1, 2001
- Dr. Martin Walker
- Compaq Computer EMEA
- martin.walker_at_compaq.com
2Agenda of the entertainment
- From EV4 to EV7 four implementations of the
Alpha microprocessor over ten years - Performance on a few applications, including
numerical weather forecasting - The Terascale Computing System at the Pittsburgh
Supercomputing Center - Marvel the next (and last) AlphaServer
- Grid Computing
3Scientific basis for vector processor choice for
Earth Simulator project
- Comparison of Cray T3D and Cray Y-MP/C90 J.J.
Hack, et al, Computational design of the NCAR
community climate model, J. Parallel Computing
21 (1995) 1545-1569 - Fraction of peak performance achieved
- 1-7 on Cray T3D
- 30 on Cray Y-MP/C90
- Cray T3D used the Alpha EV4 processor from 1992
4Key ratios that determine sustained application
performance (U.S. DoD/DoE)
5Alpha EV6 Architecture
FETCH MAP QUEUE REG
EXEC DCACHE Stage 0 1
2 3 4
5 6
Int Reg Map
Int Issue Queue (20)
Branch Predictors
Exec
Reg File (80)
Addr
Exec
L1 Data Cache 64KB 2-Set
Reg File (80)
Exec
80 in-flight instructions plus 32 loads and 32
stores
Addr
Exec
Next-Line Address
4 Instructions / cycle
L1 Ins. Cache 64KB 2-Set
FP ADD Div/Sqrt
Reg File (72)
FP Issue Queue (15)
FP Reg Map
Victim Buffer
FP MUL
Miss Address
6Weather Forecasting Benchmark
- LM local model, German Weather Service (DWD)
Current version is RAPS 2.0 - Grid size is 325 ? 325 ? 35 predefined INPUT set
dwd used for all benchmarks - First forecast hour timed (contains more I/O than
subsequent forecast hours) - Machines
- Cray T3E/1200 (EV5/600 MHz) in Jülich, Germany
- AlphaServer SC40 (EV67/667 MHz) in Marlboro, MA
- Study performed by Pallas GmbH (www.pallas.com)
7Total time (AS SC40 vs. Cray T3E)
8Performance comparisons
- Alpha EV67/667 MHz in AS SC40 delivers about 3
times the performance of EV5/600 MHz in Cray T3E
to the LM application - EV5 is running at about 6.7 of peak
- EV67 is running at about 18.5 of peak
9Compilation Times
- Cray T3EFlags -O3 -O aggress,unroll2,split1,pip
eline2 Compilation time 41 min 37 sec - Compaq EV6/500 MHz (EV67 is faster)Flags
-fast -O4Compilation time 5 min 15 sec - IBM SP3Flags -04 -qmaxmem-1Compilation
time 40 min 19 secNote numeric_utilities.f90
had to be compiled with -O3 in order to avoid
crashes
10SWEEP3D
- 3D discrete ordinates (Sn) neutron transport
- Implicit wavefront algorithm
- Convergence to stable solution
- Target System - multitasked PVP / MPP
- Vector style code
- High ratio of (load,stores) to flops
- memory bandwidth and latency sensitive
- performance is sensitive to grid size
11SWEEP3D as is Performance
12Optimizations to SWEEP3D
- Fuse inner loops
- demote temporary vectors to scalars
- reduce load/store count
- Separate loops with explicit values for i2
-1,1 - allows prefetch code to be generated
- Fixup code moved outside loop
- loop unrolling, pipelining
13Instruction counts/iteration( measured cycles
on EV6)
14Optimized SWEEP3D Performance
15AlphaServer ES45 (EV68/1.001 GHz)
Each _at_ 64b (4.2GB/s)
SDRAM Memory 133 MHz 128MB - 32GB
Alpha 264
Alpha 264
Alpha 264
Alpha 264
Bank 0
Crossbar Switch (Typhoon chipset)
Quad Ctl
256b 4.2GB/s
Bank 1
L2 Cache
Data Slices (8)
L2 Cache
Bank 2
L2 Cache
Each _at_ 128b 8.0GB/s
L2 Cache
256b 4.2GB/s
PA
PP
Bank 3
PCI 0
32b_at_133MHz 512MB/s
64b 266MB/s
64b_at_66MHz 512MB/s
64b_at_66MHz 512MB/s
64b_at_66MHz 256MB/s
16Pittsburgh Supercomputing Center (PSC)
- Cooperative effort of
- Carnegie Mellon University
- University of Pittsburgh
- Westinghouse Electric
- Offices in Mellon Institute
- On CMU campus
- Adjacent to UofP campus
17Westinghouse Electric
- Energy Center, MonroevillePA
- Major computing systems
- High-speed network connections
18Terascale Computing System at Pittsburgh
Supercomputing Center
- Sponsored by the U.S. National Science Foundation
- Integrated into the PACI program (Partnerships
for Academic Computing Infrastructure) - Serving the very high end for academic
computational science and engineering - The largest open facility in the world
- PSC in collaboration with Compaq and with
- Application scientists and engineers
- Applied mathematicians
- Computer Scientists
- Facilities staff
- Compaq AlphaServer SC technology
19System Block Diagram
- 3040 CPUs
- Tru64 UNIX
- 3 TB memory
- 41 TB disk
- 152 CPU cabs
- 20 switch cabs
20- ES45 nodes
- 5 per cabinet
- 3 local disks
21Row upon row
22QuadricsSwitches
23Middle Aisle, Switches in Center
24QSW switch chassis
- Fully wired switch chassis
- 1 of 42
25Control nodes and concentrators
26The Front Row
27Installation from 0 to 3.465 TFLOPS in 29 days
(Latest 4.059 TFLOPS on 3024 CPUs)
- Deliveries continual integration
- 44 nodes arrived at PSC on Saturday, 9-1-2001
- 50 nodes arrived on Friday, 9-7-2001
- 30 nodes arrived on Saturday, 9-8-2001
- 50 nodes arrived on Monday, 9-10-2001
- 180 nodes arrived on Wednesday, 9-12-2001
- 130 nodes arrived on Sunday, 9-16-2001
- 180 nodes arrived on Thursday, 9-20-2001
- To have shipped 12 September!
- Federated switch cabled/operational by 9-23-01
- 760 nodes clustered by 9-24-01
- 3.465 TFLOPS Linpack by 9-29-01
- 4.059 TFLOPS in Dongarras list dated Mon Oct 22
(67 of peak performance)
28http//www.mmm.ucar.edu/mm5/mpp/helpdesk/20011023.
html
MM5
29http//www.mmm.ucar.edu/mm5/mpp/helpdesk/20011023.
html
30Alpha Microprocessor Summary
- EV6
- .35 ?m, 600 MHz
- 4-wide superscalar
- Out-of-order execution
- High memory BW
- EV67
- .25 ?m, up to 750 MHz
- EV68
- .18 ?m, ?1000 MHz
- EV7
- .18 ?m, 1250 MHz
- L2 cache on-chip
- Memory control on-chip
- I/O control on-chip
- cc inter-proc com on-chip
- EV79
- .13 ?m, 1600 MHz
31EV7 The System is the Silicon.
SMP CPU interconnect used to be external
logic Now its on the chip
- EV68 core with enhancements
- Integrated L2 cache
- 1.75 MB (ECC)
- 20 GB/s cache bandwidth
- Integrated memory controllers
- Direct RAMbus (ECC)
- 12.8 GB/s memory bandwidth
- Optional RAID in memory
- Integrated network interface
- Direct processor-processor interconnects
- 4 links - 25.6 GB/s aggregate bandwidth
- ECC (single error correct, double error detect)
- 3.2 GB/s I/O interface per processor
32Alpha EV7
33EV7 The System is the Silicon.
EV7
Electronics to do cache-coherent communications
gets placed within the EV7 chip
34Alpha EV7 Core
FETCH MAP QUEUE REG
EXEC DCACHE Stage 0 1
2 3 4
5 6
Int Reg Map
Int Issue Queue (20)
Branch Predictors
Exec
Reg File (80)
L2 cache 1.75MB 7-Set
Addr
Exec
L1 Data Cache 64KB 2-Set
Reg File (80)
Exec
80 in-flight instructions plus 32 loads and 32
stores
Addr
Exec
Next-Line Address
4 Instructions / cycle
L1 Ins. Cache 64KB 2-Set
FP ADD Div/Sqrt
Reg File (72)
FP Issue Queue (15)
FP Reg Map
Victim Buffer
FP MUL
Miss Address
35Virtual Page Size
- Current virtual page size
- 8K
- 64K
- 512K
- 4M
- New virtual page size (boot time selection)
- 64K
- 2M
- 64M
- 512M
36Performance
- SPEC95
- SPECint95 75
- SPECfp95 160
- SPEC2000
- CINT2000 800
- CFP2000 1200
- 59 higher than EV68/1GHz
37Building Block Approach to System Design
- Key Components
- EV7 Processor
- IO7 I/O Interface
- Dual Processor Module
- Systems Grow by Adding
- Processors
- Memory
- I/O
38(No Transcript)
39(No Transcript)
40(No Transcript)
41Two complementary views of the Grid
- The hierarchy of understanding
- Data are uninterpreted signals
- Information is data equipped with meaning
- Knowledge is information applied in practice to
accomplish a task - The Internet is about information
- The Grid is about knowledge
- Tony Hey, Director, UK eScience Core Program
- Main technologies developed by man
- Writing captures knowledge
- Mathematics enables rigorous understanding,
prediction - Computing enables prediction of complex phenomena
- The Grid enables intentional design of complex
systems - Rick Stevens, ANL
42What is the Grid?
- A computational grid is a hardware and software
infrastructure that provides dependable,
consistent, pervasive, and inexpensive access to
high-end computing capabilities. - Ian Foster and Carl Kesselman, editors, The
GRID Blueprint for a New Computing
Infrastructure (Morgan-Kaufmann Publishers, SF,
1999) 677 pp. ISBN 1-55860-8 - The Grid is an infrastructure to enable virtual
communities to share distributed resources to
pursue common goals - The Grid infrastructure consists of protocols,
application programming interfaces, and software
development kits to provide authentication,
authorization, and resource location and access - Foster, Kesselman, Tuecke The anatomy of the
Grid Enabling Scalable Virtual Organizations
http//www.globus.org/research/papers.html
43Compaq and The Grid
- Sponsor of the Global Grid Forum
(www.globalgridforum.org) - Founding member of the New Productivity
Initiative for Distributed Resource Management
(www.newproductivity.org) - Industrial member of the GridLab consortium
(www.gridlab.org) - 20 leading European and US institutions
- Infrastructure, applications, testbed
- Cactus worm demo at SC2001 (www.cactuscode.org)
- Intra-Grid within Compaq firewall
- Nodes in Annecy, Galway, Nashua, Marlboro, Tokyo
- Globus, Cactus, GridLab infrastructure and
applications - iPAQ Pocket PC (www.ipaqlinux.com)
44Potential dangers for the Grid
- Solution in search of a problem
- Shell game for cheap (free) computing
- Plethora of unsupported, incompatible,
non-standard tools and interfaces
45Big Science
- As with the Internet, scientific computing will
be the first to benefit from the Grid. Examples - GriPhyN (US Grid Physics Network for
Data-intensive Science) - Elementary particle physics, gravitational wave
astronomy, optical astronomy (digital sky survey) - www.griphyn.org
- DataGrid (led by CERN)
- Analysis of data from scientific exploration
- www.eu-datagrid.org
- There are also compute-intensive applications
that can benefit from the Grid
46Final Thoughts all this will not be easy
- How good have we been as a community at making
parallel computing easy and transparent? - There are still some things we cant do
- predict the El Niño phenomenon correctly
- plate tectonics and earth mantel convection
- failure mechanisms in new materials
- Validation and verification of numerical
simulation are crying needs
47Thank You!
Please visit our HPTC Web Site http//www.compaq.c
om/hpc
48(No Transcript)
49Stability Continuity for AlphaServer customers
- Commitment to continue implementing the Alpha
Roadmap according to the current plan-of-record - EV68, EV7 EV79
- Marvel systems
- Tru64 UNIX support
- AlphaServer systems, running Tru64 UNIX, will be
sold as long as customers demand, at least
several years after EV79 system arrive in 2004,
with support continuing for a minimum of 5 years
beyond that
50Microprocessor and System Roadmaps
Alpha Processor
EV68
EV7
EV79
EV68
Itanium Processor Family
McKinley
Madison
Itanium Processor Family Next Generation
Itanium
EV68 Product Family GS 1 - 32P ES 1 4P
DS 1 2P
Alpha Servers
Next GenerationServer Family
McKinley family
Madison
Itanium 1 4P
8-64P, Blades, 2P, 4P, 8P
14P
ProLiant Servers
1-8P
1-32P
51The New HP
- Chairman and CEO Carly Fiorina
- President Michael Capellas
- Imaging and Printing 20B Vyamesh Joshi
- Access Devices 29B Duane Zitzen
- IT Infrastructure 23B Peter Blackmore
- Services 15B Ann Livermore