Scalable Scientific Computing at Compaq - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

Scalable Scientific Computing at Compaq

Description:

Shell game for cheap (free) computing. Plethora of unsupported, incompatible, non-standard tools and interfaces 'Big Science' ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 52
Provided by: walke9
Learn more at: http://www.cisl.ucar.edu
Category:

less

Transcript and Presenter's Notes

Title: Scalable Scientific Computing at Compaq


1
Scalable Scientific Computing at Compaq
  • CAS 2001
  • Annecy, France
  • October 29 November 1, 2001
  • Dr. Martin Walker
  • Compaq Computer EMEA
  • martin.walker_at_compaq.com

2
Agenda of the entertainment
  • From EV4 to EV7 four implementations of the
    Alpha microprocessor over ten years
  • Performance on a few applications, including
    numerical weather forecasting
  • The Terascale Computing System at the Pittsburgh
    Supercomputing Center
  • Marvel the next (and last) AlphaServer
  • Grid Computing

3
Scientific basis for vector processor choice for
Earth Simulator project
  • Comparison of Cray T3D and Cray Y-MP/C90 J.J.
    Hack, et al, Computational design of the NCAR
    community climate model, J. Parallel Computing
    21 (1995) 1545-1569
  • Fraction of peak performance achieved
  • 1-7 on Cray T3D
  • 30 on Cray Y-MP/C90
  • Cray T3D used the Alpha EV4 processor from 1992

4
Key ratios that determine sustained application
performance (U.S. DoD/DoE)
5
Alpha EV6 Architecture
FETCH MAP QUEUE REG
EXEC DCACHE Stage 0 1
2 3 4
5 6
Int Reg Map
Int Issue Queue (20)
Branch Predictors
Exec
Reg File (80)
Addr
Exec
L1 Data Cache 64KB 2-Set
Reg File (80)
Exec
80 in-flight instructions plus 32 loads and 32
stores
Addr
Exec
Next-Line Address
4 Instructions / cycle
L1 Ins. Cache 64KB 2-Set
FP ADD Div/Sqrt
Reg File (72)
FP Issue Queue (15)
FP Reg Map
Victim Buffer
FP MUL
Miss Address
6
Weather Forecasting Benchmark
  • LM local model, German Weather Service (DWD)
    Current version is RAPS 2.0
  • Grid size is 325 ? 325 ? 35 predefined INPUT set
    dwd used for all benchmarks
  • First forecast hour timed (contains more I/O than
    subsequent forecast hours)
  • Machines
  • Cray T3E/1200 (EV5/600 MHz) in Jülich, Germany
  • AlphaServer SC40 (EV67/667 MHz) in Marlboro, MA
  • Study performed by Pallas GmbH (www.pallas.com)

7
Total time (AS SC40 vs. Cray T3E)
8
Performance comparisons
  • Alpha EV67/667 MHz in AS SC40 delivers about 3
    times the performance of EV5/600 MHz in Cray T3E
    to the LM application
  • EV5 is running at about 6.7 of peak
  • EV67 is running at about 18.5 of peak

9
Compilation Times
  • Cray T3EFlags -O3 -O aggress,unroll2,split1,pip
    eline2 Compilation time 41 min 37 sec
  • Compaq EV6/500 MHz (EV67 is faster)Flags
    -fast -O4Compilation time 5 min 15 sec
  • IBM SP3Flags -04 -qmaxmem-1Compilation
    time 40 min 19 secNote numeric_utilities.f90
    had to be compiled with -O3 in order to avoid
    crashes

10
SWEEP3D
  • 3D discrete ordinates (Sn) neutron transport
  • Implicit wavefront algorithm
  • Convergence to stable solution
  • Target System - multitasked PVP / MPP
  • Vector style code
  • High ratio of (load,stores) to flops
  • memory bandwidth and latency sensitive
  • performance is sensitive to grid size

11
SWEEP3D as is Performance
12
Optimizations to SWEEP3D
  • Fuse inner loops
  • demote temporary vectors to scalars
  • reduce load/store count
  • Separate loops with explicit values for i2
    -1,1
  • allows prefetch code to be generated
  • Fixup code moved outside loop
  • loop unrolling, pipelining

13
Instruction counts/iteration( measured cycles
on EV6)
14
Optimized SWEEP3D Performance
15
AlphaServer ES45 (EV68/1.001 GHz)
Each _at_ 64b (4.2GB/s)
SDRAM Memory 133 MHz 128MB - 32GB
Alpha 264
Alpha 264
Alpha 264
Alpha 264
Bank 0
Crossbar Switch (Typhoon chipset)
Quad Ctl
256b 4.2GB/s
Bank 1
L2 Cache
Data Slices (8)
L2 Cache
Bank 2
L2 Cache
Each _at_ 128b 8.0GB/s
L2 Cache
256b 4.2GB/s
PA
PP
Bank 3
PCI 0
32b_at_133MHz 512MB/s
64b 266MB/s
64b_at_66MHz 512MB/s
64b_at_66MHz 512MB/s
64b_at_66MHz 256MB/s
16
Pittsburgh Supercomputing Center (PSC)
  • Cooperative effort of
  • Carnegie Mellon University
  • University of Pittsburgh
  • Westinghouse Electric
  • Offices in Mellon Institute
  • On CMU campus
  • Adjacent to UofP campus

17
Westinghouse Electric
  • Energy Center, MonroevillePA
  • Major computing systems
  • High-speed network connections

18
Terascale Computing System at Pittsburgh
Supercomputing Center
  • Sponsored by the U.S. National Science Foundation
  • Integrated into the PACI program (Partnerships
    for Academic Computing Infrastructure)
  • Serving the very high end for academic
    computational science and engineering
  • The largest open facility in the world
  • PSC in collaboration with Compaq and with
  • Application scientists and engineers
  • Applied mathematicians
  • Computer Scientists
  • Facilities staff
  • Compaq AlphaServer SC technology

19
System Block Diagram
  • 3040 CPUs
  • Tru64 UNIX
  • 3 TB memory
  • 41 TB disk
  • 152 CPU cabs
  • 20 switch cabs

20
  • ES45 nodes
  • 5 per cabinet
  • 3 local disks

21
Row upon row
22
QuadricsSwitches
  • Rail 1
  • Rail 0

23
Middle Aisle, Switches in Center
24
QSW switch chassis
  • Fully wired switch chassis
  • 1 of 42

25
Control nodes and concentrators
26
The Front Row
27
Installation from 0 to 3.465 TFLOPS in 29 days
(Latest 4.059 TFLOPS on 3024 CPUs)
  • Deliveries continual integration
  • 44 nodes arrived at PSC on Saturday, 9-1-2001
  • 50 nodes arrived on Friday, 9-7-2001
  • 30 nodes arrived on Saturday, 9-8-2001
  • 50 nodes arrived on Monday, 9-10-2001
  • 180 nodes arrived on Wednesday, 9-12-2001
  • 130 nodes arrived on Sunday, 9-16-2001
  • 180 nodes arrived on Thursday, 9-20-2001
  • To have shipped 12 September!
  • Federated switch cabled/operational by 9-23-01
  • 760 nodes clustered by 9-24-01
  • 3.465 TFLOPS Linpack by 9-29-01
  • 4.059 TFLOPS in Dongarras list dated Mon Oct 22
    (67 of peak performance)

28
http//www.mmm.ucar.edu/mm5/mpp/helpdesk/20011023.
html
MM5
29
http//www.mmm.ucar.edu/mm5/mpp/helpdesk/20011023.
html
30
Alpha Microprocessor Summary
  • EV6
  • .35 ?m, 600 MHz
  • 4-wide superscalar
  • Out-of-order execution
  • High memory BW
  • EV67
  • .25 ?m, up to 750 MHz
  • EV68
  • .18 ?m, ?1000 MHz
  • EV7
  • .18 ?m, 1250 MHz
  • L2 cache on-chip
  • Memory control on-chip
  • I/O control on-chip
  • cc inter-proc com on-chip
  • EV79
  • .13 ?m, 1600 MHz

31
EV7 The System is the Silicon.
SMP CPU interconnect used to be external
logic Now its on the chip
  • EV68 core with enhancements
  • Integrated L2 cache
  • 1.75 MB (ECC)
  • 20 GB/s cache bandwidth
  • Integrated memory controllers
  • Direct RAMbus (ECC)
  • 12.8 GB/s memory bandwidth
  • Optional RAID in memory
  • Integrated network interface
  • Direct processor-processor interconnects
  • 4 links - 25.6 GB/s aggregate bandwidth
  • ECC (single error correct, double error detect)
  • 3.2 GB/s I/O interface per processor

32
Alpha EV7
33
EV7 The System is the Silicon.
EV7
Electronics to do cache-coherent communications
gets placed within the EV7 chip
34
Alpha EV7 Core
FETCH MAP QUEUE REG
EXEC DCACHE Stage 0 1
2 3 4
5 6
Int Reg Map
Int Issue Queue (20)
Branch Predictors
Exec
Reg File (80)
L2 cache 1.75MB 7-Set
Addr
Exec
L1 Data Cache 64KB 2-Set
Reg File (80)
Exec
80 in-flight instructions plus 32 loads and 32
stores
Addr
Exec
Next-Line Address
4 Instructions / cycle
L1 Ins. Cache 64KB 2-Set
FP ADD Div/Sqrt
Reg File (72)
FP Issue Queue (15)
FP Reg Map
Victim Buffer
FP MUL
Miss Address
35
Virtual Page Size
  • Current virtual page size
  • 8K
  • 64K
  • 512K
  • 4M
  • New virtual page size (boot time selection)
  • 64K
  • 2M
  • 64M
  • 512M

36
Performance
  • SPEC95
  • SPECint95 75
  • SPECfp95 160
  • SPEC2000
  • CINT2000 800
  • CFP2000 1200
  • 59 higher than EV68/1GHz

37
Building Block Approach to System Design
  • Key Components
  • EV7 Processor
  • IO7 I/O Interface
  • Dual Processor Module
  • Systems Grow by Adding
  • Processors
  • Memory
  • I/O

38
(No Transcript)
39
(No Transcript)
40
(No Transcript)
41
Two complementary views of the Grid
  • The hierarchy of understanding
  • Data are uninterpreted signals
  • Information is data equipped with meaning
  • Knowledge is information applied in practice to
    accomplish a task
  • The Internet is about information
  • The Grid is about knowledge
  • Tony Hey, Director, UK eScience Core Program
  • Main technologies developed by man
  • Writing captures knowledge
  • Mathematics enables rigorous understanding,
    prediction
  • Computing enables prediction of complex phenomena
  • The Grid enables intentional design of complex
    systems
  • Rick Stevens, ANL

42
What is the Grid?
  • A computational grid is a hardware and software
    infrastructure that provides dependable,
    consistent, pervasive, and inexpensive access to
    high-end computing capabilities.
  • Ian Foster and Carl Kesselman, editors, The
    GRID Blueprint for a New Computing
    Infrastructure (Morgan-Kaufmann Publishers, SF,
    1999) 677 pp. ISBN 1-55860-8
  • The Grid is an infrastructure to enable virtual
    communities to share distributed resources to
    pursue common goals
  • The Grid infrastructure consists of protocols,
    application programming interfaces, and software
    development kits to provide authentication,
    authorization, and resource location and access
  • Foster, Kesselman, Tuecke The anatomy of the
    Grid Enabling Scalable Virtual Organizations
    http//www.globus.org/research/papers.html

43
Compaq and The Grid
  • Sponsor of the Global Grid Forum
    (www.globalgridforum.org)
  • Founding member of the New Productivity
    Initiative for Distributed Resource Management
    (www.newproductivity.org)
  • Industrial member of the GridLab consortium
    (www.gridlab.org)
  • 20 leading European and US institutions
  • Infrastructure, applications, testbed
  • Cactus worm demo at SC2001 (www.cactuscode.org)
  • Intra-Grid within Compaq firewall
  • Nodes in Annecy, Galway, Nashua, Marlboro, Tokyo
  • Globus, Cactus, GridLab infrastructure and
    applications
  • iPAQ Pocket PC (www.ipaqlinux.com)

44
Potential dangers for the Grid
  • Solution in search of a problem
  • Shell game for cheap (free) computing
  • Plethora of unsupported, incompatible,
    non-standard tools and interfaces

45
Big Science
  • As with the Internet, scientific computing will
    be the first to benefit from the Grid. Examples
  • GriPhyN (US Grid Physics Network for
    Data-intensive Science)
  • Elementary particle physics, gravitational wave
    astronomy, optical astronomy (digital sky survey)
  • www.griphyn.org
  • DataGrid (led by CERN)
  • Analysis of data from scientific exploration
  • www.eu-datagrid.org
  • There are also compute-intensive applications
    that can benefit from the Grid

46
Final Thoughts all this will not be easy
  • How good have we been as a community at making
    parallel computing easy and transparent?
  • There are still some things we cant do
  • predict the El Niño phenomenon correctly
  • plate tectonics and earth mantel convection
  • failure mechanisms in new materials
  • Validation and verification of numerical
    simulation are crying needs

47
Thank You!
Please visit our HPTC Web Site http//www.compaq.c
om/hpc
48
(No Transcript)
49
Stability Continuity for AlphaServer customers
  • Commitment to continue implementing the Alpha
    Roadmap according to the current plan-of-record
  • EV68, EV7 EV79
  • Marvel systems
  • Tru64 UNIX support
  • AlphaServer systems, running Tru64 UNIX, will be
    sold as long as customers demand, at least
    several years after EV79 system arrive in 2004,
    with support continuing for a minimum of 5 years
    beyond that

50
Microprocessor and System Roadmaps
Alpha Processor
EV68
EV7
EV79
EV68
Itanium Processor Family
McKinley
Madison
Itanium Processor Family Next Generation
Itanium
EV68 Product Family GS 1 - 32P ES 1 4P
DS 1 2P
Alpha Servers
Next GenerationServer Family
McKinley family
Madison
Itanium 1 4P
8-64P, Blades, 2P, 4P, 8P
14P
ProLiant Servers
1-8P
1-32P
51
The New HP
  • Chairman and CEO Carly Fiorina
  • President Michael Capellas
  • Imaging and Printing 20B Vyamesh Joshi
  • Access Devices 29B Duane Zitzen
  • IT Infrastructure 23B Peter Blackmore
  • Services 15B Ann Livermore
Write a Comment
User Comments (0)
About PowerShow.com