Experiences With a Rx4610 Itanium Server - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Experiences With a Rx4610 Itanium Server

Description:

System purchased under-configured with 2 GB memory poor ... Glibc: glibc-2.2.4-19. Kernel: kernel-2.4.9-6. Intel Compilers. C/C : intel-ecc64-5.0.1-88 (v5) ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 34
Provided by: DENN267
Category:

less

Transcript and Presenter's Notes

Title: Experiences With a Rx4610 Itanium Server


1
Experiences With a Rx4610 Itanium Server
  • John Dennis
  • February 14,2002

2
Introduction
  • Evaluation of Itanium 4-way server
  • System Description
  • Software environment
  • HPUX
  • Linux
  • Benchmark Results
  • Synthetic
  • Applications
  • Conclusions

3
System Description
  • HP rx4610 4-way server
  • 733Mhz Itanium with 2 MB cache
  • 8 GB memory
  • 512 MB parts
  • 2 memory cards
  • ¼ of total slots populated
  • System purchased under-configured with 2 GB
    memory ? poor memory bandwidth
  • 7U form factor mounted in 19 rack

4
HPUX versions
  • HPUX 11i V1.5
  • With the following patches
  • PHKL_24814,24872,24856,24923,24879,24877,25011,24
    974
  • PHCO_24924,24927,25241,25173
  • PHSS_24468,24648,25860,24461,24463,24464,24465,24
    466,24467,24469,24470,24471,24472,24653,24654,2545
    4,25464,25715,26068,26080

5
Linux Versions
  • RH7.1
  • Glibc glibc-2.2.4-19
  • Kernel kernel-2.4.9-6
  • Intel Compilers
  • C/C intel-ecc64-5.0.1-88 (v5)
  • Fortran intel-efc64-5.0.1-49 (v5)
  • intel-efc6-6.0b-111 (v6)

6
Benchmarks
  • Subset for recent NCAR procurement and internal
    developed code
  • Synthetic/Kernels
  • Streams
  • Shallow Water
  • RADABS
  • ELEFUNT
  • Application
  • 3D FFT
  • SEAM3D
  • CCM3

7
Streams/Pstreams
  • Pstreams a slight modification to McCalpin
    original benchmark ? use OpenMP for parallelism
  • Sweep out a curve to test cache and memory
    effects. (arrays 1-16 Mbytes)
  • IA64 shows good performance despite smaller cache
    than IBM Power 3

8
Streams Single Processor
9
Pstreams IA64 (HPUX)
10
Pstreams IA64 (Linux v6)
11
Shallow Water Model
  • Shallow Water Model part of SPECFP
  • Good test of maximum FP performance
  • Code run with varying problem size to check
  • Cache effects
  • Memory effects
  • Other magic numbers

12
Shallow Water Model Performance on NxN Domain
13
Intrinsic Functions
  • Speed of Intrinsic Functions have large impact on
    our codes
  • Some codes spend 20 of time calculating EXP or
    LOG
  • HPUX on IA64 gives best numbers we have seen
  • PWR needs a bit of work since ab
    exp(blog(a))
  • All numbers in Million of calls per second

14
Intrinsic Function Performance
15
RADABS
  • Single most expensive routine in most used code
    at NCAR (CCM3)
  • Good indicator of maximum performance of (CCM3)
  • Tests ability of compiler to optimize large
    complex loops and math intrinsic
  • do i1,NX
  • do j1,NY ? NX x NY
  • Smaller loops ( 4x8, 2x4, 1x2)
  • Linux compiler was trying to unroll and got
    register allocation problems ? turn off unrolling
    to get better performance

16
RADABS Performance
17
3D-FFT
  • 3D-FFT part of NCARs Spectral Toolkit (STK)
  • Written in C and pthreads
  • Tests
  • C optimization
  • pthread implementation
  • memory bandwidth
  • Smaller problem (643) ? thread overhead and
    cache performance
  • Larger problem (2563) ? total system bandwidth

18
3-D FFT 643
19
3-D FFT 2563
20
SEAM3D C56L16
  • Spectral Element Atmospheric Model (SEAM3D)
  • F90 code with MPI, OpenMP or OpenMP/MPI
    parallelism
  • Tests
  • Optimization of complex f90 code
  • Quality of OpenMP implementation

21
SEAM3D OMP C56l16
22
OpenMP Overhead (SEAM)
  • What impact does OpenMP have on a single threaded
    executable?
  • SEAM has OMP at VERY high level ? low overhead

23
SEAM3D MPI C56L16
24
CCM3
  • Community Climate Model 3
  • Consumes 50 of all computer cycles at NCAR
  • Tests
  • FORTRAN optimization of large complex code
  • Large subset of FORTRAN language features
  • Performance it highly dependent on memory
    bandwidth
  • Optimization of OpenMP codes
  • HPUX and Linux could not generate a valid OpenMP
    executable

25
CCM3 MPI T42L18
26
Why does CCM3 run so slow on HPUX?
  • Based on RADABS performance ? should beat Linux
    but does not
  • Total system time ? allocate running slowly?
  • IBM 6 sec
  • IA64 (HPUX) 105 sec 5 of total time
  • IA64 (Linux) 36 sec
  • Work in progress

27
Conclusions
  • IA64 architecture
  • offers competitive performance to IBM pwr3 in
    early state of development
  • Intrinsic performance shows potential of
    architecture
  • Operating Systems
  • HPUX
  • Compilers usually generate faster code (except
    OMP)
  • Linux
  • Better development environment
  • Better compiler support

28
Conclusions (cont)
  • Summation
  • IA64 is not yet ready for production computing

29
RADABS Performance in Mflops
30
Intrinsic Function (Mfunc/sec)
31
3-D FFT
32
SEAM3D C56L16
33
CCM3 T42L16
Write a Comment
User Comments (0)
About PowerShow.com