The%20performance%20of%20NAMD%20on%20a%20large%20Power4%20system - PowerPoint PPT Presentation

About This Presentation
Title:

The%20performance%20of%20NAMD%20on%20a%20large%20Power4%20system

Description:

The performance of NAMD on a large Power4 system – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 13
Provided by: Joac91
Learn more at: http://www.spscicomp.org
Category:

less

Transcript and Presenter's Notes

Title: The%20performance%20of%20NAMD%20on%20a%20large%20Power4%20system


1
The performance of NAMD on a large Power4 system
  • Joachim Hein
  • EPCC, The University of Edinburgh

2
Measurement based load balancing
  • NAMD measures its performance for the first 200
    steps
  • Redistributes the work load to optimise the
    performance
  • Performance benefit for larger number of
    processors
  • Benchmark time Better estimate for production
    jobs from short jobs

3
Measurement based load balancing
4
Loadbalance
  • Example
  • 128 CPUs
  • 96769 atoms
  • 32000 iters
  • All but one CPUs in a narrow Window
  • Effect of slow guy negligible

5
MP_EAGER_LIMIT
  • Environment variable MP_EAGER_LIMIT changes the
    behaviour of MPI
  • Messages smaller than MP_EAGER_LIMIT are send
    instantaneous
  • Messages larger than MP_EAGER_LIMIT are send
    using hand-shake
  • Default value is small and not optimal for NAMD

Tune it!
6
MP_EAGER_LIMIT
7
Sample loadleveler script
  • _at_ shell /bin/ksh
  • _at_ job_type parallel
  • _at_ network.MPI csss,shared,us
  • _at_ account_no z001
  • _at_ output namd_run.(schedd_host)_(jobid).out
  • _at_ error namd_run.(schedd_host)_(jobid).err
  • _at_ wall_clock_limit 003000
  • _at_ node 1
  • _at_ tasks_per_node 8
  • _at_ queue
  • export MP_SHARED_MEMORYyes
  • export MP_EAGER_LIMIT65536
  • poe path/namd2 inputfile

8
Benchmark
  • Joint Amber Charm (JAC) Benchmark
  • Dihydrofolate reductase in water, 23558 atoms
  • www.scripps.edu/brooks/Benchmarks
  • Apo A-1 benchmark
  • Apolipoprotein A-1, 92224 atoms
  • www.ks.uiuc.edu/Research/apoa1
  • TCR peptide-MHC
  • 96796 atoms
  • www.hpcx.ac.uk/about/newsletter/HPCxNews02.pdf
  • F1-ATP synthase
  • F1 subunit of ATP synthase, 327506 atoms
  • www.sc-2002.org/paperpdfs/pap.pap277.pdf

9
The HPCx system
  • Presently
  • 40 IBM p690 Regata H frames
  • 32 POWER4 processors per frame (1.3 GHz)
  • Frames subdivided into LPARs of 8 processors
  • 8 GB of main memory per LPAR
  • IBM SP Switch2 (Colony) network
  • 2 switch adapters per LPAR
  • Dual plane
  • Future (Summer 2004)
  • Upgrade to p690 frames (1.7 GHz)
  • LPARs of 32 processors
  • IBM HPS (Federation) network

10
Time per step for 32 processors
Benchmark NAMD 2.4 NAMD 2.5 Comment
dhf reductase 23558 atoms 0.051s 0.032s Too small for 32 cpus
APO A-1 92224 atoms 0.28s 0.19s
TCR MHC 96796 atoms 0.30s 0.21s
F1-ATP 327506 atoms 0.58s
  • NAMD 2.5 substantially faster than NAMD 2.4

11
Large number of processors
12
Further Reading
  • Full technical report
  • The performance of NAMD on HPCx
  • Joachim Hein
  • www.hpcx.ac.uk/research/hpc/technical_reports/HPCx
    TR0310.pdf
Write a Comment
User Comments (0)
About PowerShow.com