Computing Workshop for Users of NCAR’s SCD machines - PowerPoint PPT Presentation

1 / 46

About This Presentation

Title:

Computing Workshop for Users of NCAR’s SCD machines

Description:

Computing Workshop for Users of NCAR s SCD machines Christiane Jablonowski (cjablono_at_ucar.edu) NCAR ASP/SCD 31 January 2006 ML Mesa Lab, Chapman Room – PowerPoint PPT presentation

Number of Views:63

Avg rating:3.0/5.0

Slides: 47

Provided by: aspUcarE

Learn more at: https://edec.ucar.edu

Category:

more less

Transcript and Presenter's Notes

Title: Computing Workshop for Users of NCAR’s SCD machines

1
Computing Workshop for Users of NCARs SCD
machines

Christiane Jablonowski (cjablono_at_ucar.edu) NCAR
ASP/SCD
31 January 2006

ML Mesa Lab, Chapman Room video conference
facilities FL EOL Atrium and CG1 3150
2
Overview

Current machine architectures at NCAR (SCD)
Some basics on parallel computing
Batch queuing systems at NCAR
GAU resources how to obtain a GAU account
Insights into GAU charges
The Mass Storage System
How to monitor the GAUs
Some practical tips on benchmarks, debugging
tools, restarts
???

3
Computer architectures

SCDs machines are UNIX-based parallel computing
architectures
Two types
Hybrid (shared and distributed memory) machines
like
bluesky (IBM Power4) bluevista (IBM
Power5)lightning (AMD Opteron system running
Linux)
Shared memory system liketempest (SGI, 128
CPUs), predominantly used for post-processing jobs

4
Parallel Programming

Parallel machines require parallel programming
techniques in the user application
MPI (Message Passing Interface) for distributed
memory systems, can also be used on shared memory
systems
OpenMP for shared memory systems
Hybrid (MPI OpenMP) programming technique
common on the IBMs at NCAR
Pure MPI parallelization often the fastest
option, computational domain is split into pieces
that can communicate over the network (via
messages)
OpenMP Parallelization of (mostly) loops via
compiler directives
Parallelization provided in CAM/CCSM/WRF

5
Most common Hybrid hardware architectures

Combined shared and distributed memory
architecture
Shared-memory symmetric multiprocessor (SMP)
nodes,processors on a node have direct access to
memory
Nodes are connected via the network (distributed
memory)

6
MPI example
Processors communicate via messages
7
MPI Example

Initialize finalize MPI in your program via
function/subroutine calls to the MPI library.
Examples includeMPI_Init, MPI_Comm_rank,
MPI_Comm_size, MPI_Finalize

Example fromprevious pagein C
notation(unoptimized)
Important to note such an operation (computing a
global sum) is very common, therefore MPI
provides a highly optimized function, also called
a reduction operation MPI_Reduce () that can
replace the example above
8
Example domain decompositions for MPI
Each color presentsa processor
9
OpenMP Example

Parallel loops via compiler directives (here in
Fortran notation)
Before program is called set
setenv OMP_NUM_THREADS proc
Add compiler directives in your code
!OMP PARALLEL DO
DO i 1, n
a(i) b(i) c(i)
END DO
!OMP END PARALLEL DO

master thread
team
master thread
Assume n1000 proc4 The loop will be split
into 4 threads that run in parallel with loop
indices 1250, 251500, 501750, 7511000
10
SCDs machines

Bluesky (web page)
Oldest machine at NCAR (2002)
Lots of user experience at NCAR, easy access to
help
CAM/CCSM/WRF are set up for this architecture
(Makefiles)
Batch queuing system LoadLeveler, short
interactive runs possible
Batch queues are listed under http//www.cisl.uca
r.edu/computers/bluesky/queue.charge.html
Lots of additional software available e.g. math
libraries, graphics packages, Totalview debugger

11
SCDs machines

Bluevista (web page)
Newest machine on the floor (Jan. 2006)
CAM/CCSM/WRF are (probably) set up for this
architecture
Batch queuing system LSF (Load Sharing Facility)
Queue names different from bluesky premium,
regular, economy, standby, debug,
sharehttp//www.cisl.ucar.edu/computers/bluevista
/queue.charge.html
Some additional software available e.g. math
libraries, Totalview debugger

12
SCDs machines

Lightning (web page)
Linux cluster
Compilers different from the IBMsPortland Group
or Pathscale
Batch queuing system LSF
Same queue names as bluevista
Some support software
Tempest (web page)
for data post-processing with yet another batch
queuing system NQS
Lots of support software
Interactive use possible

13
Example of a LoadLeveler job script
Parallel job with 32 MPI processes, com_reg32
queue (32-way node)

_at_ class com_rg32
_at_ node 1
_at_ tasks_per_node 32
_at_ output out.(jobid)
_at_ error out.(jobid)
_at_ job_type parallel
_at_ wall_clock_limit 002000
_at_ network.MPI csss,not_shared,us
_at_ node_usage not_shared
_at_ account_no 54042108
_at_ ja_report yes
_at_ queue
setenv OMP_NUM_THREADS 1

regular queue
32 MPI processesper 32-way node
Submit the job via llsubmit job_script
14
Example of a LoadLeveler job script
Hybrid parallel job with 8 MPI processes and 4
OpenMP threads

_at_ class com_ec32
_at_ node 1
_at_ tasks_per_node 8
_at_ output out.(jobid)
_at_ error out.(jobid)
_at_ job_type parallel
_at_ wall_clock_limit 002000
_at_ network.MPI csss,not_shared,us
_at_ node_usage not_shared
_at_ account_no 54042108
_at_ ja_report yes
_at_ queue
setenv OMP_NUM_THREADS 4

economy queue
8 MPI processesper 32-way node
Submit the job via llsubmit job_script
15
Example of an LSF job script (lightning)
Parallel job with 8 MPI processes (on 4 2-way
nodes)

! /bin/csh
BSUB -a 'mpich_gm'
BSUB -P 54042108
BSUB -q regular
BSUB -W 0030
BSUB -x
BSUB -n 8
BSUB -R "spanptile2"
BSUB -o fvcore_amr.out.J
BSUB -e fvcore_amr.err.J
BSUB -J test0.path
mpirun.lsf -v ./dycore

select on lightning
regular queue
wallclock limit 30 min
8 MPI processes (total)
2 MPI processes per node
name of the job (listedin the SCD Portal)
Submit the job via bsub lt job_script
16
Example of an LSF job script (bluevista)
Parallel job with 8 MPI processes (on 1 8-way
node)

! /bin/csh
BSUB -a poe
BSUB -P 54042108
BSUB -q economy
BSUB -W 0030
BSUB -x
BSUB -n 8
BSUB -R "spanptile8"
BSUB -o fvcore_amr.out.J
BSUB -e fvcore_amr.err.J
BSUB -J test0.path
mpirun.lsf -v ./dycore

select poe on bluevista
economy queue
exclusive use (not shared)
Allows up to 8 MPI processes on a node
Submit the job via bsub lt job_script
17
More information on SCDs machines

Web page SCDs Support and Consulting services
SCDs costomer support sometimes you even get
help on the weekends or in the evenings
Email consult1_at_ucar.edu
Phone 303 497 1278
Walk-in support at the Mesa Lab
Check out SCDs Daily Bulletin (scheduled machine
downtimes, etc.)
Subscribe to the hpcstatus mailing list (short
e-mails about machine status, system updates)

18
GAU resources

ASP has a monthly allocation of 3850 GAUs
(General Accounting Units)
A GAU is a measure for some compute time on the
supercomputers maintained by NCARs Scientific
Computing Division (SCD)http//www.cisl.ucar.edu
/
Access to these machines require
an SCD login account (dbs_at_ucar.edu or
303-497-1225)
a GAU account (for ASP contact Maura, otherwise
contact your division / apply for a university
account)
ssh environment
and a crypto card (for secure access)
SCD contacts Dick Valent Mike Page (here
today), Juli Rew, Siddhartha Gosh, Ginger
Caldwell (GAUs)

19
GAU resources

GAUs Use it or lose it - strategy
In ASP We share the resource among the ASP
postdocs graduate fellows
Distribution is flexible and will be discussed
occasionally, e.g. monthly, either via meetings
or e-mail discussions email asp-gau-users_at_asp.u
car.edu
GAUs are also charged for
storing files in the Mass Storage System (MSS)
file transfers from MSS to other machines

20
ASP GAU account

ASP GAU account number 54042108 (also project
number)
Needs to be specified in the batch job scripts
ASP account number is not your default account
number
Therefore everybody needs a second (default) GAU
account
divisional GAU account
so-called University account (small request form
for 1500 GAUs http//www.cisl.ucar.edu/resources/c
ompServ.html)these GAUs do not expire every
month, one-time allocation
Second GAU account should be used for the
accumulating MSS charges
automatic when using CAM / CCSMs MSS option

21
GAU charges on SCDs supercomputers

You are charged GAUs for how much time you use a
processor (on bluesky, bluevista, lightning,
tempest)
On bluesky, there are actually two formulas
Shared-node usageGAUs charged CPU hours used
? computer factor ? class charging
factor
Dedicated-node usageGAUs charged wallclock
hours used ? number of nodes used ?
number of processors in that node ?
computer factor ? class charging factor

Slides on GAU charges Modified from an earlier
presentation by George Bryan, NCAR MMM
22
Number of nodes used andNumber of processors
in that node

Self explanatory (?)
Bluesky
76 8-way (processors) nodes
25 32-way (processors) nodes
Bluevista
78 8-way (processors) nodes
Lightning
128 2-way (processors) nodes

23
CPU hours used and Wallclock hours used

Measure of how long you used a processor
NOTE This includes all time you were allocated
the use of a processor, whether you actually used
it or not
Example you used two 8-processor nodes on
bluesky. The job started at 100 PM and finished
at 230 PM. You are charged for 1.5 hrs

24
Computer factor

A measure of how powerful a computer is
Bluesky 0.24
Bluevista 0.5
Lightning 0.34
This levels the playing field

25
Class charging factor

Tied to queuing system How quickly do you want
your results, and how much are you willing to pay
for it?
Current setting on all SCD supercomputers
Premium 1.5 (highest priority, fastest
turnaround)
Regular 1.0
Economy 0.5
Standby 0.1 (lowest priority, slow turnaround)

26
Example

Recall dedicated-node usage on bluesky
GAUs charged wallclock hours used ? number of
nodes used ? number of processors in that node ?
computer factor ? class charging factor
1.5 hours using two 8-processor nodes
Bluesky regular queue
GAUs used 1.5 ? 2 ? 8 ? 0.24 ? 1.0 5.76
GAUs
In premium queue, this would be 8.64 GAUs
In standby queue, this would be 0.576 GAUs

27
Recommendations Queuing systems

Check the queue before you submit any job
If the queue is not busy, try using the standby
or economy queues
The queue tends to be emptier evenings,
weekends, and holidays
Job will start sooner when specifying a wallclock
limit in the job script (scheduler tries to
squeeze in short jobs)
The less processors you request, the sooner you
start
Use the premium queue sparingly
Short debug jobs (there is also a special debug
queue on lightning)
When that conference paper is due

28
Recommendations of processors vs. run times

If you are using more processors, you might wait
longer in the queue, but usually the actual
runtime of your job is reduced
Caveat it usually costs more GAUs
Example you run the same job, but using
Using 8 processors, the job ran in 24 hours
Using 64 processors, the job ran in 4 hours
1st example used 46 GAUs
2nd example used 61 GAUs

29
The Mass Storage System

MSS Mass storage system (disks and cartridges)
for your big data sets
MSS connected to the SCD machines, sometimes also
to divisional computers
MSS user have directories like mss/LOGIN_NAME/
Quick online reference (mss commands)http//www.
cisl.ucar.edu/docs/mss/mss-commandlist.html
You are charged GAUs for using the MSS
The GAU equation for MSS is more complicated ....

30
MSS Charges

GAUs charged .0837 ? R .0012 ? A N ?
(.1195 ? W .2050 ? S)
where
R Gigabytes read
W Gigabytes created or written
A Number of disk drive or tape cartridge
accesses
S Data stored, in gigabyte-years
N Number of copies of file 1 if economy
reliability selected 2 if standard reliability
selected

31
Recommendations The MSS

MSS charges seem small, but they add up!
Examples FY04 MSS usage
ACD 24,000 of 60,000 GAUs
CGD 94,500 of 181,000 GAUs
HAO 22,000 of 122,000 GAUs
MMM 34,000 of 139,000 GAUs
RAP 32,000 of 35,000 GAUs

32
Recommendations The MSS

Recommendation for ASP users
use an account in your home division or your
so-called university account (1500 GAUs for
postdocs, you need to apply) for MSS charges
leave ASP GAUs for supercomputing

33
GAU Usage Strategy 30-day and 90-day averages

The allocation actually works through 30-day and
90-day averages
Limits 120 for 30-day use 105 for 90-day use
It is helpful to spread usage out evenly
How to check GAU usage
Type charges on command line of a supercomputer
Check the daily summary output (next page)
SCD Portal look for the link on SCDs main page
http//www.cisl.ucar.edu/

34
Web page http//www.cisl.ucar.edu/dbsg/dbs/ASP/ A
SP 30 Day Percent 57.0 ASP 90 Day
Percent 48.3 30 Day Allocation 3850
90 Day Allocation 1155030 Day Use
2193 90 Day Use 5575 90 DAY ST
-- 30 DAY ST -- LAST DAY
01-NOV-05 31-DEC-05 29-JAN-06 ASP
Gaus Used by Day 01-NOV-05 9.3603-NOV-05
.03 04-NOV-05 141.45
22-JAN-06 .04 23-JAN-06
44.29 24-JAN-06 170.83
25-JAN-06 120.30 26-JAN-06
91.67 27-JAN-06 41.97
28-JAN-06 15.59 29-JAN-06
16.95
35
What happens when we use too many GAUs?

Your jobs will be thrown into a very low
priority the dreaded hold queue
It will be hard to get work done
But, jobs will still run
ASP Users You can use more than 3850 GAUs /
month
Experience says, its better to use too many than
not enough

36
What happens when we use too many/too few GAUs?

Too many
Recommendation when the 30- and 90-day averages
are running high, use the economy or standby
queue ... conserve GAUs
But, dont worry about going overToo few
ASPs allocation will be cut in the long run if
the 3850 GAUs per month allocation is not used

37
How to catch up when behind

Be wasteful
Use the premium queue
Use more processors than you need
Have fun
Try something you always wanted to do, but never
had the resources

38
How to conserve GAUs

Be frugal
Use the economy and standby queues
Use fewer processors
Use divisional GAUs (if possible) or your
university GAU account

39
How to share monitor GAUs in ASP

Communicate!
Occasionally, we (ASP postdocs) use the e-mail
listasp-gau-users_at_asp.ucar.edu
to announce a busy GAU period
Keep watching the ASP GAU usage on the webpage
http//www.cisl.ucar.edu/dbsg/dbs/ASP/ or in the
SCD Portal
Look for the SCD Portal link on the SCD
pagehttp//www.cisl.ucar.edu/

40
SCD Portal

Online tool that helps you monitor the GAU
charges and the current machine status (e.g.
batch queues), display can be customized
Information on the machine status requires a
setup-command on roy.scd.ucar.edu via the
crypto-card access, just enter scdportalkey
hostname (e.g. lightning) after logging on with
the crypto-card
At this time (Jan/31/2006) the GAU charges on
bluevista are not itemized will be included in
the next release in Spring 2006

41
Other IBM resources

Sources of information on the IBM machines
bluesky (from the command line), batchview also
works on bluevista lightning
batchview for overview of jobs with their
rankings
llq for list of all submitted jobs, no ranking
spinfo queue limits, memory quotas on home file
system and the temporary file system /ptmp
Useful IBM LoadLeveler keywords in the
script_at_account_no54042108 -gt ASP account
_at_ja_reportyes -gt job report (see
example on the next page)
Useful LoadLeveler commands llsubmit
script_file, llcancel job_id

42
Example IBM Job Report

If selected, one email per job is sent to you at
midnight, Output on the IBM machines, here
blackforest (meanwhile decommisioned)
Job Accounting - Summary Report
Operating System
blackforest AIX51 User Name (ID) cjablono
(7568) Group Name (ID) ncar (100) Account
Name 54042108 Job Name
bf0913en.26921 Job Sequence Number
bf0913en.26921 Job Starts 12/20/04 175633
Job Ends 12/20/04 232634 Elapsed Time
(Wall-Clock CPU) 633632 s Number of Nodes
(not_shared) 8 Number of CPUs 32 Number of
Steps 1

43
IBM Job Report (continued)

Charge Components Wall-clock Time 53001
Wall-clock CPU hours 176.00889 hrs Multiplier
for com_ec Queue 0.50 Charge before Computer
Factor 88.00444 GAUs Multiplier for computer
blackforest 0.10 Charged against Allocation
8.80044 GAUs Project GAUs Allocated 5000.00
GAUs Project GAUs Used, as of 12/16/041889.20
GAUs Division GAUs 30-Day Average 103.3
Division GAUs 90-Day Average 58.6

44
How to increase the efficiency

Get a feel for the GAUs for long jobs benchmark
the application on target machine
Run a short but relevant test problem and measure
the run time (wall clock time) via MPI commands
(function MPI_WTIME) or UNIX timing commands like
time or timex (output formats are shell-script
dependent)
Vary number of processors to assess the scaling
If application scales poorly, avoid using a large
number of processors (waste of GAUs), instead use
smaller number with numerous restarts
Make sure your job fits into the queue (finishes
before the max. time is up)
Use compiler options, especially the optimization
options
In case of programming problems the Totalview
debugger can save you days, weeks or even
monthson the IBMs compile your program with
the compiler options-g -qfullpath -d

45
Restarts

Restart files are important for long simulations
Queue limits are up to 6 wallclock hours (hard
limit, job fails afterwards), then a restart
becomes necessary
Get information on the queue limits (SCD web
page) and select the jobs integration time
accordingly
Restarts built into CAM/CCSM/WRF, must only be
activated
Restarts for other user applications must
probably be programmed

46
Questions ?

Write a Comment

User Comments (0)