Title: Getting Started with HPC Platforms at U'Va' October, 2003 Presented by the ITC Research Computing Su
1 Getting Started with HPC Platforms at
U.Va. October, 2003Presented by theITC
Research Computing Support Group
Kathy Gerber, Ed Hall, Katherine Holcomb,
Tim F. Jost Tolson
- Using HPC Platforms at UVa Wednesday, November 5
at 330 PM - Introduction to Mathematica 5.0 Wednesday,
November 12 at 330 PM
2Getting Started With HPC Platforms at UVa
- Katherine Holcomb
- ITC Research Computing Support Group
- res-consult_at_virginia.edu
3 Topics
- Setting up Your Environment
- File Management in Unix
- Program Development
- High Performance Platforms
- Parallel Programming
4Setting Up Your Local EnvironmentUnder Windows
- SecureCRT, SecureFX for shell access
- Notetab Light for local editing
- X server to run X applications remotely
(optional) - Hummingbird eXceed -- , some support
- CygwinXFree86 free, no support
5Unix EditorsScreen-oriented Editors
- Vi (Vim on Linux) www.thomer.com/vi/vi.html
- Emacs
- www.lib.uchicago.edu/keith/tcl-course/emacs-tutori
al.html
6Unix Editors GUI
- nedit (requires X server)
- www.itc.virginia.edu/research/nedit.html
- Pico
- www.itd.umich.edu/itcsdocs/r1168/
- Jove www.itc.virginia.edu/desktop/unix/docs/u003.j
ove.html
7File Management in Unix
- Primarily command-line
- Inverted tree hierarchy
/ (Root)
/home
/bin
/usr
/uva
you
dir1
dir2
dir3
8Listing Files
- . represents current directory
- .. represents directory above in tree
- ls -- list files
- ls a -- lists invisible files
- ls l -- lists with more information
9Moving and Copying Files
- mv move a file
- mv file1 file2
- cp copy a file
- cp file1 file2
- cp dir2/file1 dir3/file
- cp R dir1 dir2/newdir
10Deleting Files
- rm file1 file2
- Warning! This command does not automatically ask
whether you want to delete the file(s)! - rm i file1
- will ask for confirmation
- rm file.txt -- wildcard
11More Detailed Unix Information
- http//www.itc.virginia.edu/research/unixbasics.ht
ml - http//www.itc.virginia.edu/research/unixtools.htm
l - http//incres.anu.edu.au/manuals/korn.html
- http//www.kitebird.com/csh-tcsh-book
12Program Development
- Compilation
- Makefiles
- Debugging
- Checkpointing
13Compiling Your Program
- Example using PGI Fortran
- pgf90 O mycode.f90
- Produces an executable named a.out
- pgf90 g mycode.f90
- Adds symbols for debugger
- pgf90 -O o myexec mycode.f90
- Executable is named myexec
14Compiling with Separate Linking
- pgf90 c O mycode.f90
- pgf90 c O mysub.f90
- To link
- pgf90 o myexec mycode.o mysub.o
15Using Libraries
- pgf90 O I/usr/local/somecode/inc c mycode.f90
- pgf90 O c mysub.f90
- pgf90 L/usr/local/somecode/lib o myexec
mycode.o mysub.o -lsomelib - In this example, the library linked must be of
the form /usr/local/somecode/lib/libsomelib.a
16Available Compilers
- ITC Supported Unix Compilers
- www.itc.virginia.edu/research/compilers.html
- ITC Supported Linux Compilers
- www.itc.virginia.edu/research/pgi/
- www.itc.virginia.edu/research/intel/
17Makefiles
- Automates compilation of programs
- Shortens compilation by keeping track of what has
been changed/needs recompiling - Simplifies inclusion of multiple compiler
- flags, e.g. for debugging and optimization
- www.itc.virginia.edu/research/make.html
18Example Makefile
- prog main.o sub1.o sub2.o
- ifc -o prog main.o sub1.o sub2.o
-
- main.o main.f90 defs.h
- ifc -c -O main.f90
-
- sub1.o sub1.f90
- ifc -c -O sub1.f90
- sub2.o sub2.f90
- ifc -c -O sub2.f90
19Using make
- make target
- Rebuilds the target and all its dependencies
- make
- Builds the first target encountered
- make f makefile1
20Debugging Software
- Must compile with debug flag (usually g)
disables optimization - dbx or similar is found on most Unix systems
- www.itc.virginia.edu/research/debug.html
- gdb - is the GNU version of dbx for gcc, g and
g77 - TotalView for serial and parallel programs
- www.itc.virginia.edu/research/totalview/
21Checkpointing
- Prevent losing computation results due to
premature program termination resulting from
machine crash or cpu time limits. - Periodically save program state (variables) to a
file which could later be read into the program
so that computation can proceed from that state. - Allows monitoring progress of running program
- Especially useful for parallel programs
22HPC Hardware
- Single processor
- Shared memory (SMP)
- Distributed memory
23Single Processor System
24Shared Memory System
25Distributed Memory System
Main Memory
Main Memory
Main Memory
Cache
Cache
Cache
CPU
CPU
CPU
High-Speed Interconnect
26ITC HPC Resources
- IBM SMP 4 Power3 nodes with 12GB total memory.
- http//www.itc.virginia.edu/research/ibm/smp/
- Unixlab Cluster 54 SGI and Sun workstations.
- http//www.itc.virginia.edu/research/unixlab-acc
ount.html
27ITC Linux ClustersAspen
- Consists of 48 dual processor AMD K7 Athlon MP
1800, 1.533 Ghz . - Each node has 1 GB RAM
- Gigabit Ethernet interconnect
- www.itc.virginia.edu/research/linux-clusters/aspen
28ITC Linux ClustersBirch
- Consists of 32 dual processor Intel Xeon Pentium
4, 2.4 Ghz . - Each node has 2 GB RAM
- Gigabit Ethernet interconnect
- Low-latency Myrinet interconnect
- www.itc.virginia.edu/research/linux-clusters/birch
29Portable Batch System
- Queueing system resource manager
- Job scripts are prepared on the frontend and
submitted to the queue manager with the command
qsub
30Sample PBS Script
- !/bin/sh
- PBS -l nodes1ppn1
- PBS -l walltime120000
- PBS -o output_filename
- PBS -j oe
- PBS -m bea
- PBS -M userid_at_virginia.edu
- cd PBS_O_WORKDIR
- ./myexec args
31Submitting to PBS
- Script can be called by any name convention is
to end with .sh or .sub - script.sh
- Submit with
- gtlc0 qsub script.sh
- Check status
- gtlc0 qstat -a
32Using Local Storage
- Aspen and Birch have /bigtmp partitions on their
frontends - Each node has local storage
- PBS assigns a local directory to each job
- /state/partition1/pbstmp.ltjobidgt
- At the end of the job, PBS copies all files in
/state/partition1/pbstmp.ltjobidgt to
/bigtmp/pbstmp.ltjobidgt
33PBS Script Using Local Disk
- !/bin/sh
- PBS -l nodes1ppn1
- PBS -l walltime120000
- PBS -o output_filename
- LS/state/partition1/pbstmp.PBS_JOBID
- cd LS
- cp myexec .
- cp input .
- ./myexec input
34PBS Script Using Local DiskAnother Method
- !/bin/sh
- PBS -l nodes1ppn1
- PBS -l walltime120000
- PBS -o output_filename
- LS/state/partition1/pbstmp.PBS_JOBID
- cp myexec LS
- cp input LS
- cd LS
- ./myexec input
35Parallel Programming
- Simultaneous use of multiple compute resources..
- Saves wall-clock time, solves bigger problems
- www.llnl.gov/computing/tutorials/workshops/worksho
p/parallel_comp/
36MPI
- MPI is a library of functions that can be called
by a users code to pass information between
processes. - The MPI library consists of over 200 functions
in general only a small subset of these are used
in any code. - MPI can be used with Fortran, C and C.
37PBS Script for MPI Job
- !/bin/sh
- PBS -l nodes4ppn2
- PBS -l walltime120000
- PBS -o output_filename
- PBS -j oe
- PBS -m abe
- PBS -M userid_at_virginia.edu
- cd PBS_O_WORKDIR
- source /opt/Modules/default/init/sh
- module add mpich-eth-intel
- mpiexec comm mpich-p4 pmyexec
38 Upcoming Talk
- Introduction to Mathematica 5.0 Wednesday,
November 12 at 330 PM - Talks are online at
- www.itc.virginia.edu/research/talks