Running jobs on SDSC Resources - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Running jobs on SDSC Resources

Description:

Krishna Muriki. Oct 31 , 2006. kmuriki_at_sdsc.edu. SDSC User Services. SAN DIEGO SUPERCOMPUTER CENTER ... P655 :: ( 8-way, 16GB) 176 nodes. P655 :: ( 8-way, 32GB) ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 22
Provided by: WaynePf3
Category:

less

Transcript and Presenter's Notes

Title: Running jobs on SDSC Resources


1
Running jobs on SDSC Resources
  • Krishna Muriki
  • Oct 31 , 2006
  • kmuriki_at_sdsc.edu
  • SDSC User Services

2
Agenda !!!
  • Using DataStar
  • Using IA64 cluster
  • Using HPSS resource.

3
DataStar Overview
  • P655 ( 8-way, 16GB) 176 nodes
  • P655 ( 8-way, 32GB) 96 nodes
  • P690 ( 32-way, 64GB) 2 nodes
  • P690 ( 32-way, 128GB) 4 nodes
  • P690 ( 32-way, 256GB) 2 nodes
  • Total 280 nodes 2,432 processors.

4
Batch/Interactive computing
  • Batch Job Queues
  • Job queue Manager Load Leveler (tool from
    IBM)
  • Job queue Scheduler Catalina (SDSC
    internal tool)
  • Job queue Monitoring Various tools (commands)
  • Jobs Accounting Job filter (SDSC internal
    PERL scripts)

5
DataStar Access
  • Three Login Nodes Access modes
  • (platforms) (usage mode)
  • dslogin.sdsc.edu Production runs
  • (P690, 32-way, 64GB)
  • dspoe.sdsc.edu Test/debug runs
  • (P655, 8-way, 16GB)
  • dsdirect.sdsc.edu Special needs
  • (P690, 32-way, 256GB)
  • Note Above Usage modes division is not very
    strict.

6
(No Transcript)
7
Test/debug runs (Usage from dspoe)
  • dspoe.sdsc.edu P655, 8-way, 16GB
  • Access to two queues
  • P655 nodes shared
  • P655 nodes Not shared
  • Job queues have Job filter Load Leveler only
    (very fast)
  • Special command line submission (along with job
    script).

8
Production runs (Usage from dslogin)
  • dslogin.sdsc.edu P690, 32-way, 64GB
  • Data transfer/ Src editing/Compliation etc
  • Two queues
  • Onto p655/p655 nodes not shared
  • Onto p690 nodes shared
  • Job ques have Job filter LoadLeveler Catalina
    (Slowupdates)

9
All Special needs (Usage from dsdirect)
  • dsdirect.sdsc.edu P690, 32-way, 256GB
  • All Visualization needs
  • All post data analysis needs
  • Shared node (with 256 GB of memory)
  • Process accounting in place
  • Total (a.out) interactive usage.
  • No Job filter, No Load Leveler, No Catalina

10
Suggested usage model
  • Start with dspoe (test/debug queues)
  • Do production runs from dslogin (normal
    normal32 queues)
  • Use express queues from dspoe to get it right
    now.
  • Use dsdirect for special needs.

11
Accounting
  • reslist u user_name
  • reslist a account_name

12
Now lets do it !
  • Example files are located here
  • /gpfs/projects/workshop/running_jobs
  • Copy the whole directory (tcsh)
  • Use Makefile to compile the source code.
  • Edit the parameters in the job submission
    scripts.
  • Communicate with job manager using his language.

13
Job Manager language
  • Ask him to show the queue llq
  • Ask him to submit your job to queue llsubmit
  • Ask him to cancel your job in the queue llcancel
  • Special (more useful commands from SDSCs
  • inhouse tool Catalina plz
    bare with me Im slow ? )
  • showq to look at the status of the queue.
  • show_bf to look at the backfill window
    opportunities

14
Access to HPSS - 1
  • What is HPSS
  • The centralized, long-term data storage
  • system at SDSC is the
  • High Performance Storage System (HPSS)
  • currently stores more than 3 PB of data (as of
    June 2006)
  • total system capacity of 7.2 PB of data.
  • Data added at an average rate of 100 TB per month
    (between Aug0 5 and Feb 06).

15
Access to HPSS - 2
  • First thing setup your authentication
  • run get_hpss_keytab script.
  • Know HPSS language to talk to it
  • hsi
  • htar

16
SDSC IA64 cluster
17
IA64 cluster overview
  • Around 265 nodes.
  • 2-way nodes
  • 4GB memory per node.
  • Batch job environment
  • Job Manager PBS (Open source tool)
  • Job Scheduler Catalina (SDSC internal
    tool)
  • Job Monitoring Various commands Clumon

18
IA64 Access
  • IA64 Login Nodes
  • tg-login1.sdsc.edu ( alias to tg-login.sdsc.edu
    )
  • tg-login2.sdsc.edu
  • tg-c127.sdsc.edu,tg-c128.sdsc.edu,
  • tg-c129.sdsc.edu tg-c130.sdsc.edu.

19
Queues Nodes.
  • Total around 260 nodes
  • With 2 processors each.
  • All in single batch queue dque
  • Thats sufficient now lets do it!
  • Example files in
  • /gpfs/projects/workshop/running_jobs
  • PBS commands qstat, qsub, qdel

20
Running Interactive
  • Interactive use is via PBS
  • qsub -I -V -l walltime003000 -l nodes4ppn2
  • This request is for 4 nodes for interactive use
    (using 2 cpus/node) for a maximum wall-clock time
    of 30 minutes. Once the scheduler can honor the
    request, PBS responds with ready and gives the
    node names.
  • Once nodes are assigned, user can now run any
    interactive command. For example, to run an MPI
    program, parallel-test on the 4 nodes, 8 cpus
    mpirun -np 8 -machinefile PBS_NODEFILE
    parallel-test

21
References
  • See all web links at
  • http//www.sdsc.edu/user_services
  • Reach us at consult_at_sdsc.edu
Write a Comment
User Comments (0)
About PowerShow.com