Computing on Demand - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Computing on Demand

Description:

COD only available to pre-defined list of privileged users. ... Hostname is FQDN (Fully Qualified Domain Name). Ports (optional parameter) are as follows: ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 13
Provided by: awc5
Category:

less

Transcript and Presenter's Notes

Title: Computing on Demand


1
Computing on Demand
  • February 5, 2004

2
Status of Condor at RACF
  • Condor 6.6.0 installed on all 1097 Linux Farm
    nodes in RACF.
  • Standard batch policy in place (fair-share, job
    suspension, etc).
  • Multiple queues for CAS CRS, sub-divided by
    experiments.

3
Changes Since Initial Install
  • Graphical package (condorview) installed and
    running. Needs external Web server access.
  • Strong authentication for Condor in progress
    (Kerberos and GLOBUS certificates).
  • Fail-over feature to be available soon?
  • Computing on Demand (COD) feature enabled.

4
Computing on Demand
  • COD allows users to submit high-priority jobs
    with interactive-like response.
  • COD jobs do not follow standard Condor batch
    policies (jobs run immediately after submission).
  • COD jobs suspend standard Condor LSF jobs.
  • COD job description file (jdf) is different from
    standard Condor jdfs.

5
Computing on Demand (cont.)
  • COD only available to pre-defined list of
    privileged users.
  • Currently only the reco accounts are in the
    list.
  • Request for additional users with COD privilege
    must originate from RACF Liaison.

6
Computing on Demand (cont.)
  • Privileged users must first request COD resources
    (see below)
  • gt condor_cod request -name lthostnamegt -pool
    condor01.rcf.bnl.govltportgt -classad cod.id
  • gt Successfully sent CA_REQUEST_CLAIM to startd at
    lt130.199.206.5132787gt
  • gt Result ClassAd written to cod.id
  • gt ID of new claim is lt130.199.206.5132787gt1071
    8567591

7
Computing on Demand (cont.)
  • Hostname is FQDN (Fully Qualified Domain Name).
  • Ports (optional parameter) are as follows

  • 9660 ? ATLAS

  • 9661 ? BRAHMS

  • 9662 ? PHENIX

  • 9663 ? PHOBOS

  • 9664 ? STAR

  • 9665 ? RCF

8
Computing on Demand (cont.)
  • Next, privileged user must activate his/her COD
    claim (which runs the COD job contained in
    cod.job)
  • gt condor_cod activate -id lt130.199.206.5132787gt
    10718567591 -jobad cod.job
  • gt Successfully sent CA_ACTIVATE_CLAIM to startd
    at lt130.199.206.5132787gt

9
Computing on Demand (cont.)
  • Users can check on progress of their COD jobs
    with
  • gt condor_status -pool condor01.rcf.bnl.govltport
    gt -cod
  • Name ID ClaimState TimeInState
    RemoteUser JobId Keyword
  • gt vm2_at_rcas6001 COD1 Running 0000646
    starreco 1.0
  • Total Idle Running Suspended
    Vacating Killing
  • gt INTEL/LINUX 1 0 1 0 0 0
  • gt TOTAL 1 0 1 0 0 0

10
Computing on Demand (cont.)
  • Privileged users can release COD resource upon
    completion of their COD jobs
  • gt condor_cod release -pool condor01.rcf.bnl.gov
    ltportgt -id lt130.199.206.5132787gt10718567591
  • gt Successfully sent CA_RELEASE_CLAIM to startd
    at lt130.199.206.5132787gt
  • gt Status of claim when it was released Idle

11
Computing on Demand (cont.)
  • Examples of COD jdfs can be found in
  • /usatlas/u/atlareco/condor/cod.job
  • /brahms/u/bramreco/condor/cod.job
  • /phenix/u/phnxreco/condor/cod.job
  • /phobos/u/phobreco/condor/cod.job
  • /star/u/starreco/condor/cod.job

12
Whats next?
  • Plan is to implement COD this week on fastest CAS
    servers within each experiment.
  • Maybe also implement on CRS servers for new CRS
    Batch software?
  • Attend Condor Week (April 15-16, 2004) in Madison
    to learn how to improve our Condor pool. We have
    many questions
Write a Comment
User Comments (0)
About PowerShow.com