First ideas for a Resource Management Architecture for Productions - PowerPoint PPT Presentation

About This Presentation
Title:

First ideas for a Resource Management Architecture for Productions

Description:

Usage examples %globusrun b r lxpd.pd.infn.it/jobmanager-lsf f file.rsl. file.rsl: ... Usage examples %condor_submit file.cnd. file.cnd: Universe=globus ... – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 19
Provided by: massimosg
Category:

less

Transcript and Presenter's Notes

Title: First ideas for a Resource Management Architecture for Productions


1
First ideas for a Resource Management
Architecture for Productions
  • Massimo Sgaravatto
  • INFN Padova

2
First step
Submit jobs (using globusrun)
GRAM
GRAM
GRAM
CONDOR
LSF
PBS
Site1
Site2
Site3
3
Overview
  • GRAM as uniform interface to different resource
    management systems
  • Job submission from a single location
  • Users must explicitly specify in which Globus
    resources (Condor pool, LSF cluster, ) the jobs
    must be executed
  • Usage of Globus tools (globusrun,
    globus-job-status, ) to manage the jobs
  • Are these robust tools with all the required
    capabilities ???

4
Usage examples
  • globusrun b r lxpd.pd.infn.it/jobmanager-lsf
    f file.rsl
  • file.rsl
  • (executable(CMS)/startcmsim.sh)
  • (stdin(CMS)/Pythia/run.1)
  • (stdout(CMS)/Cmsim/log.1)
  • (count1)
  • (queuecmsprod)
  • globusrun b r lxbo.bo.infn.it/jobmanager-condor
    f file.rsl
  • file.rsl
  • (executable(CMS)/startcmsim.sh)
  • (stdin(CMS)/Pythia/run.1)
  • (stdout(CMS)/Cmsim/log.1)
  • (count1)

5
What has been tested so far
  • http//www.pd.infn.it/sgaravat/
  • INFN-GRID/Globus/gram-report.pdf
  • Tests only with simple programs (just to evaluate
    the capabilities and functionalities)
  • No tests with real applications
  • No stress tests (to evaluate reliability,
    robustness, )
  • GRAM LSF tested
  • Seems working

6
What has been tested so far
  • GRAM Condor tested
  • GRAM assumes that the underlying environment is a
    uniform Condor pool (in particular for Vanilla
    jobs)
  • Difficult to consider the INFN WAN Condor pool as
    Globus resource
  • Usage of local uniform Condor pools ???
  • GRAM PBS not tested

7
Second step
Submit jobs (using condor_submit and Globus
Universe)
Personal Condor
globusrun
GRAM
GRAM
GRAM
CONDOR
LSF
PBS
Site1
Site2
Site3
8
Overview
  • Personal Condor able to provide robustness and
    reliability
  • Job submission from a single location
  • Users still must explicitly specify in which
    Globus resources the jobs must be executed
  • Usage of Condor interface and tools
    (condor_submit, condor_q, ) to manage the jobs
  • Robust tools with all the required capabilities
    (monitor, logging, )

9
Usage examples
  • condor_submit file.cnd
  • file.cnd
  • Universeglobus
  • executable(CMS)/startcmsim.sh
  • input(CMS)/Pythia/run.1
  • output(CMS)/Cmsim/log.1
  • GlobusSchedulerlxpd.pd.infn.it/jobmanager-lsf
  • queue 1
  • condor_submit file.cnd
  • file.cnd
  • Universeglobus
  • executable(CMS)/startcmsim.sh
  • input(CMS)/Pythia/run.1
  • output(CMS)/Cmsim/log.1
  • GlobusSchedulerlxbo.bo.infn.it/jobmanager-condor
  • queue 1

10
Second step (option 2)
Submit jobs (using condor_submit and Globus
Universe)
Personal Condor
Condor Flocking
globusrun
condor_submit
GRAM
GRAM
CONDOR
LSF
PBS
Site1
Site2
Site3
11
Second step (option 3)
Submit jobs (using condor_submit and Globus
Universe)
Personal Condor
globusrun
condor_submit
GRAM
GRAM
CONDOR
LSF
PBS
Site1
Site2
Site3
Single Condor Pool
12
Problems
  • The Globus Universe architecture is only a
    prototype
  • Only best effort support by Condor team
  • Tests not completed
  • Ongoing tests (considering the fork system call
    as underlying resource management system)
  • Tests considering the Globus Universe and LSF or
    Condor as underlying resource management system
    have not yet been performed
  • PBS
  • Is it supported by the Globus Universe mechanisms
    ???
  • Do we need it ??

13
Third step
Resource Discovery
Master
GIS
Submit jobs
condor_submit (Globus Universe)
Information on characteristics and status of
local resources
Personal Condor
globusrun
GRAM
GRAM
GRAM
CONDOR
LSF
PBS
Site1
Site2
Site3
14
Overview
  • Master smart enough to decide in which Globus
    resources the jobs must be submitted
  • The Master uses the information on
    characteristics and status of resources published
    in the GIS

15
Problems and work needed
  • The Master doesnt exist
  • ? We have to implement it
  • It is necessary to define the GIS architecture
  • The local GRAMs provide the GIS with not enough
    information
  • ?The default schema must be integrated

16
GRAM Condor GIS
17
GRAM LSF GIS
18
Fourth step
Information on characteristics and status of
local resources
Write a Comment
User Comments (0)
About PowerShow.com