Grape for analysis - PowerPoint PPT Presentation

1 / 8
About This Presentation
Title:

Grape for analysis

Description:

d) create the shell script to run on WN (wrapper of orca executable) 3) ... connection with PubDB and modification of shell scripts (Nikolai and Federica) ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 9
Provided by: pdI2
Category:
Tags: analysis | grape

less

Transcript and Presenter's Notes

Title: Grape for analysis


1
  • Grape for analysis
  • M.Corvo, F.Fanzago, N.Smirnov
  • INFN Padova

2
Goals
  • To show how we think to implement the real
    analysis job on Grape
  • Grape was already used to run some analysis job
    but it is necessary to add some functionalities
    (like data discovery according to PubDB and
    automatic retrieve of output...), evaluate
    architecture and test it.
  • Grape was developed to run production, but now
    we want to concentrate only on analysis tasks.

1
3
What the user should provide...
... as information written into grape.cfg file
a) The analysis input parameter dataset and
owner b) The number of events to analyze for
each job (job splitting) c) The name of ORCA
executable to run on WN d) The name of output
file produced by executable (root file) e) The
user orcarc card ... and ... e) GRAPE finds
the executable and the libraries into the user
SCRAM area, in order to pack them and include
into jdl InputSandbox f) GRAPE modifies orcarc
card according to job splitting and include into
jdl InputSandox
2
4
GRAPE workflow
1) Read grape.cfg file 2) Create scripts to
submit a) data discovery (quering PubDB)
b) packaging of user code c) modify orcarc
d) create the shell script to run on WN (wrapper
of orca executable) 3) Create jdl files 4)
Submit jobs to the Grid (without Boss as first
prototype) 5) Automatic job output retrieval
3
5
How GRAPE uses user information (1)
  • Data discovery
  • Query the CERN PubDB to discover where the data
    are stored (by RC name field).
  • Possibly more than one site.
  • Sites storing data will be written like
    requirement into jdl file so the Resource Broker
    is driven to match one of them like resources
    where to submit the analysis job.
  • The RB decides where to send job
  • With the same query get also local catalogs
    location (and access protocol) for all sites.
  • Local catalog
  • Information are sent with jobs via InputSandbox
    (catalogs_file)
  • On WN, use catalogs_file to get the correct POOL
    catalog, depending on the site, and put into the
    orcarc card.

4
6
How GRAPE uses user information (2)
Packaging of code and modify the card The name of
analysis executable is necessary to package the
code and related libraries into a tgz archive to
be sent with InputSandox. The environment
variable LOCALRT provides the path of the user
scram area. The orcarc provided by the user will
be modified by Grape according to job splitting
(will PubDB publish the total number of events
of dataset-owner??), that means to change the
FirstEvent, MaxEvents. Creates jdl to submit
to the grid The InputSandbox is filled with 1)
tzg archive of user code 2) orcarc card 3)
catalogs_file obtained from PubDB The
OutputSandbox is defined with 1) root output
file 2) std.out and std.err of grid job
5
7
How GRAPE uses user information (3)
Creates script to run on WN that 1) set the CMS
environment to run ORCA in LCG environment 2)
create scram area 3) unpack the user code into
the scram area 4) overwrite the
InputFileCatalogURL into the orcarc card with the
correct POOL file to use, selected from
catalogs_file according to the site where job is
running.Eventually copy local catalog if needed
(eg RFIO protocol) 5) run the executable 6)
rename output file accordling with job splitting
(mv MyHisto.root MyHisto_n.root) 7) the produced
output (root file) returns to the user via
OutputSandbox (not stage of output into a SE and
registration in RLS) Submit the job to the Grid
via edg-job-submit command, eventually with
BOSS. The monitoring is done via grid command
(edg-job-status). In the future we are thinking
to use BOSS or GridICE (with application
monitoring implementation). Retrieve of output A
wrapper script of edg-job-get-output command
that, when job is finished, retrieves
automatically the output and puts files into a
user predefined directory.
6
8
What done and what to do
The general architecture is already done Grape
was already used to run analysis on LCG
environment We are implementing - connection
with PubDB and modification of shell scripts
(Nikolai and Federica) - software packaging and
automatic output retrieve (Marco) - monitoring
to do... We think to have a running prototype
for the end of next week. We are happy if people
will try to use it and provide feedback !!!
7
Write a Comment
User Comments (0)
About PowerShow.com