Title: Use of the European Data Grid software in the framework of the BaBar distributed computing model
1Use of the European Data Grid software in the
framework of the BaBar distributed computing
model
- T. Adye (1), R. Barlow (2), B. Bense (3), D.
Boutigny (4), D. Colling (5) , B. Cowles (3), A.
Forti (2), D. Smith (6), G. Grosdidier (7), A.
Hasan (3), J. Martyniak (5), A.McNab (2), R.
Walker (5)
On behalf of the BaBar computing group
(1) Rutherford Appleton Laboratory (2)
University of Manchester - (3) Stanford Linear
Accelerator Center (4) Laboratoire d'Annecy le
Vieux de Physique des Particules CNRS / IN2P3
- (5) University of London, Imperial College -
(6) University of Birmingham (6) Laboratoire de
l'Accélérateur Linéaire CNRS / IN2P3
2Motivations for BaBar-Grid BaBar Specificities
(1)
- Distributed computing is one of the main axis of
the BaBar computing model - Tier A Main computing centers - Hold all or a
large fraction of the data - Currently SLAC, IN2P3, RAL and FZK/GridKa
- INFN Padova is specialized in data reprocessing.
Will probably turn to an analysis Tier A later - INFN Ferrara (with SLAC) looking to MC production
on the GRID - Tier B Does not really exist
- Tier C Smaller centers, have only small chunks
of data or n-tuples
3Motivations for BaBar-Grid BaBar Specificities
(2)
- Special Configuration in the UK
- Large center at RAL
- Several smaller centers with significant
computing and data storage resources - Main motivation for BaBar-Grid
- Need a simple and reliable tool for remote job
submission - Data may be spread between several sites
- Need a Metadata Catalog and a tool to
automatically split and submit the jobs to
centers holding the data - BaBar is taking data, the introduction of Grid
tools should not disrupt physics production
4Short Term Goals for BaBarGrid developments
- Setup a Grid system able to submit analysis jobs
in major Tier-A centers - Proof of concept
- Demonstrate usage in real analysis applications
- Test various Grid implementations and
inter-operability (EDG, LCG-1, VDT,) - Have to handle 2 data formats Objectivity and
Root - Data Distribution ? "Distributing BaBar Data
using the Storage Resource Broker (SRB) " W.
Kroeger (previous talk) - BdbServer A user-driven data location and
retrieval tool (Poster) - Metadata catalog and automatic job splitting ?
"BaBar WEB job submission with Globus
authentication and AFS access" A. Forti
5The BaBar Grid as of March 2003
CE SE WN
VO RC
CE SE WN
CE SE WN
RB
CE SE WN
CE SE WN
6European Data Grid (EDG) Setup
- BaBar benefits from the EDG test bed
installations in the European sites, - We just had to add a dedicated Virtual
Organization (VO) and a Replica Catalog (in
Manchester) - An automatic system has been developed for any
BaBar user to automatically register its
certificate to the VO - The existence of a special file on the SLAC AFS
is the proof that the user is registered in BaBar
- We use the RB installed at Imperial College which
is shared with other experiments using the EDG
test bed - We decided to restrict ourselves to basic RC
usage - We don't use GDMP
- We are looking forward testing RLS
7SLAC Setup
- The EDG software has been deployed at SLAC
- Version 1.3.4 compatible with RB 1.4.x
- Some special adaptation had to be done
- WN are running LSF
- WN are located behind a firewall so they can't
communicate directly with the RB - Solved by splitting the submission scripts in
such a way that any communication is going
through the Gatekeeper
- SLAC is accepting both EDG and DOE certificates
- AFS
- gssklog has been installed in order to get AFS
tokens - The fact that EDG 1.3 / 1.4 needs RH 6.2 is a
real problem and needs a special arrangement with
the Computing Services
8RB Specificities
- One major problem with EDG 1.4.x is related to
the Meta Directory Service (MDS) - Resources disappearing in a random way from the
Information Index (II) - 2 solutions
- Replace the dynamic Information Index by a static
one (BDII) ? EDG tested recommended solution - Install monitoring scripts which automatically
detect disappearing and reappearing resources and
restart the II accordingly
- Sometimes gives flaky II oscillating with
resources coming in and out. - If this happens the resource matching process
fails - Both solutions have been tested at Imperial
College
9The Analysis Job Use Case
- The user has an executable and a configuration
file (tcl) - The executable needs input data in Objectivity or
Root format depending on the running site - The result of the analysis job is a Root-tuple
and a log file - We suppose that a suitable BaBar release is
available in the target site - In the future, we may package the BaBar release
and will be able to install it before actually
running the job
- We want the executable to be stored in a Storage
Element (SE) closed to the Computing Element (CE) - The input tcl file is sent through the input
sandbox (OK as it is relatively small) - The output log file is returned through the
sandbox - The Root-tuple is stored in a SE close to the CE
10The Machinery
Ntuple
Executable
11Getting a generic script able to run everywhere
- Make use of the edg-brokerinfo commands
- Discover the CE and SE parameters
- For instance
- edg-brokerinfo getCloseSEs returns the closest SE
hostnames - edg-brokerinfo getSEMountPoint returns the mount
point of the SE file system - ? EDG API allows to build a fully generic script
in a very simple way
12Results
- Success rate "Submission OK and n-tuple and
log-files recovered - With the dynamic MDS equipped with the control
scripts - Success rate 55 to 75
- 98 of the failing jobs are due to the RB unable
to match the requested resources with any CE - With the static MDS
- Success rate 99
- A few jobs have been lost by the RB !!!
- During the test we have also been hit by a limit
to 512 jobs present at the same time in the RB ?
Serious limitation but should be removed in
future versions.
13Monte-Carlo Production
- Very active work to "grid-ify" BaBar MC
production - Similar to analysis application previously
described but with a stable and controlled
environment. - Store MC executable on the SE(s)
- Produce output files (in Root) format ? Store in
the SE - Send data back to SLAC or Tier-A
- Need to package MC production in order to be able
to run in any institutes even those not
maintaining BaBar software - One difficulty even if we produce data in Root
format, we still need Objectivity for conditions
data. - See Poster Session for more details
14Conclusions
- Grid technology is of prime importance for BaBar
to fully exploit its distributed computing model - Many Grid activities related to
- Data distribution
- MC production
- Analysis
- We have demonstrated that EDG has all the
necessary functionalities for running Analysis
jobs on the Grid - Reliability much better with the static MDS, but
still several open issues on the scalability of
the system. - We look forward testing EDG 2.0 and are open to
other Grid implementations - Will test VDT soon
- Will move to LCG-1 as soon as it is available
- We do not expect to have the same Grid software
implemented everywhere - We need to work on the inter-operability of the
various systems