LHCb Distributed Computing and the Grid Nick Brook University of Bristol - PowerPoint PPT Presentation

1 / 20

About This Presentation

Title:

LHCb Distributed Computing and the Grid Nick Brook University of Bristol

Description:

Barcelona, Moscow, Germany, Switzerland & Poland. 27th June 2002. Nick Brook ACAT' 02 ... 70 x 73.4 GB IBM FC Hot-Swap HDD. 2004 Scale: 300 CPUs. 0.1 PBytes ... – PowerPoint PPT presentation

Number of Views:42

Avg rating:3.0/5.0

Slides: 21

Provided by: Harr279

Category:

more less

Transcript and Presenter's Notes

Title: LHCb Distributed Computing and the Grid Nick Brook University of Bristol

1
LHCb Distributed Computingand the GridNick
BrookUniversity of Bristol

D. Galli, U. Marconi, V. Vagnoni INFN Bologna
N. Brook Bristol
K. Harrison Cambridge
E. Van Herwijnen, J. Closier, P. Mato CERN
A. Khan Edinburgh
A. Tsaregorodtsev Marseille
H. Bulten, S. Klous Nikhef
F. Harris, I. McArthur, A. Soroko Oxford
G. N. Patrick, G. Kuznetsov RAL

2
Overview of presentation

Current organisation of LHCb distributed
computing
UK facilities and support through GridPP
Current use of Globus and EDG middleware
Planning for data challenges and the use of Grid
Current LHCb Grid/applications R/D
Conclusions

3
History of distributed MC production

Distributed System has been running for 3 years
processed many millions of events for LHCb
design.
Main production sites
CERN, Bologna, Liverpool, Lyon, NIKHEF RAL
Globus already used for job submission to RAL and
Lyon
System interfaced to GRID and demonstrated at
EU-DG Review and NeSC/UK Opening.
For 2002 Data Challenges, adding new institutes
Bristol, Cambridge, Oxford, ScotGrid
In 2003, add
Barcelona, Moscow, Germany, Switzerland Poland.

4
Current Architecture
Production Manager Create no. of jobs (500 events
each) Determine configuration Run
executable Check data Copy data/logs
Physics Coordinator
Physicist
Job Creation/Submission via Web Identify
outstanding requests Select workflow Create
scripts via Java servlets.
Monitoring via PVSS Submit jobs to distributed
sites See what jobs are running Check
configuration Kill jobs, etc
Bookkeeping Database
5
LOGICAL FLOW
Submit jobs remotely via Web
Analysis
Execute on farm
Data quality check
Update bookkeeping database
Transfer data to mass store
6
Monitoring and Control of MC jobs

LHCb has adopted PVSS II as prototype control and
monitoring system for MC production.
PVSS is a commercial SCADA (Supervisory Control
And Data Acquisition) product developed by ETM.
Adopted as Control framework for LHC Joint
Controls Project (JCOP).
Available for Linux and Windows platforms.

7
(No Transcript)
8
UK Tier 1 - RAL
New Computing Farm 4 racks holding 156 dual
1.4GHz Pentium III cpus. Each box has 1GB of
memory, a 40GB internal disk and 100Mb ethernet.
Tape Robot upgraded last year uses 60GB STK 9940
tapes 45TB current capacity could hold 330TB.
50TByte disk-based Mass Storage Unit after RAID 5
overhead. PCs are clustered on network switches
with up to 8x1000Mb ethernet out of each rack.
2004 Scale 1000 CPUs 0.5 PBytes
9
UK Regional Centres
Local Perspective Consolidate Research
Computing Optimisation of Number of
Nodes? 4 Relative size dependent on funding
dynamics
10
UK Prototype Tier2 - ScotGrid

ScotGrid Processing nodes at Glasgow
59 IBM X Series 330 dual 1 GHz Pentium III with
2GB memory
2 IBM X Series 340 dual 1 GHz Pentium III with
2GB memory and dual ethernet
3 IBM X Series 340 dual 1 GHz Pentium III with
2GB memory and 100 1000 Mbit/s ethernet
1TB disk
LTO/Ultrium Tape Library
Cisco ethernet switches

ScotGrid Storage at Edinburgh
IBM X Series 370 PIII Xeon with 512 MB memory 32
x 512 MB RAM
70 x 73.4 GB IBM FC Hot-Swap HDD

2004 Scale 300 CPUs 0.1 PBytes
11
GridPP support

2 LHCb posts
to work on Gaudi (software framework) persistency
services
to work on MC monitoring and control software
2 ATLAS/LHCb
Gaudi/GANGA posts
Interface between software framework and Grid
services

12
Current Use of Grid Middleware in development
system

Authentication
grid-proxy-init
Job submission to DataGrid
dg-job-submit
Monitoring and control
dg-job-status
dg-job-cancel
dg-job-get-output
Data publication and replication
globus-url-copy, GDMP
Resource scheduling use of CERN MSS
JDL, sandboxes, storage elements

13
Example 1Job Submission

dg-job-submit /home/evh/sicb/sicb/bbincl1600061.jd
l -o /home/evh/logsub/
bbincl1600061.jdl
Executable "script_prod"
Arguments "1600061,v235r4dst,v233r2"
StdOutput "file1600061.output"
StdError "file1600061.err"
InputSandbox "/home/evhtbed/scripts/x509up_u149
","/home/evhtbed/sicb/mcsend","/home/evhtbed/sicb/
fsize","/home/evhtbed/sicb/cdispose.class","/home/
evhtbed/v235r4dst.tar.gz","/home/evhtbed/sicb/sicb
/bbincl1600061.sh","/home/evhtbed/script_prod","/h
ome/evhtbed/sicb/sicb1600061.dat","/home/evhtbed/s
icb/sicb1600062.dat","/home/evhtbed/sicb/sicb16000
63.dat","/home/evhtbed/v233r2.tar.gz"
OutputSandbox "job1600061.txt","D1600063","file
1600061.output","file1600061.err","job1600062.txt"
,"job1600063.txt"

14
Example 2 Data Publishing Replication
Compute Element
Storage Element
MSS
Local disk
Job
Data
globus-url-copy
Data
register-local-file
publish
CERN TESTBED
Replica Catalogue NIKHEF - Amsterdam
REST-OF-GRID
replica-get
Job
Data
Storage Element
15
LHCb Data Challenge 1 (July-September 2002)

Physics Data Challenge (PDC) for detector,
physics and trigger evaluations
based on existing MC production system small
amount of Grid tech to start with
Generate 3107 events (signal specific
background generic b and c min bias)
Computing Data Challenge (CDC) for checking
developing software
will make more extensive use of Grid middleware
Components will be incorporated into PDC once
proven in CDC

16
LHCb software framework - Gaudi
17
GANGA Gaudi ANd Grid AllianceJoint Atlas (C.
Tull) and LHCb (P. Mato) project,formally
supported by GridPP/UK with 2 joint Atlas/LHCb
research posts at Cambridge and Oxford

Application facilitating end-user physicists and
production managers the use of Grid services for
running Gaudi/Athena jobs.

a GUI based application that should help for the
complete job life-time
- job preparation and
configuration
- resource booking
- job submission
- job monitoring and control

GANGA
GUI
Collective Resource Grid Services
Histograms Monitoring Results
JobOptions Algorithms
GAUDI Program
18
Required functionality

Before Gaudi/Athena program starts
Security (obtaining certificates and credentials)
Job configuration (algorithm configuration, input
data selection, ...)
Resource booking and policy checking (CPU,
storage, network)
Installation of required software components
Job preparation and submission
While Gaudi/Athena program is running
Job monitoring (generic and specific)
Job control (suspend, abort, ...)
After program has finished
Data management (registration)

19
Python Bus Design(A possible model for
implementation)
20
Conclusions

LHCb already has distributed MC production using
GRID facilities for job submission
We are embarking on large scale data challenges
commencing July 2002, and we are developing our
analysis model
Grid middleware will be being progressively
integrated into our production environment as it
matures (starting with EDG, and looking forward
to GLUE)
R/D projects are in place
for interfacing users (production analysis) and
Gaudi/Athena software framework to Grid services
for putting production system into integrated
Grid environment with monitoring and control
All work being conducted in close participation
with EDG and LCG projects
Ongoing evaluations of EDG middleware with
physics jobs
Participate in LCG working groups e.g. Report on
Common use cases for a HEP Common Application
layer http//cern.ch/fca/HEPCAL.doc