LHCb Distributed Computing and the Grid Nick Brook University of Bristol - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

LHCb Distributed Computing and the Grid Nick Brook University of Bristol

Description:

Barcelona, Moscow, Germany, Switzerland & Poland. 27th June 2002. Nick Brook ACAT' 02 ... 70 x 73.4 GB IBM FC Hot-Swap HDD. 2004 Scale: 300 CPUs. 0.1 PBytes ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 21
Provided by: Harr279
Category:

less

Transcript and Presenter's Notes

Title: LHCb Distributed Computing and the Grid Nick Brook University of Bristol


1
LHCb Distributed Computingand the GridNick
BrookUniversity of Bristol
  • D. Galli, U. Marconi, V. Vagnoni INFN Bologna
  • N. Brook Bristol
  • K. Harrison Cambridge
  • E. Van Herwijnen, J. Closier, P. Mato CERN
  • A. Khan Edinburgh
  • A. Tsaregorodtsev Marseille
  • H. Bulten, S. Klous Nikhef
  • F. Harris, I. McArthur, A. Soroko Oxford
  • G. N. Patrick, G. Kuznetsov RAL

2
Overview of presentation
  • Current organisation of LHCb distributed
    computing
  • UK facilities and support through GridPP
  • Current use of Globus and EDG middleware
  • Planning for data challenges and the use of Grid
  • Current LHCb Grid/applications R/D
  • Conclusions

3
History of distributed MC production
  • Distributed System has been running for 3 years
    processed many millions of events for LHCb
    design.
  • Main production sites
  • CERN, Bologna, Liverpool, Lyon, NIKHEF RAL
  • Globus already used for job submission to RAL and
    Lyon
  • System interfaced to GRID and demonstrated at
    EU-DG Review and NeSC/UK Opening.
  • For 2002 Data Challenges, adding new institutes
  • Bristol, Cambridge, Oxford, ScotGrid
  • In 2003, add
  • Barcelona, Moscow, Germany, Switzerland Poland.

4
Current Architecture
Production Manager Create no. of jobs (500 events
each) Determine configuration Run
executable Check data Copy data/logs
Physics Coordinator
Physicist
Job Creation/Submission via Web Identify
outstanding requests Select workflow Create
scripts via Java servlets.
Monitoring via PVSS Submit jobs to distributed
sites See what jobs are running Check
configuration Kill jobs, etc
Bookkeeping Database
5
LOGICAL FLOW
Submit jobs remotely via Web
Analysis
Execute on farm
Data quality check
Update bookkeeping database
Transfer data to mass store
6
Monitoring and Control of MC jobs
  • LHCb has adopted PVSS II as prototype control and
    monitoring system for MC production.
  • PVSS is a commercial SCADA (Supervisory Control
    And Data Acquisition) product developed by ETM.
  • Adopted as Control framework for LHC Joint
    Controls Project (JCOP).
  • Available for Linux and Windows platforms.

7
(No Transcript)
8
UK Tier 1 - RAL
New Computing Farm 4 racks holding 156 dual
1.4GHz Pentium III cpus. Each box has 1GB of
memory, a 40GB internal disk and 100Mb ethernet.
Tape Robot upgraded last year uses 60GB STK 9940
tapes 45TB current capacity could hold 330TB.
50TByte disk-based Mass Storage Unit after RAID 5
overhead. PCs are clustered on network switches
with up to 8x1000Mb ethernet out of each rack.
2004 Scale 1000 CPUs 0.5 PBytes
9
UK Regional Centres
Local Perspective Consolidate Research
Computing Optimisation of Number of
Nodes? 4 Relative size dependent on funding
dynamics
10
UK Prototype Tier2 - ScotGrid
  • ScotGrid Processing nodes at Glasgow
  • 59 IBM X Series 330 dual 1 GHz Pentium III with
    2GB memory
  • 2 IBM X Series 340 dual 1 GHz Pentium III with
    2GB memory and dual ethernet
  • 3 IBM X Series 340 dual 1 GHz Pentium III with
    2GB memory and 100 1000 Mbit/s ethernet
  • 1TB disk
  • LTO/Ultrium Tape Library
  • Cisco ethernet switches
  • ScotGrid Storage at Edinburgh
  • IBM X Series 370 PIII Xeon with 512 MB memory 32
    x 512 MB RAM
  • 70 x 73.4 GB IBM FC Hot-Swap HDD

2004 Scale 300 CPUs 0.1 PBytes
11
GridPP support
  • 2 LHCb posts
  • to work on Gaudi (software framework) persistency
    services
  • to work on MC monitoring and control software
  • 2 ATLAS/LHCb
  • Gaudi/GANGA posts
  • Interface between software framework and Grid
    services

12
Current Use of Grid Middleware in development
system
  • Authentication
  • grid-proxy-init
  • Job submission to DataGrid
  • dg-job-submit
  • Monitoring and control
  • dg-job-status
  • dg-job-cancel
  • dg-job-get-output
  • Data publication and replication
  • globus-url-copy, GDMP
  • Resource scheduling use of CERN MSS
  • JDL, sandboxes, storage elements

13
Example 1Job Submission
  • dg-job-submit /home/evh/sicb/sicb/bbincl1600061.jd
    l -o /home/evh/logsub/
  • bbincl1600061.jdl
  • Executable "script_prod"
  • Arguments "1600061,v235r4dst,v233r2"
  • StdOutput "file1600061.output"
  • StdError "file1600061.err"
  • InputSandbox "/home/evhtbed/scripts/x509up_u149
    ","/home/evhtbed/sicb/mcsend","/home/evhtbed/sicb/
    fsize","/home/evhtbed/sicb/cdispose.class","/home/
    evhtbed/v235r4dst.tar.gz","/home/evhtbed/sicb/sicb
    /bbincl1600061.sh","/home/evhtbed/script_prod","/h
    ome/evhtbed/sicb/sicb1600061.dat","/home/evhtbed/s
    icb/sicb1600062.dat","/home/evhtbed/sicb/sicb16000
    63.dat","/home/evhtbed/v233r2.tar.gz"
  • OutputSandbox "job1600061.txt","D1600063","file
    1600061.output","file1600061.err","job1600062.txt"
    ,"job1600063.txt"

14
Example 2 Data Publishing Replication
Compute Element
Storage Element
MSS
Local disk
Job
Data
globus-url-copy
Data
register-local-file
publish
CERN TESTBED
Replica Catalogue NIKHEF - Amsterdam
REST-OF-GRID
replica-get
Job
Data
Storage Element
15
LHCb Data Challenge 1 (July-September 2002)
  • Physics Data Challenge (PDC) for detector,
    physics and trigger evaluations
  • based on existing MC production system small
    amount of Grid tech to start with
  • Generate 3107 events (signal specific
    background generic b and c min bias)
  • Computing Data Challenge (CDC) for checking
    developing software
  • will make more extensive use of Grid middleware
  • Components will be incorporated into PDC once
    proven in CDC

16
LHCb software framework - Gaudi
17
GANGA Gaudi ANd Grid AllianceJoint Atlas (C.
Tull) and LHCb (P. Mato) project,formally
supported by GridPP/UK with 2 joint Atlas/LHCb
research posts at Cambridge and Oxford
  • Application facilitating end-user physicists and
    production managers the use of Grid services for
    running Gaudi/Athena jobs.
  • a GUI based application that should help for the
    complete job life-time
  • - job preparation and
  • configuration
  • - resource booking
  • - job submission
  • - job monitoring and control

GANGA
GUI
Collective Resource Grid Services
Histograms Monitoring Results
JobOptions Algorithms
GAUDI Program
18
Required functionality
  • Before Gaudi/Athena program starts
  • Security (obtaining certificates and credentials)
  • Job configuration (algorithm configuration, input
    data selection, ...)
  • Resource booking and policy checking (CPU,
    storage, network)
  • Installation of required software components
  • Job preparation and submission
  • While Gaudi/Athena program is running
  • Job monitoring (generic and specific)
  • Job control (suspend, abort, ...)
  • After program has finished
  • Data management (registration)

19
Python Bus Design(A possible model for
implementation)
20
Conclusions
  • LHCb already has distributed MC production using
    GRID facilities for job submission
  • We are embarking on large scale data challenges
    commencing July 2002, and we are developing our
    analysis model
  • Grid middleware will be being progressively
    integrated into our production environment as it
    matures (starting with EDG, and looking forward
    to GLUE)
  • R/D projects are in place
  • for interfacing users (production analysis) and
    Gaudi/Athena software framework to Grid services
  • for putting production system into integrated
    Grid environment with monitoring and control
  • All work being conducted in close participation
    with EDG and LCG projects
  • Ongoing evaluations of EDG middleware with
    physics jobs
  • Participate in LCG working groups e.g. Report on
    Common use cases for a HEP Common Application
    layer http//cern.ch/fca/HEPCAL.doc
Write a Comment
User Comments (0)
About PowerShow.com