LHCb on-line/off-line computing - PowerPoint PPT Presentation

Loading...

PPT – LHCb on-line/off-line computing PowerPoint presentation | free to download - id: 4892e-ZDc1Z



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

LHCb on-line/off-line computing

Description:

3 1000Base-T interfaces (1 x Intel 82545EM 2 x Intel 82546EB) Farm ... Fast logon by means of a shell like environment embedded inside a ... online ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 28
Provided by: domenic
Learn more at: http://www.pd.infn.it
Category:
Tags: bb | computing | lhcb | line | logon | online

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: LHCb on-line/off-line computing


1
LHCbon-line/off-line computing
  • Domenico Galli, Bologna

INFN CSN1 Pisa, 23.6.2004
2
DC04 (May-August 2004) Physics Goals
  • Demonstrate performance of HLT (needed for
    computing TDR)
  • Large minimum bias sample signal.
  • Improve/confirm S/B estimates of reoptimisation
    TDR
  • Large bb sample signal.
  • Validation of Gauss/Geant 4 and Generators
  • EVTGEN has replaced QQ
  • Inclusion of new processes in generation (e.g.
    prompt J/?)
  • Vincenzo Vagnoni from Bologna, as a member of the
    Physics Panel, coordinates the MC generator group.

3
DC04 Computing Goals
  • Main goal gather information to be used for
    writing LHCb computing TDR
  • Robustness test of the LHCb software and
    production system
  • Test of the LHCb distributed computing model
  • Including distributed analyses
  • Incorporation of the LCG software into the LHCb
    production environment
  • Use of LCG resources as a substantial fraction of
    the production capacity.

4
Goal Robustness Test of the LHCb Software and
Production System
  • First use of the simulation program Gauss based
    on Geant4.
  • Introduction of the new digitisation program,
    Boole.
  • Robustness of the reconstruction program, Brunel
  • Including any new tuning or other available
    improvements
  • Not including mis-alignment/calibration
    (discussion now going on).

5
Goal Robustness Test of the LHCb Software and
Production System (II)
  • Pre-selection of events based on physics criteria
    (DaVinci)
  • AKA stripping
  • Performed by production system after the
    production
  • One job for all the physics channels
  • 11000 reduction for each physics channel
  • 10 physics channels ? 1100 total reduction
  • 25 TB ? 250 GB.

6
Goal Test of the LHCb Computing Model
  • Distributed data production
  • As in 2003, will be run on all available
    production sites
  • Including LCG2
  • Controlled by the production manager at CERN
  • In close collaboration with the LHCb production
    site managers.
  • Distributed data sets
  • CERN
  • Complete DST (copied from production centres)
  • Master copies of pre-selections (stripped DST)
  • Tier1
  • Complete replica of pre-selections
  • Master copy of DST produced at associated sites.

7
DC04 Production System
GRIDFTP
GRIDFTP
Tier-1 LCG
Tier-0 CASTOR
Tier-1 CASTOR
Tier-1 DIRAC
Tier-1disk
RFIO
BBFTP
GRIDFTP
Tier-1-associated sites LCG
GRIDFTP
8
DC04 Production Share
  • LHCb Italy is participating to the DC04 with
    order of 400 processors (200k SPECint) at the
    INFN Tier-1
  • In this very moment it is the most important
    regional centre with an amount of resources
    comparable to CERN.

INFN Tier-1
CERN
9
Migration to LCG
  • We started using DIRAC as the main production
    system because for LHCb is urgent to produce
    samples for physics.
  • LCG production now under test. LCG quota is
    growing.
  • We hope to havemost of theproduction underLCG
    at the endof DC04.

10
Problems
  • Manpower dedicated to the hardware and software
    infrastructure at Tier-1 is widely unsufficient.
  • People involved are working hard but the service
    is however devoid.
  • Problems patching demands, in average, 1-2 days.
  • Duty cycle is about 20-30. And the situation
    nowadays is becoming worse and worse.
  • Problem still not solved. We cohabit with them.
  • Main problems
  • Disk storage (both hardware hung and NFS client
    hung)
  • Instability of PBS-Maui queues.

11
On-line computing and trigger
  • The most challenging aspect of LHCb on-line
    computing is the use a software trigger for L1
    too (not only in HLT) with 1 MHz input rate.
  • Cheaper then other solutions (hardware, Digital
    Signal Processors).
  • More configurable.
  • Data flow
  • L1 45-88 Gb/s.
  • HLT 13 Gb/s.
  • Latency
  • L1 lt 2 ms.
  • HLT 1 s.

12
L1HLT Architecture
HLTTraffic
Level-1Traffic
Front-end Electronics
FE
FE
FE
FE
FE
FE
FE
FE
FE
FE
TRM
FE
FE
323Links 4 kHz 1.6 GB/s
126-224Links 44 kHz 5.5-11.0 GB/s
Multiplexing Layer
29 Switches
62-87 Switches
64-137 Links 88 kHz
32 Links
L1-Decision
Sorter
TFCSystem
94-175 Links 7.1-12.6 GB/s
StorageSystem
94-175 SFCs
CPUFarm
1800 CPUs
13
L1HLT Data Flow
Front-end Electronics
FE
FE
FE
FE
FE
FE
FE
FE
FE
FE
FE
FE
1
2
2
TRM
1
Sorter
L0 Yes
L1-Decision
TFCSystem
L1 Yes
21
21
94 Links 7.1 GB/s
StorageSystem
CPUFarm
CPUFarm
94 SFCs
L1 Trigger
HLT Yes
L1 D
B? F?s
1800 CPUs
14
First Sub-Farm Prototype Built in Bologna
  • 2 Gigabit Ethernet switch
  • 3com 2824, 24 ports
  • 16 1U rack-mounted PC
  • Dual processor Intel Xeon 2.4 GHz
  • Motherboard SuperMicro X5DPL-iGM
  • 533 MHz FSB (front side bus)
  • 2 GB ECC RAM
  • Chipset Intel E7501 (8 Gb/s Hub interface)
  • Bus Controller Hub Intel P64H2 (2 x PCI-X, 64
    bit, 66/100/133 MHz)
  • 3 1000Base-T interfaces (1 x Intel 82545EM 2 x
    Intel 82546EB)

15
Farm Configuration
  • 16 Nodes running Red Hat 9b, with 2.6.5 kernel.
  • 1 Gateway, acting as bastion host and NAT to the
    external network
  • 1 Service PC, providing network boot services,
    central syslog, time synchronization, NFS
    exports, etc.
  • 1 diskless Sub-Farm Controller (SFC), with 3
    Gigabit Ethernet links (2 for data and 1 for
    control traffic)
  • 13 diskless Sub-Farm Nodes (SFNs) (26 physical,
    52 logical processors with HT) with 2 Gigabit
    Ethernet links (1 for data and 1 for control
    traffic).

16
Bootstrap Procedure
  • Little disks, little problems
  • Hard disk is the PC part more subject to failure.
  • Disk-less (and swap-less) system already
    successfully tested in Bologna off-line cluster.
  • Network bootstrap using DHCP PXE MTFTP
  • NFS-mounted disks
  • Root filesystem on NFS
  • New scheme (proposed by Bologna group) already
    tested
  • Root filesystem on a 150 MB RAMdisk (instead of
    NFS). Compressed image downloaded together with
    kernel from network at boot time (Linux initrd)
  • More robust in temporary congestion conditions.

17
Studies on Throughput and Datagram Loss in
Gigabit Ethernet Links
  • Reliable protocols (TCP or level 4) cant be
    used, because retransmission introduces an
    unpredictable latency.
  • A dropped IP datagrams means 25 event lost.
  • Its mandatory to verify that IP datagram loss is
    acceptable for the task.
  • Limit value for BER specified in IEEE 802.3
    (10-10 for 100 m cables) is not enough.
  • Measures performed at CERN show a BER lt 10-14 for
    100 m cables (small enough).
  • However we had to verify that are acceptable
  • Datagram loss in IP stack of Operating System.
  • Ethernet frame loss in level 2 Ethernet switch.

18
Studies on Throughput and Datagram Loss in
Gigabit Ethernet Links (II)
  • Concerning PCs, best performances reached are
  • Total throughput (4096 B datagrams) 999.90 Mb/s.
  • Loss datagram fraction (4096 B) 7.1x10-10.
  • Obtained in the following configuration
  • OS Linux, kernel 2.6.0-test11, compiled with
    preemptive flag
  • NAPI-compliant network driver.
  • FIFO Scheduling
  • Tx/Rx ring descriptors 4096
  • qdisc queue (pfifo discipline) size 1500.
  • IP socket send buffer size 512 kiB.
  • IP socket receive buffer size 1 MiB.

19
Studies on Throughput and Datagram Loss in
Gigabit Ethernet Links (III)
payload header UDP (8 B), header IP (20 B)
header Ethernet (14 B), preambolo Ethernet (7
B), Ethernet Start Frame Delimiter (1 B),
Ethernet Frame Check Sequence (4 B), Ethernet
Inter Frame Gap (12 B)
20
Studies on Throughput and Datagram Loss in
Gigabit Ethernet Links (IV)
21
Studies on Throughput and Datagram Loss in
Gigabit Ethernet Links (V)
Frame Loss in the Gigabit EthernetSwitch HP
ProCurve 6108
22
Studies on Throughput and Datagram Loss in
Gigabit Ethernet Links (VI)
  • An LHCb public note has been published
  • A. Barczyk, A. Carbone, J.-P. Dufey, D. Galli, B.
    Jost, U. Marconi, N.Neufeld, G. Peco, V. Vagnoni,
    Reliability of Datagram Transmission on Gigabit
    Ethernet at Full Link Load, LHCb note 2004-030,
    DAQ.

23
Studies on Port Trunking
  • In several tests performed at CERN, AMD Opteron
    CPUs show better performances than Intel Xeon in
    serving IRQ.
  • The use of Opteron PC, together with port
    trunking (i.e. splitting data across more than 1
    Ethernet cable) could help in simplifying the
    online farm design by reducing the number of
    subfarm controllers.
  • Every SFC could support more computing nodes.
  • We plan to investigate Linux kernel
    performancesin port trunking in the
    differentconfigurations (balance-rr,balance-xor,
    802.3ad, balance-tlb, balance-alb).

24
On-line Farm Monitoring, Configuration and Control
  • One critical issue in administering the event
    filter farm is how to monitor, keep configured
    and up-to-date, and control each node.
  • A stringent requirement of such a control system
    is that it necessarily has to be interfaced to
    the general DAQ framework.
  • PVSS provides a runtime DB, automatic archiving
    of data to permanent storage,alarm generation,
    easyrealization of graphicalpanels, various
    protocolsto communicate via network.

25
On-line Farm Monitoring, Configuration and
Control (II)
  • The DIM network communication layer, already
    integrated with PVSS, is very suitable for our
    needs
  • It is simple and efficient.
  • It allows bi-directionalcommunication.
  • The idea is to run light agents on the farm
    nodes, providing information to a PVSS project,
    which publishes them through GUIs, and which can
    also receive arbitrary complex commands to be
    executed on the farm nodes passing back the
    output.

26
On-line Farm Monitoring, Configuration and
Control (III)
  • All the relevant quantities useful to diagnose
    hardware or configuration problems should be
    traced
  • CPU fans and temperatures
  • Memory occupancy
  • RAM disk filesystem occupancy
  • CPU load
  • Network interface statistics, counters, errors
  • TCP/IP stack counters
  • Status of relevant processes
  • Network Switch statistics (via the SNMP-PVSS
    interface).
  • Information should be viewed as actual values
    and/or historical trends.
  • Alarms should be issued whenever relevant
    quantities dont fit in allowed ranges
  • PVSS naturally allows it, and can even start
    feedback procedures.

27
On-line Farm Monitoring, Configuration and
Control (IV)
  • Concerning configuration and control, the idea is
    to embed in the framework every common operation
    which is usually needed by the system
    administrator, to be performed by means of GUIs
  • On the Service PCs side
  • Upgrade of operating systems
  • Upgrade of application software
  • Automatic setup of configuration files
  • dhcpd table, NFS exports table, etc.
  • On the farm nodes side
  • Inspection and modification of files
  • Broadcast commands to the entire farm (e.g.,
    reboot)
  • Fast logon by means of a shell like environment
    embedded inside a PVSS GUI (e.g., commands,
    stdout and stderr passed back and forth by DIM)
  • (Re)start of online processes
About PowerShow.com