The LHCb Way of Computing - PowerPoint PPT Presentation

About This Presentation
Title:

The LHCb Way of Computing

Description:

Brief introduction to the LHCb experiment. Requirements on ... tilde-majordom.home.cern.ch/~majordom/news/gaudi-developers/index.html. Data Acquisition System ... – PowerPoint PPT presentation

Number of Views:72
Avg rating:3.0/5.0
Slides: 72
Provided by: NIC8169
Category:
Tags: computing | lhcb | tilde | way

less

Transcript and Presenter's Notes

Title: The LHCb Way of Computing


1
The LHCb Way of Computing The approach to its
organisation and development
John Harvey CERN/ LHCb DESY Seminar Jan 15th,
2001
2
Talk Outline
  • Brief introduction to the LHCb experiment
  • Requirements on data rates and cpu capacities
  • Scope and organisation of the LHCb Computing
    Project
  • Importance of reuse and a unified approach
  • Data processing software
  • Importance of architecture-driven development and
    software frameworks
  • DAQ system
  • Simplicity and maintainability of the
    architecture
  • Importance of industrial solutions
  • Experimental Control System
  • Unified approach to controls
  • Use of commercial software
  • Summary

3
Overview of LHCb Experiment
4
The LHCb Experiment
  • Special purpose experiment to measure precisely
    CP asymmetries and rare decays in B-meson
    systems
  • Operating at the most intensive source of Bu, Bd,
    Bs and Bc, i.e. the LHC at CERN
  • LHCb plans to run with an average luminosity of
    2x1032cm-2s-1
  • Events dominated by single pp interactions - easy
    to analyse
  • Detector occupancy is low
  • Radiation damage is reduced
  • High performance trigger based on
  • High pT leptons and hadrons (Level 0)
  • Detached decay vertices (Level 1)
  • Excellent particle identification for charged
    particles
  • K/p 1GeV/c lt p lt 100GeV/c

5
The LHCb Detector
  • At high energies b- and b-hadrons are produced in
    same forward cone
  • Detector is a single-arm spectrometer with one
    dipole
  • ?min 15 mrad (beam pipe and radiation)
  • ?max 300 mrad (cost optimisation)

Polar angles of b and b-hadrons calculated using
PYTHIA
6
LHCb Detector Layout
7
(No Transcript)
8
Typical Interesting Event
9
The LHCb Collaboration
49 institutes 513 members
10
LHCb in numbers
  • Expected rate from inelastic p-p collisions is
    15 MHz
  • Total b-hadron production rate is 75 kHz
  • Branching ratios of interesting channels range
    between 10-5-10-4 giving interesting physics
    rate of 5 Hz

11
Timescales
  • LHCb experiment approved in September 1998
  • Construction of each component scheduled to start
    after approval of corresponding Technical Design
    Report (TDR)
  • Magnet, Calorimeter and RICH TDRs submitted in
    2000
  • Trigger and DAQ TDRs expected January 2002
  • Computing TDR expected December 2002
  • Expect nominal luminosity (2x1032 cm-2sec 1)
    soon after LHC turn-on
  • Exploit physics potential from day 1
  • Smooth operation of the whole data acquisition
    and data processing chain will be needed very
    quickly after turnon
  • Locally tuneable luminosity ? long physics
    programme
  • Cope with long life-cycle of 15 years

12
LHCb Computing Scope and Organisation
13
Requirements and Resources
  • More stringent requirements
  • Enormous number of items to control -
    scalability
  • Inaccessibility of detector and electronics
    during datataking -reliability
  • intense use of software in triggering (Levels 1,
    2, 3) - quality
  • many orders of magnitude more data and CPU -
    performance
  • Experienced manpower very scarce
  • Staffing levels falling
  • Technology evolving very quickly (hardware and
    software)
  • Rely very heavily on very few experts (1 or 2) -
    bootstrap approach
  • The problem - a more rigorous approach is needed
    but this is more manpower intensive and must be
    undertaken under conditions of dwindling resources

14
Importance of Reuse
  • Put extra effort into building high quality
    components
  • Become more efficient by extracting more use out
    of these components (reuse)
  • Many obstacles to overcome
  • too broad functionality / lack of flexibility in
    components
  • proper roles and responsibilities not defined (
    e.g. architect )
  • organisational - reuse requires a broad overview
    to ensure unified approach
  • we tend to split into separate domains each
    independently managed
  • cultural
  • dont trust others to deliver what we need
  • fear of dependency on others
  • fail to share information with others
  • developers fear loss of creativity
  • Reuse is a management activity - need to provide
    the right organisation to make it happen

15
Traditional Project Organisation
DAQ Hardware
Message System
16
A Process for reuse
Manage Plan, initiate, track,
coordinate Set priorities and schedules, resolve
conflicts
Build Develop
architectural models Choose integration
standards Engineer reusable components
Support Support
development Manage maintain components Validate,
classify, distribute Document, give feedback
Assemble Design
application Find and specialise
components Develop missing components Integrate
components
Requirements (Existing software and hardware)
Systems
17
LHCb Computing Project Organisation
Technical Review
National Computing Board
Computing Steering Group
RC
M
E
M
A
M
C
A
E
RC
C
RC
Manage
Assemble
Build
Support
18
Data Processing Software
19
Software architecture
  • Definition of software architecture 1
  • Set or significant decisions about the
    organization of the software system
  • Selection of the structural elements and their
    interfaces which compose the system
  • Their behavior -- collaboration among the
    structural elements
  • Composition of these structural and behavioral
    elements into progressively larger subsystems
  • The architectural style that guides this
    organization
  • The architecture is the blue-print (architecture
    description document)

1 I. Jacobson, et al. The Unified Software
development Process, Addison Wesley 1999
20
Software Framework
  • Definition of software framework 2,3
  • A kind of micro-architecture that codifies a
    particular domain
  • Provides the suitable knobs, slots and tabs that
    permit clients to customise it for specific
    applications within a given range of behaviour
  • A framework realizes an architecture
  • A large O-O system is constructed from several
    cooperating frameworks
  • The framework is real code
  • The framework should be easy to use and should
    provide a lot of functionality

2 G. Booch, Object Solutions, Addison-Wesley
1996
3 E. Gamma, et al., Design Patterns,
Addison-Wesley 1995
21
Benefits
  • Having an architecture and a framework
  • Common vocabulary, better specifications of what
    needs to be done, better understanding of the
    system.
  • Low coupling between concurrent developments.
    Smooth integration. Organization of the
    development.
  • Robustness, resilient to change
    (change-tolerant).
  • Fostering code re-use

architecture
framework
applications
22
Whats the scope?
  • Each LHC experiment needs a framework to be used
    in their event data processing applications
  • physics/detector simulation
  • high level triggers
  • reconstruction
  • analysis
  • event display
  • data quality monitoring,
  • The experiment framework will incorporate other
    frameworks persistency, detector description,
    event simulation, visualization, GUI, etc.

23
Software Structure
Applications built on top of frameworks and
implementing the required physics algorithms.
Reconstruction
Simulation
High level triggers
Analysis
One main framework Various specialized
frameworks visualization, persistency,
interactivity, simulation, etc.
Frameworks Toolkits
A series of basic libraries widely used STL,
CLHEP, etc.
Foundation Libraries
24
GAUDI Object Diagram
Converter
Converter
Application Manager
Converter
Event Selector
Transient Event Store
Data Files
Persistency Service
Message Service
Event Data Service
JobOptions Service
Algorithm
Algorithm
Algorithm
Data Files
Transient Detector Store
Persistency Service
Particle Prop. Service
Detec. Data Service
Other Services
Data Files
Transient Histogram Store
Persistency Service
Histogram Service
25
GAUDI Architecture Design Criteria
  • Clear separation between data and algorithms
  • Three basic types of data event, detector,
    statistics
  • Clear separation between persistent and transient
    data
  • Computation-centric architectural style
  • User code encapsulated in few specific places
    algorithms and converters
  • All components with well defined interfaces and
    as generic as possible

26
Status
  • Sept 98 project started GAUDI team assembled
  • Nov 25 98 - 1- day architecture review
  • goals, architecture design document, URD,
    scenarios
  • chair, recorder, architect, external reviewers
  • Feb 8 99 - GAUDI first release (v1)
  • first software week with presentations and
    tutorial sessions
  • plan for second release
  • expand GAUDI team to cover new domains (e.g.
    analysis toolkits, visualisation)
  • Nov 00 GAUDI v6
  • Nov 00 BRUNEL v1
  • New reconstruction program based on GAUDI
  • Supports C algorithms (tracking) and wrapped
    FORTRAN
  • FORTRAN gradually being replaced

27
Collaboration with ATLAS
  • Now ATLAS also contributing to the development of
    GAUDI
  • Open-Source style, expt independent web and
    release area,
  • Other experiments are also using GAUDI
  • HARP, GLAST, OPERA
  • Since we can not provide all the functionality
    ourselves, we rely on contributions from others
  • Examples Scripting interface, data dictionaries,
    interactive analysis, etc.
  • Encouragement to put more quality into the
    product
  • Better testing in different environments
    (platforms, domains,..)
  • Shared long-term maintenance
  • Gaudi developers mailing list
  • tilde-majordom.home.cern.ch/majordom/news/gaudi-d
    evelopers/index.html

28
Data Acquisition System
29
Trigger/DAQ Architecture
30
Event Building Network
  • Requirements
  • 6 GB/s sustained bandwidth
  • Scalable
  • 120 inputs (RUs)
  • 120 outputs (SFCs)
  • commercial and affordable (COTS, Commodity?)
  • Readout Protocol
  • Pure push-through protocol of complete events to
    one CPU of the farm
  • Destination assignment following identical
    algorithm in all RUs (belonging to one partition)
    based on event number
  • Simple hardware and software
  • No central control ? perfect scalability
  • Full flexibility for high-level trigger
    algorithms
  • Larger bandwidth needed (50) compared with
    phased event-building
  • Avoiding buffer overflows via throttle to
    trigger
  • Only static load balancing between RUs and SFCs

31
Readout Unit using Network Processors
  • IBM NP4GS3
  • 4 x 1Gb full duplex Ethernet MACs
  • 16 RISC processors _at_ 133 MHz
  • Up-to 64 MB external RAM
  • Used in routers
  • RU Functions
  • EB and formatting
  • 7.5 msec/event
  • 200 kHz evt rate

32
Sub Farm Controller (SFC)
  • Alteon Tigon 2
  • Dual R4000-class processor running at 88 MHz
  • Up to 2 MB memory
  • GigE MAClink-level interface
  • PCI interface
  • 90 kHz event fragments/s
  • Development environment
  • GNU C cross compiler with few special features to
    support the hardware
  • Source-level remote debugger

Standard PC
PCI Bus
Local Bus
Readout Network (GbE)
Smart NIC
CPU
PCI Bridge
Subfarm Network (GbE)
NIC
Memory
50 MB/s
0.5 MB/s
Controls Network (FEth)
Control NIC
33
Control Interface to Electronics
  • Select a reduced number of solutions to interface
    Front-end electronics to LHCbs control system
  • No radiation (counting room) Ethernet to credit
    card PC on modules
  • Low level radiation (cavern)10Mbits/s custom
    serial LVDS twisted pairSEU immune antifuse
    based FPGA interface chip
  • High level radiation (inside detectors)CCU
    control system made for CMS trackerRadiation
    hard, SEU immune, bypass
  • Provide support (HW and SW) for the integration
    of the selected solutions

34
Experiment Control System
35
Control and Monitoring
LHC-B Detector
Data rates
VDET TRACK ECAL HCAL MUON RICH
40 MHz
Level 0 Trigger
40 TB/s
Level-0 Front-End Electronics Level-1
1 MHz
Timing Fast Control
L0
Fixed latency 4.0 ms
1 TB/s
L1
40 kHz
Level 1 Trigger
LAN
1 MHz
Front-End Multiplexers (FEM)
Front End Links
6 GB/s
Variable latency lt1 ms
RU
RU
RU
Read-out units (RU)
Throttle
Read-out Network (RN)
6 GB/s
SFC
SFC
Sub-Farm Controllers (SFC)
Variable latency L2 10 ms L3 200 ms
Control Monitoring
Storage
50 MB/s
Trigger Level 2 3 Event Filter
CPU
CPU
CPU
CPU
36
Experimental Control System
  • The Experiment Control System will be used to
    control and monitor the operational state of the
    detector, of the data acquisition and of the
    experimental infrastructure.
  • Detector controls
  • High and Low voltages
  • Crates
  • Cooling and ventilation
  • Gas systems etc.
  • Alarm generation and handling
  • DAQ controls
  • RUN control
  • Setup and configuration of all readout components
    (FE, Trigger, DAQ, CPU Farm, Trigger
    algorithms,...)

37
System Requirements
  • Common control services across the experiment
  • System configuration services coherent
    information in database
  • Distributed information system control data
    archival and retrieval
  • Error reporting and alarm handling
  • Data presentation status displays, trending
    tools etc.
  • Expert system to assist shift crew
  • Objectives
  • Easy to operate 2/3 shift crew to run complete
    experiment
  • Easy to adapt to new conditions and requirements
  • Implies integration of DCS with the control of
    DAQ and data quality monitoring

38
Integrated System trending charts
DAQ
Slow Control
39
Integrated system error logger
ALEPH error logger, ERRORS MONITOR
ALARM 2-JUN 1130 ALEP R_ALEP_0 RUNC_DAQ
ALEPHgtgt DAQ Error 2-JUN 1130 ALEP TPEBAL
MISS_SOURCE TPRP13 lt1_missing_Source(s)gt 2-JUN
1130 ALEP TS TRIGGERERROR Trigger protocol
error(TMO_Wait_No_Busy) 2-JUN 1130 TPC
SLOWCNTR SECTR_VME VME CRATE fault in SideA
Low
DAQ
Slow Control
40
Scale of the LHCb Control system
  • Parameters
  • Detector Control O (105) parameters
  • FE electronics Few parameters x 106 readout
    channels
  • Trigger DAQ O(103) DAQ objects x O(102)
    parameters
  • Implies a high level description of control
    components (devices/channels)
  • Infrastructure
  • 100-200 Control PCs
  • Several hundred credit-card PCs.
  • By itself a sizeable network (ethernet)

41
LHCb Controls Architecture
Conf. DB, Archives, Log files,
Technologies
Storage
Supervision
SCADA
Users
Servers
WAN
LAN
Process Management
. . .
OPC
LAN
Controller/ PLC
Communication
Other systems (LHC, Safety, ...)
VME
Fieldbus
PLC
Field Management
Fieldbuses
Experimental equipment
Devices
42
Supervisory Control And Data Acquisition
  • Used virtually everywhere in industry including
    very large and mission critical applications
  • Toolkit including
  • Development environment
  • Set of basic SCADA functionality (e.g. HMI,
    Trending, Alarm Handling, Access Control,
    Logging/Archiving, Scripting, etc.)
  • Networking/redundancy management facilities for
    distributed applications
  • Flexible Open Architecture
  • Multiple communication protocols supported
  • Support for major Programmable Logic Controllers
    (PLCs) but not VME
  • Powerful Application Programming Interface (API)
  • Open Database Connectivity (ODBC)
  • OLE for Process Control (OPC )

43
Benefits/Drawbacks of SCADA
  • Standard framework gt homogeneous system
  • Support for large distributed systems
  • Buffering against technology changes, Operating
    Systems, platforms, etc.
  • Saving of development effort (50-100 man-years)
  • Stability and maturity available immediately
  • Support and maintenance, including documentation
    and training
  • Reduction of work for the end users
  • Not tailored exactly to the end application
  • Risk of company going out of business
  • Companys development of unwanted features
  • Have to pay

44
Commercial SCADA system chosen
  • Major evaluation effort
  • technology survey looked at 150 products
  • PVSS II chosen from an Austrian company (ETM)
  • Device oriented, Linux and NT support
  • The contract foresees
  • Unlimited usage by members of all institutes
    participating in LHC experiments
  • 10 years maintenance commitment
  • Training provided by company - to be paid by
    institutes
  • Licenses available from CERN from October 2000
  • PVSS II will be the basis for the development of
    the control systems for all four LHC experiments
    (Joint COntrols Project)

45
Controls Framework
  • LHCb aims to distribute with the SCADA system a
    framework
  • Reduce to a minimum the work to be performed by
    the sub-detector teams
  • Ensure work can be easily integrated despite
    being performed in multiple locations
  • Ensure a consistent and homogeneous DCS
  • Engineering tasks for framework
  • Definition of system architecture (distribution
    of functionality)
  • Model standard device behaviour
  • Development of configuration tools
  • Templates, symbols libraries, e.g. power supply,
    rack, etc.
  • Support for system partitioning (uses FSM)
  • Guidelines on use of colours, fonts, page layout,
    naming, ...
  • Guidelines for alarm priority levels, access
    control levels, etc.
  • First Prototype released end 2000

46
Application Architecture
ECS
LHC
DCS
DAQ
Vertex
Tracker
Muon
Vertex
Tracker
Muon
GAS
HV
Temp
HV
GAS
HV
FE
RU
FE
RU
FE
RU
SAFETY
47
Run Control
48
Summary
  • Organisation has important consequences for
    cohesion, maintainability, manpower needed to
    build system
  • Architecture driven development maximises common
    infrastructure and results in systems more
    resilient to change
  • Software frameworks maximuse level of reuse and
    simplify distributed development by many
    application builders
  • Use of industrial components (hardware and
    software) can reduce development effort
    significantly
  • DAQ is designed with simplicity and
    maintainability in mind
  • Maintain a unified approach e.g. same basic
    infrastructure for detector controls and DAQ
    controls

49
Extra Slides
50
(No Transcript)
51
Typical Interesting Event
52
(No Transcript)
53
LHCb Collaboration
France Clermont-Ferrand, CPPM Marseille, LAL
Orsay Germany Tech. Univ. Dresden, KIP Univ.
Heidelberg, Phys. Inst. Univ. Heidelberg, MPI
Heidelberg, Italy Bologna, Cagliari, Ferrara,
Firenze, Frascati, Genova, Milano, Univ. Roma I
(La Sapienza), Univ. Roma II(Tor
Vergata) Netherlands NIKHEF Poland Cracow
Inst. Nucl. Phys., Warsaw Univ. Spain Univ.
Barcelona, Univ. Santiago de Compostela Switzerlan
d Univ. Lausanne, Univ. Zürich UK Univ.
Bristol, Univ. Cambridge, Univ. Edinburgh, Univ.
Glasgow, IC London, Univ. Liverpool, Univ.
Oxford, RALCERN Brazil UFRJ China IHEP
(Beijing), Tsinghua Univ. (Beijing) Romania
IFIN-HH Bucharest Russia BINR (Novosibirsk),
INR, ITEP,Lebedev Inst., IHEP,PNPI(Gatchina) Ukrai
ne Inst. Phys. Tech. (Kharkov), Inst. Nucl.
Research (Kiev)
54
Requirements on Data Rates and Computing
Capacities
55
LHCb Technical Design Reports
Submitted January 2000 Recommended by
LHCC March 2000 Approved by RB April 2000
Submitted September 2000 Recommended November
2000
Submitted September 2000 Recommended November
2000
56
Defining the architecture
  • Issues to take into account
  • Object persistency
  • User interaction
  • Data visualization
  • Computation
  • Scheduling
  • Run-time type information
  • Plug-and-play facilities
  • Networking
  • Security

57
Architectural Styles
  • General categorization of systems 2
  • user-centric focus on the direct
    visualization and manipulation of the objects
    that define a certain domain
  • data-centric focus upon preserving the
    integrity of the persistent objects in
    a system
  • computation-centric focus is on the
    transformation of objects that are
    interesting to the system
  • Our applications have elements of all three.
    Which one dominates?

58
Getting Started
  • First crucial step was to appoint an architect -
    ideally skills as
  • OO mentor, domain specialist, leadership,
    visionary
  • Started with small design team 6 people,
    including
  • developers , librarian, use case analyst
  • Control activities through visibility and self
    discipline
  • meet regularly - in the beginning every day, now
    once per week
  • Collect URs and scenarios, use to validate the
    design
  • Establish the basic design criteria for the
    overall architecture
  • architectural style, flow of control,
    specification of interfaces

59
Development Process
  • Incremental approach to development
  • new release every few ( 4) months
  • software workshop timed to coincide with new
    release
  • Development cycle is user-driven
  • Users define priority of what goes in the next
    release
  • Ideally they use what is produced and give rapid
    feedback
  • Frameworks must do a lot and be easy to use
  • Strategic decisions taken following thorough
    review (1 /year)
  • Releases accompanied by complete documentation
  • presentations, tutorials
  • URD, reference documents, user guides, examples

60
Possible migration strategies
C
Fortran
SICb
?
1
Gaudi
Fast translation of Fortran into C
SICb
2
Gaudi
Wrapping Fortran
SICb
3
Gaudi
Framework development phase
Transition phase
Hybrid phase
Consolidation phase
61
How to proceed?
  • Physics Goal
  • To be able to run new tracking pattern
    recognition algorithms written in C in
    production with standard FORTRAN algorithms in
    time to produce useful results for the RICH TDR.
  • Software Goal
  • To allow software developers to become familiar
    with GAUDI and to encourage the development of
    new software algorithms in C.
  • Approach
  • choose strategy 3
  • start with migration of reconstruction and
    analysis code
  • simulation will follow later

62
New Reconstruction Program - BRUNEL
  • Benefits of the approach
  • A unified development and production environment
  • As soon as C algorithms are proven to do the
    right thing, they can be brought into production
    in the official reconstruction program
  • Early exposure of all developers to Gaudi
    framework
  • Increasing functionality of OO DST
  • As more and more of the event data become
    available in Gaudi, it will become more and more
    attractive to perform analysis with Gaudi
  • A smooth transition to a C only reconstruction

63
Integrated System - databases
The power supply on that VME crate
Readout System Database
Slow Control Database
Detector description
64
Frontend Electronics
  • Data Buffering for Level-0 latency
  • Data Buffering for Level-1 latency
  • Digitization and Zero Suppression
  • Front-end Multiplexing onto Front-end links
  • Push of data to next higher stage of the readout
    (DAQ)

65
Timing and Fast Control
  • Provide common and synchronous clock to all
    components needing it
  • Provide Level-0 and Level-1 trigger decisions
  • Provide commands synchronous in all components
    (Resets)
  • Provide Trigger hold-off capabilities in case
    buffers are getting full
  • Provide support for partitioning (Switches, ORs)

66
IBM NP4GS3
  • Features
  • 4 x 1Gb full duplex Ethernet MACs
  • 16 special purpose RISC processors _at_ 133 MHz with
    2 hw threads each
  • 4 processor (8 threads) share 3 co-processors for
    special functions
  • Tree search
  • Memory move
  • Etc.
  • Integrated 133 MHz Power PC processor
  • Up-to 64 MB external RAM

67
Event Building Network Simulation
  • Simulated technology Myrinet
  • Nominal 1.28 Gb/s
  • Xon/Xoff flow control
  • Switches
  • ideal cross-bar
  • 8x8 maximum size (currently)
  • wormhole routing
  • source routing
  • No buffering inside switches
  • Software used Ptolemy discrete event framework
  • Realistic traffic patterns
  • variable event sizes
  • event building traffic

68
Event Building Activities
  • Studied Myrinet
  • Tested NIC event-building
  • simulated switching fabric of the size suitable
    for LHCbResults show that switching network
    could be implemented (provided buffers are added
    between levels of switches)
  • Currently focussing on xGb Ethernet
  • Studying smart NICs (-gt Nikos talk)
  • Possible switch configuration for LHCb with
    todays technology (to be simulated...)

Myrinet Simulation
Multiple Paths between sources and destinations!
69
Network Simulation Results
Results dont depend strongly on specific
technology (Myrinet), but rather on
characteristics (flow control, buffering,
internal speed, etc)
FIFO buffers between switching levels allow to
recover scalability 50 efficiency Law of
nature for these characteristics
70
Alteon Tigon 2
  • Features
  • Dual R4000-class processor running at 88 MHz
  • Up to 2 MB memory
  • GigE MAClink-level interface
  • PCI interface
  • Development environment
  • GNU C cross compiler with few special features to
    support the hardware
  • Source-level remote debugger

71
Controls System
  • Common integrated controls system
  • Detector controls
  • High voltage
  • Low voltage
  • Crates
  • Alarm generation and handling
  • etc.
  • DAQ controls
  • RUN control
  • Setup and configuration of all components (FE,
    Trigger, DAQ, CPU Farm, Trigger algorithms,...)
  • Consequent and rigorous separation of controls
    and DAQ path

Same system for both functions! Scale 100-200
Control PCs many 100s of Credit-Card PCs
By itself sizeable Network! Most likely Ethernet
Write a Comment
User Comments (0)
About PowerShow.com