CMS Software Architecture - PowerPoint PPT Presentation

About This Presentation
Title:

CMS Software Architecture

Description:

C style coding (for algorithms it is ok) but it is the source of all evils (as memory leaks) ... C features (changing: fast convergence toward ANSI standard) ... – PowerPoint PPT presentation

Number of Views:139
Avg rating:3.0/5.0
Slides: 38
Provided by: ygap
Category:

less

Transcript and Presenter's Notes

Title: CMS Software Architecture


1
CMS Software Architecture
  • An experience in OO C
  • Vincenzo Innocente
  • CERN

2
CMS (offline) Software
Quasi-online Reconstruction
Environmental data
Slow Control
Online Monitoring
store
Request part of event
Store rec-Obj
Request part of event
Event Filter Objectivity Formatter
Request part of event
store
Persistent Object Store Manager Object Database
Management System
Store rec-Obj and calibrations
store
Request part of event
Data Quality Calibrations Group Analysis
Simulation G3 and or G4
User Analysis on demand
3
Requirements (from the CTP, dec.96)
  • Multiple Environments
  • Various software modules must be able to run in a
    variety of environments from level 3 triggering,
    to individual analysis
  • Migration between environments
  • Physics modules should move easily from one
    environment to another (from individual analysis
    to level 3 triggering)
  • Migration to new technologies
  • Should not affect physics software module

4
Requirements (from the CTP)
  • Dispersed code development
  • The software will be developed by
    organizationally and geographically dispersed
    groups of part-time non-professional programmers
  • Flexibility
  • Not all software requirements will be fully known
    in advance
  • Not only performance
  • Also modularity, flexibility, maintainability,
    quality assurance and documentation.

5
Technologies
  • Do not always jump on next year buzzword
  • Do not limit to technologies standardized in
    times when graduate students were not born yet
  • There is no Silver Bullet
  • Any single technical issue can be solved with few
    thousand lines of code by any of us.
  • This is not the point
  • What is needed is a coherent Software
    Architecture for an experiment which will last
    longer than a decade

6
C
  • C is a very advanced language which supports
  • C style coding (for algorithms it is ok)
  • but it is the source of all evils (as memory
    leaks)
  • Object Oriented programming
  • Data Hiding
  • Encapsulation
  • Polymorphism
  • Multiple and Virtual inheritance
  • Generic Programming
  • Templates
  • Parametric Polymorphism
  • All this makes it complex but powerful!

7
Use Cases
  • Simulated Hits Formatting
  • Digitization of Piled-up Events
  • Test-Beam DAQ Analysis
  • L1 Trigger Simulation
  • Track Reconstruction
  • Calorimeter Reconstruction
  • Global Reconstruction
  • Physics Analysis

8
Track Reconstruction
Local measurements belongs to detector
element and are affected by the detector element
state (calibrations, alignments)
Pattern recognition navigates in the detector to
associate local measurements into a track
9
Global Reconstruction
  • Global reconstruction is performed in an absolute
    reference frame
  • 4-vector-like objects are built out of
    trajectories and localized energy deposits
  • A wide range of particle identification, jet,
    vertex, etc algorithms can be applied to produce
    others 4-vector-like objects
  • Access to the original detector data maybe
    required

10
Reconstruction Scenario
  • Reproduce Detector Status at the moment of the
    interaction
  • front-end electronics signals (digis)
  • calibrations
  • alignments
  • Perform local reconstruction as a continuation of
    the front-end data reduction until objects
    detachable from the detectors are obtained
  • Use these objects to perform global
    reconstruction and physics analysis of the Event
  • Store Retrieve results of computing intensive
    processes

11
Reconstruction Sources
12
Components
  • Reconstruction Algorithms
  • Event Objects
  • Physics Analysis modules
  • Other services (detector objects, environmental
    data, parameters, etc)
  • Legacy not-OO data (GEANT3)
  • The instances of these components require to be
    properly orchestrated to produce the results as
    specified by the user

13
CARFCMS Analysis Reconstruction Framework
Application Framework
Physics modules
Reconstruction Algorithms
Event Filter
Data Monitoring
Physics Analysis
Calibration Objects
Event Objects
Visualization Objects
Utility Toolkit
14
Architecture structure
  • An application framework CARF (CMS Analysis
    Reconstruction Framework),
  • customisable for each of the computing
    environments
  • Physics software modules
  • with clearly defined interfaces that can be
    plugged into the framework
  • A service and utility Toolkit
  • that can be used by any of the physics modules
  • Nothing terribly new, but...
  • Traditional architecture can not cope with
  • LHC-collaboration complexity

15
Problems with traditional architectures
  • Traditional Framework schedules a-priori the
    sequence of operations required to bring a given
    task to completion
  • Major management problems are produced by changes
    in the dependencies among the various operations
  • Example 1
  • Reconstruction of track type T1 requires only
    tracker hits
  • Reconstruction of track type T2 use calorimetric
    clusters as seeds
  • If a user switches from T1 to T2 the framework
    should determine that calorimeter reconstruction
    should run first now
  • Example2
  • The global initialization sequence should be
    changed because, for one detector, some condition
    changes often than foreseen

16
Framework Basic Dynamics
  • Avoid monolithic structure
  • Collection of loosely coupled mechanisms which
    implements
  • in abstract the tasks of a HEP reconstruction and
    analysis software.
  • Implicit Invocation Architecture
  • No central ordering of actions, no explicit
    control of data flow only implicit dependencies
  • External dependencies managed through an Event
    Driven Notification to subscribers
  • Internal dependencies through an Action on Demand
    mechanism

17
Event Driven Notification
Observers are instantiated by static factories
residing in shared libraries. These are loaded
on demand during application configuration
Dispatcher
Detector elements observe physics
events Factories observe user requests
Obs1
Obs2
Obs3
Obs4
Observers clients or providers
18
Action on Demand
Compare the results of two different track
reconstruction algorithms
Rec Hits
Detector Element
Rec Hits
Rec Hits
Hits
Event
Rec T1
T1
CaloCl
Rec T2
Analysis
Rec CaloCl
T2
19
HEP Data
  • Environmental data
  • Detector and Accelerator status
  • Calibrations, Alignments
  • Event-Collection Meta-Data
  • (luminosity, selection criteria, )
  • Event Data, User Data

20
Do I need a DBMS? (a self-assessment)
  • Do I encode meta-data (run number, version id) in
    file names?
  • How many files and logbooks I should consult to
    determine the luminosity corresponding to a
    histogram?
  • How easily I can determine if two events have
    been reconstructed with the same version of a
    program and using the same calibrations?
  • How many lines of code I should write and which
    fraction of data I should read to select all
    events with two ?s with p?gt 11.5 GeV and
    ?lt2.7?
  • The same at generator level?
  • If the answers scare you, you need a DBMS!

21
A major challenge for LHC The scale
  • Event output rate 100
    events/sec

  • (109 events/year)
  • Data written to tape 100 M
    Bytes/sec (1PB/yr)
  • Processing capacity gt 10
    TIPS ( 1013 instr./s)
  • Typical networks
    Hundreds of Mbits/second
  • Lifetime of experiment
    2-3 decades
  • Users 1700
    physicists
  • Software developers
    100
  • 100 Petabytes Total for the LHC

22
Can CMS do without a DBMS?
  • An experiment lasting 20 years can not rely just
    on ASCII files and file systems for its
    production bookkeeping, condition database,
    etc.
  • Even today at LEP, the management of all real and
    simulated data-sets (from raw-data to n-tuples)
    is a major enterprise.
  • A DBMS is the modern answer to such a problem
    and, given the choice of OO technology for the
    CMS software, an ODBMS (or a DBMS with an OO
    interface) is the natural solution.

23
A BLOB Model
Event
Event
DataBase Objects
RecEvent
RawEvent
Blob a sequence of bytes. Decoding it is a
user responsibility.
Why should Blobs not be stored in the DBMS?
24
Raw Event
RawData are identified by the corresponding
ReadOut. RawData belonging to different detector
s are clustered into different containers. The
granularity will be adjusted to optimize I/O
performances. An index at RawEvent level is
used to avoid the access to all containers in
search for a given RawData. A range index at
RawData level could be used for fast
random access in complex detectors.
RawEvent
ReadOut
ReadOut
...
RawData
RawData
Index implemented as an ordered vector of pairs
25
A major challenge for LHC The scale
  • Event output rate 100
    events/sec

  • (109 events/year)
  • Data written to tape 100 M
    Bytes/sec (1PB/yr)
  • Processing capacity gt 10
    TIPS ( 1013 instr./s)
  • Typical networks
    Hundreds of Mbits/second
  • Lifetime of experiment
    2-3 decades
  • Users 1700
    physicists
  • Software developers
    100
  • 100 Petabytes Total for the LHC

26
Can CMS do without a DBMS?
  • An experiment lasting 20 years can not rely just
    on ASCII files and file systems for its
    production bookkeeping, condition database,
    etc.
  • Even today at LEP, the management of all real and
    simulated data-sets (from raw-data to n-tuples)
    is a major enterprise.
  • A DBMS is the modern answer to such a problem
    and, given the choice of OO technology for the
    CMS software, an ODBMS (or a DBMS with an OO
    interface) is the natural solution.

27
Physical clustering
28
Can every object have its own persistency?
  • Data size
  • Data complexity
  • Self-Description which granularity?
  • Meta-Data vs Data
  • logical vs physical organization
  • Flexibility vs Efficiency
  • Interface with standard tools (like GUIs)
  • Fast prototyping vs formal/controlled design
  • User knowledge and training

29
Is an ODBMS an overkill for Histograms?
  • Maybe, if histograms are your sole I/O.
  • (I use my sun ultra-5 to read mails through pine
    even if a line-mode terminal would be more than
    adequate)
  • N-tuples are user event-data and, for any
    serious use, require a level of management and
    book-keeping similar to the experiment-wide
    event data.
  • What counts is the efficiency and reliability of
    the analysis
  • The most sophisticated histogramming package is
    useless if you are unable to determine the
    luminosity corresponding to a given histogram!

30
Objectivity Features CMS (really) uses
  • Persistent objects are real C (and Java)
    objects
  • I/O cache (memory) management
  • no explicit read and write
  • no need to delete previous event
  • Smart-pointers (automatic id to pointer
    conversion)
  • Efficient containers by value (VArray)
  • Full direct navigation in the complete federation
  • Flexible object physical-clustering
  • Object Naming
  • as top level entry point (at collection level)
  • as rapid prototyping tool

31
Additional ODBMS (Objy) Advantages
  • Novel access methods
  • A collection of electrons with no reference to
    events
  • Direct reference from event-objects to condition
    database
  • Direct reference to event-data from user-data
  • Flexible run-time clustering of
    heterogeneous-type objects
  • cluster together all tracks or all objects
    belonging to the same event
  • Real DB management of reconstructed objects
  • add or modify in place and on demand parts of an
    event

32
CMS Experience (Pro)
  • Designing and implementing persistent classes not
    harder than doing it for native C classes.
  • Easy and transparent distinction between logical
    associations and physical clustering.
  • Fully transparent I/O with performances
    essentially limited by the disk speed (random
    access).
  • File size overhead (3 for realistic CMS object
    sizes) not larger than for other products such
    as ZEBRA or BOS.
  • Objectivity/DB (compared to other products we are
    used to) is robust, stable and well documented.
    It provides also many additional useful
    features.
  • All our tests show that Objectivity/DB can
    satisfy CMS requirements in terms of performance,
    scalability and flexibility

33
CMS Experience (Cons)
  • Objectivity (and the compilers it supports) does
    not implement the latest C features
    (changing fast convergence toward ANSI standard)
  • There are additional configuration elements to
    care about ddl files, schema-definition
    databases, database catalogs
  • organized software development rapid prototyping
    is not impossible, its integration in a product
    should be done with care
  • Performance degradations often wait you around
    the corner
  • monitoring of running applications is essential,
    off-the-shell solutions often exist (BaBar,
    Compass)
  • Objectivity/DB is a bare product
  • integration into a framework is our
    responsibility
  • Objectivity is slow to apply OUR changes to their
    product
  • Is this a real cons? Do you really want a product
    whose kernel is changed at each user request?

34
CMS Experience (missing features)
  • Scalability 64K files are not enough (Objy is
    working on it)
  • containers are the natural Objectivity units,
    still things for which the OS (and files) is
    preferred
  • bulk data transfer (to mass-storage, among
    sites)
  • access control, space allocation to users, etc.
  • Efficient and secure AMS (ok in 5.2!!!)
  • with MSS and WAN support
  • Activator de-activator (part of ODMG standard)
  • to implement transient parts of persistent
    objects
  • Support for private user classes and user data
    (w.r.t. experiment-wide ones)
  • many custom solution based on multi-federation
  • Active schema
  • User Application Layer
  • like a rapid prototyping environment

35
ODBMS Summary
  • A DBMS is required to manage the large data set
    of CMS
  • (including user data)
  • An ODBMS is the natural choice if OO is used in
    all SW
  • There is no reason NOT to store event-data in the
    DB
  • as a Blob or as a real object system
  • Once an ODBMS will be deployed to manage the
    experiment data, it will be very natural to use
    it to manage any kind of data related to detector
    studies and physics analysis
  • Objectivity/DB is a robust and stable kernel
    ideal to be used as the base to build a custom
    storage framework

36
Conclusions
  • Object Oriented technologies have proven to be
    required to develop flexible software
    architectures
  • C is the natural choice for a large project
  • today JAVA can be a realistic alternative
  • OO and C as been easily adopted by all detector
    developers
  • (see C.Grandi, T.Todorov A.Vitelli CHEP talks)
  • ODBMS is a robust technology ideal for building a
    large coherent object store
  • CMS MC production 2000 will exercise on a real
    scale all this

37
Object Model
38
Reconstructed Objects
Reconstructed Objects produced by a given
algorithm are managed by a Reconstructor.
RecEvent
S-Track Reconstructor
A Reconstructed Object (Track) is split into
several independent persistent objects to allow
their clustering according to their access
requirements (physics analysis, reconstruction,
detailed detector studies, etc.). The top level
object acts as a proxy. Intermediate
reconstructed objects (Hits) are transient and
are cashed by value into the final objects .
Track SecInfo
Track Constituents
S Track
...
S Track
Write a Comment
User Comments (0)
About PowerShow.com