CMS Software Architecture - PowerPoint PPT Presentation

About This Presentation

Title:

CMS Software Architecture

Description:

C style coding (for algorithms it is ok) but it is the source of all evils (as memory leaks) ... C features (changing: fast convergence toward ANSI standard) ... – PowerPoint PPT presentation

Number of Views:139

Avg rating:3.0/5.0

Slides: 38

Provided by: ygap

Category:

more less

Transcript and Presenter's Notes

Title: CMS Software Architecture

1
CMS Software Architecture

An experience in OO C
Vincenzo Innocente
CERN

2
CMS (offline) Software
Quasi-online Reconstruction
Environmental data
Slow Control
Online Monitoring
store
Request part of event
Store rec-Obj
Request part of event
Event Filter Objectivity Formatter
Request part of event
store
Persistent Object Store Manager Object Database
Management System
Store rec-Obj and calibrations
store
Request part of event
Data Quality Calibrations Group Analysis
Simulation G3 and or G4
User Analysis on demand
3
Requirements (from the CTP, dec.96)

Multiple Environments
Various software modules must be able to run in a
variety of environments from level 3 triggering,
to individual analysis
Migration between environments
Physics modules should move easily from one
environment to another (from individual analysis
to level 3 triggering)
Migration to new technologies
Should not affect physics software module

4
Requirements (from the CTP)

Dispersed code development
The software will be developed by
organizationally and geographically dispersed
groups of part-time non-professional programmers
Flexibility
Not all software requirements will be fully known
in advance
Not only performance
Also modularity, flexibility, maintainability,
quality assurance and documentation.

5
Technologies

Do not always jump on next year buzzword
Do not limit to technologies standardized in
times when graduate students were not born yet
There is no Silver Bullet
Any single technical issue can be solved with few
thousand lines of code by any of us.
This is not the point
What is needed is a coherent Software
Architecture for an experiment which will last
longer than a decade

6
C

C is a very advanced language which supports
C style coding (for algorithms it is ok)
but it is the source of all evils (as memory
leaks)
Object Oriented programming
Data Hiding
Encapsulation
Polymorphism
Multiple and Virtual inheritance
Generic Programming
Templates
Parametric Polymorphism
All this makes it complex but powerful!

7
Use Cases

Simulated Hits Formatting
Digitization of Piled-up Events
Test-Beam DAQ Analysis
L1 Trigger Simulation
Track Reconstruction
Calorimeter Reconstruction
Global Reconstruction
Physics Analysis

8
Track Reconstruction
Local measurements belongs to detector
element and are affected by the detector element
state (calibrations, alignments)
Pattern recognition navigates in the detector to
associate local measurements into a track
9
Global Reconstruction

Global reconstruction is performed in an absolute
reference frame
4-vector-like objects are built out of
trajectories and localized energy deposits
A wide range of particle identification, jet,
vertex, etc algorithms can be applied to produce
others 4-vector-like objects
Access to the original detector data maybe
required

10
Reconstruction Scenario

Reproduce Detector Status at the moment of the
interaction
front-end electronics signals (digis)
calibrations
alignments
Perform local reconstruction as a continuation of
the front-end data reduction until objects
detachable from the detectors are obtained
Use these objects to perform global
reconstruction and physics analysis of the Event
Store Retrieve results of computing intensive
processes

11
Reconstruction Sources
12
Components

Reconstruction Algorithms
Event Objects
Physics Analysis modules
Other services (detector objects, environmental
data, parameters, etc)
Legacy not-OO data (GEANT3)
The instances of these components require to be
properly orchestrated to produce the results as
specified by the user

13
CARFCMS Analysis Reconstruction Framework
Application Framework
Physics modules
Reconstruction Algorithms
Event Filter
Data Monitoring
Physics Analysis
Calibration Objects
Event Objects
Visualization Objects
Utility Toolkit
14
Architecture structure

An application framework CARF (CMS Analysis
Reconstruction Framework),
customisable for each of the computing
environments
Physics software modules
with clearly defined interfaces that can be
plugged into the framework
A service and utility Toolkit
that can be used by any of the physics modules
Nothing terribly new, but...
Traditional architecture can not cope with
LHC-collaboration complexity

15
Problems with traditional architectures

Traditional Framework schedules a-priori the
sequence of operations required to bring a given
task to completion
Major management problems are produced by changes
in the dependencies among the various operations
Example 1
Reconstruction of track type T1 requires only
tracker hits
Reconstruction of track type T2 use calorimetric
clusters as seeds
If a user switches from T1 to T2 the framework
should determine that calorimeter reconstruction
should run first now
Example2
The global initialization sequence should be
changed because, for one detector, some condition
changes often than foreseen

16
Framework Basic Dynamics

Avoid monolithic structure
Collection of loosely coupled mechanisms which
implements
in abstract the tasks of a HEP reconstruction and
analysis software.
Implicit Invocation Architecture
No central ordering of actions, no explicit
control of data flow only implicit dependencies
External dependencies managed through an Event
Driven Notification to subscribers
Internal dependencies through an Action on Demand
mechanism

17
Event Driven Notification
Observers are instantiated by static factories
residing in shared libraries. These are loaded
on demand during application configuration
Dispatcher
Detector elements observe physics
events Factories observe user requests
Obs1
Obs2
Obs3
Obs4
Observers clients or providers
18
Action on Demand
Compare the results of two different track
reconstruction algorithms
Rec Hits
Detector Element
Rec Hits
Rec Hits
Hits
Event
Rec T1
T1
CaloCl
Rec T2
Analysis
Rec CaloCl
T2
19
HEP Data

Environmental data
Detector and Accelerator status
Calibrations, Alignments
Event-Collection Meta-Data
(luminosity, selection criteria, )
Event Data, User Data

20
Do I need a DBMS? (a self-assessment)

Do I encode meta-data (run number, version id) in
file names?
How many files and logbooks I should consult to
determine the luminosity corresponding to a
histogram?
How easily I can determine if two events have
been reconstructed with the same version of a
program and using the same calibrations?
How many lines of code I should write and which
fraction of data I should read to select all
events with two ?s with p?gt 11.5 GeV and
?lt2.7?
The same at generator level?
If the answers scare you, you need a DBMS!

21
A major challenge for LHC The scale

Event output rate 100
events/sec
(109 events/year)
Data written to tape 100 M
Bytes/sec (1PB/yr)
Processing capacity gt 10
TIPS ( 1013 instr./s)
Typical networks
Hundreds of Mbits/second
Lifetime of experiment
2-3 decades
Users 1700
physicists
Software developers
100
100 Petabytes Total for the LHC

22
Can CMS do without a DBMS?

An experiment lasting 20 years can not rely just
on ASCII files and file systems for its
production bookkeeping, condition database,
etc.
Even today at LEP, the management of all real and
simulated data-sets (from raw-data to n-tuples)
is a major enterprise.
A DBMS is the modern answer to such a problem
and, given the choice of OO technology for the
CMS software, an ODBMS (or a DBMS with an OO
interface) is the natural solution.

23
A BLOB Model
Event
Event
DataBase Objects
RecEvent
RawEvent
Blob a sequence of bytes. Decoding it is a
user responsibility.
Why should Blobs not be stored in the DBMS?
24
Raw Event
RawData are identified by the corresponding
ReadOut. RawData belonging to different detector
s are clustered into different containers. The
granularity will be adjusted to optimize I/O
performances. An index at RawEvent level is
used to avoid the access to all containers in
search for a given RawData. A range index at
RawData level could be used for fast
random access in complex detectors.
RawEvent
ReadOut
ReadOut
...
RawData
RawData
Index implemented as an ordered vector of pairs
25
A major challenge for LHC The scale

Event output rate 100
events/sec
(109 events/year)
Data written to tape 100 M
Bytes/sec (1PB/yr)
Processing capacity gt 10
TIPS ( 1013 instr./s)
Typical networks
Hundreds of Mbits/second
Lifetime of experiment
2-3 decades
Users 1700
physicists
Software developers
100
100 Petabytes Total for the LHC

26
Can CMS do without a DBMS?

An experiment lasting 20 years can not rely just
on ASCII files and file systems for its
production bookkeeping, condition database,
etc.
Even today at LEP, the management of all real and
simulated data-sets (from raw-data to n-tuples)
is a major enterprise.
A DBMS is the modern answer to such a problem
and, given the choice of OO technology for the
CMS software, an ODBMS (or a DBMS with an OO
interface) is the natural solution.

27
Physical clustering
28
Can every object have its own persistency?

Data size
Data complexity
Self-Description which granularity?
Meta-Data vs Data
logical vs physical organization
Flexibility vs Efficiency
Interface with standard tools (like GUIs)
Fast prototyping vs formal/controlled design
User knowledge and training

29
Is an ODBMS an overkill for Histograms?

Maybe, if histograms are your sole I/O.
(I use my sun ultra-5 to read mails through pine
even if a line-mode terminal would be more than
adequate)
N-tuples are user event-data and, for any
serious use, require a level of management and
book-keeping similar to the experiment-wide
event data.
What counts is the efficiency and reliability of
the analysis
The most sophisticated histogramming package is
useless if you are unable to determine the
luminosity corresponding to a given histogram!

30
Objectivity Features CMS (really) uses

Persistent objects are real C (and Java)
objects
I/O cache (memory) management
no explicit read and write
no need to delete previous event
Smart-pointers (automatic id to pointer
conversion)
Efficient containers by value (VArray)
Full direct navigation in the complete federation
Flexible object physical-clustering
Object Naming
as top level entry point (at collection level)
as rapid prototyping tool

31
Additional ODBMS (Objy) Advantages

Novel access methods
A collection of electrons with no reference to
events
Direct reference from event-objects to condition
database
Direct reference to event-data from user-data
Flexible run-time clustering of
heterogeneous-type objects
cluster together all tracks or all objects
belonging to the same event
Real DB management of reconstructed objects
add or modify in place and on demand parts of an
event

32
CMS Experience (Pro)

Designing and implementing persistent classes not
harder than doing it for native C classes.
Easy and transparent distinction between logical
associations and physical clustering.
Fully transparent I/O with performances
essentially limited by the disk speed (random
access).
File size overhead (3 for realistic CMS object
sizes) not larger than for other products such
as ZEBRA or BOS.
Objectivity/DB (compared to other products we are
used to) is robust, stable and well documented.
It provides also many additional useful
features.
All our tests show that Objectivity/DB can
satisfy CMS requirements in terms of performance,
scalability and flexibility

33
CMS Experience (Cons)

Objectivity (and the compilers it supports) does
not implement the latest C features
(changing fast convergence toward ANSI standard)
There are additional configuration elements to
care about ddl files, schema-definition
databases, database catalogs
organized software development rapid prototyping
is not impossible, its integration in a product
should be done with care
Performance degradations often wait you around
the corner
monitoring of running applications is essential,
off-the-shell solutions often exist (BaBar,
Compass)
Objectivity/DB is a bare product
integration into a framework is our
responsibility
Objectivity is slow to apply OUR changes to their
product
Is this a real cons? Do you really want a product
whose kernel is changed at each user request?

34
CMS Experience (missing features)

Scalability 64K files are not enough (Objy is
working on it)
containers are the natural Objectivity units,
still things for which the OS (and files) is
preferred
bulk data transfer (to mass-storage, among
sites)
access control, space allocation to users, etc.
Efficient and secure AMS (ok in 5.2!!!)
with MSS and WAN support
Activator de-activator (part of ODMG standard)
to implement transient parts of persistent
objects
Support for private user classes and user data
(w.r.t. experiment-wide ones)
many custom solution based on multi-federation
Active schema
User Application Layer
like a rapid prototyping environment

35
ODBMS Summary

A DBMS is required to manage the large data set
of CMS
(including user data)
An ODBMS is the natural choice if OO is used in
all SW
There is no reason NOT to store event-data in the
DB
as a Blob or as a real object system
Once an ODBMS will be deployed to manage the
experiment data, it will be very natural to use
it to manage any kind of data related to detector
studies and physics analysis
Objectivity/DB is a robust and stable kernel
ideal to be used as the base to build a custom
storage framework

36
Conclusions

Object Oriented technologies have proven to be
required to develop flexible software
architectures
C is the natural choice for a large project
today JAVA can be a realistic alternative
OO and C as been easily adopted by all detector
developers
(see C.Grandi, T.Todorov A.Vitelli CHEP talks)
ODBMS is a robust technology ideal for building a
large coherent object store
CMS MC production 2000 will exercise on a real
scale all this

37
Object Model
38
Reconstructed Objects
Reconstructed Objects produced by a given
algorithm are managed by a Reconstructor.
RecEvent
S-Track Reconstructor
A Reconstructed Object (Track) is split into
several independent persistent objects to allow
their clustering according to their access
requirements (physics analysis, reconstruction,
detailed detector studies, etc.). The top level
object acts as a proxy. Intermediate
reconstructed objects (Hits) are transient and
are cashed by value into the final objects .
Track SecInfo
Track Constituents
S Track
...
S Track

Write a Comment

User Comments (0)