Software Architecture and Data Model - PowerPoint PPT Presentation

About This Presentation
Title:

Software Architecture and Data Model

Description:

Various software modules must be able to run in a variety of environments from ... Digitization of Piled-up Events. Test-Beam DAQ & Analysis. L1 Trigger Simulation ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 33
Provided by: ygap
Category:

less

Transcript and Presenter's Notes

Title: Software Architecture and Data Model


1
Software Architecture and Data Model
Software framework, services and persistency in
high level trigger, reconstruction and analysis
  • Vincenzo Innocente
  • CERN/EP/CMC

2
CMS (offline) Software
Quasi-online Reconstruction
Environmental data
Slow Control
Online Monitoring
store
Request part of event
Store rec-Obj
Request part of event
Event Filter Objectivity Formatter
Request part of event
store
Persistent Object Store Manager Object Database
Management System
Store rec-Obj and calibrations
store
Request part of event
Data Quality Calibrations Group Analysis
Simulation G3 and or G4
User Analysis on demand
3
Requirements (from the CTP)
  • Multiple Environments
  • Various software modules must be able to run in a
    variety of environments from level 3 triggering,
    to individual analysis
  • Migration between environments
  • Physics modules should move easily from one
    environment to another (from individual analysis
    to level 3 triggering)
  • Migration to new technologies
  • Should not affect physics software module

4
Requirements (from the CTP)
  • Dispersed code development
  • The software will be developed by
    organizationally and geographically dispersed
    groups of part-time non-professional programmers
  • Flexibility
  • Not all software requirements will be fully known
    in advance
  • Not only performance
  • Also modularity, flexibility, maintainability,
    quality assurance and documentation.

5
CMS Data Model RD
  • 95-96 RD41 --- OO Detector Reconstruction
  • Detector model, Local hit cache, Pattern
    recognition
  • 95-97 RD45 --- OO Event Model (persistent)
  • Event structure, Raw data, Reconstructed objects
  • 95-97 RD45 --- Calibration Database
  • Time dependent data, Versioning, Experience with
    Objectivity
  • 12/96 CTP decision to use OO and ODBMS
  • 97- present GIOD
  • Many clients access over LAN and WAN
  • 97-98 Test-Beam (H2, X5)
  • OO Daq, Online filtering, ODB population
  • 99-00 ORCA production
  • MetaData, concurrent jobs, multi-threading, RT
    dynamic loading
  • 2001 Milestone on ODBMS vendor choice

6
Use Cases(current functionality in ORCA)
  • Simulated Hits Formatting
  • Digitization of Piled-up Events
  • Test-Beam DAQ Analysis
  • L1 Trigger Simulation
  • Track Reconstruction
  • Calorimeter Reconstruction
  • Global Reconstruction
  • Physics Analysis

7
Reconstruction Scenario
  • Reproduce Detector Status at the moment of the
    interaction
  • front-end electronics signals (digis)
  • calibrations
  • alignments
  • Perform local reconstruction as a continuation of
    the front-end data reduction until objects
    detachable from the detectors are obtained
  • Use these objects to perform global
    reconstruction and physics analysis of the Event
  • Store Retrieve results of computing intensive
    processes

8
Reconstruction Sources
9
Components
  • Reconstruction Algorithms
  • Event Objects
  • Physics Analysis modules
  • Other services (detector objects, environmental
    data, parameters, etc)
  • Legacy not-OO data (GEANT3)
  • The instances of these components require to be
    properly orchestrated to produce the results as
    specified by the user

10
CARFCMS Analysis Reconstruction Framework
Application Framework
Physics modules
Reconstruction Algorithms
Event Filter
Data Monitoring
Physics Analysis
Calibration Objects
Event Objects
MetaData Objects
Utility Toolkit
11
Architecture structure
  • An application framework CARF (CMS Analysis
    Reconstruction Framework),
  • customisable for each of the computing
    environments
  • Physics software modules
  • with clearly defined interfaces that can be
    plugged into the framework
  • Persistency Service
  • integrated into the framework to provide a
    transparent interface
  • to physics modules
  • A service and utility Toolkit
  • that can be used by any of the physics modules
  • The framework (and the utility Toolkit)
    effectively shields physics modules from the
    underlying technology without penalizing
    performances

12
Persistency Services
  • Persistent Object Management is fully integrated
    in
  • CARF using an ODBMS
  • CARF manages
  • multi-threaded transactions
  • creation of databases and containers
  • meta data and event collections
  • physical clustering of event objects
  • persistent event structure and its relations with
    transient objects
  • Use of Database is transparent to detector
    developers
  • users access persistent objects through C
    pointers

13
Software Architecture and Data ModelData Model
  • Vincenzo Innocente
  • CERN/EP/CMC

14
HEP Data
  • Environmental data
  • Detector and Accelerator status
  • Calibrations, Alignments
  • Event-Collection Meta-Data
  • (luminosity, selection criteria, )
  • Event Data, User Data

Navigation is essential for an effective physics
analysis Complexity requires coherent access
mechanisms
15
Do I need a DBMS? (a self-assessment)
  • Do I encode meta-data (run number, version id) in
    file names?
  • How many files and logbooks I should consult to
    determine the luminosity corresponding to a
    histogram?
  • How easily I can determine if two events have
    been reconstructed with the same version of a
    program and using the same calibrations?
  • How many lines of code I should write and which
    fraction of data I should read to select all
    events with two ?s with p?gt 11.5 GeV and
    ?lt2.7?
  • The same at generator level?
  • If the answers scare you, you need a DBMS!

16
Can CMS do without a DBMS?
  • An experiment lasting 20 years can not rely just
    on ASCII files and file systems for its
    production bookkeeping, condition database,
    etc.
  • Even today at LEP, the management of all real and
    simulated data-sets (from raw-data to n-tuples)
    is a major enterprise
  • Multiple models used (DST, N-tuple, HEPDB,
    FATMAN, ASCII)
  • A DBMS is the modern answer to such a problem
    and, given the choice of OO technology for the
    CMS software, an ODBMS (or a DBMS with an OO
    interface) is the natural solution for a coherent
    and scalable approach.

17
A BLOB Model
Event
Event
DataBase Objects
RecEvent
RawEvent
Blob a sequence of bytes. Decoding it is a
user responsibility.
Why should Blobs not be stored in the DBMS?
18
Raw Event
RawData are identified by the corresponding
ReadOut. RawData belonging to different detector
s are clustered into different containers. The
granularity will be adjusted to optimize I/O
performances. An index at RawEvent level is
used to avoid the access to all containers in
search for a given RawData. A range index at
RawData level could be used for fast
random access in complex detectors.
RawEvent
ReadOut
ReadOut
...
RawData
RawData
Index implemented as an ordered vector of pairs
19
CMS Reconstructed Objects
Reconstructed Objects produced by a given
algorithm are managed by a Reconstructor.
RecEvent
A Reconstructed Object (Track) is split into
several independent persistent objects to allow
their clustering according to their access
patterns (physics analysis, reconstruction,
detailed detector studies, etc.). The top level
object acts as a proxy. Intermediate
reconstructed objects (RHits) are cached by value
into the final objects .
S-Track Reconstructor
esd
Track SecInfo
rec
S Track
..
Track Constituents
aod
Vector of RHits
S Track
20
Physical clustering
21
User Data
  • Histograms and N-tuples are user event-data
    and, for any serious use, require a level of
    management and book-keeping similar to the
    experiment-wide event data.
  • The same tools can be used with the advantage of
    keeping the interface and the user environment
    consistent.
  • What counts is the efficiency and reliability of
    the analysis
  • The most sophisticated histogramming package is
    useless if you are unable to determine the
    luminosity corresponding to a given histogram!

22
Objectivity
  • CMS adopted the object paradigm in the CTP
  • At the same time, in close collaboration with
    RD45, an evaluation of various object storage
    solutions was undertaken and Objectivity/DB was
    chosen as baseline product for further
    evaluation, tests and prototypes in particular
    for CMS data related milestones.
  • Objectivity/DB provides
  • scalable architecture in the PB range
  • full multi-platform support
  • data distribution and MSS interface through a
    customizable slim data server (AMS)
  • very efficient C binding close to ODMG standard
    with minimal proprietary parsing

23
Objectivity Features CMS (really) uses
  • Persistent objects are real C (and Java)
    objects
  • coherent access to any kind of object
  • I/O cache (memory) management
  • no explicit read and write
  • no need to delete previous event
  • Smart-pointers (automatic id to pointer
    conversion)
  • Efficient containers by value (VArray)
  • Full direct navigation in the complete federation
  • from MetaData to Event-Data
  • from Event-Data back to Meta-Data
  • Flexible object physical-clustering
  • Object Naming
  • as top level entry point (at collection level)
  • as rapid prototyping tool

24
More ODBMS (Objy) Advantages
  • Novel access methods
  • A collection of electrons with no reference to
    events
  • Direct reference from event-objects to condition
    database
  • Direct reference to event-data from user-data
  • Flexible run-time clustering of
    heterogeneous-type objects
  • cluster together all tracks or all objects
    belonging to the same event
  • Real DB management of reconstructed objects
  • add or modify in place and on demand parts of an
    event

25
CMS Experience
  • Designing and implementing persistent classes not
    harder than doing it for native C classes.
  • Easy and transparent distinction between logical
    associations and physical clustering.
  • Fully transparent I/O in a distributed
    environment, with performances essentially
    limited by disk and network speed (random
    access).
  • File size overhead (5 for realistic CMS object
    sizes) not larger than for other products such
    as ZEBRA, BOS etc.
  • Objectivity/DB (compared to other products we are
    used to) is robust, stable and well documented.
    It provides also many additional useful
    features.
  • All our tests show that Objectivity/DB can
    satisfy CMS requirements in terms of performance,
    scalability and flexibility

26
CMS Experience
  • There are additional configuration elements to
    care about ddl files, schema-definition
    databases, database catalogs
  • organized software development rapid prototyping
    is still possible, its integration in a product
    should be done with care
  • Now fully integrated in CMS cvs and SCRAM
    environments
  • System requires tuning to avoid performance
    degradations
  • monitoring of running applications is essential,
    off-the-shelf solutions often exist (BaBar,
    Compass)
  • CMS HLT production is now at the leading edge of
    monitoring and tuning
  • Objectivity/DB is a bare product. It does not
    impose a framework
  • integration into a framework (CARF) is our
    responsibility
  • Objectivity is slow to apply OUR changes to their
    product
  • Is this a real problem? Do we really want a
    product whose kernel is changed at each user
    request?

27
CMS Experience (missing features 99)
  • Scalability 64K files are not enough (Scheduled
    for Dec 2000)
  • containers are the natural Objectivity units,
    still things for which the OS (and files) is
    preferred
  • bulk data transfer (to mass-storage, among
    sites)
  • access control, space allocation to users, etc.
  • Efficient and secure data-server (AMS ok in
    5.2!!!)
  • with MSS and WAN support
  • Support for private user classes and user data
    (w.r.t. experiment-wide ones)
  • many custom solution based on multi-federation
  • Active schema
  • User Application Layer
  • like a rapid prototyping environment

28
Objy-HEP Building a Partnership
  • Objectivity recognize that HEP requirements
    anticipate future requirements of other clients
  • the next versions will include solutions to
    almost all our improvement requests
  • The New AMS has been essentially developed at
    SLAC
  • CERN has built version 5.2.1 for Linux RH6.1
  • CERN will help in building a full port to Solaris
    CC 5
  • CERN will prototype a new lockserver monitor
  • It is essential to continue to develop this
    partnership and
  • increase the trust of both partners in each
    other.

29
Alternatives ODBMS
  • Versant is a viable commercial alternative to
    Objectivity
  • do we have time to build an effective partnership
    (eg. MSS interface)?
  • Espresso (by IT/DB)
  • we need to be able to produce a fully fledged
    ODBMS in a couple of years once the
    proof-of-concept prototype is ready
  • CMS will test Espresso in the context of CARF
    this summer
  • Migrate CARF from Objectivity to another ODBMS
  • We expect that it would take about one year
  • Such a transition will not affect the basic
    principles of CMS software architecture and Data
    Model
  • Will involve only the core CARF development team.
  • Will not disrupt production and physics analysis

30
Alternatives ORDBMS
  • ORDBMS (Relational DB with OO interface) are
    appearing on the market
  • First products look targeted to those who have
    already a relational system and wish to make a
    transition to OO
  • More realistic Object Oriented products could
    appear in the near future
  • Evaluation of their usage in HEP will start soon.
  • No experiment is using (or planning to use) them
  • IT/DB is in contact with Oracle and is planning
    to evaluate their OO product.
  • Still early to assess impact of ORDBMS on CMS
    Data Model and on migration effort

31
Fallback Solution (less functionality) Hybrid
Models
  • (R)DBMS for MetaData, Calibration, etc
  • Object-Stream files for event data
  • Ad-hoc networked dataserver and MSS interface
  • Less flexible
  • Rigid split between DBMS and event data
  • One way navigation from DBMS to event data
  • More complex
  • Two different I/O systems
  • More effort to learn and maintain
  • This approach will be used by several experiment
    at BNL and FermiLab
  • (RDBMS not directly accessible from user
    applications)
  • CMS and IT/DB are following closely these
    experiences.
  • We believe that this solution could seriously
    compromise our ability to perform our physics
    program competitively

32
ODBMS Summary
  • A DBMS is required to manage the large data set
    of CMS
  • (including user data)
  • An ODBMS provides a coherent and scalable
    solution for managing data in an OO software
    environment
  • Once an ODBMS will be deployed to manage the
    experiment data, it will be very natural to use
    it to manage any kind of data related to detector
    studies and physics analysis
  • Objectivity/DB is a robust and stable kernel
    ideal to be used as the base to build a custom
    storage framework
  • Objectivity starts to respond to our peculiar
    requirements
Write a Comment
User Comments (0)
About PowerShow.com