CHEP 2000 - PowerPoint PPT Presentation

About This Presentation
Title:

CHEP 2000

Description:

CP violation study. other interesting fields: kaon form factors. kaon rare decays ... passed directly to the RDBMS. select run_nr from run_logger where status = 'OK' ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 37
Provided by: igorsf
Category:
Tags: chep | out | passed | violated

less

Transcript and Presenter's Notes

Title: CHEP 2000


1
CHEP 2000
Data Handling in KLOE I.Sfiligoi INFN LNF,
Frascati, Italy
2
The KLOE experiment
KS?p p - KL?p p - (CP not)
  • at DAFNE ?-factory
  • main goal
  • CP violation study
  • other interesting fields
  • kaon form factors
  • kaon rare decays
  • radiative f decays

KS?p p - KL?3p 0?6g
3
KLOE Requirements
  • Data acquisition (at full DAFNE luminosity)
  • 1011 events per year acquired
  • 50 MB/s sustained throughput
  • Computing power
  • ALL the events need to be reconstructed
  • Storage requirements
  • one petabyte of raw and reconstructed events
  • hundreds of megabytes of related
    data(configurations, slow control data,
    calibration parameters, etc.)

4
KLOE computing environment
  • Based on a set of medium-sized servers
  • Connected using commercial switched networks
    (Fast Ethernet and Gigabit Ethernet)
  • Heterogeneous environment, several platforms
  • IBM AIX on PowerPC
  • Sun Solaris on Sparc
  • Compaq Tru64 Unix on Alpha
  • HP-UX on PA-RISC

5
KLOE storage pool
  • Different policies for different types of data
  • raw and reconstructed events on tape libraries,
    with big disk pools for data caching
  • related data managed by a disk based database
    system
  • analysis output on disk pools

6
Disk pools
  • Four categories of disk pools are present
  • each data acquisition node in the farm has its
    own small disk pool
  • computing nodes write their output to
    centralized, NFS mounted disk pools
  • separate disk pools are used as a cache for the
    events on tape
  • analysis output is written to its own, central
    AFS mounted disk pool

7
Tape library
  • Several automated tape libraries supported(at
    the moment the 5500 slot tape library is
    partitioned between two tape servers)
  • Accessed using commercial software
  • IBM ADSM with the current tape library

8
KLOE software
  • Three distinct categories
  • DAQ (or online)
  • reconstruction and analysis (or offline)
  • Monte Carlo

ANSI C
FORTRAN inside A_C
FORTRAN
The interface to the Data Handling System must be
compatible with all of them
9
KLOE Data Handling System
  • Composed of four elements
  • Database System
  • Archiving System
  • Spy System
  • KLOE Integrated Dataflow (KID)

10
KLOE Data Handling System
  • A mix of commercial and custom software
  • the dependency on commercial software is
    minimized by the layers of custom software
  • commercial software carries on all the vital
    functions
  • custom software mostly extends and coordinates
    the functionality of the commercial software

11
KLOE Data Handling System
  • Based on a set of multi-threaded non-privileged
    daemons and related libraries
  • Distributed across several nodes
  • Communication by means of TCP/IP sockets on high
    ports
  • bypasses TCP/IP filtering
  • flexible, programming language and operating
    system independent
  • no configuration needed on the client side

12
KLOE Data Handling System
  • Composed of four elements
  • Database System
  • Archiving System
  • Spy System
  • KLOE Integrated Dataflow (KID)

13
Database System
  • Two distinct database systems are used
  • offline database system

based on HepDB data stored as ZEBRA banks
  • online database system

based on a Relational DBMS data are
structured in fields
extended for distributed environments
14
Online Database System
  • data stored in a Relational DBMS
  • IBM DB2 Universal Database at the moment
  • communication between the clients (user
    applications) and the RDBMS through a database
    daemon

15
Database Daemon
  • The database daemon is the only link between the
    applications and the RDBMS
  • if the RDBMS is changed in the future, only the
    database daemon will need to be changed
  • Different kinds of commands are managed by the
    daemon
  • general SQL commands
  • KLOE specific commands

16
Database Daemon
  • Different kinds of commands are managed by the
    daemon
  • general SQL commands
  • KLOE specific commands

17
Database Daemon
  • The use of KLOE specific commands has several
    advantages
  • additional checks and restrictions are possible
  • data consistency management is centralized
  • fast central caches can be implemented
  • for example, the DAQ configuration cache reduces
    the typical access time from 4 to 0.1 s

18
A light version
  • The RDBMS is used to ensure flexibility,
    reliability and performance
  • Demanding in terms of computing resources and
    management effort
  • stand-alone environments oftencannot afford it
  • A RDBMS-independent version of the database
    daemon is under development

19
A light version
  • A RDBMS-independent version of the database
    daemon is under development
  • limited to KLOE specific and the most frequently
    used SQL commands
  • based on use of flat files containing a small
    portion of the data
  • not suitable for production environment,but
    enough for home use

20
KLOE Data Handling System
  • Composed of four elements
  • Database System
  • Archiving System
  • Spy System
  • KLOE Integrated Dataflow (KID)

21
KLOE Archiving System
  • Expected event data managed by KLOE
  • 1 PB
  • Tape libraries needed
  • data storage and retrieval non trivial
  • random access to data very inefficient
  • Disk-based intermediate buffers used

22
KLOE Archiving System
  • Two types of intermediate buffers
  • DAQ, offline and Monte Carlo output are
    structured as YBOS files and written on their
    disk output areas
  • event data needed by offline as input are read
    from the archiving system disk-cache

23
KLOE Archiving System
  • Data needs to be migrated
  • from output areas to the tape library
  • as soon as possible(taking into account also
    efficiency concerns)
  • from the tape library to the disk cache
  • when an application needs it(or even better, a
    bit earlier)
  • Migration is totally automated and transparent to
    the applications

24
KLOE Archiving System
  • The Archiving System is made of four components
  • storage managers
  • disk space managers
  • output areas
  • cache areas
  • archival director
  • cache manager
  • Communication by means of TCP/IP sockets
  • Coordinated by the online database

archADSM spacekeeper filekeeper archiver retrieve
25
Storage Managers
  • One for each logical tape library
  • Allows
  • queries about tape library content
  • file archival
  • file retrieval
  • Transaction oriented(if the underlying tape
    library software supports it)

26
Storage Managers
  • The only link between the tape library and the
    rest of the system
  • interface independent of the underlying archiving
    software
  • IBM ADSM is used with the current tape library
  • if other products is used in the future, only a
    specific storage manager will need to be developed

27
Disk Space Managers
  • One for each disk pool
  • Create and delete files
  • unused files get deleted to make space for new
    ones

28
Archival Director
  • Fully automated
  • Works in polling mode
  • from time to time looks for files ready to be
    archived
  • starts archiving only when enough data is
    available
  • Files are ordered and grouped to minimize the
    expected retrieve time
  • Several groups of files can be archived in
    parallel

29
Cache Manager
  • User driven
  • when a file is needed, the application asks the
    cache manager where it is located
  • a retrieve is performed by the manager if needed
  • Several requests can be issued at the same time
  • the manager reorders them internally to minimize
    the tape mounts
  • Communication by means of TCP/IP sockets

30
KLOE Archival System
archiver
Tape Library
Tape Library
...
n
archADSM
archADSM
. . .
m
spacekeeper
spacekeeper
Disk Pool
Disk Pool
DB
. . .
filekeeper
k
filekeeper
Disk Pool
Disk Pool
retrieve
NFS mount
local file system
TCP/IP socket
TCP/IP socket
31
KLOE Data Handling System
  • Composed of four elements
  • Database System
  • Archiving System
  • Spy System
  • KLOE Integrated Dataflow (KID)

32
Spy System
  • KLOE data acquisition software allows the event
    data to be read-out before they get written to
    disk
  • The mechanism that reads those data is called Spy
  • Based on use of shared memory buffers
  • DAQ processes are piped using this mechanism
  • the spy system reads data from the buffers
    without interfering with the DAQ

33
KLOE Data Handling System
  • Composed of four elements
  • Database System
  • Archiving System
  • Spy System
  • KLOE Integrated Dataflow (KID)

34
KLOE Integrated Dataflow (KID)
  • Integration library
  • database accesses and retrieve operations hidden
  • Offers a single point of access to all the
    services
  • URI-based selection

35
Management effort
  • The entire system is managed by only a few
    people
  • 3 people (2 full time) are engaged in KLOE
    computing system management (including storage)
  • 1 person is engaged in the development and
    management of the online database and the
    archiving system
  • 2 people spend few percent of their time for the
    maintenance of the offline database

36
CHEP 2000
Data Handling in KLOE I.Sfiligoi INFN LNF,
Frascati, Italy
Write a Comment
User Comments (0)
About PowerShow.com