Navigation Requirements - PowerPoint PPT Presentation

About This Presentation
Title:

Navigation Requirements

Description:

Ability to locate a persistent object even if relocated or multiply located ... Ensure full event navigability at every stage of analysis. Typical query ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 12
Provided by: ygap
Category:

less

Transcript and Presenter's Notes

Title: Navigation Requirements


1
Navigation Requirements
CMS View of OIDs and Refs
  • Vincenzo Innocente
  • Lassi Tuura
  • CMS

2
Driving Requirements
  • Ability to locate a persistent object even if
    relocated or multiply located
  • Scattering write in one file, relocate to many
  • Gathering collect in one file interesting
    objects
  • Solution should possibly address all kind of data
  • Event Data
  • Calibrations, Conditions, Geometry
  • MetaData themselves

3
Scenarios
  • Production (simplify it)
  • Write all data from a single process in a single
    file
  • Split later the data according to a clustering
    strategy
  • Access (transfer as little as possible)
  • Select and reprocess events from a large sample
  • Get a local persistent copy of just what needed
  • Ensure full event navigability at every stage of
    analysis
  • Typical query
  • Give me the closest (actually in the fastest way)
    collection of tracks compatible with this
    configuration belonging to the events satisfying
    these criteria

4
Event Model
  • (Event) Data Product (100-1000 per event)
  • chunk of (event-) data managed as a single unit
  • Collection of Digis belonging to a part of a
    detector
  • Collection of RecObj (track, calo-clusters, jets)
    produced by a given algorithm
  • Currently identified by (its objy OID and
    federaton)
  • Event
  • ascii string
  • metadata describing how it was produced
  • Its transient type
  • Physical location
  • Inter Data Product dependency tracked
  • Consistency ensured

5
Production 2002, Complexity
Number of Regional Centers 11
Number of Computing Centers 21
Number of CPUs 1000
Largest Local Center 176 CPUs
Number of Production Passes for each Dataset(including analysis group processing done by production) 6-8
Number of Files 11,000
Data Size (Not including fz files from Simulation) 17TB
File Transfer by GDMP and by perl Scripts over scp/bbcp 7TB toward T1 4TB toward T2
6
HEP Data
  • Event-Collection Meta-Data
  • Environmental data
  • Detector and Accelerator status
  • Calibrations, Alignments
  • (luminosity, selection criteria, )
  • Event Data, User Data

Navigation is essential for an effective physics
analysis Complexity requires coherent access
mechanisms
7
Re-Reconstruction Clones
Production
User
Run and Config
Run and Config.
Id-2
Tracker
Local Replica
Ecal
Ecal
Hcal
Hcal
8
CMS Reconstructed Objects
Reconstructed Objects produced by a given
algorithm are managed by a Reconstructor.
RecEvent
A Reconstructed Object (Track) is split into
several independent persistent objects to allow
their clustering according to their access
patterns (physics analysis, reconstruction,
detailed detector studies, etc.). The top level
object acts as a proxy. Intermediate
reconstructed objects (RHits) are cached by value
into the final objects . Possible to Recalibrate
aod (and generate a new version without modify
or copy the esd and rec)
calibration dependent
CPU intensive
S-Track Reconstructor
esd
Track SecInfo
rec
S Track
..
Track Constituents
aod
Vector of RHits
S Track
9
Raw Event
RawData are identified by the corresponding
ReadOut. RawData belonging to different detector
s are clustered into different containers. The
granularity will be adjusted to optimize I/O
performances. An index at RawEvent level is
used to avoid the access to all containers in
search for a given RawData. A range index at
RawData level could be used for fast
random access in complex detectors.
RawEvent
ReadOut
ReadOut
...
RawData
RawData
Index implemented as an ordered vector of pairs
10
A Oid proposal
  • We propose to use an object identifier composed
    of three fields
  • Navigation-Scope (Sea??) identifier
  • Always implicit explicit use limited to cross
    reference among disjoint stores (for instance
    event toward calibration)
  • Nothing prevents to use a context for a dataset
    or even an event
  • Concrete implementation of the Sea is a file
    catalog
  • Data Product Id (dp-id)
  • Unique and immutable identifier (in a given sea)
    of a data product
  • To simplify lookup in case of scattering-gathering
    we suggest it includes a field identifying the
    logical-file (lf-id)
  • In writing one can easily stream all
    logical-files into the same physical file
  • For a given sea a physical file can map to
    multiple logical-files
  • In small seas (lakes) (such as a local replica of
    selected events) even a m-to-n mapping could be
    affordable
  • Object index
  • Used to identify single objects in the data
    product
  • If the Data product is WORM indexing will work
    whatever data structure is used below a Data
    Product

11
Data Product id resolution
  • A possible implementation
  • Sea is responsible for mapping a lf-id to a given
    strategy to resolve a data-product-id
  • The same dp-id can be resolved differently
    depending in which sea we are navigating
  • Physical resolution strategy
  • Lf-id identifies a file, the rest of the dp-ida
    physical location (objyl ike)
  • Local mapping
  • Lf-id identifies a file (not necessarily the
    original one) the rest of the dp-id is used to
    look-up in a table contained in the file itself
  • Global mapping
  • Lf-id identifies a table, the rest of the dp-id
    is used to look-up in the table for the physical
    location of the data-product
  • .
Write a Comment
User Comments (0)
About PowerShow.com