Meta Data (reloaded) - PowerPoint PPT Presentation

About This Presentation
Title:

Meta Data (reloaded)

Description:

Attributes given to an objects in a given context (including self) ... TheRun object is the entry point for provenience and configuration information ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 15
Provided by: ygap
Category:

less

Transcript and Presenter's Notes

Title: Meta Data (reloaded)


1
Meta Data (reloaded)
An Introduction to non-Event Data in CARF
  • Vincenzo Innocente
  • CERN/EP/CMC

2
Meta-data as attributes
  • Traditionally Metadata are data that describes
    other data
  • schema, protocols, type-dictionaries
  • Its meaning has been extended to
  • Attributes given to an objects in a given context
    (including self)
  • Proxy/cache of some object-attributes for fast
    retrieval
  • Neutral protocol among domains not sharing a
    common data-model
  • Work-around for mistakes in the data-model
  • Everything that looks like a super-structure in
    your domain
  • In OO they can be considered as attributes of the
    relation among two objects
  • They suffer of many problems related to object
    identity
  • Copy semantics (shallow-deep)
  • Delete semantics (roll-back)
  • Play-back semantics (restore from backup,
    regeneration)
  • Examples
  • ls l phone innocent a the formatting of this
    presentation

3
Reconstruction Sources
4
HEP Data
  • Event-Collection Meta-Data
  • Environmental data
  • Detector and Accelerator status
  • Calibrations, Alignments
  • (luminosity, selection criteria, )
  • Event Data, User Data

Event Collection
Collection Meta-Data
Navigation is essential for an effective physics
analysis Complexity requires coherent access
mechanisms
5
Top Level Event Structure (ORCA4)
TheRun object is the entry point for provenience
and configuration information
Run
Crossing
Trigger
Pile-up
Run
SimEvent
6
Re-Reconstruction Clones
Run
Run
Id-1
Local Replica
Crossing
Trigger
Pile-up
7
Dataset Collection
MetaData User Tag
Run Collection
An example of Meta-Mess Due to the transition to
winter mode a collection ended to have attributes
in three contexts self, dataset and the master
collection
Rec Event
8
Top Level MetaData Structure (spring)
System Collection
RunList
Owner (Transformation?)
Specific to DS type
Run
DataSet
SetUp
EVDFilePool
Event Collection
Persistent Algorithms
EVDFile
Configuration
Specific to DS type
Container
Specific to DS type
Location of event data
9
Top Level MetaData Structure (winter)
System Collection
RunList
Owner (Transformation?)
Specific to DS type
Run
DataSet
PoolCatalog
SetUp
Event Collection
Persistent Algorithms
EVDFile
Configuration
Specific to DS type
Container
Specific to DS type
Location of event data
10
Interface
11
Interface
12
Publication, Distribution and Replica
  • Sharing of data sample produced in a private
    environment must be supported
  • Local data sources
  • Local data products
  • Work in isolation without prior registration to a
    larger scope
  • Make it available from a local scope
  • Requires a change of scope to access it (sshcd,
    change file/db-server)
  • Make it accessible from other scopes
  • Usually implies publication in a global scope
    (/afs/ http/, RLS, DSN)
  • Make it available from a different scope
  • Implies a physical replica
  • Data sharing cannot be supported just with a
    centralized dbms or replica service

13
Obj id in a distributed environment
  • Unique Object id
  • Easy to obtain, not human friendly
  • Related to physical location (pool oid, /afs/)
  • Fast direct access, ensure consistency, makes
    replica management difficult
  • Location independent (pool file id)
  • Support replica at object level, access requires
    an additional scope, makes update (and delete)
    difficult
  • Difficult to turn into a logical identity
  • Add checksum?
  • Attribute-based Object id
  • Human best friend
  • Does not guarantee uniqueness, supports
    relational-algebra and fuzziness
  • Global vs local scope (namespace)
  • Performance requires to turn it in a unique-id in
    a restricted-scope (index)
  • Hybrid-Store should be intended more has
    supporting various navigation and access
    paradigm, rather than implemented using different
    technologies

14
Open issues
  • Configuration should be moved in its own database
  • Is it the same as the Condition DB?
  • How we identify versions and variants?
  • Should we refer to configuration items by a
    unique id or through its attributes?
  • Owner, Originator, Transformation, Dataset
  • Do we need to distinguish between these concepts?
  • What is the relationship among them and w.r.t.
    the configuration?
  • Naming policy
  • Can we afford multiple naming policies?
  • At which level naming policies should be
    enforced?
  • Can we really implement a unique consistent
    naming policy in a fully distributed environment?
Write a Comment
User Comments (0)
About PowerShow.com