CMS Data Analysis Current Status and Future Strategy - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

CMS Data Analysis Current Status and Future Strategy

Description:

Reconstruction-on-demand is a key concept in COBRA ... IGUANA COBRA provides a platform for a coherent, well-integrated interface no ... – PowerPoint PPT presentation

Number of Views:74
Avg rating:3.0/5.0
Slides: 25
Provided by: lassia
Category:

less

Transcript and Presenter's Notes

Title: CMS Data Analysis Current Status and Future Strategy


1
CMS Data AnalysisCurrent Status and Future
Strategy
  • On behalf of CMS Collaboration
  • Lassi A. Tuura
  • Northeastern University, Boston

2
Overview
  • The Context CMS Analysis Today
  • Data Analysis Environment Architecture
  • Overview
  • COBRA
  • IGUANA
  • GRID/Production
  • Tomorrow and Beyond
  • Leveraging current frameworks in the
    Grid-enriched analysis environment
  • Clarens client-server prototype
  • Other prototype activities

3
Context
Challenges Complexity Geographic
Dispersion Direct Access To Data Migration from
Reconstruction to Trigger
Environments Real-Time Event Filter, Online
Monitoring Pre-emptive Simulation,
Reconstruction, Analysis Interactive Statistical
Analysis
4
Current CMS Production
5
Complexity of Production 2002
6
Interactive Analysis
Lizard Qt plotter
7
Behind the Scenes Frameworks
Data Browser
Generic analysis Tools
GRID
Distributed Data Store Computing Infrastructure
Analysis job wizards
Objy tools
ORCA
COBRA
OSCAR
FAMOS
Detector/Event Display
CMS tools
Federation wizards
  • Consistent User Interface

Coherent basic tools and mechanisms
8
Frameworks Disected
Specific Frameworks
Grid-Uploadable
Physics modules
Calibration Objects
Generic Application Framework
Configuration Objects
Event Objects
Adapters and Extensions
ODBMS
GEANT 3 / 4
CLHEP
PAW Replacement
C Standard Library Extension Toolkits
Basic Services
9
Framework Design Basis
  • Several frameworks provide the environment
    together
  • Open No central framework with all functionality
  • Frameworks are designed to be extensible
  • and to collaborate with other software
  • Coherent User sees final smooth interface
  • Achieved by integrating the frameworks together
  • but the user does not do this work him/herself
    !
  • Design applied at both framework and object
    design level
  • Successfully applied in many parts of CMS
    software
  • Applications, persistency sub-frameworks
    visualisation
  • No loss of usability, functionality or
    performance
  • Has made it easy to integrate directly with many
    existing tools
  • This is nothing novel it is part of the
    standard risk-mitigation strategy of any modern
    industrial solution

10
Frameworks COBRA
Data Browser
Generic analysis Tools
GRID
Distributed Data Store Computing Infrastructure
Analysis job wizards
Objy tools
ORCA
COBRA
OSCAR
FAMOS
Detector/Event Display
CMS tools
Federation wizards
  • Consistent User Interface

Coherent basic tools and mechanisms
11
COBRA Main Components
  • Push- and pull-mode executionand any mixture
  • Reconstruction-on-demand is a key concept in
    COBRA
  • Detector-centric reconstructionpush data from
    event
  • Reconstruction-unit-centric reconstructionpull/cr
    eate data as needed
  • Event data and related structures
  • Basic support for commonly needed objects (hits,
    digis, containers, )
  • Application environments
  • Basic application frameworks, various
    semi-specialised applications
  • Lots of error-handling and recovery code
    (automatic recovery after crash, )
  • Meta data a key component
  • Data chunking, system and user collections, data
    streams, file management, job concepts,
    configuration and setup records, redirected
    navigation after reprocessing,

12
COBRA Main Strengths
  • Algorithms in plug-ins
  • Publish-yourself-plug-insself-describing data
    producers
  • Strong meta-data facilities
  • Reconstruction-on-demand matches data product
    concept very well
  • Grid virtual data products concept really just an
    extension
  • Convenient mapping of data products to chunks
    files, containers,
  • Scatter / gather decompose jobs, gather data
  • One logical job can be chopped into many physical
    processes, we still know it is logically the same
    job no matter which process it is running in
  • Adapts automatically to many environments without
    special configuration interactive, batch, farm,
    stand-alone, trigger,
  • Through appropriate use of enabling techniques
    (transactions, locking, refs)
  • No data post-processing required
  • Well-matched to production tools (IMPALA)

13
(No Transcript)
14
Queries
Refs Navigation
Cache Management
15
Collections
Configurations (Data Sets)
Object Naming
Run Resume Crash Recovery
16
File Size Control
System Management
Farm Management
17
Frameworks IGUANA
Data Browser
Generic analysis Tools
GRID
Distributed Data Store Computing Infrastructure
Analysis job wizards
Objy tools
ORCA
COBRA
OSCAR
FAMOS
Detector/Event Display
CMS tools
Federation wizards
  • Consistent User Interface

Coherent basic tools and mechanisms
18
User Interface and Visualisation
  • IGUANA a generic toolkit for user interfaces and
    visualisation
  • Builds on existing high-quality libraries (Qt,
    OpenInventor, Anaphe, )
  • Used to implement specific visualisation
    applications in other projects
  • Main technical focus provide a platform that
    makes it easy to integrate GUIs as a coherent
    whole, to provide application services and to
    visualise any application object
  • Many categories / layers GUI gadgets support,
    application environment, data visualisers, data
    representation methods, control panels,
  • Designed to integrate with and into other
    applications
  • Virtually everything is in plug-ins (can still be
    statically linked)

Object Factory
Object Factory
Plug-InCache
Plug-In
Attached
Plug-InCache
ComponentDatabase
Plug-In
Plug-InCache
Plug-In
Unattached
Plug-In
Plug-In
Object Factory
19
Illustration 3D Visualisation
20
IGUANA GUI Integration
Integration
Action
Visualise Results, Modify Objects, Further
Interaction
21
Tomorrow and Beyond
  • Leverage the current frameworks on the grid
  • Many native COBRA concepts match well with grid
  • (Virtual) data products reconstruction-on-demand
  • Recording and matching configuration and setup
    information
  • Production interfaces catalogs, redirection, MSS
    hooks
  • Scatter/gather job decomposition, production
    environment
  • COBRA-based applications can be encapsulated for
    distributed analysis
  • IGUANA already separates application objects,
    model and viewer
  • Many possibilities for introducing distributed
    links
  • IGUANACOBRA provides a platform for a coherent,
    well-integrated interface no matter where the
    code runs and data comes and goes
  • Both have loads of knobs and hooks for
    integration
  • Aiming at adapting the existing software where
    possible
  • Adapt and work within CMS software (COBRA, ORCA,
    ) andexisting analysis tools (ROOT, Lizard,
    )dont replace them

22
Prototypes Clarens Web Portals
  • Grid-enabling the working environment for
    physicists' data analysis
  • Communication with clients via the commodity
    XML-RPC protocol ? Implementation independence
  • Server implemented in C access to the CMS OO
    analysis toolkit
  • Server provides a remote API to Grid tools
  • The Virtual Data Toolkit Object collection
    access
  • Data movement between tier centres using GSI-FTP
  • CMS analysis software (ORCA/COBRA)
  • Security services provided by the Grid (GSI)
  • No Globus needed on client side, only certificate

Service
Clarens
Web Server
http/https
RPC
Client
23
Prototypes Clarens Web Portals
Tier 0/1/2
Tier 1/2
Production data flow
TAGs/AODs data flow
Tier 3/4/5
Physics Query flow
User
24
Other Prototypes
  • Tag database optimisation
  • Fast sample selection is crucial
  • Various models already tried
  • Experimenting with RDBMS
  • MOP distributed job submission system
  • Allows submission of CMS production jobs from a
    central location, run on remote locations, and
    return results
  • Job Specification IMPALA
  • Replication GDMP
  • Globus GRAM
  • Job Scheduling Condor-G and local systems
Write a Comment
User Comments (0)
About PowerShow.com