Title: CMS%20Dashboard%20The%20ARDA%20Team%20Julia%20Andreeva,%20Tao-Sheng%20Chen,%20Craig%20Munro,%20Shih-Chun%20Shiu,%20Juha%20Herrala%20In%20Collaboration%20with%20the%20MonAlisa%20Team%20CMS%20Computing%20Technical%20Project%20Workshop%20CERN,%204th%20November%202005
1CMS DashboardThe ARDA TeamJulia Andreeva,
Tao-Sheng Chen, Craig Munro, Shih-Chun Shiu, Juha
HerralaIn Collaboration with the MonAlisa
TeamCMS Computing Technical Project
WorkshopCERN, 4th November 2005
2ARDA contributions on CMS Dashboard
- Dashboard start page
- http//www-asap.cern.ch/dashboard
- Job Monitoring
- Derived and complemented from the experiences
gathered with the ASAP MyFriend job monitoring
service. - First version currently available
athttp//www-asap.cern.ch/dashboard/dashboard.p
hp - I/O monitoring of the jobs
- Initial attempt with SC3 jobs sending summary
information containing I/O measurements. - First version currently available at (history
views)http//www-asap.cern.ch/dashboard/transfe
r_history.php
3Dashboard CMS Job Monitoring
RB
WNs
Monalisa
R-GMA
RB
Web Service Interface
R-GMA Client API
Submission tools
RGMA Collector
MonalisaCollector
Constantly retrieve job information
- Snapshot
- Statistics
- Job info.
- Plots
Dashboad DB
PostgreSQLsqLite Oracle?
- Other clients
- MyFriend services
- etc..
PHP
4Dashboard development cycle
- Agree on Dashboard views
- Input mainly from CMS community
- Modify tools to report the required data
- New structure for JobStatusRaw data/table in
R-GMA - ARDA/ASAP job submission tool as the first
prototype - McRunjob as the first CMS pilot tool
- CRAB submission procedure and job wrapper
experimental in SC3 - Reports submission and job summary to Monalisa
- Data cards for ORCA/COBRA
- Collect the data
- Monalisa collector
- CMS tools and ORCA jobs report to Monalisa
- R-GMA collector (for LCG Logging Bookkeeping
data) - Job state/status transition in WMS
- GridIce collector under investigation
- Feed the data in the Dashboard DB
- Current implementation with PostgreSQL
- Oracle as the next step
- Create the Dashboard views
5Dashboard status and plans
- The first and most important development
cycle finished - Initial Dashboard db schema defined
- Data collectors created and made work as reliably
as possible (not always inour power). Collectors
should also scale to the rate of the incoming
information. - Initial problems in the data sources have been
communicated to developers/maintainers. - First views have been created.
- Next step to get feedback from the CMS community
- Which information to gather, which level of
detail to maintain - Opinions of the current state of the
functionality/views - Priority list for next features
- Important to agree which tools are involved
- For example, the information sent by production
and analysishave much in common - Format and content should be discussed among all
interested parties framework developers,
submission tool developers, grid site managers. - Main objective is to make Dashboard a real
CMS-wide tool - Currently works with SC3-patched CRAB version
(Lassi co.), modified version of McRunjob (used
by Olga and Pablo) and ASAP on LCG. - A plan needed to migrate with the mainstream
development of CMS job submission tools (analysis
production). - Involve other grid backends like OSG and EGEE.
- Oracle service for the data.
- Additional information needed to increase the
reliability and to enhance the functionality. Few
examples on the next slide
6Examples of requirementsfor additional data
- At the submission time
- In addition to what we have now grid sertificate
subject, set of CEs where job can be allocated. - Synchronization message when the job starts
- Grid job ID,CE name, WN name
- Reason for immediate failure (sanity check
failure) catalog can not be downloaded, software
(tag) is missing - If staging in is required (production) time
stamp when staging started and finishedexit code
of staging out - Additional runtime information
- User executable started running time stamp
- User executable finished time stamp
- At the end of the user executable
- send as much information as possible from the WN,
so that the real executions stauts of the job can
be analyzed without opening a log file - User executable exit code in ORCA the CARF
errors line, total events processed - Length of the output file
- If the output should be saved to the SE
- name of the SE
- exit code of the staging out
- exit code of the catalogue update
- The submission tool could send the job status
whenever requested from the grid. - Complementing the incomplete data by collecting
it from yet another system, like GridIce.
7Dashboard demo