Title: LCG%20Monitoring%20and%20Accounting
1LCG Monitoring and Accounting
- Dave Kant
- CCLRC e-Science Centre, UK
- HEPSYSMAN
- April 2005
2Introduction
- Overview of some of the monitoring tools in
action in the LHC Computing Grid - GOCDB
- GPPMON
- GRIDICE
- GSTAT
- CERTIFICATION TESTING
- REAL TIME GRID MONITOR
- Accounting Use Case
- Future Plans
3Monitoring the LCG Grid is a Challenge!
Number of participating sites is growing every
day August 2003 gt 12 sites 100
CPUs October 2004 gt 83 sites 8000
CPUs April 2005 gt 138 site 14000 CPUs
4TB Disk
Grid Operations Centre Monitor the operational
status of sites Fault detection Problem
Management Identify problems escalate track
4Monitoring Challenges
- With so many sites participating, there is a
requirement for operational information in order
to manage a grid environment - What are the core grid services
- e.g. RBs/SEs/BDIIs the VOs are using for data
challenges. - Who do we contact when there is a security
incident? - Require a toolkit test specific core services.
- We have to concentrate on functional behaviour of
services e.g. If an RB sends your job to a CE,
then we must assume the RB is working fine. Is
this the only test of a RB? - Not all the tests that we perform are effective
at finding problems. - We must develop tests which simulate the life
cycle of real applications in a Grid environment. - and lots more
5GOC Configuration Database
http//goc.grid-support.ac.uk/gridsite/gocdb
Secure Database Management via HTTPS /
X.509 Store a Subset of the Grid Information
system People, Contact Information,
Resources Scheduled Maintenance
- Monitoring Services
- Operations Maps
- Configure other Tools
- Organisation Structure
- - People/Institites/Projects
- Secure services
- - News
- Self Certification
- Accounting
GOC GridSite MySQL
SERVER
SQL
https
Resource Centre Resources Site Information EDG,
LCG-1, LCG-2,
bdii
ce
GOC DB can also contain information that is not
present in the IS such as Scheduled maintenance
News Organisational Structures Geographic
coordinates for maps.
se
rb
RC
6Operations Map Job Submission Tests
GPPMON Displays the results of tests
against sites. Test Job Submission Job is a
simple test of the grid middleware components
e.g. Gatekeeper service, RB service, and the
Information System via JDL requirements.
This kind of test deals with the functional
behaviour core grid services do simple jobs
run. They are lightweight tests which run hourly.
However, they have certain limitations e.g. Dteam
VO WN reach (specialised monitoring queues).
7Operations Map Certificate Lifetime
GPPMON Displays the results of tests
against sites. TestCertificate Lifetime Many
grid services require a valid certificate for
security.
By probing the host certificates on CEs and SEs
at sites with a simple SSL client service, we can
identify certificates which are due to expire and
send an early warning to them. A predictive tool!
8GRIDICE Architecture
A different kind of monitoring tool processes /
low level metrics / grid metrics Developed by the
INFN-GRID Team http//infnforge.cnaf.infn.it/gridi
ce
Data harvest via discovery service (postgreSQL)
Publication service
Measurement
service monitoring sensor agents probe process
table, memory, cpu
9GRIDICE Global View
Different Views of the data Site / VO /
Geographic
Resource Usage CPU, Load, Storage, Job Info
List of Sites
Display shows the processes belonging to the
Broker service. Problems are flagged
10GridIce Job Monitoring
- Recently deployed version 1.6.3 on to LCG which
features job monitoring Queued, Running,
Finished organised in different ways (site, Vo
etc) - XML views of data
11GRIDICE Expert View
Node
Processes
Display shows the processes belonging to the
Broker service. Problems are flagged
12Ganglia Monitoring
- http//gridpp.ac.uk/ganglia
- Can use Ganglia to monitor a cluster
Scalable distributed monitoring system for
clusters and grids using RRD for storage and
visualisation. RAL Tier-1 Centre LCG PBS Server
displays Job status for each VO Get a lot for
little effort
13Federating Cluster Information
- Can also use Ganglia to monitor clusters of
clusters
Ganglia/R-GMA integration through Ranglia.
14GIIS Monitor
- Developed by MinTsai (GOC Taipei)
- Tool to display and check information published
by the site GIIS (sanity checks, fault detection) - http//goc.grid.sinica.edu.tw/gstat/
15Regional Monitoring
Dealing with the complexities of
managing a grid.
- EGEE is made up of regions.
- Each region contains many computing centres.
- Regional Operational Centres is a focus for
operations.
16Regional Monitoring Maps
- http//goc.grid-support.ac.uk/roc_map/map.php
- Provide ROCs with a package to monitor the
resources in the region - Tailored Monitoring
- GUIs to create organisations and populate them
with sites - Hierarchical view of Resources
- Example UK Particle Physics GridPP
- Materialised Path encoding
17Site Functional Tests (SFT)
- In terms of middleware, the installation and
configuration of a site is quite a complicated
procedure. - When there is a new release, sites dont upgrade
at the same time - Some upgrades dont always go smoothly
- Unexpected things happen (who turned of the
power?) - Day-to-day problems robustness of service under
load? - Its necessary to actively hunt for problems
- Site certification testing is by CERN deployment
team on a daily basis. First step toward
providing this service involves running a series
of replica manager tests which register files
onto the grid, move them around, delete them and
3rd party copies from remote SE. - Unlike the simple job submission tests
implemented in GPPMON, these tests are more heavy
weight and attempt simulate the life cycle of
real applications.
18Certification Test Results
http//lcg-testzone-reports.web.cern.ch/lcg-testzo
ne-reports/cgi-bin/listreports.cgi
19Syndication of Monitoring Information
GOC generates RSS feeds which clients can pull
using an RSS aggregator. How can we integrate
feeds and ticketing systems?
Aggregator RSSReader (Windows Client)
20Real Time Grid Monitor
http//www.hep.ph.ic.ac.uk/e-science/projects/demo
/index.html
Why are jobs failing? Why are jobs queued at
sites while others are empty?
A Visualisation tool to track jobs currently
running on the grid. Applet queries the logging
and bookkeeping service to get information about
grid jobs.
21Problems with existing tools
- Lots of monitoring tools have described they
have a few things in common - - all the information which they generate is
hidden away or difficult to access - - limited interfaces the data can only be
accessed in specific ways - Therefore, its difficult to build on-demand
services to allow communities Players to
interact with the data. - Examples include
- Job Accounting service to allow an Organisation
to compare resources usage for each VO - Certification Testing service Secure service to
allow a site administrator to run the
certification test suite against their site
through a RB of their choice? - The idea is for the services to collect
information and put it into a common repository
such as an RGMA Archiver. In this way, the
information can be shared and accessible to all. - Services (EGEE parlance ROC and CIC services)
munch the data and present it to the community. - Example GIIS is that its hard to drill down to
the information you want e.g How much CPU in
GridPP today? How much disk in the UKI ROC? The
new paradigm solves this problem by allowing the
data to be aggregated in different ways.
22Monitoring Paradigm
A Better way to unify monitoring information. GOC
Services collect information and publish into an
archiver. ROC/CIC Services provide a means for
the community to interact with this information
on-demand. GOC provides services tailored to the
requirements of the community.
23GOC UseCase Job Accounting
- An accounting package for LCG has been developed
by the GOC at RAL - There are two main parts
- the accounting data-gathering infrastructure
based on R-GMA which brings the data to a central
point - a web portal to allow on-demand reports for a
variety of players.
24Requirements
- A historical record of grid usage to identify the
use of individual sites by VOs as a function of
time - To demonstrate the total delivery of resources by
that site to the Grid - Aggregated views of the collected data by
- VO
- Country a requirement of LCG which has a
country-based structure - EGEE Region for use by EGEE Regional Operations
Centre (ROC) - A presentation front-end to the data to allow the
selection on-demand of the views described above
for different VOs and periods of time. - To present the data as
- A graphical view for interpretation
- A tabular view for precision
- To support sites that already had their own
methods of data collection by allowing arbitrary
data collection techniques and insertion of the
data in the standard schema into the central
database.
25Requirements
- It was not an explicit requirement that user
information be captured but we included this in
the design as we were sure this would be a
secondary requirement - This is a reporting system, not a charging
mechanism. - The information is under the control of the site,
so it does not meet the requirement of a charging
system to be digitally signed and irrefutable. - Information is gathered centrally, not under the
control of the VO
26Design
- Information collected at each site from batch
logs, gatekeeper logs etc - Information joined at site level to select grid
jobs and stored in database on R-GMA MON box at
site. - Information published through R-GMA and collected
centrally in an R-GMA archive at GOC - Web site presents various views of this data for
presentation - Structure of Grid taken from GOC DB the grid
configuration database. - Only normalised cpu time collected
27APEL Accounting Processor for Event Logs
28How APEL Works?
- PBS/LSF log processed daily on site CE to extract
required data, filter acts as R-GMA DBProducer -gt
PbsRecords table - Gatekeeper log processed daily on site CE to
extract required data, filter acts as R-GMA
DBProducer -gt GkRecords table - Message log processed daily on site CE to extract
required data, filter acts as R-GMA DBProducer -gt
MessageRecords table - Site GIIS interrogated daily on site CE to obtain
SpecInt and SpecFloat values for CE, acts as
DBProducer -gt SpecRecords table, one dated record
per day - These three tables joined daily on MON to produce
LcgRecords table. As each record is produced
program acts as StreamProducer to send the
entries to the LcgRecords table on the GOC site. - Site now has table containing its own accounting
data GOC has aggregated table over whole of LCG. - Interactive and regular reports produced by site
or at GOC site as required.
29GOC
Job Records In via RGMA
1 Record per Grid Job (Millions of records
expected)
RGMA MON
SQL QUERY TO Accounting Server 1 Query / Hour
Summary data refreshed every hour (Max records
about 100K per year)
Home Page
On-Demand Accounting Pages based on SQL queries
to summary data
30Description
- Web allows information to be selected by
- VO, time range, (Whole Grid, Country, EGEE
Region, site) - Also shows information on data collected
31Select date range
Select VOs (Default All)
Web form to apply selection criteria on the data
Aggregate data across an organisation structure
(Default All ROCs)
32 Summed CPU (Seconds) consumed by resources in
selected Region
VO Index
Selected Date Range
33 List of Sites Belonging to the Selected ROC
A breakdown of the resource usage per Site, per
VO, per Month
34http//goc.grid-support.ac.uk//
65 Sites publishing
data to GOC (April 2005) Over 1.3 Million Job
records 50K records per week
35GOC Accounting Services
http//goc.grid-support.ac.uk/gridsite/accounting/
index.html
On Demand Services to EGEE Community
Simple interface to customise views of data VO,
time frame and Region (default EGEE)
BaseCpuSeconds Aggregated across EGEE
Each Region, per VO, per Month
Other Distributions Normalised CPU Jobs
36 Provide Interface to the Data Driven
by User Requirements
Materialised Path Library
Tier-1 View Regional View Country
View
37 Including Graphing Features
38Number of Sites per Country Publishing Accounting
Records to GOC
39GridPP Accounting Status April 2005
- Sites that have never published or have not
published recently. - CAVENDISH-LCG2 -- never published
- Dublin-CSTCDIE -- never published
- DURHAM last published 18th Feb
2005 - IC-LCG2 -- last published 9th
April 2005 - RAL-LCG2 last published 16th
March 2005 - HP-Bristol -- never published
- Lancs-LCG2 never published
- LivHep-LCG2 never published
- QMUL-eScience never published
- RHUL-LCG2 -- never published
- ScotGrid-Glas last published 17th
Jan 2005 - UCL-CCC last published 12th
Feb 2005 - UCL-HEP never published
- Contact Dave if you need advise about
installing Apel - D.Kant_at_rl.ac.uk Tel 01235 778178
40Batch System Support
- APEL supports PBS (Released) and LSF (Testing)
- Implementations are separate and independent of
one another. Currently LCG2_4_0 has PBS support
only. - Re-factoring to a single package with plug-in
batch specific components is currently in
progress. - What is the current status about LSF Support?
- LSF currently comes in three flavours (version 4,
5 and 6), each has a different usage record
format - New RPM edg-rgma-apel-lsf has been released to
CERN for testing. - Expect a release in the 2_4_1 tag next Month.
41Issues
- Which RPM Version?
- Latest version on http//goc.grid-support.ac.uk/gr
idsite/accounting - 3.4.44 for LCG2_4_0
- Change Log 3.4.37 to 3.4.43
- Apel 3.4.43 (April 6th) Startup script modified
for RGMA 2_4_0 s/w release - Apel 3.4.42 (Mar 20th) Improved core
functionality - Better handling of dn suppression
- Check flexible archiver on-line before attempting
to send job records - Apel 3.4.41 (Feb 2nd) Minor fix to SQL script
- Apel 3.4.40 (Jan 17th)
- Normalisation issue (see later)
- CatchAll specInt/specFloat set to value in GIIS
rather than 0 - Apel 3.4.39 (Dec 16th) Current PBS log excluded
from archive - Apel 3.4.38 (Nov 19th)
- Bug in reprocess option during Join
- Added cleanAll option
- Apel 3.4.37 (Oct 14th)
- grant mechanism to allow GK and CE to connect to
MySQL database
42Issues
- VO Filtering
- National Grid VOs activities running on same
infrastructure as EGEE/LCG - Privacy reasons why sites dont want to publish
National VO data to GOC - APEL does not discriminate between the VOs
- Develop a solution? What can we suggest today?
- GOCDB can hide resources
- APEL made the requirement to exclude Local work
not published but non LCG work does come through. - Whats the model 1 CE per VOwhat do people do?
- Dont need to install Apel on non-LCG VO CEs
- SARA-LCG2, IISAS-Bratislava
- GridPP?
43Issues
- Development of Tests to Check the Accounting
Service - Is site accounting working?
- Is the GOC listening for new data?
- Is the RGMA Registry working?
- GSTAT
- GOC Flexible archiver service listens for
accounting producers - If the service is down, no data can be sent to
the GOC! - Use the service every 5 minutes to update a
timestamp in a test record in the accounting
database. GSTAT can query table, look at the
timestamp and compare with the current date/time. - 3rd party to use the flexi service.
- Use RGMA to compare records in the site database
and GOC - Site Functional Tests
- Can check the RPM version installed on the CE
- Testing the Whole Thing instead of the Pieces
- Investigate an Apel heart-beat
- Site cron writes a test record every hour and
publishes to GOC
44Issues
- Which Log Files Should Site Administrators
Backup? - To build accounting records, we need to process
data from THREE log file sources. This is a
mandatory requirement in order to reconstruct
what has been done during the 2004 period. - /var/log/globus-gatekeeper
- Match between grid-user dn to GramScriptJobId
- /var/spool/pbs/server_priv/accounting/
- Local jobID and details of resources consumed
- No distinction between grid jobs and non-grid
jobs. - /var/log/messages
- Map GramScriptJobID to local JobID
- This is how we separate grid jobs from local user
jobs which run on the local fabric. - If the site has deleted its messages files, we
may be able to work around this by matching local
unix groups in the batch logs. Accounting records
formed in this way will not contain the dn of the
grid-user.
45Issues
- siteName Changes
- Recent problem with presenting data from the
French ROC where CCIN2P3 was renamed to IN2P3-CC
via GOCDB portal - All records associated with the site are updated
in order for SQL queries to match the new
siteName. - Namespace Convention?
- Naming scheme to identify data belonging to large
sites which provide services for different
communities etc. - NIKHEF lcgprod.nikhef.nl , lcg2prod.nikhef.nl,
edgapptb.nikhef.nl - SiteName is a bad choice because we get
multiple hits - IC-LCG2 gives multiple matches PIC-LCG2 and
IFIC-LCG2 - Request sites stick to the convention .SiteName
- h1.desy.de, zeus.desy.de
46Issues
- Normalisation
- We want to perform a reasonably sensible first
order estimate to account for the differences in
worker node performance. - Homogeneous vs Heterogeneous
- PBS Job Records dont have any information about
the worker node benchmarks, so we must insert one
manually - PBS Farms setup in different ways can lead to an
error in the normalisation calculation (Blindman
vs internal normalisation) - Histories - What SpecInts do we use in order to
process archived Job Records? - LSF Job Records have a CPU_FACTOR (1 - 4) in the
Job Record. - What does a value of 1 correspond to?
- Different calibration value at each site
- Conversion table?
- Can the site publish a weighted specInt2000 for
the farm?
47Issues
- Service Reliability Hardening
- If flexible archiver is down, sites unable to
publish data to GOC - Update 3.4.42/43 Apel core checks if flexible
archiver service is available before attempting
to publish data. - GOC publishes a test record every 5 minutes to
check the service is alive automatic service
recovery mechanism now in place - Investigate running multiple flexible archiver
services - 1 per GOC or 1 per ROC?
- At the moment, the archiver service listens for
all producers rather than producers belonging to
a ROC. - Single point of failure if registry is down?
- Multiple registry replicas supported in the RC1
(gLite) release? - Update Multiple registries supported in LCG2_4_0
?
48Future Plans
- Integration into gLite Framework?
- Work started
- Apel for Storage
- Capturing billing information for dcache
- Cron runs, publish recent data into R-GMA
- SE snapshot e.g df of filesystem
- Use of disk and tape
- Cron runs on SE which is a script but script
tailored for different SE e.g. dcache, tapestore
etc - Web Services Interface to accounting data
- How would such a thing work?
- Any UseCases?
49Accounting Issues
- A stable release of accounting package has been
certified and tested at CERN Should sites wait
for the official release of press ahead
independently? - Package supports PBS only initial implementation
for LSF. - 80 sites advertising 313 Job managers
- - 300 PBS (91 of sites)
- - 3 CONDOR (KFKI, FNAL, TRIUMF)
- 7 LSF (GSI, LNL, CERN).
- Accounting requires the R-GMA infrastructure to
be deployed at the site. - The VO associated with a users DN is not
available in the batch or gatekeeper logs. It
will be assumed that the group ID used to execute
user jobs, which is available, is the same as the
VO name. - The global jobID assigned by the Resource Broker
is not available in the batch or gatekeeper logs.
This global jobID cannot therefore appear in the
accounting reports. The RB Events Database
contains this, but that is not accessible nor is
it designed to be easily processed. Andrea
Guarise JRA1 proposal
50Accounting Issues
- Most sites keep GK/Batch logs but throw away
message log files after 9 weeks due to default
log rotation. - At present the logs provide no means of
distinguishing sub-clusters of a CE which have
nodes of differing processing power. Changes to
the information logged by the batch system will
be required before such heterogeneous sites can
be accounted properly. At present it is believed
all sites are homogeneous.
51Summary
- Accounting Information gathering infrastructure
has been developed - It has been through the CT cycle and should be
deployed in the next release. - A web portal for display of this information has
been developed (work in progress) - This is an EGEE deliverable (DSA1.3)
- The display infrastructure can be deployed for
other monitoring information. - Development towards on-demand services to provide
the community with up-to-date information,
aggregated at different levels. - Development of Visualisation tools to enhance our
understanding of the grid.