Title: Configuration Monitoring Tool for Large Scale Distributed Computing
1 Configuration Monitoring Tool for Large Scale
Distributed Computing
Y. Wu1, G. Graham1, X Lu2, A. Afaq1, B.J. Kim3
and I. Fisk1 1. Fermi National Accelerator
Laboratory 2. University of Iowa 3. University of
Florida
2Outline
- Introduction to the CMS computing
- Why a configuration monitoring tool
- Design consideration and approach
- Configuration monitoring tool architecture and
components - Current status of the configuration monitoring
tool - Future development plan and summary
3Introduction to the CMS computing
- CMS (Compact Muon Solenoid) experiment, which
will run at the Large Hadron Collider (LHC), is
expected to have the following features in its
computing - - Will have petabytes of data
- - Need very large scale distributed computing
systems to analyze the data - - Grid computing will likely be used to achieve
much of its offline data analysis needs - - The computing systems utilized in the CMS data
analysis will be heterogeneous and dynamic
4CMS Data Grid Hierarchy
1 TIPS 25,000 SpecInt95 PC (2000) 20 SpecInt95
PBytes/sec
Online System
100 MBytes/sec
Tier 01
Bunch crossing per 25 nsecs.100 triggers per
secondEvent is 1 MByte in size
0.6-2.5 Gbits/sec
or Air Freight
Tier 1
FNAL Regional Center
France Regional Center
Italy Regional Center
UK Regional Center
2.4 Gbits/sec
Tier 2
622 Mbits/sec
Tier 3
Physicists work on analysis channels. Each
institute has 10 physicists working on one or
more channels Data for these channels should be
cached by the institute server
Institute
Institute
Institute
Institute
Physics data cache
100 - 1000 Mbits/sec
Tier 4
5Why A Configuration Monitoring Tool?
- To meet the CMS distributed computing challenges,
we find we need to have a monitoring system to
track and query site configuration information
for large-scale distributed CMS applications -
- A few selected use cases
- - Job generators, e.g. MOP, need to know
a list of configurations on a computer resource
(e.g., CMS software location, scratch area, etc.)
for generating and submitting jobs - - A general user need to know what kind of
services are available within an organization
(e.g., USCMS) and their corresponding
configurations, e.g., gatekeeper port number and
available job managers - - Users also want to know the services
status critical services need to be available
even before job submission
6Design Consideration and Approach
- The goal of a configuration monitoring system is
to fit the needs of CMS production and user
analysis across the US CMS resources - The following features are desirable (based on a
user survey) - - The information in the configuration
monitoring system should be highly available - - The history configuration information
should be archived and retrievable - - The configuration information should
only be available for authorized users and/or
groups - Utilize as much existing tools as possible
7Design Consideration and Approach (2)
- Globus Toolkit and Tomcat servlet container are
chosen as the building blocks for the
configuration monitoring tool - A relational database server is used to store the
configuration information. This has the advantage
to log the info for future queries - The Grid Security Infrastruction (GSI), together
with the EDG Java Security package, is used for
secure authentication and transparent access to
the configuration information across the USCMS
grid
8Design Consideration and Approach (3)
- A layered structure is used to develop the whole
system. It has the advantage to replace a layer
without interfering other layers. Tentatively,
the system is divided into the following layers - Site info provider layer
- - The module in this layer is distributed
at each computing resource. It collects and
publishes resource configuration info. - Configuration Database Server layer
- - It tracks the hosts and services to be
monitored, and stores all the collected
configuration info. - Tomcat service layer
- - Through Tomcat, a user can view the info
through a web browser and/or query the info in
the database through web service - User Interface
- - They are here for the convenience of
users
9A Protype Architecture
Tomcat Server
query
VOMS
query
Configuration Database Server (MySQL)
Site Info Provider
Site Info Provider
Site Info Provider
10Site Information Provider Layer
- This layer is responsible for collecting and
publishing site configuration information at each
resource. It accomplishes the task through
Globus MDS with our own information provider and
the standard GLUE schema (Grid Laboratory Uniform
Environment) - The information provider can publish the
information from the following source - - Configuration information in a text
file - - Output from a user command
- - Special scripts can also be written as
plug-ins for other configuration generations - The published resource configuration info in MDS
can be queried directly using standard Globus
commands or through a set of client scripts
provided by the configuration monitoring tool.
11Configuration Database Server Layer
- The database server layer consists of a
relational database server and cron job scripts
to track and update the information in the
database. It is the core component of the whole
configuration monitoring architecture - - Provides a mechanism on controlling
hosts and services to be monitored - - Tracks the availability of the services
within a Virtual Organization (VO) --- some
services are supposed to be available all the
time - - Archives the collected configuration
information for later use - Currently, we are using MySQL as the relational
database server. It is an open source product. It
can fit our current need when the number of hosts
and services to be monitored is relatively small.
12Configuration Database Server Layer (2)
- The configuration information in the database are
collected through site information providers and
get updated at a scheduled interval using the
cron job scripts - The old configuration information are archived
and only updated in the database when there is a
change in a resource configuration. In another
word No change in information, no update!
13Tomcat Service Layer
- Tomcat plays an important role in our
configuration monitoring system. - Tomcat servlet technology is used to provide a
web interface for users to accomplish the
following tasks - - Browse the available hosts
- - Browse the available services, and its
configurations - - Make a specific query on the host and/or
service - And the same technology is used for authorizing a
person to perform the administration tasks
securely - - Update the resources/services to be
monitored - - Reset the availability of services
14Tomcat Service Layer (2)
- In the future, we plan to provide web service
through Tomcat for both users and administrators - - Users may query the information in the
configuration database through command-line
scripts. This will include the available
resources, services, and their configuration info
in the central database (or databases). Still, if
a user wants the newest information, he/she has
to retrieve those information directly from a
local info provider. - - Administrators can update their site
information through the web service mechanism,
e.g., when a service must be shut down
immediately.
15Web Interface Screenshot (1)
16Web Interface Screenshot (2)
17Web Interface Screenshot (3)
18Security features of the Configuration Monitoring
System
- Keeping the configuration information only
accessible by authorized users is always one of
our top priorities - As the site info provider is part of the Globus
MDS, it has the same security mechanism as the
standard Globus toolkit - The web interface enforces strong authentication
and authorization using the digital certificates.
This requires a client web browser to be able to - - manage client certificates
- - perform SSL mutual authentication
19Security features of the Configuration Monitoring
System (2)
- On the server side, all the web pages and
servlets are put behind an authorization servlet
filter----currently, we are using a filter
package developed by EDG. - The authorization filter examines every incoming
request and tries to extract the client
certificate from the request. It then passes the
extracted client DN to an authorization manager
for verification. If the authorization manager
can verify the client DN, it gives permission for
the user to view the web info otherwise, it just
termites the request and informs the user
authorisation failed. - Currently, the authorization manager is
configured to examine a standard grid-mapfile to
see if a request user DN can be found in the
grid-mapfile. - Furthermore, the user DN entries in the
grid-mapfile is extracted from a VOMS (Virtual
Organization Membership Service) server
20Security features of the Configuration Monitoring
System (3)
User Request (DN, etc)
Tomcat
Authorization Manager
Servlet filter
Authorized?
Grid-mapfile
Configuration Info (.html, .jsp, servlets)
VOMS
21The Current Status of Configuration Monitoring
Tool
- We have finished the initial development on major
components of the configuration monitoring tool
and tested it using USCMS grid resources - The information provided by configuration
monitoring tool has been used in the USCMS
distributed Monte Carlo production---its first
customer (detail next page) - Other applications, such as GridServ under
development at University of Florida, also show
interest in using the info published by the
Configuration Monitoring tool. - - More info on this can be found at
- https//gdsuf.phys.ufl.edu8443/gridmon/ad
min/gridserv/dpeclient
22The Current Status of Configuration Monitoring
Tool (2)
- MOP is a system for distributing CMS production
jobs over the distributed grid environment.
Currently, it is the main production system used
in the USCMS grid testbeds. - In order to generate and submit MOP jobs, the MOP
job submitter need to know a set of parameters at
each remote site intended to run jobs - MOP_MAX_JOBS100MOP_REMOTE_JOB_MANAGER_
FOR_RUNgarlic.hep.wisc.edu/jobmanagerMOP_REMOTE
_JOB_MANAGER_FOR_STAGE_INgarlic.hep.wisc.edu/job
managerMOP_REMOTE_JOB_MANAGER_FOR_STAGE_OUTgarli
c.hep.wisc.edu/jobmanagerMOP_REMOTE_JOB_MANAGER_
FOR_PUBLISHgarlic.hep.wisc.edu/jobmanagerMOP_RE
MOTE_JOB_MANAGER_FOR_CLEANUPgarlic.hep.wisc.edu/
jobmanagerMOP_REMOTE_RUNTIME_AREA/afs/hep.wisc.e
du/grid3/shared-tmpMOP_EXPORT_DIR/afs/hep.wisc.e
du/grid3/shared-tmpMOP_REMOTE_VDT_LOCATION/data/
grid/GRID3/MOP_REMOTE_DAR_ROOT/afs/hep.wisc.edu/
grid3/app/uscms01
23The Current Status of Configuration Monitoring
Tool (3)
- Before Using Configuration Monitoring Tool
- - Remote system administrators had to mail
this information to the person who generated MOP
jobs. He/she would put these info into a
configuration file. - - It was a model very prone to failure If
there was a change in the site configuration,
there is a potential of job failure even before
submitting the jobs---sometime a system
administrator forgot to mail this info or a MOP
user forgot to check the e-mail to modify the
submitter side file. - After using the tool
- - The site system administrator just
need modify a local copy of the configuration
file. The configuration monitoring tool will take
care of the rest.
24Future development
- We think further developments are needed in the
following areas - Need to provide web services to query the info
from the database and/or to update the info in
the database through Tomcat - More resource configuration information need to
be collected from other monitoring tools, like
MonaLisa, Ganglia, etc. - Provide a web interface to view history data
- - They are now archived in the database
with timestamp. We need to have an interface to
view those info.
25Summary
- A configuration monitoring tool has been
developed on top of the Globus technology and web
service to allow users/sites to publish the site
configuration info, archive the collected info
and query them - The Grid Security Infrastructure, together with
EDG Java Security packages, are used for secure
authentication and transparent access to the
configuration information across the USCMS grid - The configuration monitoring tool has been
installed on the USCMS Grid testbeds and tested
in the USCMS grid production jobs - Further improvements have been identified and
will be available in the near future