Title: Distributed Monitoring and Information Services for the Grid
1Distributed Monitoring and Information Services
for the Grid
- Jennifer M. Schopf
- Argonne National Laboratory
- NeSC
- Dec 6, 2005
2What is a Grid
- Resource sharing
- Computers, storage, sensors, networks,
- Sharing always conditional issues of trust,
policy, negotiation, payment, - Coordinated problem solving
- Beyond client-server distributed data analysis,
computation, collaboration, - Dynamic, multi-institutional virtual orgs
- Community overlays on classic org structures
- Large or small, static or dynamic
3Why is this hard/different?
- Lack of central control
- Where things run
- When they run
- Shared resources
- Contention, variability
- Communication
- Different sites implies different sys admins,
users, institutional goals, and often strong
personalities
4So why do it?
- Computations that need to be done with a time
limit - Data that cant fit on one site
- Data owned by multiple sites
- Applications that need to be run bigger, faster,
more
5What Is Grid Monitoring?
- Sharing of community data between sites using a
standard interface for querying and notification - A way to discover what services and resources are
available to use - A way to understand the status/attributes of
those services - A system to warn you when things fail
6Monitoring Use cases
- PPGD/GriPhyN/iVDGL monitoring group (2002-2004)
found roughly 4 categories - Health of system (NW, servers, cpus, etc)
- Resource selection
- System upgrade evaluation (have systems reached
capacity) - Application-specific progress tracking
- First three types need roughly the same
information - Fourth is user-specific and application specific
no general solution yet - http//www.mcs.anl.gov/jms/pg-monitoring
7Health of the SystemIs the Grid up?
- Brief Description
- User of a grid replication service finds actions
are much slower than normal - Not sure if problem is with network, disk, CPU
end points, or something inbetween - Need archive data for historical, current
streaming for comparison - Performance events/sensors required
- Host monitoring - CPU,memory, disk
- Network path monitoring - bw, lat., traceroute
- GridFTP monitoring
- TCP stack monitoring (web 100)
- Possibly switch/router monitoring
- May want different data for user vs sys admins
8Resource Selection
- Brief Description
- User/Broker wants to decide where to run a job
- Sites advertise cluster information for
grid-level scheduling decisions - Also need data about storage locations and access
speeds - Information must be summarized for advertising to
Grid, scalability is key issue - Performance events/sensors required
- Static number of compute nodes, cpu type and
speed, OS, installed sw, available storage
systems - DynamicQueue lengths, large file transfer times
9What shouldmonitoring systems look like?
- All sensors must be non-intrusive
- All data is small, and must be as timely as
possible - All data must be kept for a long time (years),
and must be accessible in many ways - No one really knows how many sensors will be
accessed at one time (or reporting to a higher
level service), or how often they will be
accessed - Security isnt of concern YET except for job
data
10Monitoring Systems (2)
- Line between monitoring system and higher level
services isnt always clear - Archiving
- Summary statistics
- Predictions
- Error detection
- Alarms/notification
11OUTLINE
- Grid Monitoring and Use Cases
- MDS4
- Index Service
- Trigger Service
- Information Providers
- Deployments
- Metascheduling data for TeraGrid
- Service failure warning for ESG
- Performance Numbers
12What is MDS4?
- Grid-level monitoring system used most often for
resource selection - Aid user/agent to identify host(s) on which to
run an application - Uses standard interfaces to provide publishing of
data, discovery, and data access, including
subscription/notification - WS-ResourceProperties, WS-BaseNotification,
WS-ServiceGroup - Part of the Globus Toolkit v4
- Functions as an hourglass to provide a common
interface to lower-level monitoring tools
13Information Users Schedulers, Portals, Warning
Systems, etc.
WS standard interfaces for subscription,
registration, notification
GLUE Schema Attributes (cluster info, queue info,
FS info)
14Web ServiceResource Framework (WS-RF)
- Defines standard interfaces and behaviors for
distributed system integration, especially (for
us) - Standard XML-based service information model
- Standard interfaces for push and pull mode access
to service data - Notification and subscription
15MDS4 UsesWeb Service Standards
- WS-ResourceProperties
- Defines a mechanism by which Web Services can
describe and publish resource properties, or sets
of information about a resource - Resource property types defined in services WSDL
- Resource properties can be retrieved using
WS-ResourceProperties query operations - WS-BaseNotification
- Defines a subscription/notification interface for
accessing resource property information - WS-ServiceGroup
- Defines a mechanism for grouping related
resources and/or services together as service
groups
16MDS4 Components
- Higher level services
- Index Service a way to aggregate data
- Trigger Service a way to be notified of changes
- Both built on common aggregator framework
- Information providers
- Monitoring is a part of every WSRF service
- Non-WS services can also be used
- Clients
- WebMDS
- All of the tool are schema-agnostic, but
interoperability needs a well-understood common
language
17MDS4 Index Service
- Index Service is both registry and cache
- Subscribes to information providers
- Publishes (as resource properties)
- Datatype and data provider info, like a registry
- Last value of data, like a cache
- In memory default approach, DB backing store
currently being developed to allow for very large
indexes - Soft-state registration
- Can be set up for a site or set of sites, a
specific set of project data, or for
user-specific data only - Can be a multi-rooted hierarchy
18Index Service Facts 1
- No single global Index provides information about
every resource on the Grid - No person in the world is part of every VO!
- Hierarchies or special purpose indexs are common
- Each virtual organization will have different
policies on who can access its resources - The presence of a resource in an Index makes no
guarantee about the availability of the resource
for users of that Index - Ultimate decision about whether to use the
resources is left to direct negotiation between
user and rsc - MDS does not need to keep track of policy
information (something that is hard to do
concisely) - Rscs do not need to reveal their policies publicly
19Index Service Facts 2
- MDS has a soft consistency model
- Published information is recent, but not
guaranteed to be the absolute latest - Load caused by information updates is reduced at
the expense of having slightly older information - Free disk space on a system 5 minutes ago rather
than 2 seconds ago. - Each registration into an Index Service is
subject to soft-state lifetime management - All registrations has expiry times and must be
periodically renewed - Index is self-cleaning, since outdated entries
disappearing automatically
20MDS4 Trigger Service
- Subscribe to a set of resource properties
- Evaluate that data against a set of
pre-configured conditions (triggers) - When a condition matches, email is sent to
pre-defined address - Similar functionality in Hawkeye
21Aggregator Framework
- General framework for building services that
collect and aggregate data - Index and Trigger service both use this
- 1) Common interface implemention
- Java class that implements an interface to
collect XML-formatted data from information
providers - Implements WS-RP and WS-N for query and
subscription - 2) Common configuration mechanism
- Maintain information about which information
providers to use and their associated parameters - Specify what data to get, and from where
- 3) Services are self-cleaning
- Each registration has a lifetime
- If a registration expires without being
refreshed, it and its associated data are removed
from the server
22Aggregator Framework
- General framework for building services that
collect and aggregate data - Index and Trigger service both use this
- 1) Collect information via aggregator sources
(information providers) - Java class that implements an interface to
collect XML-formatted data - Query source uses WS-ResourceProperty mechanisms
to poll a WSRF service - Subscription source collects data from a service
via WS-Notification subscription/notification - Execution source executes an administrator-supplie
d program to collect information
23Aggregator Framework (cont)
- 2) Common configuration mechanism
- Maintain information about which aggregator
sources to use and their associated parameters - Specify what data to get, and from where
- 3) Aggregator services are self-cleaning
- Each registration has a lifetime
- If a registration expires without being
refreshed, it and its associated data are removed
from the server.
24Aggregator Framework
25Information Providers
- Data sources for the higher level services (eg.
Index, Trigger) - WSRF-compliant service
- WS-ResourceProperty for Query source
- WS-Notification mechanism for Subscription source
- Other services/data sources
- Executable program that obtains data via some
domain-specific mechanism for Execution source.
26Information ProvidersCluster and Queue Data
- Interfaces to Hawkeye, Ganglia, CluMon
- Not WS so these are Execution Sources
- Basic host data (name, ID), processor
information, memory size, OS name and version,
file system data, processor load data - Some condor/cluster specific data
- Interfaces to PBS, Torque LSF queue system
- Queue information, number of CPUs available and
free, job count information, some memory
statistics and host info for head node of cluster
27Information ProvidersGT4 Services
- Every WS built using GT4 core
- ServiceMetaDataInfo element includes start time,
version, and service type name - Reliable File Transfer Service (RFT)
- Service status data, number of active transfers,
transfer status, information about the resource
running the service - Community Authorization Service (CAS)
- Identifies the VO served by the service instance
- Replica Location Service (RLS)
- Note not a WS
- Location of replicas on physical storage systems
(based on user registrations) for later queries
28Sample Deployment
29WebMDS User Interface
- Web-based interface to WSRF resource property
information - User-friendly front-end to the Index Service
- Uses standard resource property requests to query
resource property data - XSLT transforms to format and display them
- Customized pages are simply done by using HTML
form options and creating your own XSLT
transforms - Sample page
- http//mds.globus.org8080/webmds/webmds?infoinde
xinfoxslservicegroupxsl
30WebMDS Service
31(No Transcript)
32(No Transcript)
33(No Transcript)
34Any questions before I walk through two current
deployments?
- Grid Monitoring and Use Cases
- MDS4
- Index Service
- Trigger Service
- Information Providers
- Deployments
- Metascheduling Data for TeraGrid
- Service Failure warning for ESG
- Performance Numbers
35Working with TeraGrid
- Large US project across 9 different sites
- Different hardware, queuing systems and lower
level monitoring packages - Starting to explore MetaScheduling approaches
- GRMS (Poznan)
- W. Smith (TACC)
- K. Yashimoto (SDSC)
- User Portal
- Need a common source of data with a standard
interface for basic scheduling info
36Cluster Data
- Provide data at the subcluster level
- Sys admin defines a subcluster, we query one node
of it to dynamically retrieve relevant data - Can also list per-host details
- Interfaces to Ganglia, Hawkeye and CluMon
available now - Nagios should be set by late January
37Cluster Info
- UniqueID
- Benchmark/Clock speed
- Processor
- MainMemory
- OperatingSystem
- Architecture
- Number of nodes in a cluster/subcluster
- TG specific Node properties
- StorageDevice
- Disk names, mount point, space available
38Data to collect Queue info
- Interface to PBS (Pro, Open, Torque), LSF
- LRMSType
- LRMSVersion
- DefaultGRAMVersion and port and host
- TotalCPUs
- Status (up/down)
- TotalJobs (in the queue)
- RunningJobs
- WaitingJobs
- FreeCPUs
- MaxWallClockTime
- MaxCPUTime
- MaxTotalJobs
- MaxRunningJobs
39How will the data be accessed?
- Java and command line APIs to a common TG-wide
Index server - Alternatively each site can be queried directly
- One common web page for TG
- http//snipurl.com/j24r
- Query page is next!
40(No Transcript)
41Status
- Currently have a demo system up
- Queuing data from SDSC and NCSA
- Cluster data using CluMon interface at NCSA
- Basic WebMDS interface
- Being deployed more widely for TeraGrid this week
- General patch for 4.0.1 deployments should be
available next week let me know if youre
interested!
42ESG use of MDS4 Trigger Service
- Need a way to notify system administrators and
users what the status of their services are - In aprticular, interested in
- Replica Locatoin Service (RLS)
- Storage Resource Manager service (SRM)
- OpenDAP
- Web Server (HTTP)
- GridFTP fileservers
43Trigger Service and ESG Cont.
- The Trigger service periodically checks to see if
services are up and running - If a service is gone down or is unavailable for
any reason, an action script is executed - Sends email to administrators
- Update portal status page
- Been in use for over a year (used GT3 version
previously)
44(No Transcript)
45OUTLINE
- Grid Monitoring and Use Cases
- MDS4
- Index Service
- Trigger Service
- Information Providers
- Deployments
- Metascheduling Data for TeraGrid
- Service Failure warning for ESG
- Performance Numbers
46Index Server Stability 4.0.0
- Zero-entry index on same server
- Ran queries against it for 8,338,435 seconds
(just over 96 days) - Server machine needed to be rebuilt for patches
- Processed 623,395,877requests
- Avg 74 per second
- Average query round-trip time of 13ms
- No noticeable performance or usability
degradation over the entire duration of the test
474.0.1 Index Stability
- 100-entry index on same server, running just over
47 days - 190K of data has been retrieved
- Processed over 20 million requests, averaging 5
per second - No noticeable performance or usability
degradation.
48Scalability Experiments
- MDS index
- Dual 2.4GHz Xeon processors, 3.5 GB RAM
- Sizes 1, 10, 25, 50, 100
- Clients
- 20 nodes also dual 2.6 GHz Xeon, 3.5 GB RAM
- 1, 2, 3, 4, 5, 6, 7, 8, 16, 32, 64, 128, 256,
384, 512, 640, 768, 800 - Nodes connected via 1Gb/s network
- Each data point is average of 8 minutes
- Ran for 10 mins but first 2 spent getting clients
up and running - Error bars are SD over 8 mins
- Experiments by Ioan Raicu, U of Chicago, using
DiPerf
49(No Transcript)
50(No Transcript)
51Performance
- Is this enough?
- We dont know!
- Currently gathering up usage statistics to find
out what people need - Bottleneck examination
- In the process of doing in depth performance
analysis of what happens during a query - MDS code, implementation of WS-N, WS-RP, etc
- Goal- HPDC submission (early January)
52Summary
- MDS4 is a WS-based Grid monitoring system that
uses current standards for interfaces and
mechanisms - Available as part of the GT4 release
- Currently in use for resource selection and fault
notification - Initial performance results arent awful we
need to do more work to determine bottlenecks
53Where do we go next?
- Extend MDS4 information providers
- More data from GT4 WS
- GRAM, RFT, CAS
- More data from GT4 non-WS components
- RLS, GridFTP
- Interface to other data sources
- Inca, GRASP
- Interface to archivers
- PinGER, NetLogger
- Additional scalability testing and development
- Additional clients
54Other Possible HigherLevel Services
- Archiving service
- The next high leverl service well build
- Looking at Xindice as a possibility
- Site Validation Service (ala Inca)
- Prediction service (ala NWS)
- What else do you think we need?
55Contributing to MDS4
- Globus is opening up its development environment
similar to Apache Jakarta - MDS4 will be a project in the new scheme
- Contact me for more details
- jms_at_mcs.anl.gov
- http//dev.globus.org
56Thanks
- MDS4 Team Mike DArcy (ISI), Laura Pearlman
(ISI), Neill Miller (UC), Jennifer Schopf (ANL) - Students Ioan Raicu, Xuehai Zhang
- This work was supported in part by the
Mathematical, Information, and Computational
Sciences Division subprogram of the Office of
Advanced Scientific Computing Research, U.S.
Department of Energy, under contract
W-31-109-Eng-38, and NSF NMI Award SCI-0438372.
This work also supported by DOESG SciDAC Grant,
iVDGL from NSF, and others.
57For More Information
- Jennifer Schopf
- Jms_at_mcs.anl.gov
- http//www.mcs.anl.gov/jms
- Globus Toolkit MDS4
- http//www.globus.org/toolkit/mds
- Monitoring and Discovery in a Web Services
Framework Functionality and Performance of the
Globus Toolkit's MDS4 - www.mcs.anl.gov/jms/Pubs/mds-sc05.pdf
58CLADE 2005Challenges of Large Applicationsin
Distributed Environments Workshop
- In conjunction with HPDC 2005
- June 19 or 20, Paris (papers due Feb 1)
- If you have a large scale application and would
like to report on - Results on the development, deployment,
management and evaluation of aps - Ways your application has benefited from
- Innovative resource management or scheduling
- The use of extremely large data sets and data
mgmt - Runtime support for intelligent, adaptive systems
- Portability, quality of service, or
fault-tolerance Performance analysis, evaluation,
and prediction of adaptive systems - http//www.mcs.anl.gov/bair/CLADE2006/
59(No Transcript)
60Site 1
1. Resources at Sites
Rsc 1.a
Site 3
Rsc 2.a
Rsc 3.a
Rsc 2.b
Rsc 1.d
61Site 1
2. Site Index Setup
Rsc 1.a
Site 3
Rsc 2.a
Rsc 3.a
Rsc 2.b
Rsc 1.d
62Site 1
3. VO Index Setup
Rsc 1.a
Site 3
Rsc 2.a
Rsc 3.a
Rsc 2.b
Rsc 1.d
63WebMDS
Trigger Service
Site 1
Rsc 1.a
Site 3
Rsc 2.a
Rsc 3.a
Rsc 2.b
Rsc 1.d
64WebMDS
4. Application Index Setup
Site 1
Rsc 1.a
Site 3
Rsc 2.a
Rsc 3.a
Rsc 2.b
Rsc 1.d
65With this deployment, the project can
- Discover needed data from services in order to
make job submission or replica selection
decisions by querying the VO-wide Index - Evaluate the status of Grid services by looking
at the VO-wide WebMDS setup - Be notified when disks are full or other error
conditions happen by being on the list of
administrators - Individual projects can examine the state of the
resources and services of interest to them