GridMonitor: Integration of Large Scale Facility Monitoring With MDS - PowerPoint PPT Presentation

About This Presentation
Title:

GridMonitor: Integration of Large Scale Facility Monitoring With MDS

Description:

Information Provider Provides Cache for the Newest Value From the Mysql Database ... A Sub-cluster Contains the Host With the Same Configuration ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 17
Provided by: bruce278
Learn more at: https://chep03.ucsd.edu
Category:

less

Transcript and Presenter's Notes

Title: GridMonitor: Integration of Large Scale Facility Monitoring With MDS


1
GridMonitor Integration of Large Scale Facility
Monitoring With MDS
  • Richard Baker, Antonio Chan
  • Jason Smith, Dantong Yu
  • USATLAS/RHIC Computing Facility
  • Brookhaven National Lab

2
Outline
  • Requirements
  • System Framework, Structure and Characteristics
  • I Ganglia and Its Information Provider
  • II Relational Database Based Archiving and Its
    Information Provider
  • Gridview and GStat, Front End System
  • http//heppc1.uta.edu/atlas/grid-status/mds.gremli
    n.usatlas.bnl.gov.html
  • Current Status and Future Works

3
Requirements
  • Requirements
  • Modularity and Extensibility Make Use of
    Existing Monitoring Pieces
  • Flexibility Adjustable to the Dynamics of the
    Monitored Systems
  • Overhead Non-intrusive
  • Scalability
  • Security, Consistency, Inter-operability,
    Etc-bility

4
What Need to Be Monitored
  • Linux Farm Monitoring
  • Description
  • About 1100 Dual CPU LINUX Nodes
  • Performance Data Must Be Summarized for
    Advertising to Grid
  • Performance Events Required
  • Configuration Information
  • Status Information CPU Load, (1, 5, 10, 15),
    Memory Load, Disk Load, and Network Load
  • Example Usage A Resource Broker Might Ask the
    Availability of Linux Farm System Resources in
    Order to Plan the Efficient Execution of Tasks

5
More
  • Network Monitoring
  • Description
  • 8 USATLAS Testbeds
  • Publish the Connectivity of These Test-beds,
    Monitor the Healthiness of the USATLAS Network
  • Archived Performance Data Can Be Used to Predict
    the Network Behavior a User Can Choose the Source
    and Destination for File Replication
  • Performance Events Required
  • Bandwidth, Delay ( Round Trip Time), Trace Route

6
Monitoring Framework
7
Monitoring System Components
  • Four Tier Structure
  • Sensors
  • Host Ganglia, Top, /Proc and lsf Host Load
  • Archive System (Database System)
  • Round Robin Database (RRD)
  • Relational Database UNIXodbcmyodbcmysql
    Database
  • Information Providers
  • Monitoring and Discovery Service (Mds2.2), GLUE
    Schema, Customized Ganglia Client Tool Reporting
    the Lastest Monitoring Data, and Database Client
    Tools Reporting the Summary Information
  • Front-end Browsing System
  • Gridview, GStat (Grid Visualization Tool
    Developed at
  • Univ. of Texas at Arlington)

8
Advantages
  • Information Provider Provides Cache for the
    Newest Value From the Mysql Database
  • Non-intrusiveness Information Provider Can
    Eliminate the User Random Accesses to the
    Database Server
  • Scalability Can Be Significantly Increased
  • 1000 Linux Nodes Are Being Monitored
  • Network Connectivity of Eight Usatlas Testbeds
    Each Site Monitoring the Paths From Itself to the
    Other Seven. Network Topology and Traffic Can Be
    Easily Constructed
  • Flexibility
  • Independent on Sensors. Many Sensors Can Be
    Easily Plugged As Long It Has Well Defined
    Protocol and API We Could Switch Among Ganglia,
    top, /proc
  • Archive System Is Independent to Underlying
    Database
  • Can Be rdbms, Oracle, Mysql, Sybase, Informix,
    Flat Files, Objectivity As Long the Odbc Drivers
    Is Available

9
I Ganglia Monitoring with MDS
  • Ganglia Information Provider
  • Front-end Glue-schema Http//www.cnaf.Infn.It/se
    rgio/datatag/glue/
  • Back-end XML

Gmond
Gmond
Gmond
Gmond
Cluster A Multicast Channel
Cluster A Multicast Channel
Gmond
Gmond
Gmond
Gmond
Gmond

Gmond
Gmond
Gmond
XML
XML
Gmetad (filtered)
Gmetad (filtered)
XML
GLUE
?
Ganglia IP
MDS
Layered Gmetad
10
I Ganglia Monitoring with MDS
  • gremlin grid-info-search -x -h
    spider.usatlas.bnl.gov -s one
  • ATLAS Linux Cluster, local, grid
  • dn clATLAS Linux Cluster, mds-vo-namelocal,
    ogrid
  • objectClass GlueClusterTop
  • objectClass GlueCluster
  • GlueClusterName ATLAS Linux Cluster
  • GlueClusterUniqueID ATLAS_Linux_Cluster-RCF_and_A
    CF_Linux_Farm_Group
  • GlueClusterService compute
  • PHOBOS CAS Linux Cluster, local, grid
  • dn clPHOBOS CAS Linux Cluster,
    mds-vo-namelocal, ogrid
  • objectClass GlueClusterTop
  • objectClass GlueCluster
  • GlueClusterName PHOBOS CAS Linux Cluster
  • GlueClusterUniqueID PHOBOS_CAS_Linux_Cluster-RCF_
    and_ACF_Linux_Farm_Group
  • GlueClusterService compute
  • STAR CAS Linux Cluster, local, grid
  • dn clSTAR CAS Linux Cluster, mds-vo-namelocal,
    ogrid
  • objectClass GlueClusterTop
  • objectClass GlueCluster

11
II Farm Monitoring
  • Linux Farm Is Divided Into Different Sub-clusters
    Based on Site Policy, Different Experiments, OS
    and Version, CPU Speed. A Sub-cluster Contains
    the Host With the Same Configuration
  • Bnl Atlas Farm Is Partitioned Into Four
    Subclusters Cpu400mhz, Cpu700hz, Cpu1ghz,
    Cpu1.4ghz and CPU 2.4GHZ
  • The Status Information of a Sub-cluster Is
    Summarized From All Nodes in This Sub-cluster
  • Grid Resource Broker Schedules in the Level of
    Farm Sub-clusters

12
Information Schema (Linux Farm Monitoring)
  • Queue-Info
  • objectclass ( 1.3.6.1.4.1.3536.2.6.0.0.0.0
    NAME 'Queue-Info' SUP 'Mds' STRUCTURAL
    MUST ( MdsQueueNumberOfCpu
    MdsQueueSpeed
    MdsQueueAverageLoad
    MdsQueueAverageUserPercent
    MdsQueueAverageSysPercent ))
  • Need to be replaced by GLUB-schema

13
Backend Data Structure
  • Node Status Information
  • mysqlgt describe node_load
  • ----------------------------------------------
    --- ------------------------------
  • Field Type Null
    Key Default Extra
  • ----------------------------------------------
    -----------------------------------
  • load_index int(10) unsigned PRI
    NULL auto_increment
  • sampletime timestamp(14) YES MUL
    NULL
  • machine_id varchar(31)
  • owner varchar(8)

  • load_5 float(10,2)
    0.00
  • user_cpu float(10,2)
    0.00
  • sys_cpu float(10,2)
    0.00
  • ----------------------------------------------
    -----------------------------------

14
Information Provider (Linux Farm Monitoring)
  • generate Farm information every 10 minutesdn
    MdsFarmQueueName1000, MdsHostNodeDomainNameusat
    las.bnl.gov, Mds-Host-hngremlin.usatlas.bnl.gov,
    Mds-Vo-namelocal, ogridobjectclass
    GlobusTopobjectclass GlobusActiveObjectobjectcl
    ass GlobusActiveSearchtype execpath
    /usr/local/globus-new/customizebase
    mds-farm-batch-info.plargs -dn
    MdsFarmQueueName1000,MdsHostNodeDomainNameusatla
    s.bnl.gov,Mds-Host-hngremlin.usatlas.bnl.gov,Mds-
    Vo-namelocal,ogrid -ttl 900cachetime
    600timelimit 20sizelimit 400

15
Observation from Grid-View
16
Current Status and Future Work
  • Current Status
  • Sensors Local Monitoring Tools Put Less Than 1
    Percent CPU Load Non-intrusive
  • Improved the Ganglia Information Provider, It Can
    Obtain Information From Both Gmond and Gmetad
  • Multiple Hierarchical Clusters Are Supported
  • Future Works
  • Merge the Ganglia RRD Information Provider and
    the Archive DB Information Provider
  • Work With the Ganglia Team and Glue-schema, Help
    to Define Requirements for What Information Be
    Monitoring for Job Scheduling
  • Automate the Mapping From Xml to ldif (via Glue
    Schema?), Provide Flexibility
  • Continue to Optimize The Information Provider to
    Deliver Data Faster
  • Scalability Test
Write a Comment
User Comments (0)
About PowerShow.com