Information Modeling and Monitoring in Grid Systems - PowerPoint PPT Presentation

1 / 52
About This Presentation
Title:

Information Modeling and Monitoring in Grid Systems

Description:

Share resources across administrative domains (e.g., computing power, ... Each domain has a Theodolite Service that gather network service related metrics ... – PowerPoint PPT presentation

Number of Views:84
Avg rating:3.0/5.0
Slides: 53
Provided by: serg132
Category:

less

Transcript and Presenter's Notes

Title: Information Modeling and Monitoring in Grid Systems


1
Information Modeling and Monitoringin Grid
Systems
  • 23 Nov 2004, Ferrara

Sergio Andreozzi INFN-CNAF Bologna
(Italy) sergio.andreozzi_at_cnaf.infn.it
2
OUTLINE
  • Problem Statement
  • Information Modeling of Grid resources
  • GLUE Schema
  • Computing Resources
  • Storage Resources
  • Network Resources
  • Common Information Model (CIM)
  • Monitoring a Grid
  • GridICE

3
PART I
  • Problem Statement

4
Grid basic principles
  • Grid systems allow to
  • Share resources across administrative domains
    (e.g., computing power, storage space, database)
  • Shared resources are
  • geographically dispersed
  • heterogeneous
  • belong to different administrative domains
  • dynamic composition
  • can be remotely accessed by users

5
Grid basic principles
  • Virtualization of users and resources
  • mapping from virtual resources to physical
  • mapping from virtual users to physical users

Grid system
1
6
Problem Statement Information Modeling
  • Resources available in Grid systems must be
    described in a precise and systematic manner if
    they are to be able to be discovered for
    subsequent management or use
  • A shared description allows multiple experts to
    contribute to the problem and serves as a
    communication mean between different knowledge
    domains

7
Problem Statement Grid Monitoring
  • How do we measure signicant parameters to analyze
    usage, behavior and performance of a Grid system
  • How do we detect and notify fault situations,
    contract violations, and user-defined events

8
PART II
  • Information Modeling
  • and the GLUE Schema

9
Information Model definition
  • Abstraction of real world into constructs that
    can be represented in computer systems (e.g.,
    objects, properties, behavior, and relationships)
  • Not tied to any particular implementation
  • Used to exchange information among different
    domains

10
Problem Statement Information Modeling
  • Main Use Cases
  • Discovery for brokering and access
  • what are the Computing Elements available to the
    VO CMS and that offer the SL3 operating system
    with installed the CMKIN software package?
  • what are the Storage Elements that offer 20
    gigabytes of disk space for the VO ATLAS?
  • Discovery for monitoring
  • how many CPUs the site XYZ is offering to the
    EGEE Grid?
  • what is the success rate of job submitted per
    site?

11
Information Model how can be represented
  • Typically, graphical languages are preferred
  • Several solutions are available
  • We have selected the Unified Modeling Language
    (UML)
  • It is a widely accepted international standard
    (Object Management Group, OMG)
  • It is often used for information and conceptual
    modeling
  • It has become well established in many
    communities with extensive tool support from both
    commercial and open source vendors

12
Unified Modeling Language (UML)
  • The Unified Modeling Language (UML) is a
    graphical language for visualizing, specifying,
    constructing, and documenting the artifacts of a
    software-intensive system.
  • The UML offers a standard way to write a system's
    blueprints, including conceptual things such as
    business processes and system functions as well
    as concrete things such as programming language
    statements, database schemas, and reusable
    software components.
  • (Object Management Group)

13
Unified Modeling Language
  • First Specification in 1997
  • Current Specification version 1.5 (12 different
    diagrams)
  • Finalizing Specification version 2.0 (13
    different diagrams)
  • Each diagram type has
  • Semantics what does the diagram type do?
  • Notation what graphical symbols can the diagram
    type contain?
  • Diagram groups
  • Structural model the static aspects of a system
  • Behavioral model the behavior of a system
    (dynamic model)
  • We use Class diagrams they show the static
    structure of the model, in particular, the things
    that exist (such as classes and types), their
    internal structure, and their relationships to
    other things

14
UML Class Diagram elements
  • Class represents a concept within the system
    being modeled. It has data structure, behavior
    and relationships to other elements
  • Generalization taxonomic relationship between a
    more general element (the parent) and a more
    specific element (the child) that is fully
    consistent with the first element and that adds
    additional information. It is used for classes,
    packages, use cases, and other elements

15
UML Class Diagram elements
  • Binary association an association among exactly
    two classes (maybe also from a class symbol to
    itself)
  • Aggregation it denotes weak ownership (i.e., the
    part may be included in several aggregates) and
    its owner may also change over time. Deleting the
    aggregate referencing does not imply deletion of
    the parts
  • Composition strong form of aggregation a part
    instance may be included in at most one composite
    at a time the composite object has sole
    responsibility for the disposition of its parts

16
GLUE Schema
2
  • approach to the information modeling of Grid
    resources started in April 2002 by the DataTAG
    and iVDGL projects
  • Contributions from DataGrid, Globus, PPDG, GryPhyn

GLUE Schema (Relational) R-GMA
DataGrid Schema (LDAP)
GLUE Schema (UML)
GLUE Schema (XML) GT MDS 4
Globus Schema (LDAP)
GLUE Schema (LDAP) GT MDS 2
17
GLUE Schemamodeling guidelines
  • Focus on the virtual abstraction given by the
    Grid paradigm
  • Virtual pool of resources
  • Generalization
  • capture common aspects for different entities
    providing the same functionality (e.g., uniform
    view over different batch services)
  • Deal with both monitoring needs and discovery
    needs
  • Monitoring concerns those attributes that are
    meaningful to describe the status of resources
    (e.g., useful to detect fault situation)
  • Discovery concerns those attributes that are
    meaningful for locate resources on the base of a
    set of preferences/constraints (e.g., useful
    during matchmaking process)

18
GLUE Computing resourceswarm up
  • What is the core offered functionality?
  • Computing power
  • What I need to know in order to use it?
  • Offered execution environment (e.g., OS type,
    available software libraries)
  • Offered Quality of Service (e.g., estimated
    response time)
  • Status (e.g., number of running jobs)
  • Policy (e.g., max execution time, assigned CPUs)
  • Access rights (e.g., can I use it?)
  • Location (e.g., Uniform Resource Locator or URL)

19
GLUE Computing resourcessome more thought
about the service
  • The computing power is typically offered by
    cluster systems
  • Requests are typically staged into queues for
    efficient system usage
  • Queue policies enable service differentiation
    (e.g., dedicated CPUs vs. shared CPUs assignment,
    differentiated max CPU time, differentiated queue
    service strategy)
  • A service has quality aspects

20
GLUE Schema an example
  • Site A has 6 worker nodes
  • (3 fresh new and fast, 3 old and slow)
  • The farm is configured as follows
  • a high-end queue to the 3 fast WNs
  • a slow queue to the 3 slow WNs
  • a background queue to the 6 WNs (lower priority)

Queue
Host - Fast
Host - Slow
Cluster
21
GLUE Computing Element (NG)
  • Possible evolution of GLUE Schema
  • CE is a site cluster
  • Queues are used to differentiate the service
  • The service offers access and management of
    available execution environments
  • ServiceClass
  • HighEnd, Slow, Background
  • ExecutionEnvironment
  • charact node A, charact node B

22
GLUE Schema CE - example
23
GLUE Storage resourceswarm up
  • What is the core offered functionality?
  • Storage Space usage
  • What I need to know in order to use it?
  • Storage Service manager type (e.g., srmv2)
  • Available data access protocols (e.g., gridftp,
    rfio)
  • Offered Quality of Service (e.g., availability,
    reliability)
  • State (e.g., available space)
  • Policy (e.g., file life time, MaxFileSize)
  • Access rights (e.g., can I use it?)
  • Location (e.g., Uniform Resource Locator or URL)

24
GLUEStorage Element
  • Storage Element
  • Refers to a group of services responsible for the
    management of storage areas and access them
  • Storage resources contributed to a Grid system
    can vary from simple disk servers to complex
    massive storage systems

25
GLUEStorage Space
  • Storage Space portion of a logical storage
    extent that
  • is assigned to Grid users (e.g., a VO, a group of
    a VO)
  • is associated to a directory of the underlying
    file system (e.g. /permanent/CMS)
  • has a set of policies (MaxFileSize, MinFileSize,
    MaxData, MaxNumFiles, MaxPinDuration, Quota, ACL)
  • has a state (available space, used space)

26
GLUE Storage Element
27
Expressing relationships amongComputing and
Storage Services
  • A typical job execution request involves certain
    properties for the computing element and for a
    permanent storage area
  • SiteAdmins may want to specify preferences on
    which Storage Areas should be used by jobs
    executed by certain computing elements
  • Possible mount point information and weight for
    choosing among different opportunities are
    provided

28
Network Resources
9
  • (not yet in the GLUE Schema)
  • Definition of a network model that enables an
    efficient and scalable way of representing the
    communication capabilities between grid services
  • Partition the Grid into Domains, and limiting the
    monitoring activity to the observation of
    Domain-to-Domain paths
  • Communication characteristics measured within the
    boundaries of D1 and D2 are negligible with
    respect to the same characteristic measured
    between the boundaries of D1 and D2.

29
Partitioning the Grid into Domains
  • A Domain is a set of elements identified by URIs
    (referred in the model as edge services)
  • Connectivity is a metric that reflects the
    quality of communication through a link between
    two Edge Services
  • A Domain communicates with other domains using
    Network Services
  • A Network Service offers a unidirectional
    communication service between two Domains
  • Each domain has a Theodolite Service that gather
    network service related metrics towards others
    domains

30
GLUE Network Element
31
GLUE Network an example scenario
32
Common Information Model
8,10
  • CIM Common Information Model
  • Conceptual view of the managed environment for IT
    resources that attempts to unify and extend the
    existing instrumentation and management standards
  • Targeted at management of resources, where
    management is defined as the active process of
    monitoring, modifying, and making decisions about
    a resource
  • Maintained by Distributed Management Task Force
    (DMTF), a worldwide industry organization
  • It uses UML Class Diagram as a modeling language

33
CIM related activities at GGF
  • CIM Grid Schema WG (CGS WG)
  • Started at GGF 5
  • Goal define CIM extensions for the Job
    Submission Service Model, i.e.
  • managed objects and their relationships for
    managing the execution and monitoring of batch
    jobs in a grid environment
  • Defined extensions will be submitted to DMTF for
    inclusion in the official CIM standard
  • Common Resource Model WG (CRM WG)
  • BOF at GGF 7
  • Goal define CIM extensions to describe managable
    resources as OGSA services

34
PART III
  • Grid Monitoring

35
Grid Monitoring Definition
  • Measuring the activity of significant Grid
    resource-related parameters to analyze usage,
    behavior and performance of a Grid system, and to
    detect and notify fault situations, contract
    violations, and user-defined events

36
Grid MonitoringRequirements
  • dynamically partition resources and service usage
    using three criteria site ownership, operations
    domain, and virtual organization accessibility
  • collect data in order to enable retrospective
    analysis
  • deal with a large volume of data by carefully
    introducing reduction mechanisms
  • collect both fine-grained and coarse-grained
    monitoring data

37
Grid MonitoringRequirements
  • help to detect fault situations and possibly
    prevent them
  • provide general visualization and analysis
    functionalities
  • rely on a common information model of the Grid
    resources
  • adopt interfaces and protocols that are standard
    within the Grid community
  • integrate with local monitoring systems, when
    available
  • Track which machines are running the VO
    applications, the status and behavior of each
    machine, and the behavior of the software.

38
GridICE architectural view
10
Presentation Service
DetectionNotification
Data Collector Service New resources detector
scheduler persistent storage
Information Service
Measurement Service
39
Measurement Service
service able to probe resources for certainp
parameters
  • Grid Service
  • gatekeeper
  • gsiftp
  • gris
  • workload-mgr
  • VO view - aggregation
  • number of total CPUs
  • number of free CPUs
  • number of running jobs
  • number of waiting jobs
  • SE free disk space
  • Machine
  • CPU load
  • memory usage
  • disk usage (per partition)
  • network activity
  • number of processes
  • system uptime

40
Information Service
  • Service that offers measured values to potential
    consumers
  • Relying on GIS (MDS 2.x)
  • Decoupling wide-discoverable information from
    periodical mobut not all nodes have direct access
    (writing) to it
  • Using the GLUE Schema mapping to LDAP data model

7
6
41
Grid Information Services overview
3
4
5
42
Discovery with MDS 2
CLIENT
  • Considering a specific customization (LCG
    project)
  • Hierarchical structure of aggregators
  • Asynchronous
  • The BDII (root of the information service tree)
    contains ALL the information published by the
    resource information services

BDII
GRIS
43
MDS2-based Information Serviceexample
44
Data Collector Service
  • Service allowing the collection of historical
    monitoring data
  • It consists of three main components
  • new resources detection service scans the GIS in
    order to detect which are the new source of
    monitoring data that should be observed
  • scheduler fires periodical observations which
    task is to discover which resource metrics are
    being offered and store the read data in a
  • persistent storage

45
Data Collector Service (2)
  • From the viewpoint of observability of Grid
    resources, we distinguish the following states

46
Data Analyzer
  • Service providing performance analysis, usage
    level and general reports and statistics
  • It can be configured to generate and send
    periodical reports of the Grid activity, and also
    of the Grid structure

47
Detection Notification Service
  • Service providing a flexible and configurable
    means for event detection and notification
    actions.
  • It provides for timely notification of the
    configured events.
  • When fault situations are detected, the right
    person or group of persons must be notified.

48
Presentation Service
  • Service providing the user interface to the
    monitoring information and control
  • designed on a role-based strategy, providing for
    different views depending on the type of the
    consumer
  • Physical view, which target is the whole set of
    grid resources being managed
  • Virtual Organization view, which target is the
    whole set of grid resources that can be accessed
    by users of a certain Virtual Organization

49
Conclusion
  • Information Modeling of Grid resources
  • Characteristics of Grid systems require a shared
    information model of resources to be used as a
    base for the Information Service
  • An important approach to the information modeling
    of Grid resources has been presented
  • Monitoring a Grid
  • Requirements of Grid monitoring have been
    presented
  • GridICE has been presented as an example of a
    monitoring tool for a Grid system

50
REFERENCES
  • 1 Németh Z, Sunderam, V. Characterizing Grids
    Attributes, Definitions, and Formalisms, Journal
    of Grid Computing, 2003, volume 1, number 1,
    pages 9-23
  • http//ipsapp009.kluweronline.com/IPS/frames/toc.
    aspx?J6160I1
  • 2 GLUE Schema Official documents
    http//www.cnaf.infn.it/sergio/datatag/glue
  • 3 Globus Toolkit Monitoring and Discovery
    Service 2
  • http//www.globus.org/mds/mds2/
  • 4 Globus Toolkit Monitoring and Discovery
    Service 4
  • http//www-unix.globus.org/toolkit/docs/deve
    lopment/4.0-drafts/info/WSMDSFacts.html
  • 5 R-GMA Relational Grid Monitoring Service
  • http//www.r-gma.org
  • 6 S. Andreozzi, GLUE Schema implementation for
    the LDAP model, INFN Technical report,
  • http//www.cnaf.infn.it/sergio/publications
    /Glue4LDAP.pdf

51
REFERENCES
  • 7 K. Czajkowskiy, S. Fitzgeraldz, I. Foster,
    and C. Kesselman. Grid Information Services for
    Distributed Resource Sharing. In Proceedings of
    10th IEEE International Symposium on
    High-Performance Distributed Computing (HPDC-10)
  • http//www.globus.org/research/papers.htmlMDS-HP
    DC
  • 8 GGF CIM Grid Schema WG
  • https//forge.gridforum.org/projects/cgs-wg/
  • 9 S. Andreozzi, A.Ciuffoletti, A. Ghiselli, C.
    Vistoli. Monitoring the Connectivity of a Grid.
  • In Proc. of the 2nd International Workshop
    on Middleware for Grid Computing (MGC 2004) in
    conjunction with the 5th ACM/IFIP/USENIX
    International Middleware Conference, Toronto,
    Canada, October 2004
  • http//www.cnaf.infn.it/sergio/publications/MGC2
    004.pdf
  • 10 GridICE Homepage
  • http//grid.infn.it/gridice
  • 11 Common Information Model (CIM).
    http//www.dmtf.org

52
  • Q A
  • or Querying the MDS 2 or GridICE demo
Write a Comment
User Comments (0)
About PowerShow.com