PEM%20status%20report - PowerPoint PPT Presentation

About This Presentation
Title:

PEM%20status%20report

Description:

The Performance and Exception Monitoring (PEM) project is a CERN IT ... The multithreaded broker contacts its assigned agents once per ... mirrored EIDE ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 16
Provided by: obar
Category:

less

Transcript and Presenter's Notes

Title: PEM%20status%20report


1
PEM status report
  • Large-Scale Cluster Computing Workshop
  • FNAL, May 24 2001
  • Olof Bärring, CERN

2
Outline
  • History
  • Design
  • First prototype
  • DataGrid fabric mgmt monitoring task
  • Conclusions

3
History
  • The Performance and Exception Monitoring (PEM)
    project is a CERN IT project since 1999
  • Leader Tim Smith, Bernd Panzer-Steindel from
    2001
  • Goal (and innovation) Monitor and alarm on
    service rather than server
  • Long requirement phase with input from many IT
    groups
  • Design settled in mid-2000

4
Design
1..n
1
1
1..n
Measurement repository (MR)
1
Agent
Broker
1..n
1
1
sensors
actuators
1..n
1
Configuration Repository (CR)
Correlation Engine (CE)
1
1..n
1
1..n
User Interface
Access control
Data
Control
5
Design agent
  • The agent forwards data from monitoring sensors
    to the broker
  • Buffering of data for transfer efficiency and
    fault tolerance
  • The configuration of local sensors and actuators
    is received via the broker
  • Firing of actuators is triggered by MR (or CE)
    via broker to the agent

6
Design broker
  • The broker was introduced for scalability
  • Transmitter of configuration information to
  • Agents
  • Measurement repository
  • Transmitter of measurement data from agent to
    measurement repository
  • Transmitter of requests for firing actuators from
    MR to the agents

7
Design measurement repository
  • The MR is the central archive of all monitoring
    measurements
  • Data compression (e.g. averaging old data)
  • Not just a passive database active notification
    of subscribed event listeners if a measurement is
    outside its configured limits triggering
    recovery actions

8
Design configuration repository
  • The configuration repository contains the
    configuration for all other components and their
    relationships, e.g.
  • Agent
  • Metrics and measurement frequency
  • Actuators
  • Broker
  • What agents to control
  • Measurement repository
  • Metric limits
  • Subscribed event listeners

9
First prototype
  • The first PEM prototype was ready for deployment
    early 2001
  • Agent status
  • Each agent measures 30 parameters (from system
    CPU to running daemons)
  • Frequency measure every 30 seconds
  • Deployed on 400 nodes since 7 weeks. Soon go to
    about 1000 nodes.

10
First prototype
  • Broker status
  • The multithreaded broker contacts its assigned
    agents once per minute and retrieves
  • Configuration which metrics have been monitored
  • The measurements since last request
  • The broker uses JDBC to write the measurements
    into an ORACLE database
  • Each measurement value and its timestamp
  • Current configuration 50 agents per broker
  • Data rate 1GB/day

11
First prototype
  • Measurement repository status
  • Some scalability problems in the beginning.
    Related with threading in JDBC
  • The ORACLE installation is currently upgraded to
    cope with increasing load (concurrent read and
    write)
  • Dual CPU PIII 800MHz
  • 512MB memory
  • Gigabit ethernet
  • 750GB mirrored EIDE disk server
  • Plan to have a cluster of database nodes to cope
    with anticipated load in the future

12
DataGrid fabric mgmt (WP4)
  • PEM prototype will most likely be adopted
  • WP4 promotes
  • High node autonomy. Monitoring hierarchies where
    lowest level can be entirely confined to a node
    (tight sensor-actuator loops)
  • PEM configuration repository will be replaced by
    WP4 configuration management system
  • Possibly use the transport layer from WP3
    framework based on the GMA (Grid Monitoring
    Architecture) producer-consumer model

13
Monitoring hierarchy
GRID
Fabric view
GUI
MR
Correlation Engine (CE)
Cluster view
Configuration Management system
Node view


Node
Config cache
sensor
CE
Agent
Actuator
sensor
sensor
MR cache
14
Service view
  • How to translate measured simple metrics into a
    service view?
  • Some preliminary plans in WP4
  • Create probes that act as user programs
    (configurable for different CPU, memory, I/O,
    characteristics)
  • Run probes on idle systems -gt benchmarking
  • Run probes under different load conditions and
    measure concurrently a set of simple metrics
    (e.g. CPU load, memory usage, I/O rates,
    bandwidth to homedirectory, ) -gt matrix for
    mapping of expected performance

15
Conclusions
  • After a long requirements and design phase, PEM
    has now reached a working prototype
  • PEM will be adopted by WP4 with slight
    modifications
  • Node autonomy, monitoring hierarchy
  • Use central fabric configuration mgmt
  • May implement GMA interfaces provided by WP3 for
    monitoring transport and publication mechanisms
Write a Comment
User Comments (0)
About PowerShow.com