Lemon Monitoring - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Lemon Monitoring

Description:

Stores the full monitoring history data. Two implementations - flat files or Oracle DB based ... LEAF (LHC-Era Automated Fabric) for high-level intervention ... – PowerPoint PPT presentation

Number of Views:220
Avg rating:3.0/5.0
Slides: 16
Provided by: admo6
Category:

less

Transcript and Presenter's Notes

Title: Lemon Monitoring


1
Lemon Monitoring
  • Miroslav Siket, German Cancio, David Front,
  • Maciej Stepniewski
  • CERN-IT/FIO-FS
  • LCG Operations Workshop
  • Bologna, 24-26 May 2005

2
Outline
  • Lemon
  • Structure and design
  • How it works, deployment
  • Use cases, web interface
  • Installation and setup
  • Summary

3
Lemon LHC Era Monitoring
  • Lemon is a system containing tools for monitoring
    status and performance of computers
  • Distributed monitoring system scalable to 10k
    nodes
  • Provides active monitoring of software and
    hardware in the Computer Center on centrally
    managed clusters
  • Facilitates early error detection and problem
    prevention
  • Executes corrective actions and sends
    notifications
  • Provides persistent storage of the monitoring
    data
  • Offers a framework for further creation of
    sensors for monitoring
  • Site independent functionality
  • Link http//cern.ch/lemon
  • Part of the ELFms toolsuite http//cern.ch/elfms

4
Lemon Use
  • It is used in-and-outside CERN by
  • System administrators, service managers, cluster
    responsibles
  • Developers and service/data challenges
  • Managers and general users
  • Deployments outside CERN
  • EDG testbeds
  • Accelerator (AB) department at CERN
  • CMS online
  • GridICE
  • BARC India (development partner)

5
Lemon architecture
6
Components
  • Lemon is a typical server/client application with
    following components
  • MSA Monitoring Sensor Agent (Lemon Agent)
  • Daemon on a client machine that spawns multiple
    Monitoring Sensors to measure data in defined
    intervals and sends data to Monitoring Repository
  • MS - Monitoring Sensor
  • Uses standard C, perl API it is easy to write
    your own sensor
  • Several sensors exist for performance, process,
    hw and sw monitoring, grid VOs job reporting,
    database monitoring, security, alarms (total 260
    metrics)
  • MR Monitoring Repository
  • Server application that receives samples and
    processes/validates them
  • Stores the full monitoring history data
  • Two implementations - flat files or Oracle DB
    based
  • LRF - Lemon RRD Framework
  • Pre-processes data into rrd files and creates
    cluster summaries
  • These are used for web graphics
  • Provides service and cluster overview in its web
    displays
  • LAG Lemon Alarm Gateway
  • Generic gateway for alarms (in development)
  • Gateways to MonALISA and GridICE exist

7
Lemon at CERN
  • Lemon monitors about 2200 computers in 100
    clusters
  • On average it collects about 70 metrics from each
    host
  • Integrated with Sure alarm system
  • Collecting about 1.5 GB/day
  • LEAF (LHC-Era Automated Fabric) for high-level
    intervention scheduling

Node
Configuration Management
Node Management
  • Configuration
  • Derived from the Quattor Configuration Database
    (CDB)
  • individual configuration per cluster/host
  • hierarchical structure
  • Alarm system
  • Sure legacy system receiving alarms from Lemon
  • Integration with new LASER system (LHC alarm
    system) via LAG is ongoing

8
Web interface
  • Cluster view displays accumulated statistics and
    status for all machines in the cluster
  • Host view gives overview of the host status with
    basic metrics
  • Other views available
  • Rack view
  • Hardware type view
  • Other views can be added, working on user defined
    views
  • With the newest version (to be released soon)
  • Generic entry page displaying status overview of
    the key services
  • Configurable views
  • In development database services monitoring with
    database specific view

9
Use(ful) case
Reboot occurrence history graph
  • Kernel upgrade
  • Kernel version is measured on the boot of the
    machine
  • Automatic tools for upgrading the kernel on a
    cluster retrieve information from Lemon and
    schedule reboot of a machine based on this info
  • Web interface allows monitoring of the progress

10
Computer Center display
  • Lemon Web Interface can be interfaced with a
    Computer Center database of objects (racks,
    silos, )
  • Provides search of objects as well as listing
  • Interfaced through a XML defined geometry of the
    computer center
  • Generic design that can be used anywhere

lt?xml version"1.0" ?gt ltCCgt ltROOM
ID0513-S-0034" DESCRIPTIONTape Vault" R"0"
G"0" B"0"gt ltDOORS R"0" G"255"
B"0"gt ltDOOR X"63" Y"39" LX"64" LY"39" /gt
ltDOOR X"34" Y"0" LX"36" LY"0" /gt
lt/DOORSgt ltRACKS R"0" G"0"
B"203"gt ltRACK ID"EA01" X"73" Y"9" LX"75"
LY"10" PLANNED"0"/gt ltRACK ID"EA03" X"73"
Y"8" LX"75" LY"9" PLANNED"0"/gt
lt/RACKSgt ltWALLS R"0" G"0"
B"0"gt ltWALL X"0" Y"0" LX"0" LY"60" /gt
ltWALL X"0" Y"0" LX"76" LY"0" /gt
lt/WALLSgt ltSTEPS R"255" G"163"
B"0"gt ltSTEP X"47" Y"36" LX"52" LY"37" /gt
ltSTEP X"47" Y"37" LX"52" LY"38" /gt
lt/STEPSgt lt/ROOMgt lt/CCgt
11
Service challenges, GRID VOs
  • Lemon allows for
  • Virtual clusters
  • clusters defined on request by service managers
  • or defined by scripts updated dynamically on
    demand
  • or defined for specific purpose
  • Examples Alice MDC, network challenges,
  • Clusters defined dynamically
  • example hosts running GRID jobs on the batch
    cluster belonging to the given Virtual
    Organization
  • hooks in Lemon for defining any dynamic grouping
    of hosts

12
Automatic recovery actions and Alarms
  • Alarm Sensor
  • For defined values of measured metrics an
    actuator is called with predefined action
  • An example ssh daemon dead action
    /sbin/service sshd start
  • Definition metric X, field Y ltopgt reference
    value Z gt call actuator
  • ltopgt can be ,lt,gt,regexp, range, etc..
  • If success log only, else call action up to max
    times
  • Each occurrence is logged in the Monitoring
    Repository
  • Already about 70 predefined alarms with automatic
    recovery actions
  • After first month of deployment it reduced number
    of problem tickets by half
  • Correlation engine (CMDaemon)
  • Allows global correlations, and in the future
    client/server alarms and recovery actions
  • Lemon Alarm gateway (LAG)
  • Lemons LAG can be used to feed alarms into
    arbitrary alarm systems (under development)

13
Installation and setup (I)
  • Lemon installation consists of three steps
  • Server installation
  • Client installation
  • Web interface installation
  • 1. Server installation
  • install edg-fabricMonitoring-server rpm (flat
    file server)
  • Configure receiving port in /etc/edg-fmon-server.c
    onf
  • Start the server daemon
  • 2. Client installation
  • Install edg-fabricMonitoring-agent rpm (comes
    with default metric configuration)
  • Configure server and its port in
    /etc/edg-fmon-agent.conf
  • Start the client daemon on all monitored hosts

14
Installation and setup (II)
  • 3. Web interface installation
  • Install and start apache server (with php) on
    your server
  • Install rrdtool and lrf (lemon rrd framework)
    rpms
  • Configure your clusters in clusters.conf file and
    start lemonmrd daemon
  • Drink Champagne you have Lemon up and running!
    -)
  • You can do all this on your laptop!
  • Possible additional components
  • Computer center synoptic view through xml file
  • Problem tracking system integration (through php
    plug-in to your DB/application)
  • Quattor CDB configuration view through CDB xml
    profiles
  • Oracle based Repository (for very large
    installations with high scalability and increased
    functionality)
  • Other, new components are easy to add
  • View detailed instructions at http//cern.ch/lemo
    n/doc/installation/installation.html

15
Summary
  • Lemon serves to provide monitoring information
    about the farms in Computer Centers (or your
    laptop).
  • Lemon provides framework for recovery actions and
    alarms.
  • Lemon is easy to install (and it is easy to add
    your own metrics and visualize them).
  • It is flexible with respect to your needs you
    can add clusters, views, specify your definition
    of virtual and dynamic clusters.
  • It has been a useful tool for general monitoring
    of performance and also for system administrators
    in debugging problems.
  • For more information check http//cern.ch/lemon
Write a Comment
User Comments (0)
About PowerShow.com