Online Monitoring with MonALISA - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Online Monitoring with MonALISA

Description:

analyze this information in real time. take automated decisions and perform actions based on it ... MySQL daemon is automatically restarted. when it runs out of memory ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 25
Provided by: Office20093
Category:

less

Transcript and Presenter's Notes

Title: Online Monitoring with MonALISA


1
Online Monitoring with MonALISA
  • Dan Protopopescu
  • Glasgow, UK

2
MonALISA
  • Is a distributed service able to
  • collect any type of information from different
    systems
  • analyze this information in real time
  • take automated decisions and perform actions
    based on it
  • optimize work flows in complex environments
  • Read more at
  • http//monalisa.caltech.edu

3
Uses
  • Monitoring distributed computing, i.e. GRIDs
  • Optimizing flow in complex system (VRVS, optics
    cable networks)
  • ALICE also uses ML for monitoring online
    reconstruction
  • Some benchmark figures for the service
  • 800k monitored parameters at 50k
    updates/second
  • gt 10k running (alien) jobs monitored
    simultaneously
  • gt 100 WAN links
  • We are proposing ML as a high level monitoring
    and possible control system along with (or on top
    of) existing slow controls systems as epics, pvss
    etc.

4
Advantages
  • MonALISA is simple to install, configure and use
  • ApMon APIs are available in C, C, Java, Python
    and Perl
  • ROOT plugin allows macros to send data directly
    to MonaLISA
  • Can easily interface with (or sit on top of) any
    existing or future slow controls subsystem
    (epics, pvss)
  • Data is stored in a standard PgSQL (or MySQL)
    database that can be accessed by other
    applications, independently of ML
  • Automatic data summarizing
  • Several data repositories (and hence DBs) can
    exist (local and remote)
  • Easy access via WebService (WS) from service
    and/or repository
  • Fully supported by development team work is
    being done in this direction

5
Capabilities
  • Based on monitored information, actions can be
    taken in
  • ML Service
  • ML Repository
  • Actions can be triggered by
  • Values above/below given thresholds
  • Absence/presence of values
  • Correlations between several values
  • Possible actions types
  • External command
  • Plain event logging
  • Annotation of repository charts RSS feeds
  • Email
  • Instant messaging

6
Components
GUI
LUS/Proxies
Web Server
Service
Service
ApMon
Actions based on local information
Repository
ApMon
ApMon
ApMon
Actions based on aggregated information
Quick actions
7
Service setup
ML Service setup wget http//nuclear.gla.ac.uk/
protopop/ML/MonaLisa.tar.gz tar -zxvf
MonaLisa.tar.gz cd MonaLisa/ ./install.sh cd
../MonaLisa/Service/CMD/ ./MLD start
LUS
Web Server
Service
Service
ApMon
Actions based on local information
Repository
ApMon
ApMon
ApMon
Actions based on aggregated information
Quick actions
8
Repository setup
ML Repository setup wget http//nuclear.gla.ac.u
k/protopop/ML/MLrepository.tgz tar -zxvf
MLrepository.tgz configure it cd
MLrepository ./start.sh
LUS
Web Server
Service
Service
ApMon
Actions based on local information
Repository
ApMon
ApMon
ApMon
Actions based on aggregated information
Quick actions
9
ApMon setup
ApMon setup wget http//nuclear.gla.ac.uk/proto
pop/ML/ApMon_perl.tar.gz tar -xzvf
ApMon_perl.tar.gz cd ApMon_perl create your
script, say mysend.pl perl mysend.pl
LUS/Proxies
Web Server
Service
Service
ApMon
Actions based on local information
Repository
ApMon
ApMon
ApMon
Actions based on aggregated information
Quick actions
10
Simple monitoring script
monalisa_at_glasgow cat mysend.pl use ApMon my
apm new ApMon("glasgow.jlab.org8884"
gt "sys_monitoring" gt 0, "general_info" gt
0) my _at_pair while (1) loop forever
get values from somewhere _at_pair
getmypar(pspec_logic_ai_0)
apm-gtsendParameters(Detector", MOR, _at_pair)
sleep (20)
LUS
Web Server
Service
Service
ApMon
Actions based on local information
Repository
ApMon
ApMon
ApMon
Actions based on aggregated information
Quick actions
11
Time history
Time history example monalisa_at_glasgow cat
mor.properties pagehist FarmsJlabML ClustersDe
tector NodesMOR Functionspspec_logic_ai_0 ylabel
Tagger rate titleMOR annotation.groups2
LUS
Web Server
Service
Service
ApMon
Actions based on local information
Repository
ApMon
ApMon
ApMon
Actions based on aggregated information
Quick actions
12
Web interface
13
Java GUI
14
Application control
Your custom Java client
  • ML Clients
  • TCP based subscribe mechanism serialized,
    compressed objects with optional encryption
  • ML Proxies
  • Application commands are encrypted
  • ML Services
  • Standard and/or users sensors and/or
    application modules

GUI client
ML Repository
Your custom view
Key
LUS
Keystore
ML Service
Your mon module
Your app module
App MonC
ApMon
Your application
bash
Your Application
15
Alert-based Actions
MySQL daemon is automatically restarted when it
runs out of memory Trigger threshold on VSZ
memory usage
ALICE Production jobs queue is automatically kept
full by the automatic resubmission Trigger
threshold on the number of aliprod waiting jobs
Administrators are kept up-to-date on the
services status Trigger presence/absence of
monitored information via instant messaging, RSS
feeds, toolbar alerts etc.
16
Summary
  • MonALISA is a very promising tool for online
    experiment monitoring and interfacing with a
    variety of slow control subsystems GlueX are
    seriously considering ML for this task
  • Easy to configure, understand and use
  • Experience from Grid monitoring and more
  • Support from the developers group for
    implementation of new modules/features
  • Online experiment monitoring tests of CLAS_at_Jlab
    were recently carried on demo repository is at
    http//mlr1.gla.ac.uk7002

17
More examples / Extras
18
Integrated Pie Charts
19
History Plots, Annotations
20
AliEn Services Monitoring
  • AliEn services
  • Periodically checked
  • PID check SOAP call
  • Simple functional tests
  • SE space usage
  • Efficiency

21
Job Network Traffic Monitoring
  • Based on the xrootd transfer from every job
  • Aggregated statistics for
  • Sites (incoming, outgoing, site to site,
    internal)
  • Storage Elements (incoming, outgoing)
  • Of
  • Read and written files
  • Transferred MB/s

22
Individual Job Tracking
  • Based on AliEn shell cmds.
  • top, ps, spy, jobinfo, masterjob
  • Using the GUI ML Client
  • Status, resource usage, per job

23
Head Node Monitoring
  • Machine parameters, real-time history, load,
    memory swap usage, processes, sockets

24
MonALISA in AliEn
  • The MonALISA framework is used as a primary
    monitoring tool for the ALICE Grid since 2004
  • Presently the system is used for monitoring of
    all (identified) services, jobs and network
    parameters necessary for the Grid operation and
    debugging
  • The number of concurrently monitored and stored
    parameters today is 300.000 in 75 ML Services
  • The add-on tools for automatic events
    notification allow for more efficient reaction to
    problems
  • The framework design and flexibility answers all
    requirements for a monitoring system
  • The accumulated information allows to construct
    and implement automated decision making
    algorithms, thus increasing further the
    efficiency of the Grid operations
Write a Comment
User Comments (0)
About PowerShow.com