Title: PARMON A Comprehensive Cluster Monitoring System A Single System Image Case Study Developer: PARMON Team Centre for Development of Advanced Computing, Bangalore, India http://www.cdacindia.com Project Leader: Rajkumar Buyya (buyya@computer.org)
1PARMON A Comprehensive Cluster Monitoring
SystemA Single System Image Case Study
Developer PARMON Team Centre for Development
of Advanced Computing, Bangalore,
Indiahttp//www.cdacindia.comProject Leader
Rajkumar Buyya (buyya_at_computer.org)
2Topics of Discussion
- PARMON System Model Architecture
- PARMON Server
- PARMON Client
- PARMON Features and Services
- PARMON Installation and its Usage
- Monitoring with PARMON
- PARMON Integration with other products
- Conclusions and Future Directions
3Motivations
- Workstation clusters have off late become a
cost-effective solution for HPC ? . - C-DACs PARAM 10000 is a large cluster of more
than 40 Ultra-4 workstations interconnected
through low-latency, high bandwidth communication
networks. - Monitoring such huge systems is a tedious and
challenging task since typical workstations are
designed to work as a standalone system, rather
than a part of workstation clusters. - System administrators require tools to
effectively monitor such huge systems. PARMON
provides the solution to this challenging problem.
4C-DAC HPCC Software Architecture
5PARMON Capabilities
- PARMON allows the user to monitor system
activities and resource utilization of various
components of workstation clusters. - It monitors the machine at various levels
component, node and the entire system level
exhibiting a single system image. - It allows the system administrator to monitor the
following. - Aggregation of system resources utilization.
- Process activities.
- System log activities.
- Kernel activities.
- Multiple instances of the same resource.
6PARMON - Salient Features
- Online creation of Node and Group database
- Allows to monitor system activities at Component,
Node, Group, or entire Cluster level monitoring - Designed using state-of-the-art Java technology
- Monitoring of System Components
- CPU, Memory, Disk and Network
- Allows to monitor multiple instances of the same
componet. - Facility for definition of events and automatic
notification - Miscellaneous facilities Message broadcast,
Invocation of system management commands (halt,
reboot, etc.), System Information Configuration - PARMON provides GUI interface for initiating
activities/request and presents results
graphically.
7PARMON System Model
PARMON Server on Solaris Node
PARMON Client on JVM
parmon
parmond
8PARMON Implementation
- Server
- Multithreaded using POSIX and Solaris
- Developed using C as it need to access system
internals - It is a stateless server
- Client
- Developed using Java
- Java features are extensively used..
- New Window is created for each client request,
which interacts with server - Threads are used extensively to while creating
online resource utilization meters - Dynamically configures with changes to node date
base.
9Setting up of PARMON
- Server installation invocation
- Binding to port
- Rights (requires root permission for full
functionality) - parmond or parmond ltport-nogt(either at boot
time or on-line) - Needs to be loaded on all nodes to be monitored
- Client installation invocation
- Java based client (client machine can be
PC/workstation supporting JVM) - CLASSPATH (pointing to classes.zip, parmon.jar)
- jar file (parmon.jar)
- java parmon or java parmon ltport-nogt
10Monitoring System Activities and Resource
Utilization
11PARMON Launcher
12Creation of Node Database
13Node Deletion
14Group Creation
15Group Modification/Deletion
16Resource Utilization at a Glance
17Selection of Nodes/Group
18CPU Usage Monitoring
19Memory Usage monitoring
20Disk/Network Usage Monitoring
21Message Viewer (System logs)
22Process activities
23Kernel Data Catalog - CPU
24Kernel Data Catalog - Memory
25Kernel Data Catalog - Disk
26Kernel Data Catalog - Network
27Catalog of CPU Parameters
28Component View - Physical
29Component View - Logical
30Message Broadcast
31System Configuration
32System Information
33Issuing Commands halt, shutdown, etc.
34Node Diagnostics - Online (SunVTS)
35Online Help
36PARMON Integration with other Products
- PARMON can send resource utilization information
to any other product if protocols are made
available
Node 1
parmond
Node N
PARAM online bulletin board
37Summary and Recent Works
- PARMON successfully used in monitoring PARAM
OpenFrame Supercomputer, which is a cluster of 48
Ultra-4 workstations running SUN-Solaris
operating system. - Portable across platforms supporting Java
- Comprehensive monitoring support and GUI
- PARMON supports Solaris and Linux clusters and
planned for supporting NT clusters (one such
implementation was carried out at UPC,
Barcelona). - It has been extended to support web-based
monitoring of clusters, by creating a interface
server (running on web-server) between client and
PARMON server running on cluster nodes.
38References
- Project Team
- Rajkumar Buyya
- Krishna Mohan
- Bindu Gopal
- R. Buyya, PARMON A Portable and Scalable
Monitoring System for Clusters, International
Journal on Software Practice Experience (SPE),
John Wiley Sons, Inc, USA, June 2000. - Further Info http//www.buyya.com/parmon
- C-DAC http//www.cdacindia.com
39Thank YOU
?