You are here:
TWiki
>
WMSMonitor Web
>
WMSMonDBAnalyzer
(2009-06-16,
DanieleCesini
)
(raw view)
---+++ WMSMonitor DB-Analyzer (Available from release 2.0) The DB analyzer is a daemon that periodically checks the WMSMonitor database looking for new data and keeps track of the status of any monitored instances and notifies their status to a NAGIOS server, that should be configured in order to accept WMSMonitor notifications. The main purpose of the DB analyzer is to send notification to NAGIOS that successively is able to send email and to interact with the SMS gateway keeping a database for the alarms history. <br />This is a cheap way to implement a robust notification service for WMSMonitor. We are however working to implement a stand alone notification service for the db analyzer. It is also possible to specify groups of instances so that special notifications to nagios are sent about the whole group and not only the single instances.<br />This is particularly convenient when a site has multiple instances dedicated to one VO. All the executable needed to start the DB analyzer are already present on the data_collector, under the usual directory /root/wmsmon<br />It uses the same wmsmon_site-info.def file to obtain the db connection parameters.<br /> ---++++++ Configuration and start of the DB Analyzer DB analyzer is implemented in python and many parameters are still hardcoded in the python executable, so they must be modified editing the executable itself. We will provide an installation script in the next WMSMonitor releases. The analyzer sends to NAGIOS notifications for a service MON-WMS or MON-LB, that should be configured in NAGIOS as a service of the WMS(LB) host. A typical notification is the following (where gstore.cnaf.infn.it is the NAGIOS server): In case of an LB: _echo "lb010;MON-LB;0;lb010.cnaf.infn.it STATUS is OK - " | send_nsca -H gstore.cnaf.infn.it -d ';' -c /etc/nagios/send_nsca.cfg_ In case of a WMS: _echo "devel14;MON-WMS;2;devel14.cnaf.infn.it STATUS is CRITICAL - At least daemon LM is dead!" | send_nsca -H gstore.cnaf.infn.it -d ';' -c /etc/nagios/send_nsca.cfg_ There are four kinds of notification that can be sent for any single instance: *OK, WARNING, CRITICAL, UNKNOWN* defined as follow: *OK*: no problem found in the DB for that specific instance *WARNING*: problems are found but they are not critical, i.e. internal WMS/LB components queues are increasing but are not too high or a file system occupancy is between 80% and 90% *CRITICAL*: something bad was found in the db about the instance: i.e. internal WMS/LB components queues are greater than 3000 entries or a file system occupancy greater than 90% _NOTE that the analyzer is able to associate an LB to a WMS from the information stored into the DB. The status of the LB affects the status of the WMS, but not vice versa. If the LB is in WARNING and the WMS itself is OK the notification for the WMS will be WARNING. The worst status between the WMS and LB are notified for the WMS._ *UNKNOWN*: the latest data about an instance are too old to have a reliable status NAGIOS should be configured to handle all these notification. In example the CNAF NAGIOS is configured to notify via mail every status change on any instance. The DB analyzer send notifications also about *groups of instances.* *Groups* are discovered from the WMSMonitor DB, they reflect the group reported in the third coloumn of the wmslist.conf file. Notifications are sent to NAGIOS for each group following these rules: *OK*: no problem found in the DB for that specific group *WARNING*: less than 50% of the group instances are in critical status. *CRITICAL*: more than 50% of the group instances are in critical status. *UNKNOWN*: the latest data about an instance are too old to have a reliable status. It is possible to configure subgroups for any group editing the file /root/groupfile. I.e. to create the groups ANALYSIS and PROD for the CMS VO the groupfile looks like: #cat /root/groupfile <br />wms001.yuor_domain cms PROD<br />wms002.your_domain cms ANALYSIS<br />wms003.your_domain cms ANALYSIS<br />wms004.your_domain cms PROD<br />wms005.your_domain cms ANALYSIS In this way notifications are sent for each subgroups and not for the groups itself and by default notification are sent for NAGIOS-services called GROUP-SUBGROUP-WMS belonging to the WMSMonitor server host. As for single instances NAGIOS should be configured to handle (sub)groups notifications. Before starting the analyzer you should configure the hostname of the NAGIOS server. This must be done by hand editing the file /root/wmsmon/bin/analyzer-utils.py substituting the string "gstore.cnaf.infn.it" with your NAGIOS server hostname. Now you are ready to start the analyzer as a normal Linux backgroud process: #/root/wmsmon/bin/wmsmon-db-analyzer.py > /var/log/wmsmon-db-analyzer.log 2>&1 & NOTE that the analyzer logs to stdout. It is advisable to set a logrotate for the /var/log/wmsmon-db-analyzer.log. You just need to add the following lines to /etc/wmsmon_logrotate.conf: /var/log/wmsmon-db-analyzer.log { copytruncate<br /> rotate 10<br /> size = 100M<br /> missingok<br /> nomail } In case of problems running the analyzer please contact wmsmon<at>cnaf.infn.it.
E
dit
|
A
ttach
|
P
rint version
|
H
istory
: r5
<
r4
<
r3
<
r2
<
r1
|
B
acklinks
|
V
iew topic
|
M
ore topic actions
Topic revision: r5 - 2009-06-16
-
DanieleCesini
Home
Documentation
WebMainPage
WebDetailsPage
WebCustomPlot
WebUsersStat
WebResUsage
WebLoadBal
WebVOStatsPage
WMS Load Balancing Arbiter
DBAnalyzer
Download/Install version 2.1
Download/Install version 3.0
Publications
Screenshots
VideoTour
Credits & Contacts
TWIKI.NET
WMSMonitor
E
dit
A
ttach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback