Nagios on certification testbed
In this page we describe the services monitored by the nagios server installed on
cream-17.pd.infn.it
Install NRPE client
To monitor a service (from the
nagios server on cream-17.pd.infn.it) you need to install NRPE client using this procedure:
- yum install xinetd
- yum install nrpe
Nagios plugins
Usually the PLUGINPATH depends on the architecture (substitute variable with the right line in the command description):
- PLUGINPATH=/usr/lib/nagios/plugins on SL4 (i386)
- PLUGINPATH=/usr/lib64/nagios/plugins on SL5 (x86_64)
These checks are activate on all hosts:
- command[check_users]=<PLUGINPATH>/check_users -w 5 -c 10
- command[check_load]=<PLUGINPATH>/check_load -w 15,10,5 -c 30,25,20
- command[check_disk]=<PLUGINPATH>/check_disk -w 20% -c 10% -p /dev/hda1
- command[check_zombie_procs]=<PLUGINPATH>/check_procs -w 5 -c 10 -s Z
- command[check_total_procs]=<PLUGINPATH>/check_procs -w 150 -c 200
Custom checks
We suppose that these custom plugins are installed in <PLUGINPATH>/gLite directory
The following checks are based on the output of a shell monitoring script installed by default on a CREAM CE. All these checks required root permission so instead of add user nagios to sudoers file we prefer to add a cron job which run the monitoring script and save the output on the file /tmp/cream.mon. This is the cron job which must be installed on the client before activating these plugins:
*/1 * * * * root /opt/glite/bin/glite_cream_load_monitor --show > /tmp/cream.mon
- command[check_tom_fd]=<PLUGINPATH>/gLite/check_tom_fd -w 300 -c 600
- command[check_cream_cmd]=<PLUGINPATH>/gLite/check_cream_cmd -w 300 -c 600
- command[check_cream_jobs]=<PLUGINPATH>/gLite/check_cream_jobs -w 1000 -c 3000
You have to choose beetween pbs/torque and lsf
- command[check_lsf_jobs]=<PLUGINPATH>/gLite/check_lsf_jobs -w 1000 -c 5000
or
- command[check_pbs_jobs]=<PLUGINPATH>/gLite/check_pbs_jobs -w 1000 -c 5000
The option depends from the installed batch system
- command[check_wn_jobs]=<PLUGINPATH>/gLite/check_wn_jobs -b [ pbs || lsf ]
- command[check_host]=<PLUGINPATH>/gLite/check_glite_host
This is useful to test Cream CE through CreamCLI direct submission. To run this plugin you need to modify the nrpe configuration file of the host (i.e. the UI):
-dont_blame_nrpe=0
+dont_blame_nrpe=1
-command_timeout=600
+command_timeout=1200
- command[creamCLI_submit]=<PLUGINPATH>/gLite/CreamCLI_Submit -H $ARG1$
- command[check_wms_services]=<PLUGINPATH>/gLite/check_wms_services
- command[check_wms_queues]=<PLUGINPATH>/gLite/check_wms_queues
- command[check_wms_jobs]=<PLUGINPATH>/gLite/check_wms_jobs
Useful links
--
AlessioGianelle - 2011-01-12