Nagios on certification testbed

In this page we describe the services monitored by the nagios server installed on cream-17.pd.infn.it

Install NRPE client

To monitor a service (from the nagios server on cream-17.pd.infn.it) you need to install NRPE client using this procedure:
  • yum install xinetd
    • Configure /etc/xinetd.d/nrpe
      # description: NRPE (Nagios Remote Plugin Executor)
      service nrpe
      {
              flags           = REUSE
              type            = UNLISTED
              port            = 5666
              socket_type     = stream
              wait            = no
              user            = nagios
              group           = nagios
              server          = /usr/sbin/nrpe
              server_args     = -c /etc/nagios/nrpe.cfg --inetd
              log_on_failure  += USERID
              disable         = yes
              only_from       = 127.0.0.1 193.206.210.82
      }
    • chkconfig xinetd on
    • service xinetd restart
  • yum install nrpe
    • Add to /etc/services
      # Local services
      nrpe            5666/tcp                        # NRPE
    • Changes in /etc/nagios/nrpe.cfg
      -allowed_hosts=127.0.0.1
      +allowed_hosts=193.206.210.82
      -command_timeout=60
      +command_timeout=600
    • Add the required plugins (This step depend on the service see below)
    • chkconfig nrpe on
    • service nrpe restart

Nagios plugins

Usually the PLUGINPATH depends on the architecture (substitute variable with the right line in the command description):

  • PLUGINPATH=/usr/lib/nagios/plugins on SL4 (i386)
  • PLUGINPATH=/usr/lib64/nagios/plugins on SL5 (x86_64)

Default checks

These checks are activate on all hosts:
  • command[check_users]=<PLUGINPATH>/check_users -w 5 -c 10
  • command[check_load]=<PLUGINPATH>/check_load -w 15,10,5 -c 30,25,20
  • command[check_disk]=<PLUGINPATH>/check_disk -w 20% -c 10% -p /dev/hda1
  • command[check_zombie_procs]=<PLUGINPATH>/check_procs -w 5 -c 10 -s Z
  • command[check_total_procs]=<PLUGINPATH>/check_procs -w 150 -c 200

Custom checks

We suppose that these custom plugins are installed in <PLUGINPATH>/gLite directory

CREAM checks

The following checks are based on the output of a shell monitoring script installed by default on a CREAM CE. All these checks required root permission so instead of add user nagios to sudoers file we prefer to add a cron job which run the monitoring script and save the output on the file /tmp/cream.mon. This is the cron job which must be installed on the client before activating these plugins:

*/1 * * * * root /opt/glite/bin/glite_cream_load_monitor --show > /tmp/cream.mon

  • command[check_tom_fd]=<PLUGINPATH>/gLite/check_tom_fd -w 300 -c 600
  • command[check_cream_cmd]=<PLUGINPATH>/gLite/check_cream_cmd -w 300 -c 600
  • command[check_cream_jobs]=<PLUGINPATH>/gLite/check_cream_jobs -w 1000 -c 3000

Batch System checks

You have to choose beetween pbs/torque and lsf

  • command[check_lsf_jobs]=<PLUGINPATH>/gLite/check_lsf_jobs -w 1000 -c 5000
or
  • command[check_pbs_jobs]=<PLUGINPATH>/gLite/check_pbs_jobs -w 1000 -c 5000

Worker Node checks

The option depends from the installed batch system

  • command[check_wn_jobs]=<PLUGINPATH>/gLite/check_wn_jobs -b [ pbs || lsf ]

Generic checks

  • command[check_host]=<PLUGINPATH>/gLite/check_glite_host

Cream CLI checks

This is useful to test Cream CE through CreamCLI direct submission. To run this plugin you need to modify the nrpe configuration file of the host (i.e. the UI):

-dont_blame_nrpe=0
+dont_blame_nrpe=1
-command_timeout=600
+command_timeout=1200

  • command[creamCLI_submit]=<PLUGINPATH>/gLite/CreamCLI_Submit -H $ARG1$

WMS checks

  • command[check_wms_services]=<PLUGINPATH>/gLite/check_wms_services
  • command[check_wms_queues]=<PLUGINPATH>/gLite/check_wms_queues
  • command[check_wms_jobs]=<PLUGINPATH>/gLite/check_wms_jobs

Useful links

-- AlessioGianelle - 2011-01-12

Edit | Attach | PDF | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | More topic actions
Topic revision: r4 - 2011-01-25 - AlessioGianelle
 
This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback