Difference: NagiosforWeNMR (1 vs. 9)

Revision 92014-02-19 - MarcoVerlato

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Nagios for WeNMR

Line: 33 to 33
 

Management instructions

Changed:
<
<
For its probes, Nagios uses Badoer's certificate, that must be renewed before it expires (it lasts one week):
 [badoer@grid-monitor03]# myproxy-init --voms enmr.eu:/enmr.eu/ops -k !NagiosRetrieve-grid-monitor03.pd.infn.it-enmr.eu -s prod-ui-02.pd.infn.it -l nagios -x -Z "/C=IT/O=INFN/OU=Host/L=Padova/CN=grid-monitor03.pd.infn.it" 
>
>
For renewing the proxy (it lasts 100 days) used by Nagios, from a EMI UI do execute:
 [emi-ui]$  myproxy-init -l nagios -s prod-ui-02.pd.infn.it -k  NagiosRetrieve-grid-monitor03.pd.infn.it-enmr.eu -c 2400 -x -Z  "/C=IT/O=INFN/OU=Host/L=Padova/CN=grid-monitor03.pd.infn.it"
  After a yaim reconfig, do the following instructions:

Revision 82014-02-19 - MarcoVerlato

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Nagios for WeNMR

Line: 14 to 14
 

How to add quickly new WeNMR probes

Changed:
<
<
  • 1. In /etc/ncg-metric-config.d create the file wenmr-probes.conf with json formatted directives
  • 2. In /usr/libexec/grid-monitoring/probes/wenmr/wnjob/etc/wn.d/wenmr/ edit the services.cfg and commands.cfg files
  • 3. Implement the probe in the file /usr/libexec/grid-monitoring/probes/wenmr/wnjob/probes/wenmr/
>
>
  • 1. In /etc/ncg-metric-config.d create the file wenmr-probes.conf with json formatted directives
  • 2. In /usr/libexec/grid-monitoring/probes/wenmr/wnjob/etc/wn.d/wenmr/ edit the services.cfg and commands.cfg files
  • 3. Implement the probe in the file /usr/libexec/grid-monitoring/probes/wenmr/wnjob/probes/wenmr/probe_name (see e.g. gromacs probe)
 
  • 4. Ensuring the following:
cat /etc/ncg/ncg-localdb.d/wenmr-custom.conf
MODIFY_METRIC_PARAMETER!org.sam.CREAMCE-JobState!--add-wntar-nag!/usr/libexec/grid-monitoring/probes/wenmr/wnjob/
Line: 357 to 357
 
  • set variable NAGIOS_HTTPD_ENABLE_CONFIG=false in yaim configuration file, in order to avoid https configuration to be reset after every reconfiguration

Changed:
<
<
-- MarcoVerlato - 2012-02-15
>
>
-- MarcoVerlato - 2014-02-19

META FILEATTACHMENT attachment="gromacs.txt" attr="h" comment="gromacs probe" date="1392807622" name="gromacs.txt" path="gromacs.txt" size="2600" user="MarcoVerlato" version="1"
META FILEATTACHMENT attachment="commands.cfg.txt" attr="h" comment="probes configurations" date="1392808579" name="commands.cfg.txt" path="commands.cfg.txt" size="1072" user="MarcoVerlato" version="1"
META FILEATTACHMENT attachment="services.cfg.txt" attr="h" comment="probes configurations" date="1392808579" name="services.cfg.txt" path="services.cfg.txt" size="1969" user="MarcoVerlato" version="1"
META FILEATTACHMENT attachment="wenmr-probes.conf.txt" attr="h" comment="probes configurations" date="1392808579" name="wenmr-probes.conf.txt" path="wenmr-probes.conf.txt" size="3226" user="MarcoVerlato" version="1"

Revision 72013-11-22 - MarcoVerlato

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Nagios for WeNMR

Line: 6 to 6
  Access permitted for enmr.eu members only using personal certificate (authorized DNs are retrieved by /etc/cron.d/voms-htpasswd and listed in files /etc/nagios/htpasswd.users and /etc/httpd/httpd.users)
Changed:
<
<
This Nagios monitors hosts belonging to ROCs/NGIs "Italy NGI_IT NGI_NL NGI_DE UKI NGI_IBERGRID ROC_IGALC NGI_FRANCE AsiaPacific" where enmr.eu probes can be done (it isn't yet available a real VO-Nagios).
>
>
This Nagios monitors hosts belonging to NGIs of Belgium, Germany, France, Italy, Spain, Portugal, Netherlands, UK, Poland, Malaysia, Taiwan, Brasil where enmr.eu probes can be executed.
 
Changed:
<
<
Also the site ZA-UJ, uncertified or not present in GOCDB, is monitored.
>
>
Sites from South Africa and OSG will be soon monitored too.
 
Changed:
<
<
Detailed documentation and instructions about Nagios could be found here
>
>
Detailed documentation about Nagios could be found here

How to add quickly new WeNMR probes

  • 1. In /etc/ncg-metric-config.d create the file wenmr-probes.conf with json formatted directives
  • 2. In /usr/libexec/grid-monitoring/probes/wenmr/wnjob/etc/wn.d/wenmr/ edit the services.cfg and commands.cfg files
  • 3. Implement the probe in the file /usr/libexec/grid-monitoring/probes/wenmr/wnjob/probes/wenmr/
  • 4. Ensuring the following:
cat /etc/ncg/ncg-localdb.d/wenmr-custom.conf
MODIFY_METRIC_PARAMETER!org.sam.CREAMCE-JobState!--add-wntar-nag!/usr/libexec/grid-monitoring/probes/wenmr/wnjob/
MODIFY_METRIC_PARAMETER!org.sam.CE-JobState!--add-wntar-nag!/usr/libexec/grid-monitoring/probes/wenmr/wnjob/
/opt/glite/yaim/bin/yaim -s siteinfo/site-info.def -d 6 -c -n glite-UI -n glite-NAGIOS && service nagios restart
 

Revision 62012-04-05 - MarcoVerlato

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Nagios for WeNMR

Line: 8 to 8
  This Nagios monitors hosts belonging to ROCs/NGIs "Italy NGI_IT NGI_NL NGI_DE UKI NGI_IBERGRID ROC_IGALC NGI_FRANCE AsiaPacific" where enmr.eu probes can be done (it isn't yet available a real VO-Nagios).
Changed:
<
<
Also the sites BCBR, ZA-UCT-ICTS, ZA-UJ, uncertified or not present in GOCDB, are monitored.
>
>
Also the site ZA-UJ, uncertified or not present in GOCDB, is monitored.
  Detailed documentation and instructions about Nagios could be found here

Revision 52012-03-08 - MarcoVerlato

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Nagios for WeNMR

Line: 19 to 19
 

Management instructions

For its probes, Nagios uses Badoer's certificate, that must be renewed before it expires (it lasts one week):

Changed:
<
<
  • [badoer@grid-monitor03]# myproxy-init --voms enmr.eu:/enmr.eu/ops -k NagiosRetrieve-grid-monitor03.pd.infn.it-enmr.eu -s prod-ui-02.pd.infn.it -l nagios -x -Z "/C=IT/O=INFN/OU=Host/L=Padova/CN=grid-monitor03.pd.infn.it"
>
>
 [badoer@grid-monitor03]# myproxy-init --voms enmr.eu:/enmr.eu/ops -k !NagiosRetrieve-grid-monitor03.pd.infn.it-enmr.eu -s prod-ui-02.pd.infn.it -l nagios -x -Z "/C=IT/O=INFN/OU=Host/L=Padova/CN=grid-monitor03.pd.infn.it" 
  After a yaim reconfig, do the following instructions:
Line: 31 to 31
 
    • add the following line to /etc/ncg/ncg-localdb.d/uncert.conf
      • MODIFY_METRIC_PARAMETER!org.sam.SRM-All!--ldap-uri!ldap://bdii-wenmr.pd.infn.it:2170
    • or equally:
Changed:
<
<
      • cp /etc/ncg/ncg-localdb.d/uncert.conf.GOOD /etc/ncg/ncg-localdb.d/uncert.conf
>
>
 cp /etc/ncg/ncg-localdb.d/uncert.conf.GOOD /etc/ncg/ncg-localdb.d/uncert.conf 
 
  • Configure ncg to find sites not belonging to EGI-GOCDB on their site-BDII
    • add the following lines to /etc/ncg/ncg.conf.d/uncert.conf for each site outside EGI-GOCDB
Line: 39 to 39
 
      • ADD_HOSTS=1
      • LDAP_ADDRESS=<siteBDII>
    • or equally:
Changed:
<
<
      • cp /etc/ncg/ncg.conf.d/uncert.conf-OK-OutOfGOCDB_sites /etc/ncg/ncg.conf.d/uncert.conf
>
>
 cp /etc/ncg/ncg.conf.d/uncert.conf-OK-OutOfGOCDB_sites /etc/ncg/ncg.conf.d/uncert.conf 
  To add a site:
  • if the site is certified in EGI-GOCDB:
Line: 75 to 75
 Installation

Installed SL5 x86_64

Changed:
<
<
  • service yum stop
  • chkconfig yum off
>
>
   service yum stop
   chkconfig yum off
 
  • host certificates (''hostkey.pem'' and ''hostcert.pem'') installed in ''/etc/grid-security/''
Changed:
<
<

  • vi egi-sam.repo
    • [egi-sam]
    • name=EGI SAM repo
    • baseurl=http://repository.egi.eu/sw/production/sam/1/$basearch
    • enabled=1
    • gpgcheck=0
    • protect=1
    • priority=10

  • mv sl.repo sl.repo.disable
  • mv sl-security.repo sl-security.repo.disable
  • mv sl-fastbugs.repo sl-fastbugs.repo.disable
  • mv sl-contrib.repo sl-contrib.repo.disable

  • yum clean all
  • yum install lcg-CA
  • yum install httpd
  • yum groupinstall ig_UI_noafs
  • yum install yum-priorities
  • yum remove mysql-server-5.0.77-4.el5_5.4 mysql-5.0.77-4.el5_5.4 mysql-devel-5.0.77-4.el5_5.4 [necessary because yaim configuration wants a newer version of MySQL]
>
>
   cd /etc/yum.repos.d/
   wget http://repository.egi.eu/sw/production/cas/1/current/repo-files/egi-trustanchors.repo
   wget http://grid-deployment.web.cern.ch/grid-deployment/glite/repos/3.X/glite-BDII.repo
   wget http://grid-it.cnaf.infn.it/mrepo/repos/sl5/x86_64/dag.repo
   wget http://grid-it.cnaf.infn.it/mrepo/repos/sl5/x86_64/ig.repo
   wget http://grid-it.cnaf.infn.it/mrepo/repos/sl5/x86_64/glite-ui.repo

   vi egi-sam.repo
      [egi-sam]
      name=EGI SAM repo
      baseurl=http://repository.egi.eu/sw/production/sam/1/$basearch
      enabled=1
      gpgcheck=0
      protect=1
      priority=10

   mv sl.repo sl.repo.disable
   mv sl-security.repo sl-security.repo.disable
   mv sl-fastbugs.repo sl-fastbugs.repo.disable
   mv sl-contrib.repo sl-contrib.repo.disable

   yum clean all
   yum install lcg-CA
   yum install httpd
   yum groupinstall ig_UI_noafs
   yum install yum-priorities
   yum remove mysql-server-5.0.77-4.el5_5.4 mysql-5.0.77-4.el5_5.4 mysql-devel-5.0.77-4.el5_5.4 [necessary because yaim configuration wants a newer version of !MySQL]
 
  • edited /etc/yum.repos.d/dag.repo because of missing dependencies (why??) with perl-DBD-mysql-4.014-1.el5.rfx (needed by egee-NAGIOS)
Changed:
<
<

  • yum install egee-NAGIOS
  • yum install 'perl(Class::Inspector)' [needed to let Nagios update file /etc/nagios/htpasswd.users, where authorized users are listed]
>
>
   [root@grid-monitor03 ~]# cat /etc/yum.repos.d/dag.repo
      [dag]
      name=DAG rpms
      baseurl=http://ftp.scientificlinux.org/linux/extra/dag/redhat/el5/en/$basearch/dag/
      http://ftp1.scientificlinux.org/linux/extra/dag/redhat/el5/en/$basearch/dag/
      http://ftp2.scientificlinux.org/linux/extra/dag/redhat/el5/en/$basearch/dag/
      ftp://ftp.scientificlinux.org/linux/extra/dag/redhat/el5/en/$basearch/dag/
      enabled=1
      # To use priorities you must have yum-priorities installed
      priority=30
      [dag-extra]
      name=DAG extras
      baseurl=http://ftp.scientificlinux.org/linux/extra/dag/redhat/el5/en/$basearch/extras/
      enabled=1

   yum install egee-NAGIOS
   yum install 'perl(Class::Inspector)' [needed to let Nagios update file /etc/nagios/htpasswd.users, where authorized users are listed]
  Configuration

  • edit file <yaim-conf-dir>/3_2/nodes/grid-monitor03
Changed:
<
<
    • VOS="enmr.eu"

    • NAGIOS_HOST=grid-monitor03.$MY_DOMAIN
    • NAGIOS_ADMIN_DNS="/C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Cristina Aiftimiei,/C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Sergio Traldi,/C=IT/O=INFN/OU=Personal Certificate/L=LNL/CN=Simone Badoer,/C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Marco Verlato"
    • NCG_NAGIOS_ADMIN=simone.badoer@pd.infn.it
    • NAGIOS_ROLE=vo
    • NCG_PROBES_TYPE=all
    • NCG_VO=enmr.eu
    • NAGIOS_HTTPD_ENABLE_CONFIG=true
    • NAGIOS_NCG_ENABLE_CONFIG=true
    • NAGIOS_SUDO_ENABLE_CONFIG=true
    • NAGIOS_NAGIOS_ENABLE_CONFIG=true
    • NAGIOS_CGI_ENABLE_CONFIG=true
    • NAGIOS_NSCA_PASS=xxx

    • NAGIOS_NCG_ENABLE_CRON=true

    • NCG_TOPOLOGY_USE_SAM=true
    • NCG_TOPOLOGY_USE_GOCDB=false
    • NCG_TOPOLOGY_USE_ENOC=false
    • NCG_TOPOLOGY_USE_LDAP=false
>
>
      VOS="enmr.eu"
 
Changed:
<
<
    • NCG_REMOTE_USE_SAM=false
    • NCG_REMOTE_USE_NAGIOS=false
    • NCG_REMOTE_USE_ENOC=false

    • MYSQL_ADMIN="xxx"
    • DB_PASS="xxx"

    • MYEGI_ADMIN_NAME="Simone Badoer"
    • MYEGI_ADMIN_EMAIL="simone.badoer@pd.infn.it"
    • MYEGI_DEFAULT_PROFILE="ROC"

    • NCG_MDDB_SUPPORTED_PROFILES="ROC,ROC_CRITICAL,ROC_OPERATORS"
    • NCG_NOTIFICATION_HEADER="WeNMR Nagios"
    • NCG_INCLUDE_EMPTY_HOSTS=0
    • # Found from GOCDB:
    • NCG_GOCDB_ROC_NAME="Italy NGI_IT NGI_NL NGI_DE UKI NGI_IBERGRID ROC_IGALC"

    • # Needed for uncertified sites:
    • UNCERTIFIED_SITES="BCBR"
    • UNCERTIFIED_WMS=wms-enmr.chem.uu.nl
    • UNCERTIFIED_BDII=bdii-enmr.chem.uu.nl
>
>
NAGIOS_HOST=grid-monitor03.$MY_DOMAIN NAGIOS_ADMIN_DNS="/C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Cristina Aiftimiei,/C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Sergio Traldi,/C=IT/O=INFN/OU=Personal Certificate/L=LNL/CN=Simone Badoer,/C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Marco Verlato" NCG_NAGIOS_ADMIN=simone.badoer@pd.infn.it NAGIOS_ROLE=vo NCG_PROBES_TYPE=all NCG_VO=enmr.eu NAGIOS_HTTPD_ENABLE_CONFIG=true NAGIOS_NCG_ENABLE_CONFIG=true NAGIOS_SUDO_ENABLE_CONFIG=true NAGIOS_NAGIOS_ENABLE_CONFIG=true NAGIOS_CGI_ENABLE_CONFIG=true NAGIOS_NSCA_PASS=xxx

NAGIOS_NCG_ENABLE_CRON=true

NCG_TOPOLOGY_USE_SAM=true NCG_TOPOLOGY_USE_GOCDB=false NCG_TOPOLOGY_USE_ENOC=false NCG_TOPOLOGY_USE_LDAP=false

NCG_REMOTE_USE_SAM=false NCG_REMOTE_USE_NAGIOS=false NCG_REMOTE_USE_ENOC=false

MYSQL_ADMIN="xxx" DB_PASS="xxx"

MYEGI_ADMIN_NAME="Simone Badoer" MYEGI_ADMIN_EMAIL="simone.badoer@pd.infn.it" MYEGI_DEFAULT_PROFILE="ROC"

NCG_MDDB_SUPPORTED_PROFILES="ROC,ROC_CRITICAL,ROC_OPERATORS" NCG_NOTIFICATION_HEADER="WeNMR Nagios" NCG_INCLUDE_EMPTY_HOSTS=0 # Found from GOCDB: NCG_GOCDB_ROC_NAME="Italy NGI_IT NGI_NL NGI_DE UKI NGI_IBERGRID ROC_IGALC"

# Needed for uncertified sites: UNCERTIFIED_SITES="BCBR" UNCERTIFIED_WMS=wms-enmr.chem.uu.nl UNCERTIFIED_BDII=bdii-enmr.chem.uu.nl

 
Changed:
<
<
  • /opt/glite/yaim/bin/ig_yaim -c -d 6 -s /usr/local/nfs/3_2/ig-site-info.def.current -n ig_UI_noafs -n glite-NAGIOS 2>&1 | tee /root/conf_ig_UI_noafs__glite-NAGIOS.`hostname -s`.`date +mHS`.log
>
>
/opt/glite/yaim/bin/ig_yaim -c -d 6 -s /usr/local/nfs/3_2/ig-site-info.def.current -n ig_UI_noafs -n glite-NAGIOS 2>&1 | tee /root/conf_ig_UI_noafs__glite-NAGIOS.`hostname -s`.`date +mHS`.log
 
  • on yaim configuration file of prod-ui-02 changed this variables and reconfigured prod-ui-02:
Changed:
<
<
    • GRID_AUTHORIZED_RETRIEVERS="'/C=IT/O=INFN/OU=Host/L=Padova/CN=prod-ui-02.pd.infn.it' '/C=IT/O=INFN/OU=Host/L=Padova/CN=cert-30.pd.infn.it' '/C=IT/O=INFN/OU=Host/L=Padova/CN=grid-monitor03.pd.infn.it'"
    • GRID_TRUSTED_RETRIEVERS="'/C=IT/O=INFN/OU=Host/L=Padova/CN=prod-ui-02.pd.infn.it' '/C=IT/O=INFN/OU=Host/L=Padova/CN=cert-30.pd.infn.it' '/C=IT/O=INFN/OU=Host/L=Padova/CN=grid-monitor03.pd.infn.it'"

  • userdadd badoer
  • [badoer@grid-monitor03]#myproxy-init -k NagiosRetrieve-grid-monitor03.pd.infn.it-enmr.eu -s prod-ui-02.pd.infn.it -l nagios -x -Z "/C=IT/O=INFN/OU=Host/L=Padova/CN=grid-monitor03.pd.infn.it"
>
>
      GRID_AUTHORIZED_RETRIEVERS="'/C=IT/O=INFN/OU=Host/L=Padova/CN=prod-ui-02.pd.infn.it' '/C=IT/O=INFN/OU=Host/L=Padova/CN=cert-30.pd.infn.it' '/C=IT/O=INFN/OU=Host/L=Padova/CN=grid-monitor03.pd.infn.it'"
      GRID_TRUSTED_RETRIEVERS="'/C=IT/O=INFN/OU=Host/L=Padova/CN=prod-ui-02.pd.infn.it' '/C=IT/O=INFN/OU=Host/L=Padova/CN=cert-30.pd.infn.it' '/C=IT/O=INFN/OU=Host/L=Padova/CN=grid-monitor03.pd.infn.it'"

   userdadd badoer
   [badoer@grid-monitor03]#myproxy-init -k !NagiosRetrieve-grid-monitor03.pd.infn.it-enmr.eu -s prod-ui-02.pd.infn.it -l nagios -x -Z "/C=IT/O=INFN/OU=Host/L=Padova/CN=grid-monitor03.pd.infn.it"
 
  • cp /etc/nagios/plugins/send_to_db.ini /etc/nagios/plugins/send_to_db.ini
  • edit /etc/nagios/plugins/send_to_db.ini changing:
Changed:
<
<
    • db_user=mrs
    • db_pwd=xxx
>
>
      db_user=mrs
      db_pwd=xxx
 

Information about old update 07

Line: 333 to 340
 
    • edited ''/etc/httpd/conf.d/ssl.conf'' changing from 443 to 50080

  • set variable NAGIOS_HTTPD_ENABLE_CONFIG=false in yaim configuration file, in order to avoid https configuration to be reset after every reconfiguration
Changed:
<
<
>
>

  -- MarcoVerlato - 2012-02-15 \ No newline at end of file

Revision 42012-02-20 - MarcoVerlato

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Nagios for WeNMR

Line: 64 to 64
 To authorize a user whose DN isn't automatically retrieved from VOMS to /etc/nagios/htpasswd.users:
  • copy user's DN in a file /etc/voms2htpasswd-static.d/*.conf
Added:
>
>
To add a custom Nagios probe see here
 

Installation and configuration instructions

This documentation was followed: VO SAM

Revision 32012-02-15 - MarcoVerlato

Line: 1 to 1
 
META TOPICPARENT name="WebHome"
Changed:
<
<

Nagios for WeNMR

>
>

Nagios for WeNMR

 
Changed:
<
<
WeNMR Nagios web page: https://grid-monitor03.pd.infn.it:50080/nagios/
>
>
WeNMR Nagios web page: https://grid-monitor03.pd.infn.it:50080/nagios/
  Access permitted for enmr.eu members only using personal certificate (authorized DNs are retrieved by /etc/cron.d/voms-htpasswd and listed in files /etc/nagios/htpasswd.users and /etc/httpd/httpd.users)
Changed:
<
<
This Nagios monitors hosts belonging to ROCs/NGIs "Italy NGI_IT NGI_NL NGI_DE UKI NGI_IBERGRID ROC_IGALC NGI_FRANCE AsiaPacific" where enmr.eu probes can be done (it isn't yet available a real VO-Nagios).
>
>
This Nagios monitors hosts belonging to ROCs/NGIs "Italy NGI_IT NGI_NL NGI_DE UKI NGI_IBERGRID ROC_IGALC NGI_FRANCE AsiaPacific" where enmr.eu probes can be done (it isn't yet available a real VO-Nagios).
  Also the sites BCBR, ZA-UCT-ICTS, ZA-UJ, uncertified or not present in GOCDB, are monitored.
Line: 19 to 19
 

Management instructions

For its probes, Nagios uses Badoer's certificate, that must be renewed before it expires (it lasts one week):

Changed:
<
<
  • [badoer@grid-monitor03]# myproxy-init --voms enmr.eu:/enmr.eu/ops -k NagiosRetrieve-grid-monitor03.pd.infn.it-enmr.eu -s prod-ui-02.pd.infn.it -l nagios -x -Z "/C=IT/O=INFN/OU=Host/L=Padova/CN=grid-monitor03.pd.infn.it"
>
>
  • [badoer@grid-monitor03]# myproxy-init --voms enmr.eu:/enmr.eu/ops -k NagiosRetrieve-grid-monitor03.pd.infn.it-enmr.eu -s prod-ui-02.pd.infn.it -l nagios -x -Z "/C=IT/O=INFN/OU=Host/L=Padova/CN=grid-monitor03.pd.infn.it"
  After a yaim reconfig, do the following instructions:
Line: 27 to 27
 
    • /opt/glite/etc/enmr.eu/glite_wms.conf
    • /opt/glite/etc/enmr.eu/glite_wmsui.conf
Changed:
<
<
  • Add WeNMR BDII for SRM probes in sites not belonging to EGI-GOCDB
>
>
  • Add WeNMR BDII for SRM probes in sites not belonging to EGI-GOCDB
 
    • add the following line to /etc/ncg/ncg-localdb.d/uncert.conf
      • MODIFY_METRIC_PARAMETER!org.sam.SRM-All!--ldap-uri!ldap://bdii-wenmr.pd.infn.it:2170
    • or equally:
Line: 35 to 35
 
  • Configure ncg to find sites not belonging to EGI-GOCDB on their site-BDII
    • add the following lines to /etc/ncg/ncg.conf.d/uncert.conf for each site outside EGI-GOCDB
Changed:
<
<
      • #
>
>
      • # <GOCDB/>
 
      • ADD_HOSTS=1
Changed:
<
<
      • LDAP_ADDRESS=
>
>
      • LDAP_ADDRESS=<siteBDII>
 
    • or equally:
      • cp /etc/ncg/ncg.conf.d/uncert.conf-OK-OutOfGOCDB_sites /etc/ncg/ncg.conf.d/uncert.conf
Line: 46 to 46
 
    • edit yaim configuration file adding the NGI of the site (if not already present) in variable NCG_GOCDB_ROC_NAME.
    • reconfigure with yaim
  • if the site is present in EGI-GOCDB but not certified:
Changed:
<
<
    • edit yaim configuration file adding the site name on variable UNCERTIFIED_SITES (it should be present in TopBDII bdii-wenmr.pd.infn.it)
>
>
    • edit yaim configuration file adding the site name on variable UNCERTIFIED_SITES (it should be present in TopBDII bdii-wenmr.pd.infn.it)
 
    • reconfigure with yaim
  • if the site is not present in EGI-GOCDB
Changed:
<
<
    • edit yaim configuration file adding the site name on variable UNCERTIFIED_SITES (it should be present in TopBDII bdii-wenmr.pd.infn.it)
>
>
    • edit yaim configuration file adding the site name on variable UNCERTIFIED_SITES (it should be present in TopBDII bdii-wenmr.pd.infn.it)
 
    • reconfigure with yaim
    • edit grid-monitor03:/etc/ncg/ncg.conf.d/uncert.conf changing:
      • <NCG::SiteInfo SITE_NAME>
Changed:
<
<
      • #
>
>
      • # <GOCDB/>
      • <LDAP>
 
      • LDAP_ADDRESS=SITE_BDII
      • # ADD_HOSTS=0
      • ADD_HOSTS=1
Changed:
<
<
>
>
      • </LDAP>
 

To authorize a user whose DN isn't automatically retrieved from VOMS to /etc/nagios/htpasswd.users:

Line: 104 to 104
 
  • yum install httpd
  • yum groupinstall ig_UI_noafs
  • yum install yum-priorities
Changed:
<
<
  • yum remove mysql-server-5.0.77-4.el5_5.4 mysql-5.0.77-4.el5_5.4 mysql-devel-5.0.77-4.el5_5.4 [necessary because yaim configuration wants a newer version of MySQL]
>
>
  • yum remove mysql-server-5.0.77-4.el5_5.4 mysql-5.0.77-4.el5_5.4 mysql-devel-5.0.77-4.el5_5.4 [necessary because yaim configuration wants a newer version of MySQL]
 
  • edited /etc/yum.repos.d/dag.repo because of missing dependencies (why??) with perl-DBD-mysql-4.014-1.el5.rfx (needed by egee-NAGIOS)
  • [root@grid-monitor03 ~]# cat /etc/yum.repos.d/dag.repo
Line: 127 to 127
  Configuration
Changed:
<
<
  • edit file <yaim-conf-dir>/3_2/nodes/grid-monitor03
>
>
  • edit file <yaim-conf-dir>/3_2/nodes/grid-monitor03
 
    • VOS="enmr.eu"

    • NAGIOS_HOST=grid-monitor03.$MY_DOMAIN
Line: 179 to 179
 
    • GRID_TRUSTED_RETRIEVERS="'/C=IT/O=INFN/OU=Host/L=Padova/CN=prod-ui-02.pd.infn.it' '/C=IT/O=INFN/OU=Host/L=Padova/CN=cert-30.pd.infn.it' '/C=IT/O=INFN/OU=Host/L=Padova/CN=grid-monitor03.pd.infn.it'"

  • userdadd badoer
Changed:
<
<
  • [badoer@grid-monitor03]#myproxy-init -k NagiosRetrieve-grid-monitor03.pd.infn.it-enmr.eu -s prod-ui-02.pd.infn.it -l nagios -x -Z "/C=IT/O=INFN/OU=Host/L=Padova/CN=grid-monitor03.pd.infn.it"
>
>
  • [badoer@grid-monitor03]#myproxy-init -k NagiosRetrieve-grid-monitor03.pd.infn.it-enmr.eu -s prod-ui-02.pd.infn.it -l nagios -x -Z "/C=IT/O=INFN/OU=Host/L=Padova/CN=grid-monitor03.pd.infn.it"
 
  • cp /etc/nagios/plugins/send_to_db.ini /etc/nagios/plugins/send_to_db.ini
  • edit /etc/nagios/plugins/send_to_db.ini changing:
Line: 234 to 234
 
    • ''ig.repo''
    • ''dag.repo'' (different versions for ig and for glite)
  • disabled (renamed with different extension) ''dag.repo'', ''sl.repo'' and ''sl-security.repo'' because it's used the option
Changed:
<
<
"event-scheduler=1" in file ''/etc/my.cnf'' available only for MySQL > 5.1.6 (default mysql was 5.0.77)
>
>
"event-scheduler=1" in file ''/etc/my.cnf'' available only for MySQL > 5.1.6 (default mysql was 5.0.77)
 
  • ''yum install httpd''
  • ''yum install lcg-CA''

Revision 22012-02-15 - MarcoVerlato

Line: 1 to 1
 
META TOPICPARENT name="WebHome"
Added:
>
>

Nagios for WeNMR

 
Changed:
<
<
Test
>
>
WeNMR Nagios web page: https://grid-monitor03.pd.infn.it:50080/nagios/
 
Added:
>
>
Access permitted for enmr.eu members only using personal certificate (authorized DNs are retrieved by /etc/cron.d/voms-htpasswd and listed in files /etc/nagios/htpasswd.users and /etc/httpd/httpd.users)
 
Deleted:
<
<
-- AndreaCristofori - 2012-02-15
 \ No newline at end of file
Added:
>
>
This Nagios monitors hosts belonging to ROCs/NGIs "Italy NGI_IT NGI_NL NGI_DE UKI NGI_IBERGRID ROC_IGALC NGI_FRANCE AsiaPacific" where enmr.eu probes can be done (it isn't yet available a real VO-Nagios).

Also the sites BCBR, ZA-UCT-ICTS, ZA-UJ, uncertified or not present in GOCDB, are monitored.

Detailed documentation and instructions about Nagios could be found here


Information about latest update 09

Management instructions

For its probes, Nagios uses Badoer's certificate, that must be renewed before it expires (it lasts one week):

  • [badoer@grid-monitor03]# myproxy-init --voms enmr.eu:/enmr.eu/ops -k NagiosRetrieve-grid-monitor03.pd.infn.it-enmr.eu -s prod-ui-02.pd.infn.it -l nagios -x -Z "/C=IT/O=INFN/OU=Host/L=Padova/CN=grid-monitor03.pd.infn.it"

After a yaim reconfig, do the following instructions:

  • Keep only WMS of CERM, removing the other WMSes from
    • /opt/glite/etc/enmr.eu/glite_wms.conf
    • /opt/glite/etc/enmr.eu/glite_wmsui.conf

  • Add WeNMR BDII for SRM probes in sites not belonging to EGI-GOCDB
    • add the following line to /etc/ncg/ncg-localdb.d/uncert.conf
      • MODIFY_METRIC_PARAMETER!org.sam.SRM-All!--ldap-uri!ldap://bdii-wenmr.pd.infn.it:2170
    • or equally:
      • cp /etc/ncg/ncg-localdb.d/uncert.conf.GOOD /etc/ncg/ncg-localdb.d/uncert.conf

  • Configure ncg to find sites not belonging to EGI-GOCDB on their site-BDII
    • add the following lines to /etc/ncg/ncg.conf.d/uncert.conf for each site outside EGI-GOCDB
      • #
      • ADD_HOSTS=1
      • LDAP_ADDRESS=
    • or equally:
      • cp /etc/ncg/ncg.conf.d/uncert.conf-OK-OutOfGOCDB_sites /etc/ncg/ncg.conf.d/uncert.conf

To add a site:

  • if the site is certified in EGI-GOCDB:
    • edit yaim configuration file adding the NGI of the site (if not already present) in variable NCG_GOCDB_ROC_NAME.
    • reconfigure with yaim
  • if the site is present in EGI-GOCDB but not certified:
    • edit yaim configuration file adding the site name on variable UNCERTIFIED_SITES (it should be present in TopBDII bdii-wenmr.pd.infn.it)
    • reconfigure with yaim
  • if the site is not present in EGI-GOCDB
    • edit yaim configuration file adding the site name on variable UNCERTIFIED_SITES (it should be present in TopBDII bdii-wenmr.pd.infn.it)
    • reconfigure with yaim
    • edit grid-monitor03:/etc/ncg/ncg.conf.d/uncert.conf changing:
      • <NCG::SiteInfo SITE_NAME>
      • #
      • LDAP_ADDRESS=SITE_BDII
      • # ADD_HOSTS=0
      • ADD_HOSTS=1

To authorize a user whose DN isn't automatically retrieved from VOMS to /etc/nagios/htpasswd.users:

  • copy user's DN in a file /etc/voms2htpasswd-static.d/*.conf

Installation and configuration instructions

This documentation was followed: VO SAM

Here's the steps executed on grid-monitor03.pd.infn.it.

Installation

Installed SL5 x86_64

  • service yum stop
  • chkconfig yum off
  • host certificates (''hostkey.pem'' and ''hostcert.pem'') installed in ''/etc/grid-security/''

  • vi egi-sam.repo
    • [egi-sam]
    • name=EGI SAM repo
    • baseurl=http://repository.egi.eu/sw/production/sam/1/$basearch
    • enabled=1
    • gpgcheck=0
    • protect=1
    • priority=10

  • mv sl.repo sl.repo.disable
  • mv sl-security.repo sl-security.repo.disable
  • mv sl-fastbugs.repo sl-fastbugs.repo.disable
  • mv sl-contrib.repo sl-contrib.repo.disable

  • yum clean all
  • yum install lcg-CA
  • yum install httpd
  • yum groupinstall ig_UI_noafs
  • yum install yum-priorities
  • yum remove mysql-server-5.0.77-4.el5_5.4 mysql-5.0.77-4.el5_5.4 mysql-devel-5.0.77-4.el5_5.4 [necessary because yaim configuration wants a newer version of MySQL]

  • yum install egee-NAGIOS
  • yum install 'perl(Class::Inspector)' [needed to let Nagios update file /etc/nagios/htpasswd.users, where authorized users are listed]

Configuration

  • edit file <yaim-conf-dir>/3_2/nodes/grid-monitor03
    • VOS="enmr.eu"

    • NAGIOS_HOST=grid-monitor03.$MY_DOMAIN
    • NAGIOS_ADMIN_DNS="/C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Cristina Aiftimiei,/C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Sergio Traldi,/C=IT/O=INFN/OU=Personal Certificate/L=LNL/CN=Simone Badoer,/C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Marco Verlato"
    • NCG_NAGIOS_ADMIN=simone.badoer@pd.infn.it
    • NAGIOS_ROLE=vo
    • NCG_PROBES_TYPE=all
    • NCG_VO=enmr.eu
    • NAGIOS_HTTPD_ENABLE_CONFIG=true
    • NAGIOS_NCG_ENABLE_CONFIG=true
    • NAGIOS_SUDO_ENABLE_CONFIG=true
    • NAGIOS_NAGIOS_ENABLE_CONFIG=true
    • NAGIOS_CGI_ENABLE_CONFIG=true
    • NAGIOS_NSCA_PASS=xxx

    • NAGIOS_NCG_ENABLE_CRON=true

    • NCG_TOPOLOGY_USE_SAM=true
    • NCG_TOPOLOGY_USE_GOCDB=false
    • NCG_TOPOLOGY_USE_ENOC=false
    • NCG_TOPOLOGY_USE_LDAP=false

    • NCG_REMOTE_USE_SAM=false
    • NCG_REMOTE_USE_NAGIOS=false
    • NCG_REMOTE_USE_ENOC=false

    • MYSQL_ADMIN="xxx"
    • DB_PASS="xxx"

    • MYEGI_ADMIN_NAME="Simone Badoer"
    • MYEGI_ADMIN_EMAIL="simone.badoer@pd.infn.it"
    • MYEGI_DEFAULT_PROFILE="ROC"

    • NCG_MDDB_SUPPORTED_PROFILES="ROC,ROC_CRITICAL,ROC_OPERATORS"
    • NCG_NOTIFICATION_HEADER="WeNMR Nagios"
    • NCG_INCLUDE_EMPTY_HOSTS=0
    • # Found from GOCDB:
    • NCG_GOCDB_ROC_NAME="Italy NGI_IT NGI_NL NGI_DE UKI NGI_IBERGRID ROC_IGALC"

    • # Needed for uncertified sites:
    • UNCERTIFIED_SITES="BCBR"
    • UNCERTIFIED_WMS=wms-enmr.chem.uu.nl
    • UNCERTIFIED_BDII=bdii-enmr.chem.uu.nl

  • /opt/glite/yaim/bin/ig_yaim -c -d 6 -s /usr/local/nfs/3_2/ig-site-info.def.current -n ig_UI_noafs -n glite-NAGIOS 2>&1 | tee /root/conf_ig_UI_noafs__glite-NAGIOS.`hostname -s`.`date +mHS`.log

  • on yaim configuration file of prod-ui-02 changed this variables and reconfigured prod-ui-02:
    • GRID_AUTHORIZED_RETRIEVERS="'/C=IT/O=INFN/OU=Host/L=Padova/CN=prod-ui-02.pd.infn.it' '/C=IT/O=INFN/OU=Host/L=Padova/CN=cert-30.pd.infn.it' '/C=IT/O=INFN/OU=Host/L=Padova/CN=grid-monitor03.pd.infn.it'"
    • GRID_TRUSTED_RETRIEVERS="'/C=IT/O=INFN/OU=Host/L=Padova/CN=prod-ui-02.pd.infn.it' '/C=IT/O=INFN/OU=Host/L=Padova/CN=cert-30.pd.infn.it' '/C=IT/O=INFN/OU=Host/L=Padova/CN=grid-monitor03.pd.infn.it'"

  • userdadd badoer
  • [badoer@grid-monitor03]#myproxy-init -k NagiosRetrieve-grid-monitor03.pd.infn.it-enmr.eu -s prod-ui-02.pd.infn.it -l nagios -x -Z "/C=IT/O=INFN/OU=Host/L=Padova/CN=grid-monitor03.pd.infn.it"

  • cp /etc/nagios/plugins/send_to_db.ini /etc/nagios/plugins/send_to_db.ini
  • edit /etc/nagios/plugins/send_to_db.ini changing:
    • db_user=mrs
    • db_pwd=xxx


Information about old update 07

This Nagios monitors hosts published by Top BDII bdii-enmr.cerm.unifi.it

Detailed documentation and instructions about Nagios could be found here


Management instructions

Nothing to do about published information, a cron keeps them up-to-date.

After a reconfiguration

  • ''service nagios restart''


Installation and configuration instructions

Initially this documentation was followed:

After a series of problems, due to database name errors (only one database named 'mrs' must be used, while in previous links the names of databases are defined by user as yaim variables - see ticket https://gus.fzk.de/ws/ticket_info.php?ticket=65594) the following documentation was used to correctly complete the first installation:

"All on one box" configuration (Nagios + NRPE + ig_UI) was installed.

Here's the steps executed on grid-monitor03.pd.infn.it.

Installation

  • host certificates (''hostkey.pem'' and ''hostcert.pem'') installed in ''/etc/grid-security/''

  • copied repo files in ''/etc/yum.repos.d/'' as described in documentation
    • ''lcg-CA.repo''
    • ''glite-BDII.repo''
    • rpmforge (''rpmforge-release-0.5.1-1.el5.rf.x86_64.rpm'')
    • sa1-release (''sa1-release-3-1.el5.noarch.rpm'')
    • ''ig.repo''
    • ''dag.repo'' (different versions for ig and for glite)
  • disabled (renamed with different extension) ''dag.repo'', ''sl.repo'' and ''sl-security.repo'' because it's used the option
"event-scheduler=1" in file ''/etc/my.cnf'' available only for MySQL > 5.1.6 (default mysql was 5.0.77)

  • ''yum install httpd''
  • ''yum install lcg-CA''
  • ''yum groupinstall ig_UI_noafs''
  • ''yum install egee-NAGIOS''

Configuration

Nagios specific variables was defined in /opt/nfs_install/3_2/nodes/grid-monitor03.pd.infn.it In particular:

  • NAGIOS_ROLE=vo
    • it creates some databases (ATP, MDDB, MS)... don't know if really necessary...
    • it searches voms for the specified VO
  • NCG_VO=enmr.eu
  • BDII_HOST=bdii-enmr.cerm.unifi.it
    • set the Top BDII where to find sites to be monitored
  • NCG_LDAP_FILTER="GlueSiteUniqueID=*"
    • this is a "false" filter (* implies every site), but this variable must have a value in order to let yaim (config_ncg) to create by its own the correct file ''/etc/ncg/ncg.conf'', in such a way that ncg considers only the Top BDII; if this variable is not set, ncg searches for all sites belonging to other parameters, for example ROC=Italy

A lot of bugs had to be resolved before having a good configuration.

  • hardcoded parameter in ''/usr/share/doc/atp-1.15.6/mysql_schema/ver_1_6/increase_version.sql''
    • line 1: removed ''USE `mrs`;''
      • ATP_DB_NAME is a variable defined in ''site-info.def'' with value 'atp', not hardcoded with value 'mrs', so yaim couldn't create table atp.schema_details
    • no problems using only one DB named 'mrs' (ATP_DB_NAME=`mrs`)

  • wrong comment in ''/opt/glite/yaim/functions/config_mddb_mysql''
    • line 112: uncomment ''#mysqladmin -u root --password=${MYSQL_ADMIN} create $MDDB_DB_NAME > /dev/null 2>&1''
      • MDDB database couldn't be created
    • no problems using only one DB named 'mrs' (MDDB_DB_NAME=`mrs`)

  • wrong parameter in function tableName in file ''/usr/libexec/mddb/synchronizer.php''
    • line 22: changed ''vo'' with ''atp.vo''
      • a test on an inexistent table was tried
    • no problems using only one DB named 'mrs'

  • hardcoded parameters in each file in directory ''/usr/share/doc/nagios2metricstore-1.0.29/DBScripts/initial/1.4/mysql/''
    • removed every instance of ''USE `mrs`;'' on every file
      • MS_DB_NAME is a variable defined in ''site-info.def'' with value 'metricstore', not hardcoded with value 'mrs', so yaim couldn't go on
    • no problems using only one DB named 'mrs' (MS_DB_NAME=`mrs`)

  • undefined tables in ''/usr/share/doc/nagios2metricstore-1.0.29/DBScripts/initial/1.4/mysql/create_structure.sql''
    • created tables ''vo, metrics, service, profile'' and their dipendences copying their definitions from ATP database (is it wrong??)
      • these tables are required from other tables - declared in the same file - because or foreign key, for example:
        • FOREIGN KEY (vo_id )
        • REFERENCES vo (id )
    • no problems using only one DB named 'mrs'

  • undefined field in ''/usr/share/doc/nagios2metricstore-1.0.29/DBScripts/initial/1.4/mysql/create_structure.sql''
    • added field 'db_name' on table 'schema_details' copying from its definition in MDDB database
      • it's used by file ''/usr/share/doc/nagios2metricstore-1.0.29/DBScripts/initial/1.4/mysql/increase_version.sql''
    • no problems using only one DB named 'mrs'

  • LDAP error with some TOPOLOGY definition:
    • set these variables:
      • NCG_TOPOLOGY_USE_SAM=true
      • NCG_TOPOLOGY_USE_GOCDB=false
      • NCG_TOPOLOGY_USE_ENOC=false
      • NCG_TOPOLOGY_USE_LDAP=false
    • in the beginning they were inverted, but there was a blocking LDAP error when a host couldn't be connected.
      • Invoking NCG::SiteInfo::LDAP.
      • DEBUG: in NCG::SiteDB::siteName with args:
      • DEBUG: in NCG::SiteDB::siteLDAP with args:
      • Getting info from LDAP: inaf-ce-01.ct.pi2s2.it:2170/Mds-Vo-Name=GRISU-COMETA-INAF-CT, O=Grid
      • ERROR: Cannot connect to inaf-ce-01.ct.pi2s2.it:2170
      • Module NCG::SiteInfo::LDAP hit critical error, stopping NCG

  • exit with error in ''/usr/sbin/ncg.reload.sh''
    • moved 'exit 0' from line 18 to line 19, outside the more internal 'if'.
      • if service nagios is stopped (at the first configuration it is stopped), ''service nagios reload'' gives an error (exit 7: reload implies stop and start, and stopping a stopped service is considered by /etc/init.d/nagios an error); so if exit!=0 yaim failed

  • wrong directory in ''/opt/glite/yaim/functions/config_nagios''
    • line 266: changed ''lock_file=/var/run/nagios.pid'' with ''lock_file=/var/run/nagios/nagios.pid''
      • there was a permission denied error because the deamon nagios is executed by user nagios, but the pid file wasn't created in a directory with write permission for that user

  • short(?) timeout in ''/opt/glite/yaim/functions/config_ncg''
    • lines 299 and 448: changed from ''TIMEOUT=600'' to ''#TIMEOUT=600''
      • error starting ncg; the log in /var/log/ncg.log:
        • ERROR: Could not get results from SAM: 500 Server closed connection without sending any data back
        • ERROR: Could not get list of critical metrics from SAM: 500 Server closed connection without sending any data back

After correcting the bugs, finally the yaim configuration command:

  • /opt/glite/yaim/bin/ig_yaim -c -d 6 -s /usr/local/nfs/3_2/ig-site-info.def.current -n ig_UI_noafs -n glite-NAGIOS 2>&1 | tee /root/yaim38.log

Post configuration

  • changed https port to make site visible outside pd.infn.it
    • edited ''/etc/httpd/conf.d/ssl.conf'' changing from 443 to 50080

  • set variable NAGIOS_HTTPD_ENABLE_CONFIG=false in yaim configuration file, in order to avoid https configuration to be reset after every reconfiguration

-- MarcoVerlato - 2012-02-15

Revision 12012-02-15 - AndreaCristofori

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="WebHome"

Test

-- AndreaCristofori - 2012-02-15

 
This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback