Notes about Installation and Configuration of a WN - EMI-2 - SL6 (torque, mpi, glexec)

  • These notes are provided by site admins on a best effort base as a contribution to the IGI communities and MUST not be considered as a subsitute of the Official IGI documentation.
  • This document is addressed to site administrators responsible for middleware installation and configuration.
  • The goal of this page is to provide some hints and examples on how to install and configure an EMI-2 WN service based on EMI middleware, with TORQUE as batch system installed on a different host, using ARGUS and GLEXEC for the users authorization and with MPI enabled.

WN BATCH SYSTEM ARGUS MPI WNODES
EMI-2 SL6 torque glexec enabled TODO

References

  1. About IGI - Italian Grid infrastructure
    1. About IGI Release
    2. IGI Official Installation and Configuration guide
  2. EMI-2 Release
    1. EMI-WN
    2. glite-MPI
  3. Yaim Guide
    1. site-info.def yaim variables
    2. MPI yaim variables
    3. WN yaim variables
    4. TORQUE Yaim variables
  4. MPI-Start Installation and Configuration
  5. Troubleshooting Guide for Operational Errors on EGI Sites
  6. Grid Administration FAQs page

Service installation

O.S. and Repos

  • Starts from a fresh installation of Scientific Linux 6.x (x86_64).
# cat /etc/redhat-release 
Scientific Linux release 6.2 (Carbon)

  • Install the additional repositories: EPEL, Certification Authority, EMI-2

# yum install yum-priorities yum-protectbase epel-release
# rpm -ivh http://emisoft.web.cern.ch/emisoft/dist/EMI/2/sl6/x86_64/base/emi-release-2.0.0-1.sl6.noarch.rpm

# cd /etc/yum.repos.d/
# wget http://repo-pd.italiangrid.it/mrepo/repos/egi-trustanchors.repo

  • Be sure that SELINUX is disabled (or permissive). Details on how to disable SELINUX are here:

# getenforce 
Disabled

yum install

# yum clean all 

#  yum install ca-policy-egi-core emi-wn emi-torque-utils glite-mpi emi-glexec_wn openmpi openmpi-devel mpich2 mpich2-devel emi-torque-client

Service configuration

You have to copy the configuration files in another path, for example root, and set them properly (see later):
# cp -vr /opt/glite/yaim/examples/siteinfo .

host certificate

# ll /etc/grid-security/host*
-rw-r--r-- 1 root root 1440 Oct 18 09:31 /etc/grid-security/hostcert.pem
-r-------- 1 root root  887 Oct 18 09:31 /etc/grid-security/hostkey.pem

vo.d directory

Create the directory siteinfo/vo.d and fill it with a file for each supported VO. You can download them from HERE and here an example for some VOs. Information about the several VOs are available at the CENTRAL OPERATIONS PORTAL.
# cat /root/siteinfo/vo.d/comput-er.it
SW_DIR=$VO_SW_DIR/computer
DEFAULT_SE=$SE_HOST
STORAGE_DIR=$CLASSIC_STORAGE_DIR/computer
VOMS_SERVERS="'vomss://voms2.cnaf.infn.it:8443/voms/comput-er.it?/comput-er.it'"
VOMSES="'comput-er.it voms2.cnaf.infn.it 15007 /C=IT/O=INFN/OU=Host/L=CNAF/CN=voms2.cnaf.infn.it comput-er.it' 'comput-er.it voms-02.pd.infn.it 15007 /C=IT/O=INFN/OU=Host/L=Padova/CN=voms-02.pd.infn.it comput-er.it'"
VOMS_CA_DN="'/C=IT/O=INFN/CN=INFN CA' '/C=IT/O=INFN/CN=INFN CA'"

# cat /root/siteinfo/vo.d/dteam
SW_DIR=$VO_SW_DIR/dteam
DEFAULT_SE=$SE_HOST
STORAGE_DIR=$CLASSIC_STORAGE_DIR/dteam
VOMS_SERVERS='vomss://voms.hellasgrid.gr:8443/voms/dteam?/dteam/'
VOMSES="'dteam lcg-voms.cern.ch 15004 /DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch dteam 24' 'dteam voms.cern.ch 15004 /DC=ch/DC=cern/OU=computers/CN=voms.cern.ch dteam 24' 'dteam voms.hellasgrid.gr 15004 /C=GR/O=HellasGrid/OU=hellasgrid.gr/CN=voms.hellasgrid.gr dteam 24' 'dteam voms2.hellasgrid.gr 15004 /C=GR/O=HellasGrid/OU=hellasgrid.gr/CN=voms2.hellasgrid.gr dteam 24'"
VOMS_CA_DN="'/DC=ch/DC=cern/CN=CERN Trusted Certification Authority' '/DC=ch/DC=cern/CN=CERN Trusted Certification Authority' '/C=GR/O=HellasGrid/OU=Certification Authorities/CN=HellasGrid CA 2006' '/C=GR/O=HellasGrid/OU=Certification Authorities/CN=HellasGrid CA 2006'"

# cat /root/siteinfo/vo.d/gridit
SW_DIR=$VO_SW_DIR/gridit
DEFAULT_SE=$SE_HOST
STORAGE_DIR=$CLASSIC_STORAGE_DIR/gridit
VOMS_SERVERS="'vomss://voms.cnaf.infn.it:8443/voms/gridit?/gridit' 'vomss://voms-01.pd.infn.it:8443/voms/gridit?/gridit'"
VOMSES="'gridit voms.cnaf.infn.it 15008 /C=IT/O=INFN/OU=Host/L=CNAF/CN=voms.cnaf.infn.it gridit' 'gridit voms-01.pd.infn.it 15008 /C=IT/O=INFN/OU=Host/L=Padova/CN=voms-01.pd.infn.it gridit'"
VOMS_CA_DN="'/C=IT/O=INFN/CN=INFN CA' '/C=IT/O=INFN/CN=INFN CA'"

# cat /root/siteinfo/vo.d/igi.italiangrid.it
SW_DIR=$VO_SW_DIR/igi
DEFAULT_SE=$SE_HOST
STORAGE_DIR=$CLASSIC_STORAGE_DIR/igi
VOMS_SERVERS="'vomss://vomsmania.cnaf.infn.it:8443/voms/igi.italiangrid.it?/igi.italiangrid.it'"
VOMSES="'igi.italiangrid.it vomsmania.cnaf.infn.it 15003 /C=IT/O=INFN/OU=Host/L=CNAF/CN=vomsmania.cnaf.infn.it igi.italiangrid.it'"
VOMS_CA_DN="'/C=IT/O=INFN/CN=INFN CA'"

# cat /root/siteinfo/vo.d/infngrid
SW_DIR=$VO_SW_DIR/infngrid
DEFAULT_SE=$SE_HOST
STORAGE_DIR=$CLASSIC_STORAGE_DIR/infngrid
VOMS_SERVERS="'vomss://voms.cnaf.infn.it:8443/voms/infngrid?/infngrid' 'vomss://voms-01.pd.infn.it:8443/voms/infngrid?/infngrid'"
VOMSES="'infngrid voms.cnaf.infn.it 15000 /C=IT/O=INFN/OU=Host/L=CNAF/CN=voms.cnaf.infn.it infngrid' 'infngrid voms-01.pd.infn.it 15000 /C=IT/O=INFN/OU=Host/L=Padova/CN=voms-01.pd.infn.it infngrid'"
VOMS_CA_DN="'/C=IT/O=INFN/CN=INFN CA' '/C=IT/O=INFN/CN=INFN CA'"

# cat /root/siteinfo/vo.d/ops
SW_DIR=$VO_SW_DIR/ops
DEFAULT_SE=$SE_HOST
STORAGE_DIR=$CLASSIC_STORAGE_DIR/ops
VOMS_SERVERS="vomss://voms.cern.ch:8443/voms/ops?/ops/"
VOMSES="'ops lcg-voms.cern.ch 15009 /DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch ops 24' 'ops voms.cern.ch 15009 /DC=ch/DC=cern/OU=computers/CN=voms.cern.ch ops 24'"
VOMS_CA_DN="'/DC=ch/DC=cern/CN=CERN Trusted Certification Authority' '/DC=ch/DC=cern/CN=CERN Trusted Certification Authority'"

users and groups

You can download them from HERE.

Munge

Copy the key /etc/munge/munge.key from the Torque server to every host of your cluster, adjust the permissions and start the service
# chown munge:munge /etc/munge/munge.key

# ls -ltr /etc/munge/
total 4
-r-------- 1 munge munge 1024 Jan 13 14:32 munge.key

# chkconfig munge on
# /etc/init.d/munge restart

site-info.def

KISS: Keep it simple, stupid! For your convenience there is an explanation of each yaim variable. For more details look HERE.

SUGGESTION: use the same site-info.def for CREAM and WNs: for this reason in this example file there are yaim variable used by CREAM, TORQUE or emi-WN.

# cat site-info.def 
CE_HOST=cream-01.cnaf.infn.it
SITE_NAME=IGI-BOLOGNA

BATCH_SERVER=batch.cnaf.infn.it
BATCH_LOG_DIR=/var/torque

BDII_HOST=egee-bdii.cnaf.infn.it

CE_BATCH_SYS=torque
JOB_MANAGER=pbs
BATCH_VERSION=torque-2.5.7
#CE_DATADIR=

CE_INBOUNDIP=FALSE
CE_OUTBOUNDIP=TRUE
CE_OS="ScientificSL"
CE_OS_RELEASE=6.2
CE_OS_VERSION="Carbon"

CE_RUNTIMEENV="IGI-BOLOGNA"

CE_PHYSCPU=8
CE_LOGCPU=16
CE_MINPHYSMEM=16000
CE_MINVIRTMEM=32000
CE_SMPSIZE=8
CE_CPU_MODEL=Xeon
CE_CPU_SPEED=2493
CE_CPU_VENDOR=intel
CE_CAPABILITY="CPUScalingReferenceSI00=1039 glexec"
CE_OTHERDESCR="Cores=1,Benchmark=4.156-HEP-SPEC06"
CE_SF00=951
CE_SI00=1039
CE_OS_ARCH=x86_64

CREAM_PEPC_RESOURCEID="http://cnaf.infn.it/cremino"

USERS_CONF=/root/siteinfo/ig-users.conf
GROUPS_CONF=/root/siteinfo/ig-users.conf

VOS="comput-er.it dteam igi.italiangrid.it infngrid ops gridit"
QUEUES="cert prod"
CERT_GROUP_ENABLE="dteam infngrid ops /dteam/ROLE=lcgadmin /dteam/ROLE=production /ops/ROLE=lcgadmin /ops/ROLE=pilot /infngrid/ROLE=SoftwareManager /infngrid/ROLE=pilot"
PROD_GROUP_ENABLE="comput-er.it gridit igi.italiangrid.it /comput-er.it/ROLE=SoftwareManager /gridit/ROLE=SoftwareManager /igi.italiangrid.it/ROLE=SoftwareManager"
VO_SW_DIR=/opt/exp_soft

WN_LIST="/root/siteinfo/wn-list.conf"
MUNGE_KEY_FILE=/etc/munge/munge.key
CONFIG_MAUI="no"

MYSQL_PASSWORD=*********************************
APEL_DB_PASSWORD=not_used
APEL_MYSQL_HOST=not_used
SE_LIST="darkstorm.cnaf.infn.it"
SE_MOUNT_INFO_LIST="none"

For your convenience there is an explanation of each yaim variable. For more details look at [6, 7, 8, 9]

glite-mpi

in the following example, it is enabled the support for MPICH2 and OPENMPI; moreover the WNs are configured to use shared homes

MPI_MPICH_ENABLE="no"
MPI_MPICH2_ENABLE="yes"
MPI_OPENMPI_ENABLE="yes"
MPI_LAM_ENABLE="no"

#MPI_MPICH_PATH="/opt/mpich-1.2.7p1/"
#MPI_MPICH_VERSION="1.2.7p1"
MPI_MPICH2_PATH="/usr/lib64/mpich2/bin"
MPI_MPICH2_VERSION="1.2.1"
MPI_OPENMPI_PATH="/usr/lib64/openmpi/bin/"
MPI_OPENMPI_VERSION="1.5.4"
#MPI_LAM_VERSION="7.1.2"

# Most versions of MPI now distribute their own versions of mpiexec
# However, I had some problems with the MPICH2 version - so use standard mpiexec
MPI_MPICH_MPIEXEC="/usr/bin/mpiexec"
MPI_MPICH2_MPIEXEC="/usr/bin/mpiexec"
MPI_OPENMPI_MPIEXEC="/usr/lib64/openmpi/bin/mpiexec"

#########  MPI_SHARED_HOME section
# Set this variable to one of the following:
# MPI_SHARED_HOME="no" if a shared directory is not used
# MPI_SHARED_HOME="yes" if the HOME directory area is shared
# MPI_SHARED_HOME="/Path/to/Shared/Location" if a shared area other
#    than the HOME dirirectory is used.
# If you do provide a shared home and Grid jobs normally start in that area,
# set MPI_SHARED_HOME to "yes".
MPI_SHARED_HOME="yes"

######## Intra WN authentication
MPI_SSH_HOST_BASED_AUTH=${MPI_SSH_HOST_BASED_AUTH:-"no"}

glite-mpi_wn

# Setup configuration variables that are common to both the CE and WN
# Most variables are common to CE and WN. It is easier to define
# These in a common file ${config_dir}/services/glite-mpi

if [ -r ${config_dir}/services/glite-mpi ]; then
 source ${config_dir}/services/glite-mpi
fi

services/glite-glexec_wn

GLEXEC_WN_SCAS_ENABLED="no"
GLEXEC_WN_ARGUS_ENABLED="yes"
GLEXEC_WN_OPMODE="setuid"           

yaim check

# /opt/glite/yaim/bin/yaim -v -s site-info_batch.def -n MPI_WN -n WN_torque_noafs -n GLEXEC_wn

yaim config

# /opt/glite/yaim/bin/yaim -d 6 -c -s site-info_batch.def -n MPI_WN -n WN_torque_noafs -n GLEXEC_wn

N.B.: To reconfigure the BATCH_SERVER variable on the WN via yaim, remove the file /var/lib/torque/mom_priv/config as reported by yaim itself

DEBUG: configuring pbs
   WARNING: /var/lib/torque/mom_priv/config already exists, YAIM will not touch it
   WARNING: Batch server defined in BATCH_SERVER variable is different
   WARNING: from the batch server defined under /var/lib/torque/mom_priv/config
   WARNING: Remove /var/lib/torque/mom_priv/config and reconfigure again to use the new value! 

-- PaoloVeronesi - 2012-05-30

Edit | Attach | PDF | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | More topic actions
Topic revision: r4 - 2012-06-04 - PaoloVeronesi
 
This site is powered by the TWiki collaboration platformCopyright © 2008-2018 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback