System Administrator Guide for WMS for EMI
1 Installation and configuration
1.1 Prerequisites
1.1.1 Operating System
A standard x86_64 SL(C)5 distribution is supposed to be properly installed. An EPEL repository must be installed on the machine.
1.1.2 Node synchronization
A general requirement for the Grid nodes is that they are synchronized. This requirement may be fulfilled in several ways. One of the most common one is using the
NTP
protocol with a time server.
1.1.3 Cron and logrotate
Many components deployed on the WMS rely on the presence of
cron
(including support for
/etc/cron.*
directories) and
logrotate
. You should make sure these utils are available on your system.
1.2 Installation
1.2.1 Repositories
For a successful installation, you will need to configure your package manager to reference a number of repositories (in addition to your OS);
- the EPEL repository
- the EMI middleware repository
- the CA repository
and to
REMOVE () or
DEACTIVATE (!!!)
1.2.1.1 The EPEL repository
You can install the EPEL repository, issuing:
rpm -Uvh http://download.fedora.redhat.com/pub/epel/5/i386/epel-release-5-4.noarch.rpm
1.2.1.2 The EMI middleware repository
The EMI-1 repository can be found under
http://emisoft.web.cern.ch/emisoft/dist/EMI/1/sl5/x86_64/
To use yum, the yum repo to be installed in /etc/yum.repos.d can be found at
http://emisoft.web.cern.ch/emisoft/
The packages are signed with the EMI gpg key, that can be downloaded from
http://emisoft.web.cern.ch/emisoft/dist/EMI/1/RPM-GPG-KEY-emi.
To import it:
[root@emi-demo11 ~]# wget http://emisoft.web.cern.ch/emisoft/dist/EMI/1/RPM-GPG-KEY-emi -O /tmp/emi-key_gd.asc
[root@emi-demo11 ~]# rpm --import /tmp/emi-key_gd.asc
1.2.1.3 The Certification Authority repository
The most up-to-date version of the list of trusted Certification Authorities (CA) is needed on your node. The relevant yum repo can be installed issuing:
wget http://repository.egi.eu/sw/production/cas/1/current/repo-files/egi-trustanchors.repo -O /etc/yum.repos.d/egi-trustanchors.repo
1.2.1.4 Important note on automatic updates
An update of an RPM not followed by configuration can cause problems. Therefore WE STRONGLY RECOMMEND NOT TO USE AUTOMATIC UPDATE PROCEDURE OF ANY KIND.
Running the script available at
http://forge.cnaf.infn.it/frs/download.php/101/disable_yum.sh (implemented by Giuseppe Platania (INFN Catania) yum autoupdate will be disabled
1.2.2 Installation of a WMS node
First of all, install the yum-protectbase rpm:
yum install yum-protectbase.noarch
Then proceed with the installation of the CA certificates.
1.2.2.1 Installation of the CA certificates
The CA certificate can be installed issuing:
yum install ca-policy-egi-core
1.2.2.2 Installation of the WMS software
Install the WMS metapackage:
yum install emi-wms
1.3 Configuration
1.3.1 Using the YAIM configuration tool
For a detailed description on how to configure the middleware with YAIM, please check the
YAIM guide.
The necessary YAIM modules needed to configure a certain node type are automatically installed with the middleware.
1.3.2 Configuration of a WMS node
1.3.2.1 Install host certificate
The WMS node requires the host certificate/key files to be installed. Contact your national Certification Authority (CA) to understand how to obtain a host certificate if you do not have one already.
Once you have obtained a valid certificate:
- hostcert.pem - containing the machine public key
- hostkey.pem - containing the machine private key
make sure to place the two files in the target node into the /etc/grid-security directory. Then set the proper mode and ownerships doing:
chown root.root /etc/grid-security/hostkey.pem
chmod 600 /etc/grid-security/hostcert.pem
chmod 400 /etc/grid-security/hostkey.pem
chown root.root /etc/grid-security/hostcert.pem
1.3.2.2 Configure the siteinfo.def file
Set your
siteinfo.def
file, which is the input file used by yaim. The yaim variables relevant for WMS are the following:
- $WMS_HOST -> the WMS hostname, ex. : 'egee-rb-01.$MY_DOMAIN'
- $PX_HOST -> the hostname of a server myproxy, ex.: 'myproxy.$MY_DOMAIN'
- $BDII_HOST -> the hostname of the site bdii to be used, ex: 'sitebdii.$MY_DOMAIN'
- $LB_HOST -> the hostname of the LB server to be used, ex: 'lb-server.$MY_DOMAIN:9000' This variable is set as a service specific variable in the file services/glite-wms, located one directory below the one where the 'site-info.def' file is locate
If an LB has to be installed in colocation on the same server (LBProxy = both), the following parameters have to be set in
siteinfo/services/glite-wms
file:
- LB_HOST = "WMS hostanme:port" ex: "devel11.cnaf.infn.it:9000"
- GLITE_LB_TYPE = both
and the following parameters have to be set in
siteinfo.def
file:
- GLITE_LB_AUTHZ_REGISTER_JOBS = ".*"
- GLITE_LB_WMS_DN = "the WMS DN" ex: "/C=IT/O=INFN/OU=Host/L=CNAF/CN=devel09.cnaf.infn.it"
1.3.2.3 Configure SELinux
In order for the httpd section of yaim to be run correctly, SELinux should be disabled, or, as an alternative, the certificates should be correctly labelled, see:
http://docs.fedoraproject.org/en-US/Fedora/13/html/Security-Enhanced_Linux/sect-Security-Enhanced_Linux-Working_with_SELinux-SELinux_Contexts_Labeling_Files.html
In WMS EMI 3 SL6 (glite-wms-interface-3.5.0-6.sl6.x86_64) in order for the httpd section of yaim to be run correctly, SELinux should be disabled, or, as an alternative, the workaround described above must be applyed.
SELINUX enabled:
# getenforce
Enforcing
# ls -Z /var/lib/glite/.certs/host*.pem
-rw-r--r--. glite glite user_u:object_r:var_lib_t:s0 /var/lib/glite/.certs/hostcert.pem
-r--------. glite glite user_u:object_r:var_lib_t:s0 /var/lib/glite/.certs/hostkey.pem
WORKAROUND:
/usr/sbin/setsebool httpd_can_network_connect=1
/usr/sbin/semanage port -a -t http_port_t -p tcp 7443
# /etc/init.d/glite-wms-wmproxy restart
Restarting /usr/bin/glite_wms_wmproxy_server... ok
1.3.2.4 Run yaim
After having filled the
siteinfo.def
file, run yaim:
/opt/glite/yaim/bin/yaim -c -s <site-info.def> -n WMS
1.3.3 Configuration of the WMS CLI
The WMS CLI is part of the EMI-UI. To configure it please refer to xxx.
2 Operating the system
2.1 How to start the WMS service
A system administrator can start the WMS service by issuing:
service gLite start
A system administrator can stop the WMS service by issuing:
service gLite stop
2.2 Daemons
Scripts to check the daemons status and to start/stop are located in the ${GLITE_WMS_LOCATION}/etc/init.d/ directory (i.e. ${GLITE_WMS_LOCATION}/etc/init.d/glite-wms-wm start/stop/status). Glite production installation also provide a more generic service, called gLite, to manage all of them simultaneously, try service gLite status/start/stop On a typical WMS node the following services must be running:
- glite-lb-locallogger:
glite-lb-logd running
glite-lb-interlogd running
- glite-lb-proxy:
glite-lb-proxy running as 4137
- glite-proxy-renewald:
glite-proxy-renewd running
- globus-gridftp:
globus-gridftp-server (pid 3107) is running...
- glite-wms-jc:
JobController running in pid: 10008
CondorG master running in pid: 10063 10062
CondorG schedd running in pid: 10070
- glite-wms-lm:
Logmonitor running...
- glite-wms-wm:
/opt/glite/bin/glite-wms-workload_manager (pid 9957) is running...
- glite-wms-wmproxy:
WMProxy httpd listening on port 7443
httpd (pid 22223 22222 22221 22220 22219 22218 22217) is running ....
===
WMProxy Server running instances:
UID PID PPID C STIME TTY TIME CMD
- glite-wms-ice:
/opt/glite/bin/glite-wms-ice-safe (pid 10103) is running...
2.3 Init scripts
The init scripts are located under
/etc/init.d
and are the following:
/etc/init.d/globus-gridftp
/etc/init.d/glite-wms-wmproxy
/etc/init.d/glite-wms-wm
/etc/init.d/glite-wms-lm
/etc/init.d/glite-wms-jc
/etc/init.d/glite-wms-ice
/etc/init.d/glite-proxy-renewald
/etc/init.d/glite-lb-locallogger
/etc/init.d/glite-lb-bkserverd
2.4 Configuration Files
The configuration files are located under
/etc/glite-wms
and are the following:
/etc/glite-wms/glite_wms.conf
/etc/glite-wms/glite_wms_wmproxy.gacl
/etc/glite-wms/glite_wms_wmproxy_httpd.conf
/etc/glite-wms/wmproxy_logrotate.conf
/etc/glite-wms/.drain
For configuration files related to other services running on the wms node, please refer to
Service Reference Card.
2.4.1 glite_wms.conf
This is the general configuration file for the WMS. The syntax is based on the ClassAd language. The parameter names are case insensitive. It is organised in sections: one for every running service plus a common section.
[
Common = [...];
JobController = [...];
LogMonitor = [...];
NetworkServer = [...];
WorkloadManager = [...];
WorkloadManagerProxy = [...];
ICE = [...]
]
The value of a parameter can be expressed in terms of environment variables, with the typical UNIX shell syntax: a $ sign followed by the name of the variable in brackets (e.g. ${HOME}).
2.4.1.1 Common section
In general there is no need to change this section.
DGUser: the user under which a WMS process runs
LBProxy: Boolean attribute to switch from LB and LBProxy. If the value of this attribute is true, LBProxy is used for logging and query operations about jobs
HostProxyFile (no default): the host proxy certificate file
Very important paramenters are those that configure the so called
limiter (
OperationLoadScripts) used to inhibit submission in the case that some system load limits are hit.
Since the WMS and LB suggested deployment is to have them on two separate physical machines a foundamental parameter is also
LBServer.
The relevant parameters available in this section are the following:
*
MaxInputSandboxSize = 10000000; this puts a
PER FILE limit in the dimension of the input sandbox of the jdl. Units are byte
LogFile: String attribute containing the path of the WMProxy log file
LogLevel: Integer attribute containing a value from 0 to 6 (Optional). The integer value represents the WMProxy log file verbosity level: from 0 (fatal) to 6 (debug: maximum verbosity)
SandboxStagingPath: Root directory where job sandboxes are stored. It MUST be in the form: <DocumentRoot>/<single directory name>, where
DocumentRoot is set as inside glite_wms_wmproxy_httpd.conf configuration file. The directory MUST be accessible by the user under which WMProxy is running (usually it is the "glite" user). The user running WMProxy is determined by the value of the environment variable GLITE_USER, if not differently set with User directive inside glite_wms_wmproxy_httpd.conf configuration file
ListMatchRootPath: Directory path where temporary pipes for list-match operations are created. The directory MUST be accessible by the user under which WMProxy is running (usually it is the "glite" user). The user running WMProxy is determined by the value of the environment variable GLITE_USER, if not differently set with User directive inside glite_wms_wmproxy_httpd.conf configuration file
GridFTPPort: Port number where gridFTP server is listening
MinPerusalTimeInterval: Integer value representing the time interval (in seconds) between two savings of job partial execution output. This attribute affects the WMProxy and other components behaviour only if perusal functionality are explicity requested by the user via the
JDL, see
EnableFilePerusal JDL attribute
LBServer: Address or list of addresses of the LB Server[s] to be used for storing job's information in the format of <host>[:<port>] (default value for port is 9000). Selection of the LB Server to use is made randomically from the list by the WMProxy, for any different service request. WMproxy maintains a list of weights associated to the available LB Servers so that failing LB Servers have decreasing probability of being selected. If the Service Discovery is enabled, the LB Servers found using the Service Discovery are added in the list.
Note that the following lines have same meaning:
LBServer = "ghemon.cnaf.infn.it:9000";
LBServer = {"ghemon.cnaf.infn.it:9000"};
WeightsCacheValidity: Time in seconds (n) indicating the validity of the weights (i.e. probability to be selected) associated to the available LB Servers. When last weights update (i.e. last received request) has occurred more than n seconds ago then the weights are restored to the same value for all LB Servers
DISMISSED IN EMI2 LBLocalLogger: address of LB Local Logger in the format of <host>[:<port>] (default value for port is 9002). This attribute is needed only if LB Local Logger runs on another host and LBProxy is not enabled. Removed starting from EMI2 releases.
AsyncJobStart: Boolean attribute used to switch from synchronous/asynchronous job start behavior. When set to true, during job start operation the control is returned to user immediately after the request has been received, while the actual execution of the operation (that could be quite time consuming) is performed asynchronously
EnableServiceDiscovery: Boolean attribute to enable Service Discovery. If the value of this attribute is true, the Service Discovery is enabled, i.e. WMProxy invokes Service Discovery for finding available LB Servers
ServiceDiscoveryInfoValidity: Time in seconds (n) indicating the validity of the information provided by the Service Discovery. A call to Service Discovery for updated information is done every n seconds
LBServiceDiscoveryType: Type key for LB Servers to be discovered by Service Discovery
MaxServedRequests: Long attribute limiting the number of operation served by each WMProxy instance before exiting and releasing possibly allocated memory. This value is overriden by GLITE_WMS_WMPROXY_MAX_SERVED_REQUESTS environment variable, if set. This feature can be disabled by setting a lower-or-equal to zero value
OperationsLoadScripts:
ClassAd type attribute where an internal attribute can be specified for any WMProxy provided operation. The names of these attributes are equal to the names of the server operations (e.g. for jobSubmit operation the attribute name to use is "jobSubmit"). This internal attributes are used to provide the path and the name of the script to be executed to verify the load of the WMProxy server for any provided operation. If the server load is too high the requested operation is refused. The path and the name of the script can be followed by user defined options and parameters depending on the specific script needs for arguments.
WMProxy provide a load script that can be used for any of the provided operations. The template load script glite_wms_wmproxy_load_monitor.template is installed by the rpm file glite-wms-wmproxy in the directory ${WMS_LOCATION_SBIN}.
To call the script glite_wms_wmproxy_load_monitor, when the operation jobSubmit is requested, with the options:
--load1 10 --load5 10 --load15 10 --memusage 95 --diskusage 95 --fdnum 500
add the attribute:
OperationsLoadScripts [
jobSubmit = "${WMS_LOCATION_SBIN}/glite_wms_wmproxy_load_monitor
--oper jobSubmit --load1 10 --load5 10 --load15 10
--memusage 95 --diskusage 95 --fdnum 500";
]
Any kind of load script file can be used. If a user custom script is used, the only rule to follow is that the script exit value must be 0 in the case the operation can continue the execution, 1 in the opposite case (operation refused - Server load too high).
The script files must be executable and must have the proper access permissions
2.4.1.3 Workload Manager section
Important parameters in this section are:
DISMISSED IN EMI2 EnableBulkMM = true; //enable the bulk matchmaking for collection
NEW IN EMI2 EnableReplanner =
; // The job replanner can now be toggled by configuration. The replanning feature is not always used, and in some cases it can show problems with queries to the LB, in case of high load. For this reason it is now disabled by default.
IsmUpdateRate = 600; // information supermarket update rate (in seconds)
WorkerThreads = 5; // enable the multithread for the WM component. Speed up the matchmaking process. 5 is a god compromise between machine load and speed.
CeForwardParameters: the parameters forwarded by the WM to the CE
CeMonitorAsynchPort: the port used to listen to notification arriving from CEMon's. A value of -1 means that listening is disabled
CeMonitorServices: the list of CEMon's the WM listens to
DISMISSED IN EMI2 DispatcherType: the WM can read its input using different mechanisms. Currently supported types are "filelist" and "jobdir". Removed starting from EMI2 releases, only jobdir is supported.
EnableRecovery: specifies if at startup the WM should perform a special recovery procedure for the requests that it finds already in its input.
ExpiryPeriod: the maximum time, expressed in seconds, a submitted job is kept in the overall system, from the time it arrives for the first time at the WM
Input: the input source of new requests. If DispatcherType is "filelist" the source is a file; if DispatcherType is "jobdir" the source is the base dir of a JobDir structure, which is supposed to be already in place when the WM starts. A JobDir structure consists of a base dir under which lie other three subdirectories, named tmp, new, old
IsmBlackList: a list of CEs that have to be excluded in the ISM
IsmDump: if the ISM dump is enabled, the dump, in ClassAd format, will be written to this file. In order to avoid file corruptions, the contents of a dump are built in a temporary file, whose name is the same value of this parameter with the prefix ".tmp|, which only at the end of the operation is renamed to the specified file
IsmIiPurchasingRate: the period between two ISM purchases from the BDII, in seconds
IsmThreads: All the threads releated to the ISM management are taken from the thread pool or created separately
IsmUpdateRate: the period between two updates of the ISM, in seconds. Note that conceptually purchasing just retrieves the list of available resources, whereas an ISM update gathers the resource information for each resource.
JobWrapperTemplateDir: the job wrapper sent to the CE and then executed on Worker Node is based on a bash template which is augmented by the WM with job-specific information. This is the location where all the templates - one at the moment - are stored
LogFile: the name of the file where messages are logged
LogLevel: each logging statement in the code specifies a log level. If that level is less than or equal to LogLevel the message is actually logged, otherwise it is ignored. The levels go from 1 (minimum verbosity) to 6 (maximum verbosity)
MatchRetryPeriod: once a job becomes pending, meaning that there are no resources available, this parameter represents the period between successive match-making attempts, in seconds
MaxOutputSandboxSize: the maximum size of the output sandbox, in bytes. The limit is currently enforced by the job wrapper running on the Worker Node, which doesn't upload more data than what specified here. If the value is -1 there is no limit.
MaxRetryCount: the system limit to the number of deep resubmissions for a job. The actual limit is the minimum between this value and the one specified in the job description
QueueSize: (def=1000) Size of the queue of events "ready" to be managed by the workers thread pool
RuntimeMalloc: allows to the use an alternative malloc library (examples are nedmalloc, google performance tools, ccmalloc), specifying the path to the shared object, to be loaded with LD_PRELOAD. Example: RuntimeMalloc = "/usr/lib64/libtcmalloc_minimal.so".
WorkerThreads: the number of request handler threads
ReplanGracePeriod (3600): the minimum time a job should be in status 'scheduled' after being evaluated for replanning
MaxReplansCount (5): the maximum number of replans a job should undergo before being terminated
NEW IN EMI2 SbRetryDifferentProtocols (false): if different protocols should be used when retrying failed or hanging transfers of ISB or OSB in the jobwrapper. See also bug https://savannah.cern.ch/bugs/?48479
WmsRequirements: This expression is appended in && to the user requirements. It contains both WMS typical requirements and queue requirements, such as authZ checks.
Example:
requirements = (userrequirements) && (wmsrequirements); The default value for this attribute (set by yaim) is: WmsRequirements = ((ShortDeadlineJob =?= TRUE) ? RegExp(".sdj$", other.GlueCEUniqueID) : !RegExp(".sdj$", other.GlueCEUniqueID)) && (other.GlueCEPolicyMaxTotalJobs == 0 || other.GlueCEStateTotalJobs < other.GlueCEPolicyMaxTotalJobs) && (EnableWmsFeedback =?= TRUE ? RegExp("cream", other.GlueCEImplementationName, "i") : true);
PropagateToLRMS: this expression is propagated to the LRMS
Usually there is no need to change the default parameters with the exceptio of: RemoveJobFiles = true;
That by default is set to false.
Setting it to true will force condor to remove unused internal files when the job are in a final state.
The relevant parameters available in this section are the following:
LogFile: String attribute containing the path of the JobController log file
LogLevel: Log verbosity level. If that level is less than or equal to LogLevel the message is actually logged, otherwise it is ignored. The level goes from 1 (minimum verbosity) to 6 (maximum verbosity)
LockFile: Path of the lock file for the service
CondorLogDir: Path of the directory where LogMonitor stores the CondorG log files
CondorLogRecycleDir: Path of the directory where old CondorG log files are saved
JobsPerCondorLog: Max number of job logged in the same CondorG log file
GlobusDownTimeout: Log monitor waits this number of seconds before considering as failed a condor job which has lost contact with the CE and so resubmitting it if possible
MainLoopDuration: LogMonitor loops between the CondorLog files every this number of seconds
MonitorInternalDir: Path of the directory where LogMonitor stores its own files
IdRepositoryName: Name of the file containing pieces of information about the jobs used by LogMonitor and JobController
ExternalLogFile: Path of the directory where extra log files are stored
RemoveJobFiles: If sets to true all files used to submit jobs to condor are removed when they are no more necessary. Set it to "false" only for debug purpose. Files are stored in the SubmitFileDir directory as it is set in the JobController section
2.4.1.5 Job Controller section
Usually there is no need to change the default parameters.
The relevant parameters available in this section are the following:
LogFile: String attribute containing the path of the JobController log file
LogLevel: Log verbosity level. If that level is less than or equal to LogLevel the message is actually logged, otherwise it is ignored. The level goes from 1 (minimum verbosity) to 6 (maximum verbosity)
LockFile: Path of the lock file for the service
CondorSubmit: Path of the "condor_submit" command
CondorRemove: Path of the "condor_remove" command
CondorDagman: Path of the "condor_dagman" command
DagmanMaxPre: Sets the maximum number of PRE scripts within the DAG that may be running at one time; it is the "-maxpre" parameter of the condor_dagman command
MaximumTimeAllowedForCondorMatch: Sets the number of seconds that a job can wait in the condor queue to be matched before being resubmitted
ContainerRefreshThreshold: Number of jobs that JobController can take in memory before resyncronizing its container with the one phisically saved in the file "IdRepositoryName" (see LM section)
DISMISSED IN EMI2 InputType: The JobController can read its input using different mechanisms. Currently supported types are "filelist" and "jobdir". Removed starting from EMI2 releases, only jobdir is supported.
Input: The input source of new requests. If InputType is "filelist" the source is a file; if InputType is "jobdir" the source is the base dir of a JobDir structure, which is supposed to be already in place when the JobController starts. A JobDir structure consists of a base dir under which lie other three subdirectories, named tmp, new, old
SubmitFileDir: Path of the directory where the submit files for condor are stored
OutputFileDir: Path of the directory where the standard error/output files of the jobs (e.g. the JobWrapper) are stored
2.4.1.6 ICE section
The parameters for ICE are available in the ICE Configuration Guide
2.4.1.7 Network Server section
Although the Network Server is no more installed on WMS nodes some configuration paramenters in its section of the global conf file are still needed.
The important parameters in this section are those regardig the contact with the information system. In particular:
- II_Contact = "egee-bdii.cnaf.infn.it"; set the hostname of the bdii to be contacted
- II_Port = 2170; set the port on which the bdii is contacted
- Gris_DN = "mds-vo-name=local, o=grid"; set the path where the bdii is publishing information
- II_Timeout = 100; Set the timeout for the bdii query. It is important that this value is not too small, it is very dangerous if many bdii queries fail for timeout reasons. The risk is that all the information on the InformationSupermarker expire making all jobs in Waiting Status not to match any CE (they remain in Waiting Status for a long time, until a query to the bdii is successful). By default that value is set to 30, but 100 is a safer.
- MaxInputSandboxSize = 10000000; # NOT USED
2.4.2 glite_wms_wmproxy.gacl
WMS User Authentication is performed by the WMProxy component based on a GACL module.
The fundamental file used to manages the WMS authentication is the /etc/glite-wms/glite_wms_wmproxy.gacl
file.
This file contains the name of the VO that are allowed to use the WMS. A .gacl file example that allows the dteam and ops VOs is the following:
<pre><gacl version='0.0.1'>
<entry>
<voms>
<fqan>/ops/Role=NULL/Capability=NULL</fqan>
</voms>
<allow>
<exec/>
</allow>
</entry>
<entry>
<voms>
<fqan>/dteam/Role=NULL/Capability=NULL</fqan>
</voms>
<allow>
<exec/>
</allow>
</entry>
</pre>
There must be an exact match between the fqan expressed in the gacl file and the one in the user proxy.
The gacl file can also contain the DNs of single user allowed to use the WMS resources. The following entry in the .gacl file will allow Daniele Cesini to use the WMS even if he is not in the VOs allowed to use the WMS:
<pre><entry>
<person>
<dn>/C=IT/O=INFN/OU=Personal Certificate/L=CNAF/CN=Daniele Cesini/Email=daniele.cesini@cnaf.infn.it/</dn>
</person>
<allow><exec/></allow>
</entry>
</pre>
An entry with a DENY tag can be used to ban users or VOs:
<pre><entry>
<voms>
<fqan>/dteam/Role=admin/Capability=NULL</fqan>
</voms>
<deny><exec/></deny>
</entry>
</pre>
As the previous examples shows, it is possible to allow/ban users and VOs on the basis of their FQAN (i.e. those returned by the voms-proxy-info --fqan command).
User Mapping on a WMS node is done through lcmaps as in any other gLite services, so fundamental places to look in case of mapping problems are:
- the gridmap file:
/etc/grid-security/grid-mapfile
;
- the lcmaps log:
/var/log/glite/lcmaps.log
;
- the gridmapdir:
/etc/grid-security/gridmapdir/
;
- the existing pool accounts for a VO or a VO group/role
2.4.3 glite_wms_wmproxy_httpd.conf
This file is a WMProxy specific configuration file configuring the HTTP daemon and Fast CGI
Removed starting from EMI2 releases, only jobdir is supported.
2.4.4 wmproxy_logrotate.conf
This file configures the logrotate tool, that performs the rotation of httpd-wmproxy-access and httpd-wmproxy-errors HTTPD daemon log files.
DISMISSED IN EMI2 Starting from EMI2 releases, all the log files will be rotated in the same way and by the same tool. In this case it will remain logrotate, but the configuration will be handled by yaim in /etc/logrotate.d
2.4.5 /var/.drain
This file is used to put the WMS in draining mode, so that it does not accept new submission requests but allows any other operations like output retrieval.
The content should be the following:
<gacl>
<entry>
<any-user/>
<deny><exec/></deny>
</entry>
</gacl>
2.5 Log files
The WMS log files, located under /var/log/wms
, are the following:
-
workload_manager_events.log
contains logs of the workload manager component
-
wmproxy.log
contains logs of the wmproxy component
-
httpd-wmproxy-access.log
wmproxy httpd access log
-
httpd-wmproxy-errors.log
wmproxy httpd error log
-
glite-wms-wmproxy.restart.cron.log
log of the /etc/cron.d/glite-wms-wmproxy.restart.cron
cron
-
glite-wms-wmproxy-purge-proxycache.log
log of the /etc/cron.d/glite-wms-wmproxy-purge-proxycache.cron
cron
-
wmproxy_logrotate.log
contains logs about the rotation of wmproxy httpd log files DISMISSED IN EMI2
-
renewal.log
proxy renewal service log
-
logmonitor_events.log
contains logs of the logmonitor component
-
jobcontoller_events.log
contains logs of the jobcontroller component
-
ice.log
contains logs of the ice component
-
glite-wms-purgeStorage.log
log of the /etc/cron.d/glite-wms-purger.cron
cron
For information on log files of other services running on the wms node, please refer to Service Reference Card
2.6 Network ports
Information about network ports is available in the Service Reference Card
2.7 Cron jobs
Information about cron jobs is available in the Service Reference Card
2.8 Security related operations
2.8.1 How authorization works
In the WMS, two different authorization mechanisms can be utilized. The default is local, based on the Gridsite GACL http://www.gridsite.org/wiki/GACL. The relevant entries are basically DN and FQAN, that can be used to set permission on single users and roles (i.e. user banning and so on). FQANs support wildcards to allow for easier handling. The GACL file is "${WMS_LOCATION_ETC}/glite_wms_wmproxy.gacl". It is a structurally simple xml file where policies are specified, either directly or through the siteinfo.def.
Another way to perform authorization is to use Argus as a site service. Argus is typically enabled via sitenfo.def, through the following variables.
USE_ARGUS=<boolean>
ARGUS_PEPD_ENDPOINTS="list_of_space_separated_URLs" # i.e.: "https://argus01.lcg.cscs.ch:8154/authz https://argus02.lcg.cscs.ch:8154/authz https://argus03.lcg.cscs.ch:8154/authz"
On the Argus server side, the policies to be defined will have to specify an action and a resource id. The WMS automatically sets the resource id to its service endpoint. The actions are the following:
getVersion
getJDL
getMaxInputSandboxSize
getSandboxDestURI
getSandboxBulkDestURI
getQuota
getFreeQuota
getOutputFileList
getJobTemplate
getDAGTemplate
getCollectionTemplate
getIntParametricJobTemplate
getStringParametricJobTemplate
getDelegationVersion
getProxyReq
putProxy
renewProxyReq
getNewProxyReq
destroyProxy
getProxyTerminationTime
getACLItems
addACLItems
removeACLItem
getProxyInfo
enableFilePerusal
getPerusalFiles
getTransferProtocols
getJobStatusOp
jobStart
jobSubmit
jobSubmitJSDL
jobRegister
jobListMatch
jobCancel
jobPurge
The profile used for creating the request complies to the glite profile.
2.8.2 How to filter out unwanted VOs
It can be useful to force the WMS to only select resources specific to a certain VO as the matchmaking time is consequntly reduced.
It can be enabled by providing ad additional ldap clause which will be added in the search filter at purchasing time.
The default search filter is:
(|(objectclass=gluecesebind)(objectclass=gluecluster)(objectclass=gluesubcluster)(objectclass=gluevoview)(objectclass=gluece))
The idea is to supply system administrators with the possibility to specify an additional ldap clause which will be added in logical AND
to the latest two clauses of the default filter in order to match gluece/gluevoview
objectclasses specific attributes.
To such an aim the configuration file supplies users with:
IsmIILDAPCEFilterExt
for handling the additional search filter while purchasing information about CE from the BDII
As an example, by specifying the following:
IsmIILDAPCEFilterExt = "(|(GlueCEAccessControlBaseRule=VO:cms)(GlueCEAccessControlBaseRule=VOMS:/cms/*))"
the search filter during the purchasing would be:
(|(objectclass=gluecesebind)(objectclass=gluecluster)(objectclass=gluesubcluster)(&(|(objectclass=gluevoview)(objectclass=gluece))
(|(GlueCEAccessControlBaseRule=VO:cms)(GlueCEAccessControlBaseRule=VOMS:/cms/*))))
and thus the WMS would select only resources (i.e. CE/Views) belonging to CMS.
2.8.3 How to block/ban a VO
To ban a VO, it is suggested to reconfigure the service via yaim without that VO in the siteinfo.def
2.9 Job purging
Purging a WMS job means removing from the WMS node any relevant information about the job (e.g. the job sandbox area).
A job can be purged:
- Explicitly by the administrator by invoking the command
/usr/sbin/glite-wms-purgeStorage
- Automatically by the wms services when a job is aborted and when the output of a completed job is retrieved by the user
- Automatically by the following cron:
3 */6 * * mon-sat glite . /usr/libexec/grid-env.sh ; /usr/sbin/glite-wms-purgeStorage.sh -l /var/log/wms/glite-wms-purgeStorage.log -p /var/SandboxDir -t 604800 > /dev/null 2>&1
0 1 * * sun glite . /usr/libexec/grid-env.sh ; /usr/sbin/glite-wms-purgeStorage.sh -l /var/log/wms/glite-wms-purgeStorage.log -p /var/SandboxDir -o -s -t 1296000 > /dev/null 2>&1
For jobs submitted to a CREAM CE through the WMS, the purging is done by the ICE component of the WMS when it detects the job has reached a terminal status. The purging operation is not done if in the WMS conf file ( /etc/glite-wms/glite_wms.conf
) the attribute purge_jobs
in the ICE section is set to false
.
3 Service Migration
During operation, the WMS retains information about what follows:
1) Job requests (submit, cancel, etc.)
2) Job sandboxes (user data)
3) Job tracking metadata (log events, statuses, statistics)
Provided that this data is consistently preserved, a WMS instance can be migrated to another machine with no extra overhead. No new and old instances must be working at the same time for this whole process to work, the host certificate must also be present at the standard location (/etc/grid-security). Unless otherwise specified, the new instance might even install an updated version of both WMS and L&B.
1)
Job requests are stored in the form of ASCII files inside maildir-like directories. As taken from the configuration, they are typically:
Input = "${WMS_LOCATION_VAR}/workload_manager/jobdir";
Input = "${WMS_LOCATION_VAR}/jobcontrol/jobdir/";
Input = "${WMS_LOCATION_VAR}/ice/jobdir";
2)
User sandbox directories are stored in a directory hierarchy whose root path is SandboxStagingPath, as set in the WorkloadManagerProxy section of the configuration (tipically /var/Sandboxdir).
3) The L&B service, that is always installed in a WMS node (though in two defferent modes, 'proxy' and 'both'), stores job tracking information about both processed and unprocessed jobs in a MySql database, named lbserver20. This database typically resides in /var/lib/mysql/lbserver20.
-- FabioCapannini - 2011-04-28