How the Job Wrapper is Generated
The job wrapper is a shell script that wraps the user's job execution on the Worker Node.
In the LCG RB the full script generation was hard-coded in the RB itself.
In the gLite WMS, instead, the body of the shell script comes from a template file distributed with the WMS packages and installed in
${GLITE_LOCATION}/etc/templates/template.sh
. For each job the template is augmented by the WMS with a few instructions in order to generate the final job wrapper. These instructions set a number of variables that are used inside the template. The value of these variables depends on the specific job.
The template has the following structure
#!/bin/sh
<empty line>
<body>
The WMS in correspondence of the empty line adds a header consisting in a number of instructions of the form:
<variable>=<job-specific value>
For example, the template contains the following excerpt of code:
if [ ${__job_type} -eq 0 ]; then # normal
cmd_line="${__job} ${__arguments}"
elif [ ${__job_type} -eq 1 -o ${__job_type} -eq 2 ]; then # MPI LSF, PBS
cmd_line="mpirun -np ${__nodes} -machinefile ${HOSTFILE} ${__job} ${__arguments}"
elif [ ${__job_type} -eq 3 ]; then # interactive
cmd_line="./glite-wms-job-agent $BYPASS_SHADOW_HOST $BYPASS_SHADOW_PORT '${__job} ${__arguments}'"
fi
where a number of reserved variables are used:
__job_type
,
__job
,
__arguments
,
__nodes
. The setting of these variables is included by the WMS just after the line
#!/bin/sh
at the beginning of the script, so that they are well defined during the script execution.
Why the change?
We changed the approach on how to generate the job wrapper script because the earlier hard-coded generation revealed to be pretty inflexible: any change would have required a change to the C++ code, with all the obvious consequences (rebuilding other components depending on this one, testing, certification, etc.). With the current approach, instead, changing the way the job wrapper is generated most of the times requires only a change in the template file.
For example, in order to fix bug
#29604 ("WMS job wrapper must not reset umask"), waiting for an official patch, a WMS sysadmin could simply remove the following line
umask 022
from the template file.
As another example, to enable the limit mechanism on the output sandbox in the slc3 release - which does not work because of bug
#27215 - applying the following patch would fix it:
diff -u -r1.37.2.45 template.sh
--- template.sh 15 Jun 2007 10:47:55 -0000 1.37.2.45
+++ template.sh 13 Dec 2007 14:40:04 -0000
@@ -870,7 +870,7 @@
if [ ${max_osb_size} -ge 0 ]; then
# TODO
#if hostname=wms
- file_size=`stat -t $f | awk '{print $2}'`
+ file_size=`stat -t $s | awk '{print $2}'`
file_size_acc=`expr $file_size_acc + $file_size`
#fi
if [ $file_size_acc -le ${max_osb_size} ]; then
@@ -883,9 +883,11 @@
fi
else
jw_echo "OSB quota exceeded for $s, truncating needed"
- remaining_files=`expr $total_files \- $current_file + 2`
+ file_size_acc=`expr $file_size_acc - $file_size`
+ remaining_files=`expr $total_files \- $current_file`
remaining_space=`expr $max_osb_size \- $file_size_acc`
- trunc_len=`expr $remaining_space / $remaining_files`||0
+ trunc_len=`expr $remaining_space / $remaining_files || echo 0`
+ file_size_acc=`expr $file_size_acc + $trunc_len`
if [ $trunc_len -lt 10 ]; then # non trivial truncation
jw_echo "Not enough room for a significant truncation on file ${f}, not sending"
else
--
FrancescoGiacomini - 15 Nov 2007