Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
-- AlessandroPaolini - 2010-11-26
How to test a site before putting it into production gridBe sure that its GIIS url is contained in the BDII (gridit-bdii-01.cnaf.infn.it) we use for certification. In case it is missed, please open a ticket on ticketing.cnaf.infn.it.1) Check the consistency of the published informationthe main branches of the ldap tree are:
1.1) GlueSiteUniqueID branchUnder the branch GlueSiteUniqueID check the values of the following parameters:
$ ldapsearch -x -LLL -H ldap://gridit-bdii-01.cnaf.infn.it:2170 -b mds-vo-name=local,o=grid 'objectClass=GlueSite' GlueSiteName GlueSiteUserSupportContact GlueSiteSysAdminContact GlueSiteSecurityContact GlueSiteOtherInfo 1.2) GlueSubClusterUniqueID branchUnder the branch GlueSubClusterUniqueID check the values of the following parameters:* Check GlueHostApplicationSoftwareRunTimeEnvironment * site name * Current version of middleware * R-GMA * SI00MeanPerCPU_<value> e SF00MeanPerCPU_<value> * if the site supports mpi jobs, MPICH (ant other related tags) * (in case) AFS (and verify WNs mount /afs) * GlueHostProcessorOtherDescription (for instance: Cores=2,Benchmark=7.92-HEP-SPEC06 ) * GlueHostOperatingSystemName (es. ScientificSL) * GlueHostOperatingSystemVersion (es. Berillium) * GlueHostOperatingSystemRelease (es. 4.5)EXAMPLE: $ ldapsearch -x -LLL -H ldap://virgo-ce.roma1.infn.it:2170 -b mds-vo-name=resource,o=grid 'objectClass=GlueSubCluster' GlueHostOperatingSystemName GlueHostOperatingSystemVersion GlueHostOperatingSystemRelease GlueHostProcessorOtherDescription 1.3) GlueCEUniqueID branchUnder the branch GlueCEUniqueID check the values of the following parameters:
$ ldapsearch -x -LLL -H ldap://virgo-ce.roma1.infn.it:2170 -b mds-vo-name=INFN-ROMA1-VIRGO,o=grid 'objectClass=GlueCE' GlueCEInfoTotalCPUs GlueCEInfoTotalCPUs GlueCEStateWaitingJobs GlueCEInfoJobManager GlueCEImplementationName GlueCEInfoLRMSType GlueCEStateStatus GlueCEAccessControlBaseRule GlueCECapability 1.4) GlueCESEBindSEUniqueID branchFor each SE it has to be defined:
$ ldapsearch -x -LLL -H ldap://prod-ce-02.pd.infn.it:2170 -b mds-vo-name=resource,o=grid 'objectClass=GlueCESEBind' GlueCESEBindSEUniqueID GlueCESEBindCEUniqueID GlueCESEBindMountInfo 1.5) GlueSEUniqueID branchUnder the branch GlueSEUniqueID check the values of the following parameters:
$ ldapsearch -x -LLL -H ldap://prod-bdii-02.pd.infn.it:2170 -b mds-vo-name=INFN-PADOVA,o=grid 'objectClass=GlueSE' $ ldapsearch -x -LLL -H ldap://prod-bdii-02.pd.infn.it:2170 -b mds-vo-name=INFN-PADOVA,o=grid 'objectClass=GlueSA' GlueSAAccessControlBaseRule 1.6) GlueServiceUniqueID branchthere is a branch GlueServiceUniqueID for each service published by the site (WMS, LFC, DPM, GRIDICE, LB, MYPROXY, BDII,…): what discriminate the services are the values ofGlueServiceType , ex:
$ ldapsearch -x -LLL -H ldap://gridit-ce-001.cnaf.infn.it:2170 -b mds-vo-name=INFN-CNAF,o=grid 'objectClass=GlueService' GlueServiceType GlueServiceEndpoint GlueServiceName Check the functionality of the grid elementslcg-CE checksVerify the authentication and authorization on CE$ globus-job-run inaf-ce-01.ct.pi2s2.it /bin/hostname (or /usr/bin/whoami, or whatever you want!!)In case of pbs, check the WNs, ex.: $ globus-job-run pbs-enmr.cerm.unifi.it /usr/bin/pbsnodes -aVerify the functioning of the batch system: be careful that the queue you are querying really exists, and your VO is enabled on it. For example: $ globus-job-run ce-cyb.ca.infn.it/jobmanager-lcglsf -queue poncert /bin/pwdcheck dgas processes on CE (with a ps ax| grep dgas) Cream-CE checksOpen your browser tohttps://<hostname-of-cream-ce>:8443/ce-cream/servicesA page with link to the CREAM WSDL should be shown Try a gsiftp (e.g. using globus-url-copy or @uberftp@@) towards that CREAM CE. E.g.: $ globus-url-copy gsiftp://<hostname-of-cream-ce>/opt/glite/yaim/etc/versions/ig-yaim file:/tmp/ig-version-testTry the following command: $ glite-ce-allowed-submission <<hostname-of-cream-ce>>:8443It should report: Job Submission to this CREAM CE is enabledTry a submission to Cream-CE using the glite-ce-job-submit command, e.g.: $ /bin/cat sleep.jdl [ executable="/bin/sleep"; arguments="1"; ] $ glite-ce-job-submit -a -r <hostname-of-cream-ce>:8443/<queue> test.jdl $ glite-ce-job-submit -a -r ce-cr-02.ts.infn.it:8443/cream-lsf-cert sleep.jdl https://ce-cr-02.ts.infn.it:8443/CREAM127814374Check the status of that job, which eventually should be DONE-OK $ glite-ce-job-status https://ce-cr-02.ts.infn.it:8443/CREAM127814374 2010-07-27 11:55:37,986 WARN - No configuration file suitable for loading. Using built-in configuration ****** JobID=[https://ce-cr-02.ts.infn.it:8443/CREAM127814374] Status = [DONE-OK] ExitCode = [0]Try a submission to that CE using the glite-ce-job-submit command, and then tries to cancel it (using the glite-ce-job-cancel command) $ /bin/cat sleep2.jdl [ executable="/bin/sleep"; arguments="1000"; ] $ glite-ce-job-submit -a -r cecream-cyb.ca.infn.it:8443/cream-lsf-poncert sleep2.jdl https://cecream-cyb.ca.infn.it:8443/CREAM126335182 $ glite-ce-job-cancel https://cecream-cyb.ca.infn.it:8443/CREAM126335182 $ glite-ce-job-status https://cecream-cyb.ca.infn.it:8443/CREAM126335182 2010-07-27 12:18:26,973 WARN - No configuration file suitable for loading. Using built-in configuration ****** JobID=[https://cecream-cyb.ca.infn.it:8443/CREAM126335182] Status = [CANCELLED] ExitCode = [] Description = [Cancelled by user] SE checkscheck if gridftp server on SE works (NOTE: this command isn't present any more on sl5 UI):$ edg-gridftp-ls gsiftp://inaf-se-01.ct.pi2s2.it/check if SRM client works (on the published information you can find the right port to use) $ clientSRM ping -e httpg://sunstorm.cnaf.infn.it:8444 ============================================================ Sending Ping request to: httpg://sunstorm.cnaf.infn.it:8444 ============================================================ Request status: statusCode="SRM_SUCCESS"(0) explanation="SRM server successfully contacted" ============================================================ SRM Response: versionInfo="v2.2" otherInfo (size=2) [0] key="backend_type" [0] value="StoRM" [1] key="backend_version" [1] value="<FE:1.5.0-1.sl4><BE:1.5.3-4.sl4>" ============================================================if you want, try to write on SE. Be sure your UI is pointing to an IS the SE is contained in (you may use our certification BDII), i.e. $ export LCG_GFAL_INFOSYS=gridit-bdii-01.cnaf.infn.it:2170 $ lcg-cr -v --vo glast.org -d storm-fe-cg.cr.cnaf.infn.it -l lfn:/grid/glast.org/wfug.jdl file:/home/paolini/rank.jdl $ lcg-del -v --vo glast.org -a <guid> Job submissionSubmit a test job to either lcg-CE or Cream-CE through the WMS, i.e. using the glite-wms-job-submit command. In case, submit a mpi test job. Our certification WMS is gridit-cert-wms.cnaf.infn.itRegistration into 1st level HLRAfter the site entered in production, it needs to register the site resources in the hlr.Ask the site-admins to open a ticket towards the hlr adminstrators, passing them the following information:
Certication JobThe test job cheks several things, like the envirnment on WN and rpms installed. Moreover it performs some replica managements test.With a "grep TEST" you may get a summary of the results: in case of errors, you have to see in detail what is gone wrong! As already said, if the site supports any flavour of mpi, launch a mpi test job, like this don't forget to set a reasonable value in CPUNumber : the important is that your job will go soon in running
If you want less stuff in the .out and .err files, in the file mpi-start-wrapper.sh comment the line
export I2G_MPI_START_DEBUG=1A successful output will look like the following one (extract) [...] mpi-start [DEBUG ]: using user supplied startup : '/opt/mpich-1.2.7p1/bin/mpirun ' mpi-start [DEBUG ]: => MPI_SPECIFIC_PARAMS= mpi-start [DEBUG ]: => I2G_MPI_PRECOMMAND= mpi-start [DEBUG ]: => MPIEXEC=/opt/mpich-1.2.7p1/bin/mpirun mpi-start [DEBUG ]: => I2G_MACHINEFILE_AND_NP=-machinefile /tmp/tmp.iBypc12521 -np 6 mpi-start [DEBUG ]: => I2G_MPI_APPLICATION=/home/dteam022/globus-tmp.t3-wn-13.11955.0/https_3a_2f_2falbalonga.cnaf.infn.it_3a9000_2fI06uWaKi1evxL3tTF-DTOg/hello mpi-start [DEBUG ]: => I2G_MPI_APPLICATION_ARGS= mpi-start [DEBUG ]: /opt/mpich-1.2.7p1/bin/mpirun -machinefile /tmp/tmp.iBypc12521 -np 6 /home/dteam022/globus-tmp.t3-wn-13.11955.0/https_3a_2f_2falbalonga.cnaf.infn.it_3a9000_2fI06uWaKi1evxL3tTF-DTOg/hello Process 4 on t3-wn-37.pn.pd.infn.it out of 6 Process 3 on t3-wn-34.pn.pd.infn.it out of 6 Process 1 on t3-wn-13.pn.pd.infn.it out of 6 Process 2 on t3-wn-34.pn.pd.infn.it out of 6 Process 5 on t3-wn-37.pn.pd.infn.it out of 6 Process 0 on t3-wn-13.pn.pd.infn.it out of 6 [...]
| ||||||||
Added: | ||||||||
> > |
|