Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
WMS Test PlanNOTICE: missing tests:
Unit testsN/ADeployment testsGeneric repository
Installation testFirst of all, install the yum-protectbase rpm:yum install yum-protectbase.noarch
Then proceed with the installation of the CA certificates by issuing:
yum install ca-policy-egi-core
Install the WMS metapackage:
yum install emi-wms
After the definition of the site-info.def file configure the WMS:
/opt/glite/yaim/bin/yaim -c -s site-info.def -n WMS
At the end of the installation the various init script should be checked with all parameters (start | stop | restart | status | version) (TBD) | ||||||||
Added: | ||||||||
> > | ||||||||
Update testStarting from a production WMS add the patch repository then issue:yum update
If necessary reconfigure the WMS:
/opt/glite/yaim/bin/yaim -c -s site-info.def -n WMS
At the end of the update the various init script should be checked with all parameters (start | stop | restart | status | version) (TBD) | ||||||||
Added: | ||||||||
> > | Service configuration tests
| |||||||
Functionality testsFeatures/Scenarios to be testedWMS can be deployed into two modes:
Test job cycle (from submission to output retrieve)Submit a job to the WMS service and when finished retrieve the output; a the end the final status of the jobs should be Cleared. Submission can be tested using different type of proxy:
Normal Job
Perusal jobJob perusal is the ability to view output from a job while it is running. Implemented.DAG jobDirected Acyclic Graphs (a set of jobs where the input/output/execution of one of more jobs may depend on one or more other jobs). Implemented.
Parametric JobMultiple jobs with one parametrized description. Implemented.Collection JobMultiple jobs with a common description. There are two ways to submit collection:
Parallel JobJobs that can be running in one or more cpus in parallel. Implemented.
Delegation
Shallow and deep re-submissionThere two type of resubmission; the first is defined deep occurs when the user's job has stardted running on the WN and then the job itself or the WMS JobWrapper has failed. The second one is called shallow and occurs when the WMS JobWrapper has failed before starting the actual user's job. Implemented.Job List-match TestingTest various matching requests Implemented.With dataTest matchmaking using data requests TBD
########################################### # JDL with Data Requirements # ########################################### Executable = "calc-pi.sh"; Arguments = "1000"; StdOutput = "std.out"; StdError = "std.err"; Prologue = "prologue.sh"; InputSandbox = {"calc-pi.sh", "fileA", "fileB","prologue.sh"}; OutputSandbox = {"std.out", "std.err","out-PI.txt","out-e.txt"}; Requirements = true; DataRequirements = { [ DataCatalogType = "DLI"; DataCatalog = "http://lfcserver.cnaf.infn.it:8085"; InputData = {"lfn:/grid/infngrid/cesini/PI_1M.txt","lfn:/grid/infngrid/cesini/e-2M.txt"}; ] }; DataAccessProtocol = "gsiftp";The listed CEs should be the ones "close" to the used SE Gang-MatchingIf we consider for example a job that requires a CE and a determined amount of free space on a close SE to run successfully, the matchmaking solution to this problem requires three participants in the match (i.e., job, CE and SE), which cannot be accommodated by conventional (bilateral) matchmaking. The gangmatching feature of the classads library provides a multilateral matchmaking formalism to address this deficiency. Try some listmatch using different expressions of Requirements which use theanyMatch() function: TBD
WMS Job Cancel TestingTest the cancellation of these type of jobs (final status should be Cancelled):
Prologue and Epilogue jobsIn the jdl you can specify two attributes prologue and epilogue which are scripts that are execute respectively before and after the user's job. Implemented.Proxy renewal
WMS feedbackThis mechanism avoid a job to remain stuck for long time in queue waiting to be assigned to a worker node for execution. There are three parameters in the jdl that can be used to manage this mechanism:
CREAM CE
Limiter mechanismThe WMS has implemented a limiter mechanism to protect himself from overload. This mechanism is based on different parameters anc can be configured inside wms configuration file. All these parameters should be checked and tested. (TBD)Usage:/usr/sbin/glite_wms_wmproxy_load_monitor [OPTIONS]... --load1 threshold for load average (1min) --load5 threshold for load average (5min) --load15 threshold for load average (15min) --memusage threshold for memory usage (%) --swapusage threshold for swap usage (%) --fdnum threshold for used file descriptor --diskusage threshold for disk usage (%) --flsize threshold for input filelist size (KB) --flnum threshold for number of unprocessed jobs (for filelist) --jdsize threshold for input jobdir size (KB) --jdnum threshold for number of unprocessed jobs (for jobdir) --ftpconn threshold for number of FTP connections --oper operation to monitor (can be listed with --list) --list list operation supported --show show all the current values PurgingThere are differents purging mechanisms on the WMS:
Configuration fileThe file /etc/glite-wms/glite_wms.conf is used to configure all the daemons running on a WMS. A lot of parameters should be set with this file. Almost all these parameters should be checked. (TBD) It should be verified that in the configuration file/etc/glite-wms/glite_wms.conf there are these hard-coded values:
For the common section:
DGUser = "\${GLITE_WMS_USER}" HostProxyFile = "\${WMS_LOCATION_VAR}/glite/wms.proxy" LBProxy = trueFor the JobController section: CondorSubmit = "${CONDORG_INSTALL_PATH}/bin/condor_submit" CondorRemove = "${CONDORG_INSTALL_PATH}/bin/condor_rm" CondorQuery = "${CONDORG_INSTALL_PATH}/bin/condor_q" CondorRelease = "${CONDORG_INSTALL_PATH}/bin/condor_release" CondorDagman = "${CONDORG_INSTALL_PATH}/bin/condor_dagman" DagmanMaxPre = 10 SubmitFileDir = "${WMS_LOCATION_VAR}/jobcontrol/submit" OutputFileDir = "${WMS_LOCATION_VAR}/jobcontrol/condorio" InputType = "jobdir" Input = "${WMS_LOCATION_VAR}/jobcontrol/jobdir/" LockFile = "${WMS_LOCATION_VAR}/jobcontrol/lock" LogFile = "\${WMS_LOCATION_LOG}/jobcontoller_events.log" LogLevel = 5 MaximumTimeAllowedForCondorMatch = 1800 ContainerRefreshThreshold = 1000For the NetworkServer section: II_Port = 2170 Gris_Port = 2170 II_Timeout = 100 Gris_Timeout = 20 II_DN = "mds-vo-name=local, o=grid" Gris_DN = "mds-vo-name=local, o=grid" BacklogSize = 64 ListeningPort = 7772 MasterThreads = 8 DispatcherThreads = 10 SandboxStagingPath = "${WMS_LOCATION_VAR}/SandboxDir" LogFile = "${WMS_LOCATION_LOG}/networkserver_events.log" LogLevel = 5 EnableQuotaManagement = false MaxInputSandboxSize = 10000000 EnableDynamicQuotaAdjustment = false QuotaAdjustmentAmount = 10000 QuotaInsensibleDiskPortion = 2.0 DLI_SI_CatalogTimeout = 60 ConnectionTimeout = 300For the LogMonitor section: JobsPerCondorLog = 1000 LockFile = "${WMS_LOCATION_VAR}/logmonitor/lock" LogFile = "${WMS_LOCATION_LOG}/logmonitor_events.log" LogLevel = 5 ExternalLogFile = "\${WMS_LOCATION_LOG}/logmonitor_external.log" MainLoopDuration = 5 CondorLogDir = "${WMS_LOCATION_VAR}/logmonitor/CondorG.log" CondorLogRecycleDir = "${WMS_LOCATION_VAR}/logmonitor/CondorG.log/recycle" MonitorInternalDir = "${WMS_LOCATION_VAR}/logmonitor/internal" IdRepositoryName = "irepository.dat" AbortedJobsTimeout = 600 GlobusDownTimeout = 7200 RemoveJobFiles = true ForceCancellationRetries = 2or the Workloadmanager section: PipeDepth = 200 WorkerThreads = 5 DispatcherType = "jobdir" Input = "${WMS_LOCATION_VAR}/workload_manager/jobdir" LogLevel = 5 LogFile = "${WMS_LOCATION_LOG}/workload_manager_events.log" MaxRetryCount = 10 CeMonitorServices = {} CeMonitorAsynchPort = 0 IsmBlackList = {} IsmUpdateRate = 600 IsmIiPurchasingRate = 480 JobWrapperTemplateDir = "${WMS_JOBWRAPPER_TEMPLATE}" IsmThreads = false IsmDump = "${WMS_LOCATION_VAR}/workload_manager/ismdump.fl" SiServiceName = "org.glite.SEIndex" DliServiceName = "data-location-interface" MaxRetryCount = 10 DisablePurchasingFromGris = true EnableBulkMM = true CeForwardParameters = {"GlueHostMainMemoryVirtualSize","GlueHostMainMemoryRAMSize","GlueCEPolicyMaxCPUTime"} MaxOutputSandboxSize = -1 EnableRecovery = true QueueSize = 1000 ReplanGracePeriod = 3600 MaxReplansCount = 5 WmsRequirements = ((ShortDeadlineJob =?= TRUE) ? RegExp(".*sdj$", other.GlueCEUniqueID) : !RegExp(".*sdj$", other.GlueCEUniqueID)) && (other.GlueCEPolicyMaxTotalJobs == 0 || other.GlueCEStateTotalJobs < other.GlueCEPolicyMaxTotalJobs) && (EnableWmsFeedback =?= TRUE ? RegExp("cream", other.GlueCEImplementationName, "i") : true)For the WorkloadManagerProxy: SandboxStagingPath = "${WMS_LOCATION_VAR}/SandboxDir" LogFile = "${WMS_LOCATION_LOG}/wmproxy.log" LogLevel = 5 MaxInputSandboxSize = 100000000 ListMatchRootPath = "/tmp" GridFTPPort = 2811 LBLocalLogger = "localhost:9002" MinPerusalTimeInterval = 1000 AsyncJobStart = true EnableServiceDiscovery = false LBServiceDiscoveryType = "org.glite.lb.server" ServiceDiscoveryInfoValidity = 3600 WeightsCacheValidity = 86400 MaxServedRequests = 50 OperationLoadScripts = [ jobRegister = "${WMS_LOCATION_SBIN}/glite_wms_wmproxy_load_monitor --oper jobRegister --load1 22 --load5 20 --load15 18 --memusage 99 --diskusage 95 --fdnum 1000 --jdnum 1500 --ftpconn 300" jobSubmit = "${WMS_LOCATION_SBIN}/glite_wms_wmproxy_load_monitor --oper jobSubmit --load1 22 --load5 20 --load15 18 --memusage 99 --diskusage 95 --fdnum 1000 --jdnum 1500 --ftpconn 300" RuntimeMalloc = "/usr/lib64/libtcmalloc_minimal.so" ]For the ICE section: start_listener = false start_lease_updater = false logfile = "${WMS_LOCATION_LOG}/ice.log" log_on_file = true creamdelegation_url_prefix = "https://" listener_enable_authz = true poller_status_threshold_time = 30*60 ice_topic = "CREAM_JOBS" subscription_update_threshold_time = 3600 lease_delta_time = 0 notification_frequency = 3*60 start_proxy_renewer = true max_logfile_size = 100*1024*1024 ice_host_cert = "${GLITE_HOST_CERT}" Input = "${WMS_LOCATION_VAR}/ice/jobdir" job_cancellation_threshold_time = 300 poller_delay = 2*60 persist_dir = "${WMS_LOCATION_VAR}/ice/persist_dir" lease_update_frequency = 20*60 log_on_console = false cream_url_postfix = "/ce-cream/services/CREAM2" subscription_duration = 86400 bulk_query_size = 100 purge_jobs = false InputType = "jobdir" listeneristener_enable_authn = true ice_host_key = "${GLITE_HOST_KEY}" start_poller = true creamdelegation_url_postfix = "/ce-cream/services/gridsite-delegation" cream_url_prefix = "https://" max_ice_threads = 10 cemon_url_prefix = "https://" start_subscription_updater = true proxy_renewal_frequency = 600 ice_log_level = 700 soap_timeout = 60 max_logfile_rotations = 10 cemon_url_postfix = "/ce-monitor/services/CEMonitor" max_ice_mem = 2096000 ice_empty_threshold = 600It should then be verified that:
Performance testsCollection of multiple nodesSubmit a collection of n (a good compromise should be 1000) nodes. (TBD)Stress testStress tests can parametrized some features: (partially implemented)
Regression testsComplete list of Rfc testsNagios probe testFor tests about Nagios probes see hereNoteImplemented. means that an automatic test exists. Otherwise test must be developed and or execute by hand. \ No newline at end of file |