General Info box: General h/w and system status for the WMS and LB machines measured the last time the sensor run . The LB daemon status is also reported. Follow this link for general box info detail *s: GENERAL INFO BOXES*

Components Details BOX

In this BOX information about each single WMS component is presented.
Here again, a short automatic help on mouse pointer positioning over main buttons and variables.
A flag (Green=OK / Red=Error) beside each component's label reports the correspondent daemon status.

Components Details from t1 to t2 (Reports the exact time interval in the LB query used to collect data)

WM Proxy [WMproxy daemon status]

Jobs -> WMProxy : number of jobs submitted within reported time interval
Collections submitted : number of collections of jobs submitted within reported time interval
Mean nodes per coll. : mean number of nodes per collection within reported time interval
Std nodes per coll.: standard deviation from the mean of the number of nodes per collection in reported time interval

Proxy Reneval [PX daemon status]

Workload Manager [WM daemon status]

WM file descriptors: Number of file descriptors opened by process Work Load Manager at t2 time
WM queue: Number of entries in input.fl at t2 time
Jobs -> WM: Number of jobs enqueued to Workload Manager from WMproxy within reported time interval
Jobs Resub -> WM: Number of jobs enqueued to Workload Manager from Job Controller, i.e. resubmitted after failure
VO Views: Number of VO views available for the WMS Match Making. This is either parsed in the workload_manager.log looking for the number of VO Views used in last Match making from workload Manager or ( if last MM is older than 1 hour) as the number of entries in ismdump.fl. None is returned in case none of the two measures above is successful.

Log Monitor [LM daemon status]

Job Controller [JC daemon status]

JC queue: Number of entries in queue.fl at t2 time
JC file descriptors: Number of file descriptors opened by process Job Controller at t2 time
Jobs -> JC : Number of jobs enqueued to Job Controller from Workload Manager within reported time interval
Jobs JC -> Condor: Number of jobs enqueued to Condor from Job Controller within reported time interval

Local Logger [LL daemon status]
LB events queue: Number of dg20logd_* files in directory /var/tmp t2 time. These are events not yet stored in the LB server DB, hence not available for LB queries which maybe affected by this.
LL file descriptor: Number of file descriptors opened by process Job Controller at t2 time

LB Proxy [LBPX daemon status]

Tranfers [FTPD daemon status]

gftp: number of gftp sessions opened at t2 time

LOAD BALANCING box

A metric is calculated every time the sensors runon a WMS server. This is a sort of estimated response time in arbitrary units that can be used to select the most unloaded WMS among a cluster by load balancing systems. Higher the metric value higher the WMS load.

Negative values are very bad since the indicate a dead death or a drained WMS

Topic revision: r4 - 2008-09-16 - DanieleCesini
 
TWIKI.NET
This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback