634302 – RunningJobs/IdleJobs are slowly updated

Bug 634302 - RunningJobs/IdleJobs are slowly updated

Summary: RunningJobs/IdleJobs are slowly updated

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise MRG
Classification:	Red Hat
Component:	condor
Sub Component:
Version:	Development
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	2.0
Target Release:	---
Assignee:	Matthew Farrellee
QA Contact:	MRG Quality Engineering
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	673179
TreeView+	depends on / blocked

Reported:	2010-09-15 18:38 UTC by Luigi Toscano
Modified:	2011-01-31 21:58 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2011-01-31 21:58:10 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Luigi Toscano 2010-09-15 18:38:46 UTC

Description of problem:
RunningJobs/IdleJobs advertised by condor_collector are updated more slowly than other properties (CurrentJobsRunningAll, HostsTotal, HostsClaimed, ...).
It would be nice to have their update rate to be aligned with the one of the other properties (COLLECTOR_UPDATE_INTERVAL).

Version-Release number of selected component (if applicable):
condor-7.4.4-0.9, all supported architectures.

How reproducible:
always

Comment 1 Matthew Farrellee 2010-09-16 02:53:15 UTC

The Jobs attributes are aggregated from Schedd updates and the Host attributes from Startds. The information arrives at the Collector at different rates. It isn't desirable to create a sync point for the statistics.

What problem is this causing?

Comment 2 Luigi Toscano 2010-09-16 16:58:17 UTC

The provided information are not coherent. I don't think that a sync point should be created, but at least the update rate should be closer for all the sources. Moreover, the current update rate is a bit strange.

Configure condor with:
CONDOR_DEVELOPERS_COLLECTOR = localhost
COLLECTOR_UPDATE_INTERVAL = 10

and submit this simple job:
----
universe = vanilla
executable = /bin/sleep
arguments = 10
Queue 10
-----

IdleJobs is (almost) immediataly updated to 10, so HostClaimed/HostUnclaimed and CurrentJobsRunningAll. Subsequent updates are quite strange: IdleJobs and RunningJobs do not change (with the example above, they are _never_ updated even if there is only one slot); but if the jobs are held/removed, IdleJobs changes to 0 when Host* and CurrentJobsRunningAll are updated.

Comment 3 Matthew Farrellee 2011-01-31 21:58:10 UTC

The COLLECTOR_UPDATE_INTERVAL defines when the Collector calculates the aggregates. So a delayed publish from a Schedd (SCHEDD_UPDATE_INTERVAL) could easily result in the inconsistency seen.

I'm inclined to close this as NOTABUG, though it may be confusing behavior. I welcome an RFE that may cover a means to rationalize the data published by the Collector. Possibly something similar to bug 673179.

Note You need to log in before you can comment on or make changes to this bug.