Bug 634302

Summary:	RunningJobs/IdleJobs are slowly updated
Product:	Red Hat Enterprise MRG	Reporter:	Luigi Toscano <ltoscano>
Component:	condor	Assignee:	Matthew Farrellee <matt>
Status:	CLOSED NOTABUG	QA Contact:	MRG Quality Engineering <mrgqe-bugs>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	Development	CC:	matt
Target Milestone:	2.0
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2011-01-31 21:58:10 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	673179

Description Luigi Toscano 2010-09-15 18:38:46 UTC

Description of problem:
RunningJobs/IdleJobs advertised by condor_collector are updated more slowly than other properties (CurrentJobsRunningAll, HostsTotal, HostsClaimed, ...).
It would be nice to have their update rate to be aligned with the one of the other properties (COLLECTOR_UPDATE_INTERVAL).

Version-Release number of selected component (if applicable):
condor-7.4.4-0.9, all supported architectures.

How reproducible:
always

Comment 1 Matthew Farrellee 2010-09-16 02:53:15 UTC

The Jobs attributes are aggregated from Schedd updates and the Host attributes from Startds. The information arrives at the Collector at different rates. It isn't desirable to create a sync point for the statistics.

What problem is this causing?

Comment 2 Luigi Toscano 2010-09-16 16:58:17 UTC

The provided information are not coherent. I don't think that a sync point should be created, but at least the update rate should be closer for all the sources. Moreover, the current update rate is a bit strange.

Configure condor with:
CONDOR_DEVELOPERS_COLLECTOR = localhost
COLLECTOR_UPDATE_INTERVAL = 10

and submit this simple job:
----
universe = vanilla
executable = /bin/sleep
arguments = 10
Queue 10
-----

IdleJobs is (almost) immediataly updated to 10, so HostClaimed/HostUnclaimed and CurrentJobsRunningAll. Subsequent updates are quite strange: IdleJobs and RunningJobs do not change (with the example above, they are _never_ updated even if there is only one slot); but if the jobs are held/removed, IdleJobs changes to 0 when Host* and CurrentJobsRunningAll are updated.

Comment 3 Matthew Farrellee 2011-01-31 21:58:10 UTC

The COLLECTOR_UPDATE_INTERVAL defines when the Collector calculates the aggregates. So a delayed publish from a Schedd (SCHEDD_UPDATE_INTERVAL) could easily result in the inconsistency seen.

I'm inclined to close this as NOTABUG, though it may be confusing behavior. I welcome an RFE that may cover a means to rationalize the data published by the Collector. Possibly something similar to bug 673179.