Bug 634302 - RunningJobs/IdleJobs are slowly updated
Summary: RunningJobs/IdleJobs are slowly updated
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor
Version: Development
Hardware: All
OS: Linux
medium
medium
Target Milestone: 2.0
: ---
Assignee: Matthew Farrellee
QA Contact: MRG Quality Engineering
URL:
Whiteboard:
Depends On:
Blocks: 673179
TreeView+ depends on / blocked
 
Reported: 2010-09-15 18:38 UTC by Luigi Toscano
Modified: 2011-01-31 21:58 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-01-31 21:58:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Luigi Toscano 2010-09-15 18:38:46 UTC
Description of problem:
RunningJobs/IdleJobs advertised by condor_collector are updated more slowly than other properties (CurrentJobsRunningAll, HostsTotal, HostsClaimed, ...).
It would be nice to have their update rate to be aligned with the one of the other properties (COLLECTOR_UPDATE_INTERVAL).

Version-Release number of selected component (if applicable):
condor-7.4.4-0.9, all supported architectures.

How reproducible:
always

Comment 1 Matthew Farrellee 2010-09-16 02:53:15 UTC
The Jobs attributes are aggregated from Schedd updates and the Host attributes from Startds. The information arrives at the Collector at different rates. It isn't desirable to create a sync point for the statistics.

What problem is this causing?

Comment 2 Luigi Toscano 2010-09-16 16:58:17 UTC
The provided information are not coherent. I don't think that a sync point should be created, but at least the update rate should be closer for all the sources. Moreover, the current update rate is a bit strange.

Configure condor with:
CONDOR_DEVELOPERS_COLLECTOR = localhost
COLLECTOR_UPDATE_INTERVAL = 10

and submit this simple job:
----
universe = vanilla
executable = /bin/sleep
arguments = 10
Queue 10
-----

IdleJobs is (almost) immediataly updated to 10, so HostClaimed/HostUnclaimed and CurrentJobsRunningAll. Subsequent updates are quite strange: IdleJobs and RunningJobs do not change (with the example above, they are _never_ updated even if there is only one slot); but if the jobs are held/removed, IdleJobs changes to 0 when Host* and CurrentJobsRunningAll are updated.

Comment 3 Matthew Farrellee 2011-01-31 21:58:10 UTC
The COLLECTOR_UPDATE_INTERVAL defines when the Collector calculates the aggregates. So a delayed publish from a Schedd (SCHEDD_UPDATE_INTERVAL) could easily result in the inconsistency seen.

I'm inclined to close this as NOTABUG, though it may be confusing behavior. I welcome an RFE that may cover a means to rationalize the data published by the Collector. Possibly something similar to bug 673179.


Note You need to log in before you can comment on or make changes to this bug.