|Summary:||QMF: Job status stats incorrect on scheduler and submitter objects|
|Product:||Red Hat Enterprise MRG||Reporter:||Pete MacKinnon <pmackinn>|
|Component:||condor||Assignee:||Pete MacKinnon <pmackinn>|
|Status:||CLOSED CURRENTRELEASE||QA Contact:||MRG Quality Engineering <mrgqe-bugs>|
|Fixed In Version:||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2010-07-22 17:18:47 UTC||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Cloudforms Team:||---||Target Upstream Version:|
Description Pete MacKinnon 2010-05-06 15:57:49 UTC
src/management/qmfprobe.py script that queries all plugin objects doesn't have expected counts from scheduler and submitter objects for job counts
Comment 1 Pete MacKinnon 2010-05-19 14:28:39 UTC
Seems to be a problem with: a) idle counts - we can get -1 to start from the schedd-set attr after a py submit b) submitter thinks there is still 1 job running after all have completed Need to test this as restart then condor_submit instead of py submit
Comment 2 Pete MacKinnon 2010-06-07 22:11:53 UTC
These counts are actually coming from the UPDATE_SCHEDD_ADS and UPDATE_SUBMITTOR_ADS. QMF plugin just directly updates whatever it gets from the schedd. The counts are off by 1 at both ends... ~/personal-condor/log $ grep -e "IdleJobs" -e "RunningJobs" SchedLog TotalIdleJobs = 0 TotalRunningJobs = 0 TotalIdleJobs = 3 TotalRunningJobs = 0 06/07 17:42:04 Changed attribute: RunningJobs = 0 06/07 17:42:04 Changed attribute: IdleJobs = 3 RunningJobs = 0 IdleJobs = 3 TotalIdleJobs = 1 TotalRunningJobs = 2 06/07 17:47:05 Changed attribute: RunningJobs = 2 06/07 17:47:05 Changed attribute: IdleJobs = 1 RunningJobs = 2 IdleJobs = 1 TotalIdleJobs = 0 TotalRunningJobs = 1 06/07 17:47:25 Changed attribute: RunningJobs = 1 06/07 17:47:25 Changed attribute: IdleJobs = 0 RunningJobs = 1 IdleJobs = 0 TotalIdleJobs = 0 TotalRunningJobs = 1 06/07 17:52:25 Changed attribute: RunningJobs = 1 06/07 17:52:25 Changed attribute: IdleJobs = 0 RunningJobs = 1 IdleJobs = 0 TotalIdleJobs = 0 TotalRunningJobs = 0 TotalIdleJobs = 0 TotalRunningJobs = 0 TotalIdleJobs = 0 TotalRunningJobs = 0 When we are really 2R/1I the update doesn't change from 3I. Then for a period of time we are 3C and it still thinks 1R. Matt, thoughts?
Comment 3 Matthew Farrellee 2010-06-08 14:58:52 UTC
Thought - You didn't want long enough for an update that showed 0R,0I. Does condor_status -sched already report the 1R after all are complete (shown via condor_q | tail -n1)? The SCHEDD&SUBMITTER updates may be delayed when there are no jobs to report, which may be the wrong semantic, e.g. don't report on no change instead.
Comment 4 Pete MacKinnon 2010-06-08 21:58:08 UTC
Lowering the SCHEDD_INTERVAL from the 5 min default certainly improved this. However, we never see a final updated submitter ad (ie., 0 jobs running). The last one claims there is 1 job running and that is what we are left with.
Comment 5 Matthew Farrellee 2010-06-09 10:29:19 UTC
If that can be verified by looking at condor_status -submitter then it's a candidate for fixing. IIRC, submitter ads are generated from jobs in the queue. If there are no jobs for a submitter (all completed) I could imagine the Schedd just wouldn't know to send a final update (an invalidate!).
Comment 6 Pete MacKinnon 2010-06-17 16:22:25 UTC
Fixed for incorrect idle job stats on the scheduler and submitter (needed to augment the inbound classad a bit). Now we need a solution for the missing UPDATE_SUBMITTER_AD to update the submitter objects. Also I see this: ~/personal-condor/log $ condor_status -submitter Name Machine Running IdleJobs HeldJobs email@example.com localhost. 0 0 [???????] RunningJobs IdleJobs HeldJobs firstname.lastname@example.org 0 0 0 Total 0 0 0 (Omitted 1 malformed ads in computed attribute totals)
Comment 7 Pete MacKinnon 2010-06-18 02:15:51 UTC
We were missing a plugin update when we walk the owner list and there are no jobs. The collectors were getting this submitter update already - just needed to do the same for the schedd plugins also. FH 29c3f20c2ea