Bug 589660 - QMF: Job status stats incorrect on scheduler and submitter objects
QMF: Job status stats incorrect on scheduler and submitter objects
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor (Show other bugs)
Development
All Linux
medium Severity high
: 1.3
: ---
Assigned To: Pete MacKinnon
MRG Quality Engineering
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-05-06 11:57 EDT by Pete MacKinnon
Modified: 2010-07-22 13:18 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-07-22 13:18:47 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Pete MacKinnon 2010-05-06 11:57:49 EDT
src/management/qmfprobe.py script that queries all plugin objects doesn't have expected counts from scheduler and submitter objects for job counts
Comment 1 Pete MacKinnon 2010-05-19 10:28:39 EDT
Seems to be a problem with:
a) idle counts - we can get -1 to start from the schedd-set attr after a py submit
b) submitter thinks there is still 1 job running after all have completed

Need to test this as restart then condor_submit instead of py submit
Comment 2 Pete MacKinnon 2010-06-07 18:11:53 EDT
These counts are actually coming from the UPDATE_SCHEDD_ADS and UPDATE_SUBMITTOR_ADS. QMF plugin just directly updates whatever it gets from the schedd. The counts are off by 1 at both ends...

~/personal-condor/log  $ grep -e "IdleJobs" -e "RunningJobs" SchedLog 
TotalIdleJobs = 0
TotalRunningJobs = 0
TotalIdleJobs = 3
TotalRunningJobs = 0
06/07 17:42:04 Changed attribute: RunningJobs = 0
06/07 17:42:04 Changed attribute: IdleJobs = 3
RunningJobs = 0
IdleJobs = 3
TotalIdleJobs = 1
TotalRunningJobs = 2
06/07 17:47:05 Changed attribute: RunningJobs = 2
06/07 17:47:05 Changed attribute: IdleJobs = 1
RunningJobs = 2
IdleJobs = 1
TotalIdleJobs = 0
TotalRunningJobs = 1
06/07 17:47:25 Changed attribute: RunningJobs = 1
06/07 17:47:25 Changed attribute: IdleJobs = 0
RunningJobs = 1
IdleJobs = 0
TotalIdleJobs = 0
TotalRunningJobs = 1
06/07 17:52:25 Changed attribute: RunningJobs = 1
06/07 17:52:25 Changed attribute: IdleJobs = 0
RunningJobs = 1
IdleJobs = 0
TotalIdleJobs = 0
TotalRunningJobs = 0
TotalIdleJobs = 0
TotalRunningJobs = 0
TotalIdleJobs = 0
TotalRunningJobs = 0

When we are really 2R/1I the update doesn't change from 3I. Then for a period of time we are 3C and it still thinks 1R.

Matt, thoughts?
Comment 3 Matthew Farrellee 2010-06-08 10:58:52 EDT
Thought -
 You didn't want long enough for an update that showed 0R,0I. Does condor_status -sched already report the 1R after all are complete (shown via condor_q | tail -n1)? The SCHEDD&SUBMITTER updates may be delayed when there are no jobs to report, which may be the wrong semantic, e.g. don't report on no change instead.
Comment 4 Pete MacKinnon 2010-06-08 17:58:08 EDT
Lowering the SCHEDD_INTERVAL from the 5 min default certainly improved this. However, we never see a final updated submitter ad (ie., 0 jobs running). The last one claims there is 1 job running and that is what we are left with.
Comment 5 Matthew Farrellee 2010-06-09 06:29:19 EDT
If that can be verified by looking at condor_status -submitter then it's a candidate for fixing. IIRC, submitter ads are generated from jobs in the queue. If there are no jobs for a submitter (all completed) I could imagine the Schedd just wouldn't know to send a final update (an invalidate!).
Comment 6 Pete MacKinnon 2010-06-17 12:22:25 EDT
Fixed for incorrect idle job stats on the scheduler and submitter (needed to augment the inbound classad a bit). Now we need a solution for the missing UPDATE_SUBMITTER_AD to update the submitter objects. Also I see this:

~/personal-condor/log  $ condor_status -submitter

Name                 Machine      Running IdleJobs HeldJobs

nobody@redhat.com    localhost.         0        0 [???????]
                           RunningJobs           IdleJobs           HeldJobs

   nobody@redhat.com                 0                  0                  0

               Total                 0                  0                  0

                    (Omitted 1 malformed ads in computed attribute totals)
Comment 7 Pete MacKinnon 2010-06-17 22:15:51 EDT
We were missing a plugin update when we walk the owner list and there are no jobs. The collectors were getting this submitter update already - just needed to do the same for the schedd plugins also.

FH 29c3f20c2ea

Note You need to log in before you can comment on or make changes to this bug.