Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 753829 - Dag submissions have incorrect job totals from plugin publisher
Dag submissions have incorrect job totals from plugin publisher
Status: CLOSED ERRATA
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor-qmf (Show other bugs)
2.1
Unspecified Linux
medium Severity medium
: 2.1.1
: ---
Assigned To: Pete MacKinnon
Lubos Trilety
:
Depends On:
Blocks: 765607
  Show dependency treegraph
 
Reported: 2011-11-14 11:11 EST by Pete MacKinnon
Modified: 2012-03-28 05:43 EDT (History)
5 users (show)

See Also:
Fixed In Version: condor-7.6.5-0.10
Doc Type: Bug Fix
Doc Text:
Cause: Monitoring a DAG-based submission's job totals when the schedd QMF plug-in is used for job publishing. Consequence: The job totals are incorrect and do not properly accumulate as the DAG submission progresses through it's node job execution. Fix: A comparator for an internal collection that tracks active jobs in a submission was insufficient for the DAG case. Thus, DAG submissions were being prematurely destroyed and recreated. This is why job counts appeared incorrect. Result: DAG submission job state totals increase, decrease and accumulate consistently as viewed by a QMF client.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-02-06 13:17:59 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2012:0100 normal SHIPPED_LIVE Moderate: MRG Grid security, bug fix, and enhancement update 2012-02-06 18:15:47 EST

  None (edit)
Description Pete MacKinnon 2011-11-14 11:11:10 EST
SCHEDD.PLUGINS = $(LIBEXEC)/MgmtScheddPlugin-plugin.so
QMF_PUBLISH_SUBMISSIONS = True

The above config will tell the schedd plugin to publish job submission objects for QMF. When a DAG job is submitted, the counts of the jobs (idle, running, etc.) are correct at points in time but do not persist. This is due to the fact that the internal C++ submission object in the plugin is being recreated on each dag node job submit, thus wiping out the overall job totals.

The same test using the condor_job_server job publisher shows correct totals (when the schedd update interval is accounted for).
Comment 1 Pete MacKinnon 2011-12-05 08:26:43 EST
The comparator for std::set that tracks active jobs in a submission was insufficient for the dag case. Thus, dag submissions were being prematurely destroyed and recreated. This is why the job counts were off. 

UW commit a80cf51
Comment 6 Pete MacKinnon 2011-12-12 12:43:35 EST
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause: Monitoring a DAG-based submission's job totals when the schedd QMF plug-in is used for job publishing.

Consequence: The job totals are incorrect and do not properly accumulate as the DAG submission progresses through it's node job execution.

Fix: A comparator for an internal collection that tracks active jobs in a submission was insufficient for the DAG case. Thus, DAG submissions were being prematurely destroyed and recreated. This is why job counts were appeared incorrect.

Result: DAG submission ob state totals increase, decrease and accumulate consistently as viewed by a QMF client.
Comment 7 Pete MacKinnon 2011-12-12 12:44:37 EST
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -4,4 +4,4 @@
 
 Fix: A comparator for an internal collection that tracks active jobs in a submission was insufficient for the DAG case. Thus, DAG submissions were being prematurely destroyed and recreated. This is why job counts were appeared incorrect.
 
-Result: DAG submission ob state totals increase, decrease and accumulate consistently as viewed by a QMF client.+Result: DAG submission job state totals increase, decrease and accumulate consistently as viewed by a QMF client.
Comment 8 Pete MacKinnon 2011-12-13 09:09:34 EST
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -2,6 +2,6 @@
 
 Consequence: The job totals are incorrect and do not properly accumulate as the DAG submission progresses through it's node job execution.
 
-Fix: A comparator for an internal collection that tracks active jobs in a submission was insufficient for the DAG case. Thus, DAG submissions were being prematurely destroyed and recreated. This is why job counts were appeared incorrect.
+Fix: A comparator for an internal collection that tracks active jobs in a submission was insufficient for the DAG case. Thus, DAG submissions were being prematurely destroyed and recreated. This is why job counts appeared incorrect.
 
 Result: DAG submission job state totals increase, decrease and accumulate consistently as viewed by a QMF client.
Comment 10 Lubos Trilety 2012-01-06 10:07:43 EST
Successfully reproduced on:
$CondorVersion: 7.6.3 Jul 27 2011 BuildID: RH-7.6.3-0.3.el5 $
$CondorPlatform: X86_64-RedHat_5.6 $

number of submissions in qmf doesn't correspond to condor_q statistics
Comment 11 Lubos Trilety 2012-01-06 10:25:53 EST
Tested on:
$CondorVersion: 7.6.5 Dec 16 2011 BuildID: RH-7.6.5-0.11.el5 $
$CondorPlatform: I686-RedHat_5.7 $

$CondorVersion: 7.6.5 Dec 16 2011 BuildID: RH-7.6.5-0.11.el5 $
$CondorPlatform: X86_64-RedHat_5.7 $

$CondorVersion: 7.6.5 Dec 16 2011 BuildID: RH-7.6.5-0.11.el6 $
$CondorPlatform: I686-RedHat_6.2 $

$CondorVersion: 7.6.5 Dec 16 2011 BuildID: RH-7.6.5-0.11.el6 $
$CondorPlatform: X86_64-RedHat_6.2 $

Number of submission correspond better with condor_q statistics and it ends with there is 5 completed jobs after dagman job ends.

>>> VERIFIED
Comment 12 errata-xmlrpc 2012-02-06 13:17:59 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-0100.html

Note You need to log in before you can comment on or make changes to this bug.