Bug 753829
Summary: | Dag submissions have incorrect job totals from plugin publisher | ||
---|---|---|---|
Product: | Red Hat Enterprise MRG | Reporter: | Pete MacKinnon <pmackinn> |
Component: | condor-qmf | Assignee: | Pete MacKinnon <pmackinn> |
Status: | CLOSED ERRATA | QA Contact: | Lubos Trilety <ltrilety> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 2.1 | CC: | ltoscano, ltrilety, matt, mkudlej, tstclair |
Target Milestone: | 2.1.1 | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | condor-7.6.5-0.10 | Doc Type: | Bug Fix |
Doc Text: |
Cause: Monitoring a DAG-based submission's job totals when the schedd QMF plug-in is used for job publishing.
Consequence: The job totals are incorrect and do not properly accumulate as the DAG submission progresses through it's node job execution.
Fix: A comparator for an internal collection that tracks active jobs in a submission was insufficient for the DAG case. Thus, DAG submissions were being prematurely destroyed and recreated. This is why job counts appeared incorrect.
Result: DAG submission job state totals increase, decrease and accumulate consistently as viewed by a QMF client.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2012-02-06 18:17:59 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 765607 |
Description
Pete MacKinnon
2011-11-14 16:11:10 UTC
The comparator for std::set that tracks active jobs in a submission was insufficient for the dag case. Thus, dag submissions were being prematurely destroyed and recreated. This is why the job counts were off. UW commit a80cf51 Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: Monitoring a DAG-based submission's job totals when the schedd QMF plug-in is used for job publishing. Consequence: The job totals are incorrect and do not properly accumulate as the DAG submission progresses through it's node job execution. Fix: A comparator for an internal collection that tracks active jobs in a submission was insufficient for the DAG case. Thus, DAG submissions were being prematurely destroyed and recreated. This is why job counts were appeared incorrect. Result: DAG submission ob state totals increase, decrease and accumulate consistently as viewed by a QMF client. Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -4,4 +4,4 @@ Fix: A comparator for an internal collection that tracks active jobs in a submission was insufficient for the DAG case. Thus, DAG submissions were being prematurely destroyed and recreated. This is why job counts were appeared incorrect. -Result: DAG submission ob state totals increase, decrease and accumulate consistently as viewed by a QMF client.+Result: DAG submission job state totals increase, decrease and accumulate consistently as viewed by a QMF client. Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -2,6 +2,6 @@ Consequence: The job totals are incorrect and do not properly accumulate as the DAG submission progresses through it's node job execution. -Fix: A comparator for an internal collection that tracks active jobs in a submission was insufficient for the DAG case. Thus, DAG submissions were being prematurely destroyed and recreated. This is why job counts were appeared incorrect. +Fix: A comparator for an internal collection that tracks active jobs in a submission was insufficient for the DAG case. Thus, DAG submissions were being prematurely destroyed and recreated. This is why job counts appeared incorrect. Result: DAG submission job state totals increase, decrease and accumulate consistently as viewed by a QMF client. Successfully reproduced on: $CondorVersion: 7.6.3 Jul 27 2011 BuildID: RH-7.6.3-0.3.el5 $ $CondorPlatform: X86_64-RedHat_5.6 $ number of submissions in qmf doesn't correspond to condor_q statistics Tested on:
$CondorVersion: 7.6.5 Dec 16 2011 BuildID: RH-7.6.5-0.11.el5 $
$CondorPlatform: I686-RedHat_5.7 $
$CondorVersion: 7.6.5 Dec 16 2011 BuildID: RH-7.6.5-0.11.el5 $
$CondorPlatform: X86_64-RedHat_5.7 $
$CondorVersion: 7.6.5 Dec 16 2011 BuildID: RH-7.6.5-0.11.el6 $
$CondorPlatform: I686-RedHat_6.2 $
$CondorVersion: 7.6.5 Dec 16 2011 BuildID: RH-7.6.5-0.11.el6 $
$CondorPlatform: X86_64-RedHat_6.2 $
Number of submission correspond better with condor_q statistics and it ends with there is 5 completed jobs after dagman job ends.
>>> VERIFIED
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2012-0100.html |