Bug 709873 - Plugins report JobStatus of held job as "Running" in result of GetJobSummaries method
Summary: Plugins report JobStatus of held job as "Running" in result of GetJobSummarie...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor-qmf
Version: Development
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: 2.0
: ---
Assignee: Pete MacKinnon
QA Contact: Martin Kudlej
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-06-01 20:16 UTC by Trevor McKay
Modified: 2011-06-27 14:13 UTC (History)
3 users (show)

Fixed In Version: condor-7.6.1-0.10
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-06-27 14:13:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
qmf patch (1.10 KB, patch)
2011-06-02 18:15 UTC, Pete MacKinnon
no flags Details | Diff
aviary patch (1.18 KB, patch)
2011-06-02 18:15 UTC, Pete MacKinnon
no flags Details | Diff

Description Trevor McKay 2011-06-01 20:16:44 UTC
Description of problem:

A job that was in the held state was shown to be held by condor_q and shown as held in the job classad but is shown with JobStatus 2 (running) in the data returned from the GetJobSummaries method run on its enclosing Submission.

Held/Running/Idle count for the Submission itself was correct.

Version-Release number of selected component (if applicable):

How reproducible:

unknown

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Pete MacKinnon 2011-06-02 15:07:09 UTC
Upstream at V7_6-branch

~/repos/uw/condor/CONDOR_SRC  (V7_6-branch)$ git show d69c77708a7e97b5c7a47b21fd05fe66bf034ea1 1f7623883552c6acf9b87967419327545cc42aed
commit d69c77708a7e97b5c7a47b21fd05fe66bf034ea1
Author: Peter MacKinnon <pmackinn>
Date:   Wed Jun 1 16:43:46 2011 -0400

    Fix to ensure job status is up-to-date even if summary has been cached
    in qmf contrib job server

diff --git a/src/condor_contrib/mgmt/qmf/daemons/Job.cpp b/src/condor_contrib/mgmt/qmf/daemons/Job.cpp
index 6983e0e..eb219f7 100644
--- a/src/condor_contrib/mgmt/qmf/daemons/Job.cpp
+++ b/src/condor_contrib/mgmt/qmf/daemons/Job.cpp
@@ -337,6 +337,16 @@ const ClassAd* LiveJobImpl::GetSummary ()
                }
        }
 
+    // make sure we're up-to-date with status even if we've cached the summary
+       m_summary_ad->Assign(ATTR_JOB_STATUS,this->GetStatus());
+    int i;
+    if ( m_full_ad->LookupInteger ( ATTR_ENTERED_CURRENT_STATUS, i ) ) {
+        m_summary_ad->Assign(ATTR_ENTERED_CURRENT_STATUS,i);
+    }
+    else {
+        dprintf(D_ALWAYS,"Unable to get ATTR_ENTERED_CURRENT_STATUS\n");
+    }
+
        return m_summary_ad;
 }
 

commit 1f7623883552c6acf9b87967419327545cc42aed
Author: Peter MacKinnon <pmackinn>
Date:   Wed Jun 1 17:14:52 2011 -0400

    Ensure live job status is accurate in job summaries
    from aviary contrib query server

diff --git a/src/condor_contrib/aviary/src/Job.cpp b/src/condor_contrib/aviary/src/Job.cpp
index bb4112e..1b2c294 100644
--- a/src/condor_contrib/aviary/src/Job.cpp
+++ b/src/condor_contrib/aviary/src/Job.cpp
@@ -307,11 +307,21 @@ const ClassAd* LiveJobImpl::getSummary ()
                                                m_summary_ad->Assign(ATTRS[i], attr->getValue());
                                }
                        }
-                       delete attr;
+               delete attr;
                i++;
         }
        }
 
+    // make sure we're up-to-date with status even if we've cached the summary
+    m_summary_ad->Assign(ATTR_JOB_STATUS,this->getStatus());
+    int i;
+    if ( m_full_ad->LookupInteger ( ATTR_ENTERED_CURRENT_STATUS, i ) ) {
+        m_summary_ad->Assign(ATTR_ENTERED_CURRENT_STATUS,i);
+    }
+    else {
+        dprintf(D_ALWAYS,"Unable to get ATTR_ENTERED_CURRENT_STATUS\n");
+    }
+
        return m_summary_ad;
 }

Comment 3 Pete MacKinnon 2011-06-02 18:15:14 UTC
Created attachment 502595 [details]
qmf patch

Comment 4 Pete MacKinnon 2011-06-02 18:15:52 UTC
Created attachment 502596 [details]
aviary patch

Comment 5 Pete MacKinnon 2011-06-03 13:34:39 UTC
QMF condor_job_server test procedure:

1) submit new job (either via qmf or cmd line)
2) use qpid-tool to get the submission summary while the job is still active
(i.e., not COMPLETED or REMOVED) -> "call XXX GetJobSummaries"
3) make note of the JobStatus (IDLE or RUNNING)
4) put job on hold (qmf or cmd line)
5) get summary again and note the new JobStatus (HELD)
6) release job (qmf or cmd line)
7) get summary again and note the new JobStatus (IDLE or RUNNING)

This test needs to allow for the combined latency of condor and QMF updates (10-30 seconds?).

Comment 6 Martin Kudlej 2011-06-09 14:15:22 UTC
Tested on RHEL 5.6/6.1 x i386/x86_64 and
with condor-7.6.1-0.9 it doesn't work
and with condor-7.6.1-0.10 it works. -->VERIFIED


Note You need to log in before you can comment on or make changes to this bug.