Description of problem: A job that was in the held state was shown to be held by condor_q and shown as held in the job classad but is shown with JobStatus 2 (running) in the data returned from the GetJobSummaries method run on its enclosing Submission. Held/Running/Idle count for the Submission itself was correct. Version-Release number of selected component (if applicable): How reproducible: unknown Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Upstream at V7_6-branch ~/repos/uw/condor/CONDOR_SRC (V7_6-branch)$ git show d69c77708a7e97b5c7a47b21fd05fe66bf034ea1 1f7623883552c6acf9b87967419327545cc42aed commit d69c77708a7e97b5c7a47b21fd05fe66bf034ea1 Author: Peter MacKinnon <pmackinn> Date: Wed Jun 1 16:43:46 2011 -0400 Fix to ensure job status is up-to-date even if summary has been cached in qmf contrib job server diff --git a/src/condor_contrib/mgmt/qmf/daemons/Job.cpp b/src/condor_contrib/mgmt/qmf/daemons/Job.cpp index 6983e0e..eb219f7 100644 --- a/src/condor_contrib/mgmt/qmf/daemons/Job.cpp +++ b/src/condor_contrib/mgmt/qmf/daemons/Job.cpp @@ -337,6 +337,16 @@ const ClassAd* LiveJobImpl::GetSummary () } } + // make sure we're up-to-date with status even if we've cached the summary + m_summary_ad->Assign(ATTR_JOB_STATUS,this->GetStatus()); + int i; + if ( m_full_ad->LookupInteger ( ATTR_ENTERED_CURRENT_STATUS, i ) ) { + m_summary_ad->Assign(ATTR_ENTERED_CURRENT_STATUS,i); + } + else { + dprintf(D_ALWAYS,"Unable to get ATTR_ENTERED_CURRENT_STATUS\n"); + } + return m_summary_ad; } commit 1f7623883552c6acf9b87967419327545cc42aed Author: Peter MacKinnon <pmackinn> Date: Wed Jun 1 17:14:52 2011 -0400 Ensure live job status is accurate in job summaries from aviary contrib query server diff --git a/src/condor_contrib/aviary/src/Job.cpp b/src/condor_contrib/aviary/src/Job.cpp index bb4112e..1b2c294 100644 --- a/src/condor_contrib/aviary/src/Job.cpp +++ b/src/condor_contrib/aviary/src/Job.cpp @@ -307,11 +307,21 @@ const ClassAd* LiveJobImpl::getSummary () m_summary_ad->Assign(ATTRS[i], attr->getValue()); } } - delete attr; + delete attr; i++; } } + // make sure we're up-to-date with status even if we've cached the summary + m_summary_ad->Assign(ATTR_JOB_STATUS,this->getStatus()); + int i; + if ( m_full_ad->LookupInteger ( ATTR_ENTERED_CURRENT_STATUS, i ) ) { + m_summary_ad->Assign(ATTR_ENTERED_CURRENT_STATUS,i); + } + else { + dprintf(D_ALWAYS,"Unable to get ATTR_ENTERED_CURRENT_STATUS\n"); + } + return m_summary_ad; }
Created attachment 502595 [details] qmf patch
Created attachment 502596 [details] aviary patch
QMF condor_job_server test procedure: 1) submit new job (either via qmf or cmd line) 2) use qpid-tool to get the submission summary while the job is still active (i.e., not COMPLETED or REMOVED) -> "call XXX GetJobSummaries" 3) make note of the JobStatus (IDLE or RUNNING) 4) put job on hold (qmf or cmd line) 5) get summary again and note the new JobStatus (HELD) 6) release job (qmf or cmd line) 7) get summary again and note the new JobStatus (IDLE or RUNNING) This test needs to allow for the combined latency of condor and QMF updates (10-30 seconds?).
Tested on RHEL 5.6/6.1 x i386/x86_64 and with condor-7.6.1-0.9 it doesn't work and with condor-7.6.1-0.10 it works. -->VERIFIED