Bug 692741 - Incomplete Job and Submission Details from JobServer (probably Aviary too)
Summary: Incomplete Job and Submission Details from JobServer (probably Aviary too)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor-qmf
Version: 1.3
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: 2.0
: ---
Assignee: Pete MacKinnon
QA Contact: Tomas Rusnak
URL:
Whiteboard:
Depends On: 697503
Blocks: 693778
TreeView+ depends on / blocked
 
Reported: 2011-04-01 01:20 UTC by Matthew Farrellee
Modified: 2011-06-23 15:36 UTC (History)
3 users (show)

Fixed In Version: condor-7.6.1-0.2
Doc Type: Bug Fix
Doc Text:
Cause: Proc entry appears before cluster entry in job queue log. Consequence: No opportunity to update the internal SubmissionObject once ATTR_OWNER set from log before the submission name. Fix: Code changes so that the internal Job object updates its associated SubmissionObject with an owner or name regardless of order. Result: Submission name and owner data appear correctly through QMF and Aviary queries.
Clone Of:
Environment:
Last Closed: 2011-06-23 15:36:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Test job_queue.log (5.34 KB, application/octet-stream)
2011-04-01 01:20 UTC, Matthew Farrellee
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2011:0889 0 normal SHIPPED_LIVE Red Hat Enterprise MRG Grid 2.0 Release 2011-06-23 15:35:53 UTC

Description Matthew Farrellee 2011-04-01 01:20:56 UTC
Created attachment 489281 [details]
Test job_queue.log

Definitely an issue in 7.6.0-0.4, presumed issue in 1.3.2.

Using the attached job_queue.log,

(a) Query the JobServer for the details of 246.0. A failure does not return attributes found only in the 0246.-1 ad, for instance Owner. Success returns all attributes, with the values for duplicate attributes coming from the 246.0 ad, for instance JobPrio = -1 (not 0).

(b) Also look at the Submission, eeyore.local#246. A failure will show Owner = Unknown. Success is Owner = matt.

For (a), the issue is eager creation of the Job in JobServerJobLogConsumer. If the parent, cluster, ad has not already been created a placeholder is made. When that placeholder is populated, the Job is never given a chance to update the Submission from which it comes to have a proper owner.

For (b), the code to iterate over the job ad for details does not iterate over the job's parent ad.

Comment 1 Pete MacKinnon 2011-04-13 17:05:20 UTC
aviary:
fixed at FH V7_6-aviary-branch 55b2689cd12f0c2c755f1df5952595e6d662c465

qmf:
still needs work...

Comment 2 Pete MacKinnon 2011-04-19 13:28:14 UTC
Even better fixes... :-)

aviary:
FH at 3658ae5  V7_6-aviary-branch

qmf:
UW at 1083848, 93798c5  V7_6-branch

Comment 3 Pete MacKinnon 2011-04-27 21:06:40 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause: Proc entry appears before cluster entry in job queue log.
Consequence: No opportunity to update the internal SubmissionObject once ATTR_OWNER set from log before the submission name. 
Fix: Code changes so that the internal Job object updates its associated SubmissionObject with an owner or name regardless of order.
Result: What now happens when the actions or circumstances above occur.

Comment 4 Pete MacKinnon 2011-04-28 15:02:12 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,4 +1,4 @@
 Cause: Proc entry appears before cluster entry in job queue log.
 Consequence: No opportunity to update the internal SubmissionObject once ATTR_OWNER set from log before the submission name. 
 Fix: Code changes so that the internal Job object updates its associated SubmissionObject with an owner or name regardless of order.
-Result: What now happens when the actions or circumstances above occur.+Result: Submission name and owner data appear correctly through QMF and Aviary queries.

Comment 6 Tomas Rusnak 2011-05-12 14:26:47 UTC
Reproduced on RHEL5/x86_64 with:

$CondorVersion: 7.6.0 Mar 30 2011 BuildID: RH-7.6.0-0.4.el5 PRE-RELEASE-GRID $
$CondorPlatform: X86_64-Redhat_5.6 $

Qpid-tool is broken in this version, the jobserver must to be asked from python code directly:

qpid: call 102 GetJobAd 246.0
qpid: invalid conversion: Variant is not a string; use asString() if conversion is required. (qpid/types/Variant.cpp:569) (7) - {}

qpid: call 102 GetJobAd "246.0"
qpid: Invalid Job Id (65536) - {}

From my script I found relevant info:

JobPrio = 0
Owner = Unknown

Comment 8 Tomas Rusnak 2011-05-12 16:44:52 UTC
Retested over all supported platforms x86,x86_64/RHEL5,RHEL6 with:

condor-7.6.1-0.4

The job_queue_log was read as expected from jobserver:
qpid: call 220 GetJobAd "246.0"
....
'Owner': 'matt'
JobPrio': -1
....


>>> VERIFIED

Comment 9 errata-xmlrpc 2011-06-23 15:36:55 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2011-0889.html


Note You need to log in before you can comment on or make changes to this bug.