Bug 476996 - JobRouter's status hook output not saved to Schedd
Summary: JobRouter's status hook output not saved to Schedd
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: grid
Version: 1.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: 1.1.1
: ---
Assignee: Robert Rati
QA Contact: Jeff Needle
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-12-18 15:21 UTC by Matthew Farrellee
Modified: 2009-04-21 16:19 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-04-21 16:19:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2009:0434 0 normal SHIPPED_LIVE Red Hat Enterprise MRG Messaging and Grid Version 1.1.1 2009-04-21 16:15:50 UTC

Description Matthew Farrellee 2008-12-18 15:21:36 UTC
The JobRouter's status hook outputs a dest_ad, but only pieces of that ad are written back to the Schedd, specifically not the EC2JobSuccessful attribute. This means a JobRouter crash could cause the loss of the EC2JobSuccessful information, if it isn't output by every invocation of the status hook.

Deeper issue is what to send to the Schedd. You don't want multiple writers for any given attribute.

Comment 1 Matthew Farrellee 2008-12-18 15:28:09 UTC
Even if EC2JobSuccessful is output on every invocation of the status hook, there may not be a guarantee that the status hook will be invoked when the JR returns.

Comment 2 Robert Rati 2009-01-19 15:19:52 UTC
The status hook's output will now be updated in the routed job's stats and sent to the schedd.  Updates should be seen in condor_q and condor_history.

Note: Only attributes that condor doesn't normally write should be returned from the status hook.  Currently, only MyType and TargetType are prevented from being updated and the returned attributes are the responsibility of the hook writer.

Fixed in:
condor-7.2.0-4

Comment 5 Robert Rati 2009-03-06 21:13:54 UTC
This can be observed by running condor_[q|history] -l <cluster.job> for the source/dest job.  Attrs to look for:

EC2JobSuccessul
EC2RunAttempts

Comment 6 Jan Sarenik 2009-03-11 09:18:19 UTC
I can see EC2RunAttempts in both source and dest (routed to EC2) job.
No EC2JobSuccessful though, maybe because I am not able to successfully
run any job in EC at the moment.

Is this proof sufficient to mark this bug as verified?

Comment 7 Robert Rati 2009-03-11 14:08:01 UTC
EC2JobSuccessul will only be in the source ad if the job completed running in EC2.  EC2RunAttempts will be incremented each time the job is attempted to be run in EC2.  So, a successful test of this would be:

EC2RunAttempts > 0 and/or EC2JobSuccessul = TRUE in the source job once the EC2 job has shut down.

Comment 8 Jan Sarenik 2009-03-11 15:02:45 UTC
The job run successfully (just some echo(1)es and sleep(1)).
Here are parts of the "condor_history -l" outputs for source and dest:

----------------------------------------------
$ cat hist-6644 | grep EC2
PeriodicHold = EC2RunAttempts >= 5
EC2JobSuccessful = TRUE
EC2RunAttempts = 0
----------------------------------------------
$ cat hist-6645 | grep EC2
Cmd = "EC2: Amazon Small: /mnt/sharedfs/testmonkey/north-14/ec2e/jasan/multi_output.sh"
EC2RunAttempts = 1
EC2JobSuccessful = TRUE
----------------------------------------------

Comment 10 errata-xmlrpc 2009-04-21 16:19:07 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-0434.html


Note You need to log in before you can comment on or make changes to this bug.