Bug 476996

Summary: JobRouter's status hook output not saved to Schedd
Product: Red Hat Enterprise MRG Reporter: Matthew Farrellee <matt>
Component: gridAssignee: Robert Rati <rrati>
Status: CLOSED ERRATA QA Contact: Jeff Needle <jneedle>
Severity: medium Docs Contact:
Priority: medium    
Version: 1.0CC: jsarenik
Target Milestone: 1.1.1   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-04-21 16:19:07 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Matthew Farrellee 2008-12-18 15:21:36 UTC
The JobRouter's status hook outputs a dest_ad, but only pieces of that ad are written back to the Schedd, specifically not the EC2JobSuccessful attribute. This means a JobRouter crash could cause the loss of the EC2JobSuccessful information, if it isn't output by every invocation of the status hook.

Deeper issue is what to send to the Schedd. You don't want multiple writers for any given attribute.

Comment 1 Matthew Farrellee 2008-12-18 15:28:09 UTC
Even if EC2JobSuccessful is output on every invocation of the status hook, there may not be a guarantee that the status hook will be invoked when the JR returns.

Comment 2 Robert Rati 2009-01-19 15:19:52 UTC
The status hook's output will now be updated in the routed job's stats and sent to the schedd.  Updates should be seen in condor_q and condor_history.

Note: Only attributes that condor doesn't normally write should be returned from the status hook.  Currently, only MyType and TargetType are prevented from being updated and the returned attributes are the responsibility of the hook writer.

Fixed in:
condor-7.2.0-4

Comment 5 Robert Rati 2009-03-06 21:13:54 UTC
This can be observed by running condor_[q|history] -l <cluster.job> for the source/dest job.  Attrs to look for:

EC2JobSuccessul
EC2RunAttempts

Comment 6 Jan Sarenik 2009-03-11 09:18:19 UTC
I can see EC2RunAttempts in both source and dest (routed to EC2) job.
No EC2JobSuccessful though, maybe because I am not able to successfully
run any job in EC at the moment.

Is this proof sufficient to mark this bug as verified?

Comment 7 Robert Rati 2009-03-11 14:08:01 UTC
EC2JobSuccessul will only be in the source ad if the job completed running in EC2.  EC2RunAttempts will be incremented each time the job is attempted to be run in EC2.  So, a successful test of this would be:

EC2RunAttempts > 0 and/or EC2JobSuccessul = TRUE in the source job once the EC2 job has shut down.

Comment 8 Jan Sarenik 2009-03-11 15:02:45 UTC
The job run successfully (just some echo(1)es and sleep(1)).
Here are parts of the "condor_history -l" outputs for source and dest:

----------------------------------------------
$ cat hist-6644 | grep EC2
PeriodicHold = EC2RunAttempts >= 5
EC2JobSuccessful = TRUE
EC2RunAttempts = 0
----------------------------------------------
$ cat hist-6645 | grep EC2
Cmd = "EC2: Amazon Small: /mnt/sharedfs/testmonkey/north-14/ec2e/jasan/multi_output.sh"
EC2RunAttempts = 1
EC2JobSuccessful = TRUE
----------------------------------------------

Comment 10 errata-xmlrpc 2009-04-21 16:19:07 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-0434.html