Bug 476996 - JobRouter's status hook output not saved to Schedd
JobRouter's status hook output not saved to Schedd
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: grid (Show other bugs)
All Linux
medium Severity medium
: 1.1.1
: ---
Assigned To: Robert Rati
Jeff Needle
Depends On:
  Show dependency treegraph
Reported: 2008-12-18 10:21 EST by Matthew Farrellee
Modified: 2009-04-21 12:19 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2009-04-21 12:19:07 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Matthew Farrellee 2008-12-18 10:21:36 EST
The JobRouter's status hook outputs a dest_ad, but only pieces of that ad are written back to the Schedd, specifically not the EC2JobSuccessful attribute. This means a JobRouter crash could cause the loss of the EC2JobSuccessful information, if it isn't output by every invocation of the status hook.

Deeper issue is what to send to the Schedd. You don't want multiple writers for any given attribute.
Comment 1 Matthew Farrellee 2008-12-18 10:28:09 EST
Even if EC2JobSuccessful is output on every invocation of the status hook, there may not be a guarantee that the status hook will be invoked when the JR returns.
Comment 2 Robert Rati 2009-01-19 10:19:52 EST
The status hook's output will now be updated in the routed job's stats and sent to the schedd.  Updates should be seen in condor_q and condor_history.

Note: Only attributes that condor doesn't normally write should be returned from the status hook.  Currently, only MyType and TargetType are prevented from being updated and the returned attributes are the responsibility of the hook writer.

Fixed in:
Comment 5 Robert Rati 2009-03-06 16:13:54 EST
This can be observed by running condor_[q|history] -l <cluster.job> for the source/dest job.  Attrs to look for:

Comment 6 Jan Sarenik 2009-03-11 05:18:19 EDT
I can see EC2RunAttempts in both source and dest (routed to EC2) job.
No EC2JobSuccessful though, maybe because I am not able to successfully
run any job in EC at the moment.

Is this proof sufficient to mark this bug as verified?
Comment 7 Robert Rati 2009-03-11 10:08:01 EDT
EC2JobSuccessul will only be in the source ad if the job completed running in EC2.  EC2RunAttempts will be incremented each time the job is attempted to be run in EC2.  So, a successful test of this would be:

EC2RunAttempts > 0 and/or EC2JobSuccessul = TRUE in the source job once the EC2 job has shut down.
Comment 8 Jan Sarenik 2009-03-11 11:02:45 EDT
The job run successfully (just some echo(1)es and sleep(1)).
Here are parts of the "condor_history -l" outputs for source and dest:

$ cat hist-6644 | grep EC2
PeriodicHold = EC2RunAttempts >= 5
EC2JobSuccessful = TRUE
EC2RunAttempts = 0
$ cat hist-6645 | grep EC2
Cmd = "EC2: Amazon Small: /mnt/sharedfs/testmonkey/north-14/ec2e/jasan/multi_output.sh"
EC2RunAttempts = 1
EC2JobSuccessful = TRUE
Comment 10 errata-xmlrpc 2009-04-21 12:19:07 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.