The JobRouter's status hook outputs a dest_ad, but only pieces of that ad are written back to the Schedd, specifically not the EC2JobSuccessful attribute. This means a JobRouter crash could cause the loss of the EC2JobSuccessful information, if it isn't output by every invocation of the status hook. Deeper issue is what to send to the Schedd. You don't want multiple writers for any given attribute.
Even if EC2JobSuccessful is output on every invocation of the status hook, there may not be a guarantee that the status hook will be invoked when the JR returns.
The status hook's output will now be updated in the routed job's stats and sent to the schedd. Updates should be seen in condor_q and condor_history. Note: Only attributes that condor doesn't normally write should be returned from the status hook. Currently, only MyType and TargetType are prevented from being updated and the returned attributes are the responsibility of the hook writer. Fixed in: condor-7.2.0-4
This can be observed by running condor_[q|history] -l <cluster.job> for the source/dest job. Attrs to look for: EC2JobSuccessul EC2RunAttempts
I can see EC2RunAttempts in both source and dest (routed to EC2) job. No EC2JobSuccessful though, maybe because I am not able to successfully run any job in EC at the moment. Is this proof sufficient to mark this bug as verified?
EC2JobSuccessul will only be in the source ad if the job completed running in EC2. EC2RunAttempts will be incremented each time the job is attempted to be run in EC2. So, a successful test of this would be: EC2RunAttempts > 0 and/or EC2JobSuccessul = TRUE in the source job once the EC2 job has shut down.
The job run successfully (just some echo(1)es and sleep(1)). Here are parts of the "condor_history -l" outputs for source and dest: ---------------------------------------------- $ cat hist-6644 | grep EC2 PeriodicHold = EC2RunAttempts >= 5 EC2JobSuccessful = TRUE EC2RunAttempts = 0 ---------------------------------------------- $ cat hist-6645 | grep EC2 Cmd = "EC2: Amazon Small: /mnt/sharedfs/testmonkey/north-14/ec2e/jasan/multi_output.sh" EC2RunAttempts = 1 EC2JobSuccessful = TRUE ----------------------------------------------
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2009-0434.html