Red Hat Bugzilla – Bug 476021
Permissions issue when retrieving data from S3 in EC3E
Last modified: 2009-02-04 11:06:27 EST
Description of problem:
An EC2E job completes, but the finalize hook is unable to place the tarball contents of the results in the routed job's spool dir because the spool dir is no longer owned by the job owner, but by user condor.
Version-Release number of selected component (if applicable):
Run EC2E job and have the AMI exit before the results are able to be read from SQS by condor.
Steps to Reproduce:
12/11 10:18:31 (pid:13672) Job 3273.0 is finished
12/11 10:18:31 (pid:13672) Job cleanup for 3273.0 will not block, calling jobIsFinished() directly
12/11 10:18:31 (pid:13672) jobIsFinished() completed, calling DestroyProc(3273.0)
12/11 10:19:00 JobRouter (src=3266.7,dest=3273.0,route=Amazon Small): updated job status
12/11 10:19:00 JobRouter (src=3266.7,dest=3273.0,route=Amazon Small): found target job finished
The routed job is completing BEFORE the source job. That means the EC2 job is completing and shutting down before the status message makes its way back.
When the ec2 job finishes condor chown's it's spool back to condor.condor.
The status that the routed job has completed has not yet been read. and the JR just fires off the cleanup hook, as it should, when the routed job completes.
The finalize hook no longer attempts to write to the routed job's spool directory, and instead writes to the source job's IWD.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.