Bug 474845
Summary: | EC2E AMis shutting down and resetarting | ||
---|---|---|---|
Product: | Red Hat Enterprise MRG | Reporter: | Robert Rati <rrati> |
Component: | grid | Assignee: | Robert Rati <rrati> |
Status: | CLOSED ERRATA | QA Contact: | Jeff Needle <jneedle> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 1.0 | CC: | matt |
Target Milestone: | 1.1 | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2009-02-04 16:04:58 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Robert Rati
2008-12-05 16:05:43 UTC
This is being caused by a failure in the finalization process. The exit hook is executed and then condor fails to find a file it is expecting in the spool directory so the finalization process fails and condor sets the JobStatus = 1. This causes the job to be re-routed and a new AMI started. The reason the finalization process can't find the files it is looking for is because the exit hook is being told that the spool directory is in a different location than it should be. In the case I am seeing, a job is submitted from /home/testmonkey/ec2e and the routed job has: SUBMIT_Iwd = "/home/testmonkey/ec2e" Iwd = "/mnt/sharedfs/condor_ha_schedd/cluster3142.proc0.subproc0" The spool directory that the exit hook is being given is "/home/testmonkey/ec2e", not "/mnt/sharedfs/condor_ha_schedd/cluster3142.proc0.subproc0". So the exit hook is extracting the tarball in S3 into "/home/testmonkey/ec2e" instead of the ...cluster3142... dir, thus the finalization process can't find the files it needs to complete. The jobs are no longer forcibly spooled by the hooks, and the finalize hook does the extracting/remapping of files back to the original job's iwd, as opposed to extracting the files from AWS into the routed job's spool directory. Fixed in: condor-ec2-enhanced-1.0-7 condor-ec2-enhanced-hooks-1.0-8 An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-0036.html |