Description of problem: randomly it seems an EC2E job won't complete. The job is created and the AMI started, but it seems the job is never run. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
When the AMI starts up, caroniad first tries to access AWS using the information provided by the user_data. If it was unable to access AWS because the information was wrong or because AWS was having problems at that instance then caroniad would exit and the job would never be run nor would the AMI be shutdown. The caronia daemon now trys to access AWS 5 times (waiting 5 times between attempts) and if it still can't access AWS will shutdown the AMI. The hooks on the schedd will notice that the job has been run and force condor to re-route the job. Fixed in: condor-7.2.0-0.13 condor-ec2-enhanced-hooks-1.0-7 condor-ec2-enhanced-1.0-6
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-0036.html