Bug 474842 - EC2E job never completes
Summary: EC2E job never completes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: grid
Version: 1.0
Hardware: All
OS: Linux
high
high
Target Milestone: 1.1
: ---
Assignee: Robert Rati
QA Contact: Jeff Needle
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-12-05 16:01 UTC by Robert Rati
Modified: 2009-02-04 16:04 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-02-04 16:04:54 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2009:0036 0 normal SHIPPED_LIVE Red Hat Enterprise MRG Grid 1.1 Release 2009-02-04 16:03:49 UTC

Description Robert Rati 2008-12-05 16:01:07 UTC
Description of problem:
randomly it seems an EC2E job won't complete.  The job is created and the AMI started, but it seems the job is never run.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Robert Rati 2008-12-17 02:46:52 UTC
When the AMI starts up, caroniad first tries to access AWS using the information provided by the user_data.  If it was unable to access AWS because the information was wrong or because AWS was having problems at that instance then caroniad would exit and the job would never be run nor would the AMI be shutdown.

The caronia daemon now trys to access AWS 5 times (waiting 5 times between attempts) and if it still can't access AWS will shutdown the AMI.  The hooks on the schedd will notice that the job has been run and force condor to re-route the job.

Fixed in:
condor-7.2.0-0.13
condor-ec2-enhanced-hooks-1.0-7
condor-ec2-enhanced-1.0-6

Comment 4 errata-xmlrpc 2009-02-04 16:04:54 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-0036.html


Note You need to log in before you can comment on or make changes to this bug.