474842 – EC2E job never completes

Bug 474842 - EC2E job never completes

Summary: EC2E job never completes

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise MRG
Classification:	Red Hat
Component:	grid
Sub Component:
Version:	1.0
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	1.1
Target Release:	---
Assignee:	Robert Rati
QA Contact:	Jeff Needle
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-12-05 16:01 UTC by Robert Rati
Modified:	2009-02-04 16:04 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2009-02-04 16:04:54 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2009:0036	0	normal	SHIPPED_LIVE	Red Hat Enterprise MRG Grid 1.1 Release	2009-02-04 16:03:49 UTC

Description Robert Rati 2008-12-05 16:01:07 UTC

Description of problem:
randomly it seems an EC2E job won't complete.  The job is created and the AMI started, but it seems the job is never run.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Robert Rati 2008-12-17 02:46:52 UTC

When the AMI starts up, caroniad first tries to access AWS using the information provided by the user_data.  If it was unable to access AWS because the information was wrong or because AWS was having problems at that instance then caroniad would exit and the job would never be run nor would the AMI be shutdown.

The caronia daemon now trys to access AWS 5 times (waiting 5 times between attempts) and if it still can't access AWS will shutdown the AMI.  The hooks on the schedd will notice that the job has been run and force condor to re-route the job.

Fixed in:
condor-7.2.0-0.13
condor-ec2-enhanced-hooks-1.0-7
condor-ec2-enhanced-1.0-6

Comment 4 errata-xmlrpc 2009-02-04 16:04:54 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-0036.html

Note You need to log in before you can comment on or make changes to this bug.