Bug 738335 - Failed ec2e job can cause instance to run indefinitely
Summary: Failed ec2e job can cause instance to run indefinitely
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor-ec2-enhanced
Version: 2.0
Hardware: All
OS: Linux
low
low
Target Milestone: ---
: ---
Assignee: grid-maint-list
QA Contact: MRG Quality Engineering
URL:
Whiteboard:
Depends On: 738338
Blocks: 810324
TreeView+ depends on / blocked
 
Reported: 2011-09-14 15:05 UTC by Timothy St. Clair
Modified: 2016-05-26 19:12 UTC (History)
4 users (show)

Fixed In Version: condor-ec2-enhanced[-hooks]-1.3.1-1
Doc Type: Bug Fix
Doc Text:
C: An EC2 Enhanced job failed to run in the AMI C: It was difficult to know what happened when the job ran in EC2 F: The following attributes were added to give additional insight into what occurred in the AMI: EC2JobStatus, EC2LastFailureReason, EC2HookArg R: Additional feedback of the job status in the AMI is available
Clone Of:
Environment:
Last Closed: 2016-05-26 19:12:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 808506 1 None None None 2021-01-20 06:05:38 UTC

Internal Links: 808506

Description Timothy St. Clair 2011-09-14 15:05:12 UTC
Description of problem:
If you have a bad submission file you can cause an instance to run indefinitely without any return information

Version-Release number of selected component (if applicable):
2.0.1

How reproducible:
100 % 

Steps to Reproduce:
1.) In your submission file specify some custom script
executable = /tmp/my_script.sh

2.) Set should_transfer_files = no (when it should be yes) 

  
Actual results:
Job will run forever.

Expected results:
Job should fail with some information for the user.

Comment 5 Robert Rati 2012-04-05 14:46:49 UTC
Added the following parameters that give additional information about the job in the AMI:
EC2JobStatus
EC2LastFailureReason
EC2HookArg

The EC2HookArg can be one of the following values (The meaning follows):
2 - Job was accepted
3 - Job was rejected
5 - Job exited normally
6 - Job was removed
7 - Job was held
8 - Job was evicted

EC2LastFailureReason will be set in a failure case to a string with an explanation

EC2JobStatus is the status of the job running on the condor in the AMI

The above parameters can be used in policy expressions, but nothing will be done with the job automatically.  It is not possible for EC2E to place a job on hold by itself.

Tracking on branch:
more-job-status-in-ec2

Comment 6 Robert Rati 2012-05-03 13:58:28 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
C: An EC2 Enhanced job failed to run in the AMI
C: It was difficult to know what happened when the job ran in EC2
F: The following attributes were added to give additional insight into what occurred in the AMI: EC2JobStatus, EC2LastFailureReason, EC2HookArg
R: Additional feedback of the job status in the AMI is available

Comment 11 Anne-Louise Tangring 2016-05-26 19:12:26 UTC
MRG-G is in maintenance only and only customer escalations will be addressed from this point forward. This issue can be re-opened if a customer escalation associated with this issue occurs.


Note You need to log in before you can comment on or make changes to this bug.