Bug 738335

Summary: Failed ec2e job can cause instance to run indefinitely
Product: Red Hat Enterprise MRG Reporter: Timothy St. Clair <tstclair>
Component: condor-ec2-enhancedAssignee: grid-maint-list <grid-maint-list>
Status: CLOSED WONTFIX QA Contact: MRG Quality Engineering <mrgqe-bugs>
Severity: low Docs Contact:
Priority: low    
Version: 2.0CC: ltoscano, matt, rrati, tstclair
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: condor-ec2-enhanced[-hooks]-1.3.1-1 Doc Type: Bug Fix
Doc Text:
C: An EC2 Enhanced job failed to run in the AMI C: It was difficult to know what happened when the job ran in EC2 F: The following attributes were added to give additional insight into what occurred in the AMI: EC2JobStatus, EC2LastFailureReason, EC2HookArg R: Additional feedback of the job status in the AMI is available
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-05-26 19:12:26 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 738338    
Bug Blocks: 810324    

Description Timothy St. Clair 2011-09-14 15:05:12 UTC
Description of problem:
If you have a bad submission file you can cause an instance to run indefinitely without any return information

Version-Release number of selected component (if applicable):
2.0.1

How reproducible:
100 % 

Steps to Reproduce:
1.) In your submission file specify some custom script
executable = /tmp/my_script.sh

2.) Set should_transfer_files = no (when it should be yes) 

  
Actual results:
Job will run forever.

Expected results:
Job should fail with some information for the user.

Comment 5 Robert Rati 2012-04-05 14:46:49 UTC
Added the following parameters that give additional information about the job in the AMI:
EC2JobStatus
EC2LastFailureReason
EC2HookArg

The EC2HookArg can be one of the following values (The meaning follows):
2 - Job was accepted
3 - Job was rejected
5 - Job exited normally
6 - Job was removed
7 - Job was held
8 - Job was evicted

EC2LastFailureReason will be set in a failure case to a string with an explanation

EC2JobStatus is the status of the job running on the condor in the AMI

The above parameters can be used in policy expressions, but nothing will be done with the job automatically.  It is not possible for EC2E to place a job on hold by itself.

Tracking on branch:
more-job-status-in-ec2

Comment 6 Robert Rati 2012-05-03 13:58:28 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
C: An EC2 Enhanced job failed to run in the AMI
C: It was difficult to know what happened when the job ran in EC2
F: The following attributes were added to give additional insight into what occurred in the AMI: EC2JobStatus, EC2LastFailureReason, EC2HookArg
R: Additional feedback of the job status in the AMI is available

Comment 11 Anne-Louise Tangring 2016-05-26 19:12:26 UTC
MRG-G is in maintenance only and only customer escalations will be addressed from this point forward. This issue can be re-opened if a customer escalation associated with this issue occurs.