Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 709722

Summary:	Sporadic local execution of EC2/Enhanced jobs
Product:	Red Hat Enterprise MRG	Reporter:	Luigi Toscano <ltoscano>
Component:	condor-ec2-enhanced-hooks	Assignee:	grid-maint-list <grid-maint-list>
Status:	CLOSED NOTABUG	QA Contact:	MRG Quality Engineering <mrgqe-bugs>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	1.3	CC:	matt
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2011-06-01 14:42:39 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Luigi Toscano 2011-06-01 13:39:41 UTC

Description of problem:
EC2/E jobs are sometimes executed locally instead of being correctly routed.
According to developers:
<rsquared> [...] What is happening makes sense, and from the logs looks to be a race between the JR and the Negotiator since the job has real requirements.


Version-Release number of selected component (if applicable):
Found on 2.0rc, all supported architectures (i386/x86_64, RHEL5.6/6.1):
condor-7.6.1-0.8
condor-classads-7.6.1-0.8
condor-ec2-enhanced-hooks-1.2-2
python-condorec2e-1.2-2
python-condorutils-1.5-3
but most probably it is a pre-existing issue.

Most likely it does not depend on EC2 jobs, but it could be related to the interaction between Negotiator and JobRouter.

How reproducible:
Configure a personal condor on a i686 system to support EC2 jobs and submit many instances (10 should be enough) of something like:
-----------
universe = vanilla
executable = /bin/sleep
arguments = 600
output = /tmp/hostname32.$ENV(USER).$(cluster).out
error = /tmp/hostname32.$ENV(USER).$(cluster).err
log = /tmp/ulog.$ENV(USER).$(cluster).log
requirements = Arch == "INTEL"
should_transfer_files = yes
when_to_transfer_output = on_exit
transfer_executable = false
+WantAWS = True
+WantArch = "INTEL"
+WantCPUs = 1
+EC2RunAttempts = 1
queue
-----------
(on 64 bit the issue is triggered when a 64 bit AMI is required, so replace INTEL with X86_64 in requirements and WantArch).

Few instances of the jobs will be executed locally.

Comment 1 Luigi Toscano 2011-06-01 14:42:39 UTC

 
It's not a bug, but an omission in the job submission file (missing WantAWS =!=
true as part of the requirements).