Bug 709722

Summary: Sporadic local execution of EC2/Enhanced jobs
Product: Red Hat Enterprise MRG Reporter: Luigi Toscano <ltoscano>
Component: condor-ec2-enhanced-hooksAssignee: grid-maint-list <grid-maint-list>
Status: CLOSED NOTABUG QA Contact: MRG Quality Engineering <mrgqe-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 1.3CC: matt
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-06-01 14:42:39 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Luigi Toscano 2011-06-01 13:39:41 UTC
Description of problem:
EC2/E jobs are sometimes executed locally instead of being correctly routed.
According to developers:
<rsquared> [...] What is happening makes sense, and from the logs looks to be a race between the JR and the Negotiator since the job has real requirements.


Version-Release number of selected component (if applicable):
Found on 2.0rc, all supported architectures (i386/x86_64, RHEL5.6/6.1):
condor-7.6.1-0.8
condor-classads-7.6.1-0.8
condor-ec2-enhanced-hooks-1.2-2
python-condorec2e-1.2-2
python-condorutils-1.5-3
but most probably it is a pre-existing issue.

Most likely it does not depend on EC2 jobs, but it could be related to the interaction between Negotiator and JobRouter.

How reproducible:
Configure a personal condor on a i686 system to support EC2 jobs and submit many instances (10 should be enough) of something like:
-----------
universe = vanilla
executable = /bin/sleep
arguments = 600
output = /tmp/hostname32.$ENV(USER).$(cluster).out
error = /tmp/hostname32.$ENV(USER).$(cluster).err
log = /tmp/ulog.$ENV(USER).$(cluster).log
requirements = Arch == "INTEL"
should_transfer_files = yes
when_to_transfer_output = on_exit
transfer_executable = false
+WantAWS = True
+WantArch = "INTEL"
+WantCPUs = 1
+EC2RunAttempts = 1
queue
-----------
(on 64 bit the issue is triggered when a 64 bit AMI is required, so replace INTEL with X86_64 in requirements and WantArch).

Few instances of the jobs will be executed locally.

Comment 1 Luigi Toscano 2011-06-01 14:42:39 UTC
 
It's not a bug, but an omission in the job submission file (missing WantAWS =!=
true as part of the requirements).