Bug 709722 - Sporadic local execution of EC2/Enhanced jobs
Summary: Sporadic local execution of EC2/Enhanced jobs
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor-ec2-enhanced-hooks
Version: 1.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: grid-maint-list
QA Contact: MRG Quality Engineering
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-06-01 13:39 UTC by Luigi Toscano
Modified: 2011-06-01 14:42 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-06-01 14:42:39 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Luigi Toscano 2011-06-01 13:39:41 UTC
Description of problem:
EC2/E jobs are sometimes executed locally instead of being correctly routed.
According to developers:
<rsquared> [...] What is happening makes sense, and from the logs looks to be a race between the JR and the Negotiator since the job has real requirements.


Version-Release number of selected component (if applicable):
Found on 2.0rc, all supported architectures (i386/x86_64, RHEL5.6/6.1):
condor-7.6.1-0.8
condor-classads-7.6.1-0.8
condor-ec2-enhanced-hooks-1.2-2
python-condorec2e-1.2-2
python-condorutils-1.5-3
but most probably it is a pre-existing issue.

Most likely it does not depend on EC2 jobs, but it could be related to the interaction between Negotiator and JobRouter.

How reproducible:
Configure a personal condor on a i686 system to support EC2 jobs and submit many instances (10 should be enough) of something like:
-----------
universe = vanilla
executable = /bin/sleep
arguments = 600
output = /tmp/hostname32.$ENV(USER).$(cluster).out
error = /tmp/hostname32.$ENV(USER).$(cluster).err
log = /tmp/ulog.$ENV(USER).$(cluster).log
requirements = Arch == "INTEL"
should_transfer_files = yes
when_to_transfer_output = on_exit
transfer_executable = false
+WantAWS = True
+WantArch = "INTEL"
+WantCPUs = 1
+EC2RunAttempts = 1
queue
-----------
(on 64 bit the issue is triggered when a 64 bit AMI is required, so replace INTEL with X86_64 in requirements and WantArch).

Few instances of the jobs will be executed locally.

Comment 1 Luigi Toscano 2011-06-01 14:42:39 UTC
 
It's not a bug, but an omission in the job submission file (missing WantAWS =!=
true as part of the requirements).


Note You need to log in before you can comment on or make changes to this bug.