Bug 493810 - RFE: Slot/Job requirement diagnostics from Startd
RFE: Slot/Job requirement diagnostics from Startd
Status: CLOSED NOTABUG
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor (Show other bugs)
1.1
All Linux
medium Severity high
: 1.3
: ---
Assigned To: Matthew Farrellee
Jan Sarenik
: FutureFeature
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-04-03 04:17 EDT by Jan Sarenik
Modified: 2010-06-04 08:01 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-06-04 08:01:13 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Jan Sarenik 2009-04-03 04:17:59 EDT
Since I was testing https://bugzilla.redhat.com/show_bug.cgi?id=472607
I kept adding the following lines to condor_config.local:

--------------------------------------------------
SLOT_TYPE_1 = CPUS=100%,DISK=100%,SWAP=100%
SLOT_TYPE_1_PARTITIONABLE = TRUE
NUM_SLOTS = 1
NUM_SLOTS_TYPE_1 = 1
--------------------------------------------------

These lines enable dynamic provisioning.

Now I wanted to test some low-latency job and despite all jobs
submitted via condor_submit worked well, I could not find out
why StartLog says "Slot requirements not satisfied." for any
AMQP submitted low-latency job. There was no helpful message
in log even with

ALL_DEBUG = D_FULLDEBUG D_JOB D_MACHINE D_COMMAND

After removing four lines mentioned above, everything started
to work seamlessly.

But even with those lines, I was able to make the AMQP-submitted
job run by adding RequestDisk and RequestMemory classads to
the job (as I was hinted by Matt).

Version affected (I am not sure all are used, but including
them for reference):
  condor-7.2.2-0.9.el5
  condor-low-latency-1.0-12.el5
  condor-job-hooks-1.0-5.el5
  condor-job-hooks-common-1.0-5.el5

I would expect it either to throw a meaningful message into
the log, or to run the job without hassle (not requiring
Request* ads). I think the latter implies the former one
anyway.
Comment 1 Matthew Farrellee 2009-04-03 09:06:56 EDT
Issue at hand is the interaction of fetched jobs and partitionable slots. Slots that can be partitioned require Request* attributes on incoming jobs, which is fine. The problem is that the code path used to actually partition slots is not hit for fetched jobs. This means a slot cannot be partitioned by a fetched job.
Comment 2 Matthew Farrellee 2009-05-01 11:41:49 EDT
This is probably best as an RFE for information as to why the job or slot are rejecting one another.

A workaround while debugging is to wrap the job's requirements or the slot's requirements in debug().
Comment 3 Jan Sarenik 2009-11-12 05:46:47 EST
Target set to 1.3
Comment 4 Matthew Farrellee 2010-01-14 22:29:49 EST
What would be useful?

Right now you can get a copy of the job and machine ads with D_JOB and D_MACHINE, and you'll get notification that their requirements were not met with either "Slot requirements not satisfied." or "Job requirements not satisfied."
Comment 5 Jan Sarenik 2010-01-21 08:26:48 EST
Is there a use-case to use partitioned slots and low-latency
enabled at the same time? If not, I find no issues other than
that I tested in badly configured environment.

Otherwise I would like to know why normal condor_submit -ted
jobs went fine, while low-lat jobs ended with requirements
not satisfied.

According to this, I may think about some useful and easy
enhancement.
Comment 6 Matthew Farrellee 2010-01-21 23:38:15 EST
There are such use-cases. The reason is condor_submit fills in a default RequestMemory and RequestDisk. The condor_submit tool is thick. In the low-latency case the writer of the message needs to replicate some of the knowledge embodied in condor_submit.

A condor_submit -> AMQP is desirable.
Comment 7 Jan Sarenik 2010-06-04 08:01:13 EDT
May be resurrected later if somebody asks for similar functionality,
but it was dead and I already do not work on testing Condor for months.
Closing as NOTABUG.

Note You need to log in before you can comment on or make changes to this bug.