Bug 672830

Summary: partitionable slot not partitioning using Requirements = cpus == x
Product: Red Hat Enterprise MRG Reporter: Jon Thomas <jthomas>
Component: condorAssignee: Matthew Farrellee <matt>
Status: CLOSED NOTABUG QA Contact: MRG Quality Engineering <mrgqe-bugs>
Severity: low Docs Contact:
Priority: low    
Version: 1.2CC: matt
Target Milestone: 2.0   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-02-01 18:37:09 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Jon Thomas 2011-01-26 14:49:43 UTC
SLOT_TYPE_1 = cpus=8
SLOT_TYPE_1_PARTITIONABLE = TRUE
NUM_SLOTS_TYPE_1 = 1

submit

executable = /bin/sleep
arguments = 6000
universe = vanilla
transfer_executable = true
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
queue 1

slot partitions correctly

submit

executable = /bin/sleep
arguments = 6000
universe = vanilla
transfer_executable = true
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
Requirements  = cpus == 7
queue 1

Job is matched in the negotiator to the partitionable slot (slot1@), however in the next cycle check_matches removes the match and attempts to match the job to slot1_1@ and not slot1_2@. Slot1_1@ is already match to the first submission. The second submission stays in idle state.


condor_submit jrt.job
Submitting job(s).
1 job(s) submitted to cluster 575.

<i stripped hostnames so log might look odd>

01/26/11 09:23:32       Matched 575.0 jrthomas@<192.168.122.1:56478> preempting none <192.168.122.1:34948> slot1@
01/26/11 09:23:32       Notifying the accountant
01/26/11 09:23:32 Accountant::AddMatch - CustomerName=jrthomas@, ResourceName=slot1@@<192.168.122.1:34948>
01/26/11 09:23:32 Customername jrthomas@ GroupName is: <none>
01/26/11 09:23:32 GroupWeightedResourcesUsed=2.000000 SlotWeight=1.000000
01/26/11 09:23:32 (ACCOUNTANT) Added match between customer jrthomas@ and resource slot1@@<192.168.122.1:34948>
01/26/11 09:23:32       Successfully matched with slot1@

...

01/26/11 09:23:52 Resource slot1@@<192.168.122.1:34948> was not claimed by jrthomas@ - removing match
01/26/11 09:23:52 Accountant::RemoveMatch - ResourceName=slot1@@<192.168.122.1:34948>
01/26/11 09:23:52 Customername jrthomas@ GroupName is: <none>
01/26/11 09:23:52 GroupResourcesUsed =1.000000 GroupWeightedResourcesUsed= 1.000000 SlotWeight=0.000000
01/26/11 09:23:52 (ACCOUNTANT) Removed match between customer jrthomas@ and resource slot1@@<192.168.122.1:34948>
01/26/11 09:23:52 Accountant::AddMatch - CustomerName=jrthomas@, ResourceName=slot1_1@@<192.168.122.1:34948>
01/26/11 09:23:52 Match already existed!


-- Submitter: basin.redhat.com : <192.168.122.1:56478> : basin.redhat.com
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
 574.0   jrthomas        1/26 09:22   0+00:04:07 R  0   0.0  sleep 6000        
 575.0   jrthomas        1/26 09:23   0+00:00:00 I  0   0.0  sleep 6000        

2 jobs; 1 idle, 1 running, 0 held

From the logs, it looks like the slot is not partitioning as there should be 3 startd ads:

01/26/11 09:30:14 Public ads include 1 submitter, 2 startd

The cycle of matching to slot1@, not partitioning, and then having the match removed continues indefinitely. 

Partitioning works as expected when using RequestCpus = 7 instead of Requirements   = cpus == 7 .


$ condor_q -better-analyze

576.000:  Request is being serviced

---
578.000:  Run analysis summary.  Of 2 machines,
      1 are rejected by your job's requirements
      0 reject your job because of their own requirements
      0 match but are serving users with a better priority in the pool
      1 match but reject the job for unknown reasons
      0 match but will not currently preempt their existing job
      0 match but are currently offline
      0 are available to run your job
	Last successful match: Wed Jan 26 09:43:38 2011

The Requirements expression for your job is:

( target.Cpus == 7 ) && ( target.Arch == "X86_64" ) &&
( target.OpSys == "LINUX" ) && ( target.Disk >= DiskUsage ) &&
( ( target.Memory * 1024 ) >= ImageSize ) &&
( ( RequestMemory * 1024 ) >= ImageSize ) && ( target.HasFileTransfer )

    Condition                         Machines Matched    Suggestion
    ---------                         ----------------    ----------
1   ( target.Cpus == 7 )              1                    
2   ( target.Arch == "X86_64" )       2                    
3   ( target.OpSys == "LINUX" )       2                    
4   ( target.Disk >= 30 )             2                    
5   ( ( 1024 * target.Memory ) >= 30 )2                    
6   ( ( 1024 * ceiling(ifThenElse(JobVMMemory isnt undefined,JobVMMemory,2.929687500000000E-02)) ) >= 30 )
                                      2                    
7   ( target.HasFileTransfer )        2

Comment 1 Matthew Farrellee 2011-02-01 18:37:09 UTC
The "Requirements = CPUs == 7" job is expected to not run unless it also has "RequestCPUs = 7"

The reason is the job may match slot1 (the partitionable slot), but it still needs to be accepted by the dynamic slot created for it. Without RequestCPUs=7, the dynamic slot will have CPUs=1 and the Job's Requirements will fail.

Check the StartLog to see this,

02/01/11 13:29:24 slot1: Received match <192.168.1.100:44510>#1296584763#6#...
02/01/11 13:29:24 slot1: State change: match notification protocol successful
02/01/11 13:29:24 slot1: Changing state: Unclaimed -> Matched
02/01/11 13:29:24 slot1_2: New machine resource of type -1 allocated
02/01/11 13:29:24 slot1: Changing state: Matched -> Unclaimed
***02/01/11 13:29:24 slot1_2: Job requirements not satisfied.***
02/01/11 13:29:24 slot1_2: Request to claim resource refused.
02/01/11 13:29:24 slot1_2: Claiming protocol failed
02/01/11 13:29:24 slot1_2: Changing state: Owner -> Delete
02/01/11 13:29:24 slot1_2: Resource no longer needed, deleting

The Requirements are inconsistent with the job's other requests. This cannot be easily detected. The "( ( RequestMemory * 1024 ) >= ImageSize )" expression added to the Requirements is an example of a possible way to detect this situation wrt memory in a way that condor_q -better-analyze can report on.

An RFE with a suggest on how to help -better-analyze detect such a situation is welcomed.

Comment 2 Jon Thomas 2011-02-01 18:55:48 UTC
I see this differently because it matches in the negotiator. If it matches in the negotiator, it will starve submitters with jobs that could match. Hypothetically, one could flood the queue with jobs with (Requirements  = cpus == x ) such that no jobs will run at all. These jobs could be matched against all partitionable slots with cpus>0. I think this is a throughput problem.

Comment 3 Matthew Farrellee 2011-02-01 19:07:09 UTC
A submitter can certainly starve itself. When it comes to starving others, flooding sleep 365d will work, so long as preemption is disabled.

This is akin to a user submitting with "Requirements = random(2)". There's a chance the Negotiator will provide a match and the slot will never be properly claimed.

Is this deeper in how starvation can occur between users?

Comment 4 Jon Thomas 2011-02-01 19:35:32 UTC
I was thinking along the lines of slots never being used by any submitter. 

There is a simpler case:

runs
=====
Requirements   = cpus == 8
SLOT_TYPE_1 = cpus=8
SLOT_TYPE_1_PARTITIONABLE = FALSE
NUM_SLOTS_TYPE_1 = 1

RequestCpus = 8
SLOT_TYPE_1 = cpus=8
SLOT_TYPE_1_PARTITIONABLE = FALSE
NUM_SLOTS_TYPE_1 = 1

RequestCpus = 8
SLOT_TYPE_1 = cpus=8
SLOT_TYPE_1_PARTITIONABLE = TRUE
NUM_SLOTS_TYPE_1 = 1


doesn't run
===========
Requirements   = cpus == 8
SLOT_TYPE_1 = cpus=8
SLOT_TYPE_1_PARTITIONABLE = TRUE
NUM_SLOTS_TYPE_1 = 1


The 8 cpu partitionable slot doesn't behave like a "normal" 8 cpu slot.

Comment 5 Matthew Farrellee 2011-02-01 19:46:47 UTC
A user who always gets to match first and submits RequestCPUs=1,Requirements=CPUs>1 is equivalent to a user who always matches first and submits sleep 365d, and is close to a user who matches first and submits Requirements=random(2).

Long term that user's priority should decrease until they no longer match first.

Comment 6 Jon Thomas 2011-02-01 20:14:26 UTC
I understand, but the "Requirements   = cpus == x" problem only occurs with SLOT_TYPE_1_PARTITIONABLE = TRUE.  In one case, using "Requirements   = cpus == 8" is valid and not in the other. The only difference is the partitionable flag.

runs
=====
Requirements   = cpus == 8

SLOT_TYPE_1 = cpus=8
SLOT_TYPE_1_PARTITIONABLE = FALSE
NUM_SLOTS_TYPE_1 = 1

doesn't run
===========
Requirements   = cpus == 8

SLOT_TYPE_1 = cpus=8
SLOT_TYPE_1_PARTITIONABLE = TRUE
NUM_SLOTS_TYPE_1 = 1

Comment 7 Matthew Farrellee 2011-02-01 20:23:10 UTC
That is true, and the proper fix is to make the Negotiator aware of the partitionable/dynamic slot behavior. Unfortunately, such a solution was blocked from inclusion upstream.