Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 629614

Summary: Unmatchable jobs throttle throughput with HFS
Product: Red Hat Enterprise MRG Reporter: Jon Thomas <jthomas>
Component: condorAssignee: Erik Erlandson <eerlands>
Status: CLOSED ERRATA QA Contact: Lubos Trilety <ltrilety>
Severity: high Docs Contact:
Priority: high    
Version: betaCC: eerlands, ltrilety, matt, tross
Target Milestone: 1.3   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 528800    
Attachments:
Description Flags
concurrency-v3.patch
none
concurency-v4.patch
none
concurrency-v5.patch
none
tarball of testing config and submit files w/ README
none
negotiator log of match cycle from fair-share failure in comment #9
none
negotiator log of match cycle from similar submission w/out rejects
none
concurrency-v6.patch
none
updated test suite for HFS and rejected slot redistribution
none
negotiator log for test7.jsub none

Description Jon Thomas 2010-09-02 14:24:45 UTC
Concurrency limits are not considered prior to negotiation therefore jobs are given slots that end up as rejections. Other users and groups are unable to use these slots.

It appears that scheddAds do not contain concurrency limit information, so a simple filter doesn't work. The negotiator isn't aware of limit information until the negotiation with schedd.

Comment 3 Jon Thomas 2010-09-07 15:48:10 UTC
Created attachment 443544 [details]
concurrency-v3.patch

Comment 4 Jon Thomas 2010-09-07 15:50:27 UTC
Here's the third version of the patch. 

First version was:

- begin negotiation loop
- negotiate
- tally usage (any rejections?)
- fairshare(# rejections)
- goto begin negotiation loop

This well for config that are flat where depth of tree is 1 w/ no subgroups. 

Second/third versions are:

- begin loop
- bubble up unused slots
- fairshare(unused)
- roundoff
- negotiate
- tally usage
- set #submitters to reflect usage
- goto begin loop

The second version handles fairshare for subgroups.

For repro, create a group config with autoregroup and set a LIMIT in config
such as:

users_LIMIT = 1

Then change the sdf of a job to have:

concurrency_limits = users

What happens is that although the job should have gotten x slots, it gets only
one due to concurrency limits. That's correct behavior. The problem is the
(x-1) slots don't get distributed across the other groups and you can see this
be looking at expected and actual slots used.

To thoroughly test this, the concurrency limit needs to be used as 

1) non-group user
2) a node that has submitters and children
3) a child
4) in multiple groups
  - concurrency limits are first come/first served (I think one can config this
issue away). The second group that has the same limit may not get any slots,
but we need to make sure their slots get used by other groups too.
5) across different users within a group

Comment 5 Jon Thomas 2010-09-07 16:14:43 UTC
Created attachment 443635 [details]
concurency-v4.patch

yet another patch. I made a last minute change, but to the wrong value.

Comment 6 Jon Thomas 2010-09-07 20:07:40 UTC
Created attachment 445772 [details]
concurrency-v5.patch

Another update. This would only impact the 5th test scenario. There was an initialized value inside of a loop when it should have been outside.

Comment 7 Erik Erlandson 2010-09-08 20:33:56 UTC
Created attachment 446080 [details]
tarball of testing config and submit files w/ README

tarball of some tests derived from jrt's description, plus some more I added

Comment 8 Erik Erlandson 2010-09-08 20:42:50 UTC
I verified that the -v5 patch works for the testing scenarios jrt described above.   

There is a test I added to see if rejected slots get redistributed in a way that preserves fair-share, however I think fairshare may not be preserved:

(following is also in the tarball I attached to this bz, under "test 7" in the readme file)

HFS config (plus CL setup)
==============================================
T1_LIMIT = 1
T2_LIMIT = 2

NUM_CPUS = 20

GROUP_NAMES = a, b, a.m, a.n, b.q, b.r

GROUP_QUOTA_DYNAMIC_a = 0.5
GROUP_QUOTA_DYNAMIC_b = 0.4

GROUP_QUOTA_DYNAMIC_a.m = 0.5
GROUP_QUOTA_DYNAMIC_a.n = 0.4

GROUP_QUOTA_DYNAMIC_b.q = 0.5
GROUP_QUOTA_DYNAMIC_b.r = 0.25

GROUP_AUTOREGROUP_a = TRUE
GROUP_AUTOREGROUP_b = TRUE
GROUP_AUTOREGROUP_a.m = TRUE
GROUP_AUTOREGROUP_a.n = TRUE
GROUP_AUTOREGROUP_b.q = TRUE
GROUP_AUTOREGROUP_b.r = TRUE
==============================================

Submit 10 jobs against CL "T2", and against group "a.user" (two of those jobs will run, leaving 18 slots left)
Submit 15 jobs against group "b.q.user" and "b.r.user" each.  What I'm expecting is that fairshare distribution of 18 slots gives 12 to "b.q" and 6 to "b.r"  (their quota ratio is 2:1), however the distribution I get is:

% condor_q -run -l | grep AccountingGroup | sort | uniq -c
      2 AccountingGroup = "a.user"
     14 AccountingGroup = "b.q.user"
      4 AccountingGroup = "b.r.user"

Comment 9 Erik Erlandson 2010-09-08 21:51:48 UTC
another data point:  If I take CL out of the picture (that is, no rejects), just submit two jobs against "a", 15 against "b.q", 15 against "b.r", the fairshare works like expected:

% condor_q -run -l | grep AccountingGroup | sort | uniq -c
      2 AccountingGroup = "a.user"
     12 AccountingGroup = "b.q.user"
      6 AccountingGroup = "b.r.user"

Comment 10 Erik Erlandson 2010-09-08 23:47:18 UTC
Created attachment 446106 [details]
negotiator log of match cycle from fair-share failure in comment #9

Comment 11 Erik Erlandson 2010-09-08 23:48:59 UTC
Created attachment 446107 [details]
negotiator log of match cycle from similar submission w/out rejects

This is a sort of control-submission that works as expected, for comparison against previous attachment (test8.jsub.neglog), which shows failure to achieve fair share ratios when redistributing rejected slots.

Comment 12 Jon Thomas 2010-09-09 01:33:22 UTC
Created attachment 446119 [details]
concurrency-v6.patch

This seems to fix the one test case. I have not yet tried the others.  

The main fix was trimming out the groups with no submitters from the second iteration. These groups don't negotiate, but their slots were already accounted for in the first pass. This resulted in starvation.

if ( groupArray[i].submitterAds.MyLength() == 0 ) {
	dprintf(D_ALWAYS,"Group %s - skipping, no submitters\n",groupArray[i].groupName.c_str());
	if (groupArray[i].child==-1) groupArray[i].maxAllowed=0;
	else groupArray[i].nodemaxAllowed=0;
	continue;
			}	

This patch also has loophalt set to 2 rather than 5.

Comment 13 Erik Erlandson 2010-09-09 05:03:01 UTC
Created attachment 446137 [details]
updated test suite for HFS and rejected slot redistribution

Rewrote test suite to be cleaner and easier.  Added a "non-CL reject" test just to sanity check that reason for rejection doesn't affect behavior of HFS.

The -v6 patch passes all these tests.

Comment 14 Erik Erlandson 2010-09-09 05:06:31 UTC
The -v6 patch is passing all my tests.  May want to consider setting loophalt higher than 2.  Would catch cases where slots untried in previous iteration were rejected in latest iteration.  Previous setting of 5 is probably plenty.

Comment 15 Jon Thomas 2010-09-09 13:14:44 UTC
That was my reasoning for 5. I'm fine with leaving it in. I was trying to come up with a number that made sense but didn't impact performance too much.

Comment 16 Erik Erlandson 2010-09-09 19:20:57 UTC
Fix has been pushed to V7_4-HFS-branch.   Equivalent to patch -v6, except loophalt threshold set back to 5.

Comment 17 Matthew Farrellee 2010-09-10 04:35:26 UTC
Build in 7.4.4-0.12

Comment 18 Lubos Trilety 2010-09-29 10:24:46 UTC
Created attachment 450437 [details]
negotiator log for test7.jsub

I run test7 from attachment after submitting two jobs one run as "b.user" second as "none.user". The result of the test7 was not as expected, problem is that wrong number of slots goes to second iteration of negotiation cycle.

Tested with:
condor-7.4.4-0.16

Scenario:
$ cat submit.file
cmd = /bin/sleep
args = 2
+AccountingGroup = "b.user"
queue
+AccountingGroup = "none.user"
queue

$ condor_submit submit.file
Submitting job(s)..
2 job(s) submitted to cluster 1.

--- wait until all jobs finish
$ condor_q | grep jobs
0 jobs; 0 idle, 0 running, 0 held

$ condor_submit test7.jsub 
Submitting job(s)........................................
40 job(s) submitted to cluster 2.

--- The result was
$ condor_q -run -l | grep AccountingGroup | sort | uniq -c
      2 AccountingGroup = "a.user"
     14 AccountingGroup = "b.q.user"
      4 AccountingGroup = "b.r.user"


The result should be "a.user": 2, "b.q.user": 12, "b.r.user": 6

In negotiator log, it can be found that in first iteration "a.user" gets 2 slots, because of concurrency limit, "b.q.user" gets 6 slots and "b.r.user" obtains 4 slots.
So only 8 slots are not used but in logs there is: "Failed to match 12.800001 slots on iteration 1."

Comment 19 Jon Thomas 2010-09-29 14:05:02 UTC
It looks like what happened was that the jobs for nongroup and jobs for group b were in the process of ending at the start of the negotiation cycle. There were classAds for these jobs, but the values for

ad->LookupInteger(ATTR_RUNNING_JOBS, numrunning);
ad->LookupInteger(ATTR_IDLE_JOBS, numidle);
			
were zero. This meant numsubmits was zero. Since there was a classAd, the ad was inserted into the list for the group.

groupArray[0].submitterAds.Insert(ad);
				

Later, this caused a problem with

if ( groupArray[i].submitterAds.MyLength() == 0 ) {
				

So I think, 

if ( groupArray[i].submitterAds.MyLength() == 0 ) {

should be changed to

if ( groupArray[i].numsubmits == 0 ) {

We use numsubmits in a number of places, so this should be better. Negotiation is skipped in negotiatewithgroup based upon:

num_idle_jobs = 0;
schedd->LookupInteger(ATTR_IDLE_JOBS,num_idle_jobs);
if ( num_idle_jobs < 0 ) {
		num_idle_jobs = 0;
	}

This is where we get the  ..." skipped because no idle jobs" message. So I think the later stages of negotiation rely on these same attributes. If we make the change, negotiation would be skipped higher in the call chain and it would not add to the number of unused slots.

Comment 20 Lubos Trilety 2010-10-01 08:36:00 UTC
New bug 639244 was created to cover Comment 18.

Comment 21 Lubos Trilety 2010-10-01 08:40:52 UTC
Tested with (version):
condor-7.4.4-0.16

Reproduction:
All tests from attachment were successful if they run on clear condor - no jobs had been running before them.

Tested on:
RHEL5 x86_64,i386  - passed
RHEL4 x86_64,i386  - passed

>>> VERIFIED