Bug 644904

Summary: rejections create incorrect group allocations
Product: Red Hat Enterprise MRG Reporter: Jon Thomas <jthomas>
Component: condorAssignee: Erik Erlandson <eerlands>
Status: CLOSED CURRENTRELEASE QA Contact: Lubos Trilety <ltrilety>
Severity: high Docs Contact:
Priority: high    
Version: 1.3CC: iboverma, ltrilety, matt
Target Milestone: 1.3.2   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: condor-7.4.5-0.2 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-02-15 13:01:38 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
patch with incremental approach to negotiation
none
version without the other patches
none
test
none
test2
none
NegotiatorLog
none
NegotiatorLog with new settings none

Description Jon Thomas 2010-10-20 15:16:47 UTC
Created attachment 454594 [details]
patch with incremental approach to negotiation

The number of usable slots going into HFS can be bogus due to rejections (for whatever reason) during negotiation. HFS will compute shares based upon this number, but the rejections will cause starvation.

Use case (and repro):

a,b,a.a1,and a.a2. 10 slots, but 50% of slots are limited to only run "b" jobs. Quota for a.a1=6, a.a2=4. Result is a.a1 gets 5. So 5:0, rather than 3:2.

Comment 1 Jon Thomas 2010-10-20 15:32:41 UTC
ooops, I'll need to attach a different patch as this one also includes the patches from:

https://bugzilla.redhat.com/show_bug.cgi?id=637281
https://bugzilla.redhat.com/show_bug.cgi?id=639244
https://bugzilla.redhat.com/show_bug.cgi?id=641418

Comment 2 Jon Thomas 2010-10-20 15:45:08 UTC
Created attachment 454598 [details]
version without the other patches

Comment 4 Erik Erlandson 2010-11-19 21:28:11 UTC
Incorporated Jon's fix here:
V7_4-BZ619557-HFS-tree-structure

Comment 5 Lubos Trilety 2010-11-30 08:40:06 UTC
Created attachment 463664 [details]
test

Successfully reproduced on:
condor-7.4.4-0.16

Reproduction scenario is described in test attachment.

Comment 7 Lubos Trilety 2011-01-07 15:13:35 UTC
Created attachment 472246 [details]
test2

The output of the first scenario is:
b   - 50
a.a1  - 40
a.a2  - 10

Expected was:
b   - 50
a.a1  - 30
a.a2  - 20

Comment 8 Lubos Trilety 2011-01-07 15:14:43 UTC
Created attachment 472247 [details]
NegotiatorLog

the negotiator log from first scenario

Comment 9 Erik Erlandson 2011-01-10 23:00:34 UTC
> the negotiator log from first scenario

From the attached log I'm seeing that once the "a" jobs show up, the HFS algorithm is numerically allocating the 100 available slots as I expect:

01/07/11 09:48:18 HFS: group= <none>  quota= 0  requested= 0  allocated= 0  unallocated= 0
01/07/11 09:48:18 HFS: group= a  quota= 0  requested= 0  allocated= 0  unallocated= 0
01/07/11 09:48:18 HFS: group= b  quota= 33.3333  requested= 50  allocated= 34  unallocated= 16
01/07/11 09:48:18 HFS: group= a.a1  quota= 40  requested= 100  allocated= 40  unallocated= 60
01/07/11 09:48:18 HFS: group= a.a2  quota= 26.6667  requested= 100  allocated= 26  unallocated= 74
01/07/11 09:48:18 HFS: groups= 5  requesting= 3  served= 3  unserved= 0  slots= 100  requested= 250  allocated= 100  surplus= 7.10543e-15


Ideally, after this round we would like to see 34 jobs for "b", 40 jobs for "a.a1" and 26 jobs for "a.a2."  However, the "b" jobs have already claimed slots from previous rounds before the "a" jobs appeared, and the slots were not given up.

One configuration question: was preemption disabled in any way for these runs?

Also, you can try configuring these:
# turn on round-robin:
HFS_ROUND_ROBIN_RATE = 1.0
# turn on multiple allocation rounds
HFS_MAX_ALLOCATION_ROUNDS = 3

Comment 10 Lubos Trilety 2011-01-12 09:49:56 UTC
Created attachment 472977 [details]
NegotiatorLog with new settings

I didn't change any preemption setting in configuration:

# condor_config_val -dump | grep -i preempt
PREEMPT = FALSE
PREEMPTION_RANK = $(UWCS_PREEMPTION_RANK)
PREEMPTION_REQUIREMENTS = $(UWCS_PREEMPTION_REQUIREMENTS)
TESTINGMODE_PREEMPT = False
TESTINGMODE_PREEMPTION_RANK = 0
TESTINGMODE_PREEMPTION_REQUIREMENTS = False
UWCS_PREEMPT = ( ((Activity == "Suspended") && ($(ActivityTimer) > $(MaxSuspendTime))) || (SUSPEND && (WANT_SUSPEND == False)) )
UWCS_PREEMPTION_RANK = (RemoteUserPrio * 1000000) - TARGET.ImageSize
UWCS_PREEMPTION_REQUIREMENTS = ( $(StateTimer) > (1 * $(HOUR)) && RemoteUserPrio > SubmitterUserPrio * 1.2 ) || (MY.NiceUser == True)


What do variables HFS_ROUND_ROBIN_RATE and HFS_MAX_ALLOCATION_ROUNDS do?

In this case, when I changed those variables, the result was the same as expected:
b - 50
a.a1 - 30
a.a2 - 20

(see attachment for negotiator log)

Comment 11 Erik Erlandson 2011-01-14 22:50:41 UTC
(In reply to comment #10)

> What do variables HFS_ROUND_ROBIN_RATE and HFS_MAX_ALLOCATION_ROUNDS do?

HFS_MAX_ALLOCATION_ROUNDS specifies how many allocation rounds are allowed.  Allocation rounds are the iterations that may occur if a group does not get all the slots it was allocated due to rejection.  If the negotiation loop fails to fill the quota for one or more groups, it may take the unused quota and attempt to allocate it to other groups that may be able to use it.  This variable defaults to (1), which is the traditional behavior of a single negotiation attempt.  It has a minimum of (1).  Maximum is INT_MAX.

HFS_ROUND_ROBIN_RATE specifies the increment rate of the negotiation round-robin loop that can be used to solve the overlapping-effective-pool problem.   It defaults to traditional behavior of "Attempt to negotiate for everything in one iteration."  If it is set to its minimum value (1.0), it will give maximum preservation of allocation ratios, but will require possibly many iterations.  It can be increased to some value > 1.0 to reduce iterations (and log output), with the possible consequence of reduced adherence to allocated ratios.

Comment 12 Erik Erlandson 2011-01-14 22:59:41 UTC
(In reply to comment #10)

> In this case, when I changed those variables, the result was the same as
> expected:
> b - 50
> a.a1 - 30
> a.a2 - 20


Lubos, are you satisfied with the HFS behavior here?

Generically speaking, because the "a" jobs were submitted after the "b" jobs acquired resources, the final allocations may depend on preemption settings, and/or the current priorities of the submitters.

The parameter HFS_MAX_ALLOCATION_ROUNDS is meant to control the behavior of the fix for this specific bz (644904).  I have defaulted it to traditional behavior because customers may or may not want to enable it for performance tradeoff reasons.  So it is "opt-in."

Comment 13 Lubos Trilety 2011-01-17 15:50:07 UTC
(In reply to comment #12)

I verified all tests with new settings. They are all successful, so I am quite satisfied with HFS behaviour. But I missed any documentation about those two parameters. For this a new Bug 670222 was created.

Tested with (version):
condor-7.4.5-0.6

Tested on:
RHEL5 i386,x86_64  - passed
RHEL4 i386,x86_64  - passed

Comment 14 Lubos Trilety 2011-01-18 16:22:24 UTC
>>> VERIFIED