Created attachment 454594 [details] patch with incremental approach to negotiation The number of usable slots going into HFS can be bogus due to rejections (for whatever reason) during negotiation. HFS will compute shares based upon this number, but the rejections will cause starvation. Use case (and repro): a,b,a.a1,and a.a2. 10 slots, but 50% of slots are limited to only run "b" jobs. Quota for a.a1=6, a.a2=4. Result is a.a1 gets 5. So 5:0, rather than 3:2.
ooops, I'll need to attach a different patch as this one also includes the patches from: https://bugzilla.redhat.com/show_bug.cgi?id=637281 https://bugzilla.redhat.com/show_bug.cgi?id=639244 https://bugzilla.redhat.com/show_bug.cgi?id=641418
Created attachment 454598 [details] version without the other patches
Incorporated Jon's fix here: V7_4-BZ619557-HFS-tree-structure
Created attachment 463664 [details] test Successfully reproduced on: condor-7.4.4-0.16 Reproduction scenario is described in test attachment.
Created attachment 472246 [details] test2 The output of the first scenario is: b - 50 a.a1 - 40 a.a2 - 10 Expected was: b - 50 a.a1 - 30 a.a2 - 20
Created attachment 472247 [details] NegotiatorLog the negotiator log from first scenario
> the negotiator log from first scenario From the attached log I'm seeing that once the "a" jobs show up, the HFS algorithm is numerically allocating the 100 available slots as I expect: 01/07/11 09:48:18 HFS: group= <none> quota= 0 requested= 0 allocated= 0 unallocated= 0 01/07/11 09:48:18 HFS: group= a quota= 0 requested= 0 allocated= 0 unallocated= 0 01/07/11 09:48:18 HFS: group= b quota= 33.3333 requested= 50 allocated= 34 unallocated= 16 01/07/11 09:48:18 HFS: group= a.a1 quota= 40 requested= 100 allocated= 40 unallocated= 60 01/07/11 09:48:18 HFS: group= a.a2 quota= 26.6667 requested= 100 allocated= 26 unallocated= 74 01/07/11 09:48:18 HFS: groups= 5 requesting= 3 served= 3 unserved= 0 slots= 100 requested= 250 allocated= 100 surplus= 7.10543e-15 Ideally, after this round we would like to see 34 jobs for "b", 40 jobs for "a.a1" and 26 jobs for "a.a2." However, the "b" jobs have already claimed slots from previous rounds before the "a" jobs appeared, and the slots were not given up. One configuration question: was preemption disabled in any way for these runs? Also, you can try configuring these: # turn on round-robin: HFS_ROUND_ROBIN_RATE = 1.0 # turn on multiple allocation rounds HFS_MAX_ALLOCATION_ROUNDS = 3
Created attachment 472977 [details] NegotiatorLog with new settings I didn't change any preemption setting in configuration: # condor_config_val -dump | grep -i preempt PREEMPT = FALSE PREEMPTION_RANK = $(UWCS_PREEMPTION_RANK) PREEMPTION_REQUIREMENTS = $(UWCS_PREEMPTION_REQUIREMENTS) TESTINGMODE_PREEMPT = False TESTINGMODE_PREEMPTION_RANK = 0 TESTINGMODE_PREEMPTION_REQUIREMENTS = False UWCS_PREEMPT = ( ((Activity == "Suspended") && ($(ActivityTimer) > $(MaxSuspendTime))) || (SUSPEND && (WANT_SUSPEND == False)) ) UWCS_PREEMPTION_RANK = (RemoteUserPrio * 1000000) - TARGET.ImageSize UWCS_PREEMPTION_REQUIREMENTS = ( $(StateTimer) > (1 * $(HOUR)) && RemoteUserPrio > SubmitterUserPrio * 1.2 ) || (MY.NiceUser == True) What do variables HFS_ROUND_ROBIN_RATE and HFS_MAX_ALLOCATION_ROUNDS do? In this case, when I changed those variables, the result was the same as expected: b - 50 a.a1 - 30 a.a2 - 20 (see attachment for negotiator log)
(In reply to comment #10) > What do variables HFS_ROUND_ROBIN_RATE and HFS_MAX_ALLOCATION_ROUNDS do? HFS_MAX_ALLOCATION_ROUNDS specifies how many allocation rounds are allowed. Allocation rounds are the iterations that may occur if a group does not get all the slots it was allocated due to rejection. If the negotiation loop fails to fill the quota for one or more groups, it may take the unused quota and attempt to allocate it to other groups that may be able to use it. This variable defaults to (1), which is the traditional behavior of a single negotiation attempt. It has a minimum of (1). Maximum is INT_MAX. HFS_ROUND_ROBIN_RATE specifies the increment rate of the negotiation round-robin loop that can be used to solve the overlapping-effective-pool problem. It defaults to traditional behavior of "Attempt to negotiate for everything in one iteration." If it is set to its minimum value (1.0), it will give maximum preservation of allocation ratios, but will require possibly many iterations. It can be increased to some value > 1.0 to reduce iterations (and log output), with the possible consequence of reduced adherence to allocated ratios.
(In reply to comment #10) > In this case, when I changed those variables, the result was the same as > expected: > b - 50 > a.a1 - 30 > a.a2 - 20 Lubos, are you satisfied with the HFS behavior here? Generically speaking, because the "a" jobs were submitted after the "b" jobs acquired resources, the final allocations may depend on preemption settings, and/or the current priorities of the submitters. The parameter HFS_MAX_ALLOCATION_ROUNDS is meant to control the behavior of the fix for this specific bz (644904). I have defaulted it to traditional behavior because customers may or may not want to enable it for performance tradeoff reasons. So it is "opt-in."
(In reply to comment #12) I verified all tests with new settings. They are all successful, so I am quite satisfied with HFS behaviour. But I missed any documentation about those two parameters. For this a new Bug 670222 was created. Tested with (version): condor-7.4.5-0.6 Tested on: RHEL5 i386,x86_64 - passed RHEL4 i386,x86_64 - passed
>>> VERIFIED