Created attachment 441610 [details] patch to fix constraint problem The value from the GROUP_DYNAMIC_MACH_CONSTRAINT calculation in negotiator is unused. The assignment for groupArray[0].maxAllowed is prior to the calculation, but should be after the calculation.
Easiest way to repro is create one group with quota and use GROUP_DYNAMIC_MACH_CONSTRAINT to trim a slot count and observe in negotiatorlog that a) the slot count is actually reduced such as: 08/30/10 09:00:09 GROUP_DYNAMIC_MACH_CONSTRAINT constraint reduces machine count from 39 to 38 b) that the maxAllowed value for groupArray[0] is not set to the reduced value such as: 08/30/10 09:00:09 negotiationtime: finished sort - slots 38 group auto true quota 1.000000 maxAllowed 39.000000 numsubmits 0 parent -1 child 2 left -1 right -1 i 0 In the above, the maxAllowed value for groupArray[0] should be 38, not 39.
Can you simplify with GROUP_DYNAMIC_MACH_CONSTRAINT = FALSE? FAIL/PASS is determined by jobs running or not?
That would likely work, but I tend to look at the logs so I can isolate hfs behavior from matching, preemption, etc issues.
Created attachment 445775 [details] owner slots patch Patch changes the order so the number of slots relects the machine constraints. Patch implements NEG_TRIM_OWNER_STARTDS
Built in 7.4.4-0.11 sans NEG_TRIM_OWNER_STARTDS
The GROUP_DYNAMIC_MACH_CONSTRAINT and/or NEG_TRIM_OWNER_STARTDS looks like undocumented feature. Please, could you specify, how to reproduce/verify this, and provide some information, how the parameter affect dynamic group?
NEG_TRIM_OWNER_STARTDS does not exist. GROUP_DYNAMIC_MACH_CONSTRAINT assists the Negotiator in converting dynamic group quotas, %'s, in absolute numbers. If you have a pool of 1000 slots and two groups each with 0.5 dynamic quota (50%), the negotiator will assume it can give 500 matches to each group. If in fact 998 of those slots are in Owner state, there are only 2 slots and the split should be 1 each. GROUP_DYNAMIC_MACH_CONSTRAINT provides a way to trim the absolute number of assumed matchable slots to make the absolute quotas more realistic. It should probably default to State != "Owner" && Cpus > 0
Reproduced on RHEL5/i686 with: # condor -v $CondorVersion: 7.4.4 Aug 9 2010 BuildID: RH-7.4.4-0.9.el5 PRE-RELEASE $ $CondorPlatform: I386-LINUX_RHEL5 $ # tail -f /var/log/condor/NegotiatorLog | grep -i -E "MACH|sort" 09/30/10 06:39:04 GROUP_DYNAMIC_MACH_CONSTRAINT constraint reduces machine count from 10 to 9 09/30/10 06:39:04 negotiationtime:sorting 09/30/10 06:39:04 Sort : sorting group vector 09/30/10 06:39:04 Sorting : grouparray group b parent -1 child -1 left -1 right -1 i 0 09/30/10 06:39:04 Sort : stage two 09/30/10 06:39:04 midsort : grouparray group parent -1 child 2 left -1 right -1 i 0 09/30/10 06:39:04 midsort : grouparray group a parent 0 child -1 left -1 right 2 i 1 09/30/10 06:39:04 midsort : grouparray group b parent 0 child -1 left 1 right -1 i 2 09/30/10 06:39:04 Sorted : grouparray group parent -1 child 2 left -1 right -1 i 0 09/30/10 06:39:04 Sorted : grouparray group a parent 0 child -1 left -1 right 2 i 1 09/30/10 06:39:04 Sorted : grouparray group b parent 0 child -1 left 1 right -1 i 2 09/30/10 06:39:04 Sort : leaving 09/30/10 06:39:04 negotiationtime: finished sort - slots 9 group auto true quota 1.000000 maxAllowed 10.000000 numsubmits 0 parent -1 child 2 left -1 right -1 i 0 09/30/10 06:39:04 negotiationtime: finished sort - slots 9 group a auto true quota 0.500000 maxAllowed 0.000000 numsubmits 0 parent 0 child -1 left -1 right 2 i 1 09/30/10 06:39:04 negotiationtime: finished sort - slots 9 group b auto true quota 0.500000 maxAllowed 0.000000 numsubmits 0 parent 0 child -1 left 1 right -1 i 2
Retested on current condor-7.4.4-0.16 on all supported platforms - RHEL4,RHEL5/x86,x86_64. Configuration: NUM_CPUS = 10 GROUP_DYNAMIC_MACH_CONSTRAINT = ((SlotID != 4) && (Cpus > 0)) GROUP_NAMES = a,b GROUP_QUOTA_DYNAMIC_a = 0.5 GROUP_QUOTA_DYNAMIC_b = 0.5 GROUP_AUTOREGROUP_a = TRUE GROUP_AUTOREGROUP_b = TRUE ALL_DEBUG = D_FULLDEBUG ========================================================================== $CondorVersion: 7.4.4 Sep 27 2010 BuildID: RH-7.4.4-0.16.el5 PRE-RELEASE $ $CondorPlatform: I386-LINUX_RHEL5 $ # tail -f /var/log/condor/NegotiatorLog | grep -i -E "MACH|sort" 09/30/10 08:27:34 GROUP_DYNAMIC_MACH_CONSTRAINT constraint reduces machine count from 10 to 9 09/30/10 08:27:34 negotiationtime:sorting 09/30/10 08:27:34 Sort : sorting group vector 09/30/10 08:27:34 Sorting : grouparray group b parent -1 child -1 left -1 right -1 i 0 09/30/10 08:27:34 Sort : stage two 09/30/10 08:27:34 midsort : grouparray group parent -1 child 2 left -1 right -1 i 0 09/30/10 08:27:34 midsort : grouparray group a parent 0 child -1 left -1 right 2 i 1 09/30/10 08:27:34 midsort : grouparray group b parent 0 child -1 left 1 right -1 i 2 09/30/10 08:27:34 Sorted : grouparray group parent -1 child 2 left -1 right -1 i 0 09/30/10 08:27:34 Sorted : grouparray group a parent 0 child -1 left -1 right 2 i 1 09/30/10 08:27:34 Sorted : grouparray group b parent 0 child -1 left 1 right -1 i 2 09/30/10 08:27:34 Sort : leaving 09/30/10 08:27:34 negotiationtime: finished sort - slots 9 group auto true quota 1.000000 maxAllowed 9.000000 numsubmits 0 parent -1 child 2 left -1 right -1 i 0 09/30/10 08:27:34 negotiationtime: finished sort - slots 9 group a auto true quota 0.500000 maxAllowed 0.000000 numsubmits 0 parent 0 child -1 left -1 right 2 i 1 09/30/10 08:27:34 negotiationtime: finished sort - slots 9 group b auto true quota 0.500000 maxAllowed 0.000000 numsubmits 0 parent 0 child -1 left 1 right -1 i 2 ========================================================================== $CondorVersion: 7.4.4 Sep 27 2010 BuildID: RH-7.4.4-0.16.el5 PRE-RELEASE $ $CondorPlatform: X86_64-LINUX_RHEL5 $ # tail -f /var/log/condor/NegotiatorLog | grep -i -E "MACH|sort" 09/30/10 09:07:39 GROUP_DYNAMIC_MACH_CONSTRAINT constraint reduces machine count from 10 to 9 09/30/10 09:07:39 negotiationtime:sorting 09/30/10 09:07:39 Sort : sorting group vector 09/30/10 09:07:39 Sorting : grouparray group b parent -1 child -1 left -1 right -1 i 0 09/30/10 09:07:39 Sort : stage two 09/30/10 09:07:39 midsort : grouparray group parent -1 child 2 left -1 right -1 i 0 09/30/10 09:07:39 midsort : grouparray group a parent 0 child -1 left -1 right 2 i 1 09/30/10 09:07:39 midsort : grouparray group b parent 0 child -1 left 1 right -1 i 2 09/30/10 09:07:39 Sorted : grouparray group parent -1 child 2 left -1 right -1 i 0 09/30/10 09:07:39 Sorted : grouparray group a parent 0 child -1 left -1 right 2 i 1 09/30/10 09:07:39 Sorted : grouparray group b parent 0 child -1 left 1 right -1 i 2 09/30/10 09:07:39 Sort : leaving 09/30/10 09:07:39 negotiationtime: finished sort - slots 9 group auto true quota 1.000000 maxAllowed 9.000000 numsubmits 0 parent -1 child 2 left -1 right -1 i 0 09/30/10 09:07:39 negotiationtime: finished sort - slots 9 group a auto true quota 0.500000 maxAllowed 0.000000 numsubmits 0 parent 0 child -1 left -1 right 2 i 1 09/30/10 09:07:39 negotiationtime: finished sort - slots 9 group b auto true quota 0.500000 maxAllowed 0.000000 numsubmits 0 parent 0 child -1 left 1 right -1 i 2 ========================================================================== $CondorVersion: 7.4.4 Sep 27 2010 BuildID: RH-7.4.4-0.16.el4 PRE-RELEASE $ $CondorPlatform: I386-LINUX_RHEL4 $ # tail -f /var/log/condor/NegotiatorLog | grep -i -E "MACH|sort" 09/30/10 09:12:27 GROUP_DYNAMIC_MACH_CONSTRAINT constraint reduces machine count from 10 to 9 09/30/10 09:12:27 negotiationtime:sorting 09/30/10 09:12:27 Sort : sorting group vector 09/30/10 09:12:27 Sorting : grouparray group b parent -1 child -1 left -1 right -1 i 0 09/30/10 09:12:27 Sort : stage two 09/30/10 09:12:27 midsort : grouparray group parent -1 child 2 left -1 right -1 i 0 09/30/10 09:12:27 midsort : grouparray group a parent 0 child -1 left -1 right 2 i 1 09/30/10 09:12:27 midsort : grouparray group b parent 0 child -1 left 1 right -1 i 2 09/30/10 09:12:27 Sorted : grouparray group parent -1 child 2 left -1 right -1 i 0 09/30/10 09:12:27 Sorted : grouparray group a parent 0 child -1 left -1 right 2 i 1 09/30/10 09:12:27 Sorted : grouparray group b parent 0 child -1 left 1 right -1 i 2 09/30/10 09:12:27 Sort : leaving 09/30/10 09:12:27 negotiationtime: finished sort - slots 9 group auto true quota 1.000000 maxAllowed 9.000000 numsubmits 0 parent -1 child 2 left -1 right -1 i 0 09/30/10 09:12:27 negotiationtime: finished sort - slots 9 group a auto true quota 0.500000 maxAllowed 0.000000 numsubmits 0 parent 0 child -1 left -1 right 2 i 1 09/30/10 09:12:27 negotiationtime: finished sort - slots 9 group b auto true quota 0.500000 maxAllowed 0.000000 numsubmits 0 parent 0 child -1 left 1 right -1 i 2 ========================================================================== $CondorVersion: 7.4.4 Sep 27 2010 BuildID: RH-7.4.4-0.16.el4 PRE-RELEASE $ $CondorPlatform: X86_64-LINUX_RHEL4 $ # tail -f /var/log/condor/NegotiatorLog | grep -i -E "MACH|sort" 09/30/10 09:12:58 GROUP_DYNAMIC_MACH_CONSTRAINT constraint reduces machine count from 10 to 9 09/30/10 09:12:58 negotiationtime:sorting 09/30/10 09:12:58 Sort : sorting group vector 09/30/10 09:12:58 Sorting : grouparray group b parent -1 child -1 left -1 right -1 i 0 09/30/10 09:12:58 Sort : stage two 09/30/10 09:12:58 midsort : grouparray group parent -1 child 2 left -1 right -1 i 0 09/30/10 09:12:58 midsort : grouparray group a parent 0 child -1 left -1 right 2 i 1 09/30/10 09:12:58 midsort : grouparray group b parent 0 child -1 left 1 right -1 i 2 09/30/10 09:12:58 Sorted : grouparray group parent -1 child 2 left -1 right -1 i 0 09/30/10 09:12:58 Sorted : grouparray group a parent 0 child -1 left -1 right 2 i 1 09/30/10 09:12:58 Sorted : grouparray group b parent 0 child -1 left 1 right -1 i 2 09/30/10 09:12:58 Sort : leaving 09/30/10 09:12:58 negotiationtime: finished sort - slots 9 group auto true quota 1.000000 maxAllowed 9.000000 numsubmits 0 parent -1 child 2 left -1 right -1 i 0 09/30/10 09:12:58 negotiationtime: finished sort - slots 9 group a auto true quota 0.500000 maxAllowed 0.000000 numsubmits 0 parent 0 child -1 left -1 right 2 i 1 09/30/10 09:12:58 negotiationtime: finished sort - slots 9 group b auto true quota 0.500000 maxAllowed 0.000000 numsubmits 0 parent 0 child -1 left 1 right -1 i 2 Seems to be fixed, it's could be verified after documentation will be available.
Patch included in current version of packages. New BZ639358 for documentation was created. >>> VERIFIED
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Previously, the value from the GROUP_DYNAMIC_MACH_CONSTRAINT calculation in the negotiator was not used, because the assignment for groupArray[0].maxAllowed was prior to the calculation. With this update, this calculation now successfully provides a way to trim the absolute number of assumed matchable slots.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0773.html