Bug 785283 - RFE: expose accounting group negotiation-ordering to configuration
Summary: RFE: expose accounting group negotiation-ordering to configuration
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor
Version: 2.2
Hardware: All
OS: Linux
medium
low
Target Milestone: 2.3
: ---
Assignee: Erik Erlandson
QA Contact: Lubos Trilety
URL:
Whiteboard:
Depends On: 884629
Blocks: 785297
TreeView+ depends on / blocked
 
Reported: 2012-01-27 21:15 UTC by Erik Erlandson
Modified: 2013-03-06 18:41 UTC (History)
5 users (show)

Fixed In Version: condor-7.8.2-0.1
Doc Type: Enhancement
Doc Text:
Feature: The ordering of negotiation for accounting groups was exposed to configuration. Reason: The previous hard-coded ordering of accounting group negotiation was preventing customers from obtaining the job matchmaking behavior they required. Result (if any): Configurable group negotiation order enables two specific use cases: 1) Starvation ordering: allowing groups with unfilled quota to negotiate first 2) Enforcing that the root "<none>" group negotiate last, to properly support the autoregroup feature. Additionally, this feature allows a Grid administrator to define custom ordering policies to meet customer-specific needs.
Clone Of:
Environment:
Last Closed: 2013-03-06 18:41:17 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Condor 2678 0 None None None Never
Red Hat Bugzilla 785297 0 low CLOSED Document the GROUP_SORT_EXPR negotiator config macro 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHSA-2013:0564 0 normal SHIPPED_LIVE Low: Red Hat Enterprise MRG Grid 2.3 security update 2013-03-06 23:37:09 UTC

Internal Links: 785297

Description Erik Erlandson 2012-01-27 21:15:33 UTC
Motivation is twofold:

1) Customers have been requesting a return to legacy behavior of "none" group such that it negotiates last (instead of its current behavior of being ordered in with other groups by starvation)

2) There have been some other requests that indicate a desire to explicitly control the negotiation order of accounting groups in customer-dependent ways

The proposed solution to both of these cases is to expose an accounting group ordering expression to configuration that would return some floating point number that would be used for acct groups

Comment 1 Erik Erlandson 2012-01-27 21:19:23 UTC
TESTING:

Using the following configuration:

NEGOTIATOR_DEBUG = D_FULLDEBUG
NEGOTIATOR_USE_SLOT_WEIGHTS = FALSE
NEGOTIATOR_INTERVAL = 30

SCHEDD_INTERVAL	= 15

CLAIM_WORKLIFE = 0

NUM_CPUS = 10

# turn off round robin and multiple allocation rounds
HFS_ROUND_ROBIN_RATE = 100000000
HFS_MAX_ALLOCATION_ROUNDS = 1

GROUP_NAMES = a, b

GROUP_QUOTA_a = 5
GROUP_QUOTA_b = 5

GROUP_AUTOREGROUP = TRUE

# sorts "b" before "a":
GROUP_SORT_EXPR = ifThenElse(Name=?="a", 2, ifThenElse(Name=?="b", 1, 3.40282e+38))

# this should trigger warnings and default to FLT_MAX:
#GROUP_SORT_EXPR = 2+"a"

Bring up the negotiator with the above configuration, and then observe the ordering behavior for group negotiation:

$ tail -f NegotiatorLog | grep -e WARNING -e sortkey
12/07/11 14:44:43 Group b - sortkey= 1
12/07/11 14:44:43 Group a - sortkey= 2
12/07/11 14:44:43 Group <none> - sortkey= 3.40282e+38



Next, enable GROUP_SORT_EXPR = 2+"a", which defaults with warnings:

$ tail -f NegotiatorLog | grep -e WARNING -e sortkey
12/07/11 14:39:41 WARNING: sort expression "2+"a"" failed to evaluate to floating point for group <none> - defaulting to 3.40282e+38
12/07/11 14:39:41 WARNING: sort expression "2+"a"" failed to evaluate to floating point for group a - defaulting to 3.40282e+38
12/07/11 14:39:41 WARNING: sort expression "2+"a"" failed to evaluate to floating point for group b - defaulting to 3.40282e+38
12/07/11 14:39:41 Group <none> - sortkey= 3.40282e+38
12/07/11 14:39:41 Group a - sortkey= 3.40282e+38
12/07/11 14:39:41 Group b - sortkey= 3.40282e+38



Disable the settings for GROUP_SORT_EXPR, and let it take on its default "starvation-order" expr. Submit the following file:

cmd = /bin/sleep
args = 60
should_transfer_files = if_needed
when_to_transfer_output = on_exit
+AccountingGroup="a.user"
queue 5
+AccountingGroup="b.user"
queue 10

You should see something similar to the following sequence, where prior to submission, all values are FLT_MAX, then "a" and "b" have zero starvation-ratio, as their allocation is nonzero but no jobs are yet running. Then their starvation ratios will become 1, as all allocation is filled. Then "a" will empty as its jobs complete first, then finally all ratios will return to FLT_MAX:

$ tail -f NegotiatorLog | grep -e WARNING -e sortkey
12/07/11 15:33:31 Group <none> - sortkey= 3.40282e+38
12/07/11 15:33:31 Group a - sortkey= 3.40282e+38
12/07/11 15:33:31 Group b - sortkey= 3.40282e+38
12/07/11 15:33:51 Group a - sortkey= 0
12/07/11 15:33:52 Group b - sortkey= 0
12/07/11 15:33:52 Group <none> - sortkey= 3.40282e+38
12/07/11 15:34:23 Group a - sortkey= 1
12/07/11 15:34:23 Group b - sortkey= 1
12/07/11 15:34:23 Group <none> - sortkey= 3.40282e+38
12/07/11 15:34:53 Group a - sortkey= 1
12/07/11 15:34:53 Group b - sortkey= 1
12/07/11 15:34:53 Group <none> - sortkey= 3.40282e+38
12/07/11 15:35:23 Group b - sortkey= 0
12/07/11 15:35:24 Group <none> - sortkey= 3.40282e+38
12/07/11 15:35:24 Group a - sortkey= 3.40282e+38
12/07/11 15:35:54 Group b - sortkey= 1
12/07/11 15:35:54 Group <none> - sortkey= 3.40282e+38
12/07/11 15:35:54 Group a - sortkey= 3.40282e+38
12/07/11 15:36:24 Group b - sortkey= 1
12/07/11 15:36:24 Group <none> - sortkey= 3.40282e+38
12/07/11 15:36:24 Group a - sortkey= 3.40282e+38
12/07/11 15:36:55 Group <none> - sortkey= 3.40282e+38
12/07/11 15:36:55 Group a - sortkey= 3.40282e+38
12/07/11 15:36:55 Group b - sortkey= 3.40282e+38

Comment 2 Erik Erlandson 2012-01-27 21:25:43 UTC
In the test above, you should use AccountingGroup instead of Name for the attribute here:
# sorts "b" before "a":
GROUP_SORT_EXPR = ifThenElse(AccountingGroup=?="a", 2, ifThenElse(AccountingGroup=?="b", 1,
3.40282e+38))

Comment 4 Lubos Trilety 2012-03-29 11:50:24 UTC
I tried given scenario on condor-7.6.7-0.7, in the starvation mode even group '<none>' was set to zero after submit of the jobs. It stays to be zero for the rest of test. According to the provided information only groups 'a' and 'b' should be zeroed for small period of time before all allocation is filled.

Comment 5 Erik Erlandson 2012-04-05 22:15:44 UTC
(In reply to comment #4)
> I tried given scenario on condor-7.6.7-0.7, in the starvation mode even group
> '<none>' was set to zero after submit of the jobs. It stays to be zero for the
> rest of test. According to the provided information only groups 'a' and 'b'
> should be zeroed for small period of time before all allocation is filled.

Sorry, the problem is that the semantic of GROUP_AUTOREGROUP was altered on gt2679.

Tweak the test config above to turn off autoregroup and surplus:
GROUP_AUTOREGROUP = FALSE
GROUP_ACCEPT_SURPLUS = FALSE

That restored the expected testing behavior.

Comment 8 Erik Erlandson 2013-01-09 15:49:31 UTC
(In reply to comment #1)

> You should see something similar to the following sequence, where prior to
> submission, all values are FLT_MAX, then "a" and "b" have zero
> starvation-ratio, as their allocation is nonzero but no jobs are yet
> running. Then their starvation ratios will become 1, as all allocation is
> filled. Then "a" will empty as its jobs complete first, then finally all
> ratios will return to FLT_MAX:


Since this ticket was originally created, the default for GROUP_SORT_EXPR has been changed to:

     ifThenElse(AccountingGroup=?="<none>",3.4e+38,ifThenElse(GroupQuota>0,GroupResourcesInUse/GroupQuota,3.3e+38))

The impact of this is that the 'ground state' for group sort keys is now zero instead of FLT_MAX, due to the change in denominator from GroupResourcesAllocated (which can be zero) to GroupQuota (which is not zero)  

Since the change occurred prior to our releasing this feature, there will be no errata required from the change.

For further explanation, see also:
https://bugzilla.redhat.com/show_bug.cgi?id=884629#c2

Note, GroupQuota *can* be zero if the negotiator cycles prior to any slot ads appearing in the collector, so FLT_MAX may appear briefly before startds check in with collector.

Comment 10 Martin Kudlej 2013-02-06 08:59:10 UTC
Tested on RHEL 5.9/6.4 x i386/x86_64 with condor-7.8.8-0.4.1 and it works. -->VERIFIED

Comment 12 errata-xmlrpc 2013-03-06 18:41:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0564.html


Note You need to log in before you can comment on or make changes to this bug.