Bug 785283
| Summary: | RFE: expose accounting group negotiation-ordering to configuration | ||
|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | Erik Erlandson <eerlands> |
| Component: | condor | Assignee: | Erik Erlandson <eerlands> |
| Status: | CLOSED ERRATA | QA Contact: | Lubos Trilety <ltrilety> |
| Severity: | low | Docs Contact: | |
| Priority: | medium | ||
| Version: | 2.2 | CC: | ltrilety, matt, mkudlej, rrati, tstclair |
| Target Milestone: | 2.3 | Keywords: | FutureFeature |
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | condor-7.8.2-0.1 | Doc Type: | Enhancement |
| Doc Text: |
Feature: The ordering of negotiation for accounting groups was exposed to configuration.
Reason: The previous hard-coded ordering of accounting group negotiation was preventing customers from obtaining the job matchmaking behavior they required.
Result (if any):
Configurable group negotiation order enables two specific use cases:
1) Starvation ordering: allowing groups with unfilled quota to negotiate first
2) Enforcing that the root "<none>" group negotiate last, to properly support the autoregroup feature.
Additionally, this feature allows a Grid administrator to define custom ordering policies to meet customer-specific needs.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2013-03-06 18:41:17 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 884629 | ||
| Bug Blocks: | 785297 | ||
|
Description
Erik Erlandson
2012-01-27 21:15:33 UTC
TESTING: Using the following configuration: NEGOTIATOR_DEBUG = D_FULLDEBUG NEGOTIATOR_USE_SLOT_WEIGHTS = FALSE NEGOTIATOR_INTERVAL = 30 SCHEDD_INTERVAL = 15 CLAIM_WORKLIFE = 0 NUM_CPUS = 10 # turn off round robin and multiple allocation rounds HFS_ROUND_ROBIN_RATE = 100000000 HFS_MAX_ALLOCATION_ROUNDS = 1 GROUP_NAMES = a, b GROUP_QUOTA_a = 5 GROUP_QUOTA_b = 5 GROUP_AUTOREGROUP = TRUE # sorts "b" before "a": GROUP_SORT_EXPR = ifThenElse(Name=?="a", 2, ifThenElse(Name=?="b", 1, 3.40282e+38)) # this should trigger warnings and default to FLT_MAX: #GROUP_SORT_EXPR = 2+"a" Bring up the negotiator with the above configuration, and then observe the ordering behavior for group negotiation: $ tail -f NegotiatorLog | grep -e WARNING -e sortkey 12/07/11 14:44:43 Group b - sortkey= 1 12/07/11 14:44:43 Group a - sortkey= 2 12/07/11 14:44:43 Group <none> - sortkey= 3.40282e+38 Next, enable GROUP_SORT_EXPR = 2+"a", which defaults with warnings: $ tail -f NegotiatorLog | grep -e WARNING -e sortkey 12/07/11 14:39:41 WARNING: sort expression "2+"a"" failed to evaluate to floating point for group <none> - defaulting to 3.40282e+38 12/07/11 14:39:41 WARNING: sort expression "2+"a"" failed to evaluate to floating point for group a - defaulting to 3.40282e+38 12/07/11 14:39:41 WARNING: sort expression "2+"a"" failed to evaluate to floating point for group b - defaulting to 3.40282e+38 12/07/11 14:39:41 Group <none> - sortkey= 3.40282e+38 12/07/11 14:39:41 Group a - sortkey= 3.40282e+38 12/07/11 14:39:41 Group b - sortkey= 3.40282e+38 Disable the settings for GROUP_SORT_EXPR, and let it take on its default "starvation-order" expr. Submit the following file: cmd = /bin/sleep args = 60 should_transfer_files = if_needed when_to_transfer_output = on_exit +AccountingGroup="a.user" queue 5 +AccountingGroup="b.user" queue 10 You should see something similar to the following sequence, where prior to submission, all values are FLT_MAX, then "a" and "b" have zero starvation-ratio, as their allocation is nonzero but no jobs are yet running. Then their starvation ratios will become 1, as all allocation is filled. Then "a" will empty as its jobs complete first, then finally all ratios will return to FLT_MAX: $ tail -f NegotiatorLog | grep -e WARNING -e sortkey 12/07/11 15:33:31 Group <none> - sortkey= 3.40282e+38 12/07/11 15:33:31 Group a - sortkey= 3.40282e+38 12/07/11 15:33:31 Group b - sortkey= 3.40282e+38 12/07/11 15:33:51 Group a - sortkey= 0 12/07/11 15:33:52 Group b - sortkey= 0 12/07/11 15:33:52 Group <none> - sortkey= 3.40282e+38 12/07/11 15:34:23 Group a - sortkey= 1 12/07/11 15:34:23 Group b - sortkey= 1 12/07/11 15:34:23 Group <none> - sortkey= 3.40282e+38 12/07/11 15:34:53 Group a - sortkey= 1 12/07/11 15:34:53 Group b - sortkey= 1 12/07/11 15:34:53 Group <none> - sortkey= 3.40282e+38 12/07/11 15:35:23 Group b - sortkey= 0 12/07/11 15:35:24 Group <none> - sortkey= 3.40282e+38 12/07/11 15:35:24 Group a - sortkey= 3.40282e+38 12/07/11 15:35:54 Group b - sortkey= 1 12/07/11 15:35:54 Group <none> - sortkey= 3.40282e+38 12/07/11 15:35:54 Group a - sortkey= 3.40282e+38 12/07/11 15:36:24 Group b - sortkey= 1 12/07/11 15:36:24 Group <none> - sortkey= 3.40282e+38 12/07/11 15:36:24 Group a - sortkey= 3.40282e+38 12/07/11 15:36:55 Group <none> - sortkey= 3.40282e+38 12/07/11 15:36:55 Group a - sortkey= 3.40282e+38 12/07/11 15:36:55 Group b - sortkey= 3.40282e+38 In the test above, you should use AccountingGroup instead of Name for the attribute here: # sorts "b" before "a": GROUP_SORT_EXPR = ifThenElse(AccountingGroup=?="a", 2, ifThenElse(AccountingGroup=?="b", 1, 3.40282e+38)) I tried given scenario on condor-7.6.7-0.7, in the starvation mode even group '<none>' was set to zero after submit of the jobs. It stays to be zero for the rest of test. According to the provided information only groups 'a' and 'b' should be zeroed for small period of time before all allocation is filled. (In reply to comment #4) > I tried given scenario on condor-7.6.7-0.7, in the starvation mode even group > '<none>' was set to zero after submit of the jobs. It stays to be zero for the > rest of test. According to the provided information only groups 'a' and 'b' > should be zeroed for small period of time before all allocation is filled. Sorry, the problem is that the semantic of GROUP_AUTOREGROUP was altered on gt2679. Tweak the test config above to turn off autoregroup and surplus: GROUP_AUTOREGROUP = FALSE GROUP_ACCEPT_SURPLUS = FALSE That restored the expected testing behavior. (In reply to comment #1) > You should see something similar to the following sequence, where prior to > submission, all values are FLT_MAX, then "a" and "b" have zero > starvation-ratio, as their allocation is nonzero but no jobs are yet > running. Then their starvation ratios will become 1, as all allocation is > filled. Then "a" will empty as its jobs complete first, then finally all > ratios will return to FLT_MAX: Since this ticket was originally created, the default for GROUP_SORT_EXPR has been changed to: ifThenElse(AccountingGroup=?="<none>",3.4e+38,ifThenElse(GroupQuota>0,GroupResourcesInUse/GroupQuota,3.3e+38)) The impact of this is that the 'ground state' for group sort keys is now zero instead of FLT_MAX, due to the change in denominator from GroupResourcesAllocated (which can be zero) to GroupQuota (which is not zero) Since the change occurred prior to our releasing this feature, there will be no errata required from the change. For further explanation, see also: https://bugzilla.redhat.com/show_bug.cgi?id=884629#c2 Note, GroupQuota *can* be zero if the negotiator cycles prior to any slot ads appearing in the collector, so FLT_MAX may appear briefly before startds check in with collector. Tested on RHEL 5.9/6.4 x i386/x86_64 with condor-7.8.8-0.4.1 and it works. -->VERIFIED Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-0564.html |