Bug 707081

Summary:	groups are not sorted in starvation order
Product:	Red Hat Enterprise MRG	Reporter:	Erik Erlandson <eerlands>
Component:	condor	Assignee:	Erik Erlandson <eerlands>
Status:	CLOSED ERRATA	QA Contact:	Tomas Rusnak <trusnak>
Severity:	medium	Docs Contact:
Priority:	high
Version:	2.0	CC:	claudiol, jneedle, jthomas, matt, mkudlej, trusnak, tstclair
Target Milestone:	2.0.1
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	condor-7.6.2-0.1	Doc Type:	Bug Fix
Doc Text:	Cause: Ordering of accounting groups by "starvation" (usage/allocated) was left out of new Hierarchical Accounting Groups feature. Consequence: Accounting groups that fall later in the list could be starved by groups before them, due to arbitrary ordering. Fix: Sorting of accounting groups by starvation ratio (usage/allocated) prior to negotiation was restored. Result: Accounting groups no longer starved due to arbitrary ordering.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2011-09-07 16:41:20 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	723887

Description Erik Erlandson 2011-05-23 23:04:27 UTC

Description of problem:
It used to be the case that groups were sorted by the fraction of their quota that they were using. Those starving the most were considered first. This behavior was inadvertently dropped in the introduction of hierarchical groups.

This is resulting in some cases of starvation of groups who are unlucky and happen to fall near the end of the order in which negotiation happens.

Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1. To-do: construct good repro example -- eje
2.
3.
  
Actual results:
groups can starve long term

Expected results:
most-starved groups get served first on next cycle

Additional info:

Comment 1 Erik Erlandson 2011-05-23 23:04:42 UTC

upstream: https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2186

Comment 2 Erik Erlandson 2011-05-25 21:56:49 UTC

Fixed upstream on V7_6-branch:
https://condor-wiki.cs.wisc.edu/index.cgi/chngview?cn=21979

Repro/test example is described in detail here:
https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2186

Comment 3 Erik Erlandson 2011-05-25 22:22:59 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause:
Ordering of accounting groups by "starvation" (usage/allocated) was left out of new Hierarchical Accounting Groups feature.

Consequence:
Accounting groups that fall later in the list could be starved by groups before them, due to arbitrary ordering.

Fix:
Sorting of accounting groups by starvation ratio (usage/allocated) prior to negotiation was restored.

Result:
Accounting groups no longer starved due to arbitrary ordering.

Comment 4 Erik Erlandson 2011-05-25 22:46:12 UTC

Repro and test information:


Using this configuration:
$ cat 95.starvation_order.config
NEGOTIATOR_DEBUG = D_FULLDEBUG
NEGOTIATOR_USE_SLOT_WEIGHTS = FALSE
NEGOTIATOR_INTERVAL = 30

SCHEDD_INTERVAL	= 15
CLAIM_WORKLIFE = 0

NUM_CPUS = 10

# turn off round robin and multiple allocation rounds
HFS_ROUND_ROBIN_RATE = 100000000
HFS_MAX_ALLOCATION_ROUNDS = 1

GROUP_NAMES = a, b
GROUP_QUOTA_a = 5
GROUP_QUOTA_b = 5
GROUP_AUTOREGROUP = TRUE


Submit this file, which generates some jobs for groups "a" and "b", which compete for the same five slots (therefore setting up potential starvation), and which have randomized durations 15-45 seconds:

$ cat starvation_order.submit
universe = vanilla
cmd = /bin/sleep
# random sleep durations from 15 to 45 seconds
arguments = $$([15 + random(46)])
# set up "a" and "b" to compete for sub-pool:
Requirements = (SlotID <= 5)
+AccountingGroup = "a.user"
queue 50
+AccountingGroup = "b.user"
queue 50

Before restoring starvation order, we see this behavior on negotiation, where group "a" always negotiates first, and starves group "b":

$ tail -f NegotiatorLog | grep -e 'group quotas: Group.*allocated=.*usage= ' -e 'Round.*totals:'
05/24/11 22:28:34 group quotas: Group a  allocated= 0  usage= 0
05/24/11 22:28:34 group quotas: Group b  allocated= 0  usage= 0
05/24/11 22:28:34 Round 1 totals: allocated= 0  usage= 0
05/24/11 22:28:54 group quotas: Group <none>  allocated= 0  usage= 0
05/24/11 22:28:54 group quotas: Group a  allocated= 5  usage= 5
05/24/11 22:28:54 group quotas: Group b  allocated= 5  usage= 0
05/24/11 22:28:54 Round 1 totals: allocated= 10  usage= 5
05/24/11 22:29:24 group quotas: Group <none>  allocated= 0  usage= 0
05/24/11 22:29:24 group quotas: Group a  allocated= 5  usage= 5
05/24/11 22:29:24 group quotas: Group b  allocated= 5  usage= 0
05/24/11 22:29:24 Round 1 totals: allocated= 10  usage= 5
05/24/11 22:29:55 group quotas: Group <none>  allocated= 0  usage= 0
05/24/11 22:29:55 group quotas: Group a  allocated= 5  usage= 5
05/24/11 22:29:55 group quotas: Group b  allocated= 5  usage= 0
05/24/11 22:29:55 Round 1 totals: allocated= 10  usage= 5
05/24/11 22:30:26 group quotas: Group <none>  allocated= 0  usage= 0
05/24/11 22:30:26 group quotas: Group a  allocated= 5  usage= 5
05/24/11 22:30:26 group quotas: Group b  allocated= 5  usage= 0
05/24/11 22:30:26 Round 1 totals: allocated= 10  usage= 5
...

After patch to restore starvation ordering, we see groups negotiate in changing order, by who is 'most starved': both groups get a balanced allocation of jobs over time:

$ tail -f NegotiatorLog | grep -e 'group quotas: Group.*allocated=.*usage= ' -e 'Round.*totals:' -e 'starvation='
05/24/11 22:35:35 Group a - starvation= 0 (0/5)  prio= 0.5
05/24/11 22:35:36 Group b - starvation= 0 (0/5)  prio= 0.5
05/24/11 22:35:36 Group <none> - starvation= 1.79769e+308 (0/0)  prio= 0.5
05/24/11 22:35:36 group quotas: Group <none>  allocated= 0  usage= 0
05/24/11 22:35:36 group quotas: Group a  allocated= 5  usage= 5
05/24/11 22:35:36 group quotas: Group b  allocated= 5  usage= 0
05/24/11 22:35:36 Round 1 totals: allocated= 10  usage= 5
05/24/11 22:36:06 Group b - starvation= 0 (0/5)  prio= 0.5
05/24/11 22:36:07 Group a - starvation= 0.6 (3/5)  prio= 0.501091
05/24/11 22:36:07 Group <none> - starvation= 1.79769e+308 (0/0)  prio= 0.5
05/24/11 22:36:07 group quotas: Group <none>  allocated= 0  usage= 0
05/24/11 22:36:07 group quotas: Group a  allocated= 5  usage= 3
05/24/11 22:36:07 group quotas: Group b  allocated= 5  usage= 2
05/24/11 22:36:07 Round 1 totals: allocated= 10  usage= 5
05/24/11 22:36:37 Group a - starvation= 0.2 (1/5)  prio= 0.501712
05/24/11 22:36:38 Group b - starvation= 0.4 (2/5)  prio= 0.500365
05/24/11 22:36:38 Group <none> - starvation= 1.79769e+308 (0/0)  prio= 0.5
05/24/11 22:36:38 group quotas: Group <none>  allocated= 0  usage= 0
05/24/11 22:36:38 group quotas: Group a  allocated= 5  usage= 3
05/24/11 22:36:38 group quotas: Group b  allocated= 5  usage= 2
05/24/11 22:36:38 Round 1 totals: allocated= 10  usage= 5
05/24/11 22:37:08 Group b - starvation= 0.2 (1/5)  prio= 0.500738
05/24/11 22:37:09 Group a - starvation= 0.4 (2/5)  prio= 0.502325
05/24/11 22:37:09 Group <none> - starvation= 1.79769e+308 (0/0)  prio= 0.5
05/24/11 22:37:09 group quotas: Group <none>  allocated= 0  usage= 0
05/24/11 22:37:09 group quotas: Group a  allocated= 5  usage= 2
05/24/11 22:37:09 group quotas: Group b  allocated= 5  usage= 3
05/24/11 22:37:09 Round 1 totals: allocated= 10  usage= 5
...

Comment 6 Tomas Rusnak 2011-07-25 14:52:14 UTC

Reproduced on:

$CondorVersion: 7.6.0 Mar 30 2011 BuildID: RH-7.6.0-0.4.el5 PRE-RELEASE-GRID $
$CondorPlatform: X86_64-Redhat_5.6 $


# tail -f /var/log/condor/NegotiatorLog | grep -e 'group quotas: Group.*allocated=.*usage= ' -e 'Round.*totals:'
07/25/11 17:35:56 group quotas: Group <none>  allocated= 0  usage= 0
07/25/11 17:35:56 group quotas: Group a  allocated= 5  usage= 5
07/25/11 17:35:56 group quotas: Group b  allocated= 5  usage= 0
07/25/11 17:35:56 Round 1 totals: allocated= 10  usage= 5
07/25/11 17:36:27 group quotas: Group <none>  allocated= 0  usage= 0
07/25/11 17:36:27 group quotas: Group a  allocated= 5  usage= 5
07/25/11 17:36:27 group quotas: Group b  allocated= 5  usage= 0
07/25/11 17:36:27 Round 1 totals: allocated= 10  usage= 5
07/25/11 17:36:58 group quotas: Group <none>  allocated= 0  usage= 0
07/25/11 17:36:58 group quotas: Group a  allocated= 5  usage= 5
07/25/11 17:36:58 group quotas: Group b  allocated= 5  usage= 0
07/25/11 17:36:58 Round 1 totals: allocated= 10  usage= 5
07/25/11 17:37:31 group quotas: Group <none>  allocated= 0  usage= 0
07/25/11 17:37:31 group quotas: Group a  allocated= 5  usage= 5
07/25/11 17:37:31 group quotas: Group b  allocated= 5  usage= 0
07/25/11 17:37:31 Round 1 totals: allocated= 10  usage= 5
07/25/11 17:38:02 group quotas: Group <none>  allocated= 0  usage= 0
07/25/11 17:38:02 group quotas: Group a  allocated= 5  usage= 5
07/25/11 17:38:02 group quotas: Group b  allocated= 5  usage= 0
07/25/11 17:38:02 Round 1 totals: allocated= 10  usage= 5
07/25/11 17:38:32 group quotas: Group <none>  allocated= 0  usage= 0
07/25/11 17:38:32 group quotas: Group a  allocated= 5  usage= 5
07/25/11 17:38:32 group quotas: Group b  allocated= 5  usage= 0
07/25/11 17:38:32 Round 1 totals: allocated= 10  usage= 5

Comment 7 Tomas Rusnak 2011-07-25 15:03:34 UTC

Restested over all supported platforms x86,x86_64/RHEL5,RHEL6 with:

condor-7.6.3-0.2

#  sudo -u test condor_submit starvation_order.submit && tail -f /var/log/condor/NegotiatorLog | grep -e 'group quotas: Group.*allocated=.*usage= ' -e 'Round.*totals:' -e 'starvation='
Submitting job(s)....................................................................................................
100 job(s) submitted to cluster 364.
07/25/11 16:19:04 Group a - starvation= 3.40282e+38 (0/0)  prio= 0.5
07/25/11 16:19:04 Group b - starvation= 3.40282e+38 (0/0)  prio= 0.5
07/25/11 16:19:04 group quotas: Group <none>  allocated= 0  usage= 0
07/25/11 16:19:04 group quotas: Group a  allocated= 0  usage= 0
07/25/11 16:19:04 group quotas: Group b  allocated= 0  usage= 0
07/25/11 16:19:04 Round 1 totals: allocated= 0  usage= 0
07/25/11 16:19:24 Group a - starvation= 0 (0/5)  prio= 0.5
07/25/11 16:19:25 Group b - starvation= 0 (0/5)  prio= 0.5
07/25/11 16:19:26 Group <none> - starvation= 3.40282e+38 (0/0)  prio= 0.5
07/25/11 16:19:26 group quotas: Group <none>  allocated= 0  usage= 0
07/25/11 16:19:26 group quotas: Group a  allocated= 5  usage= 5
07/25/11 16:19:26 group quotas: Group b  allocated= 5  usage= 0
07/25/11 16:19:26 Round 1 totals: allocated= 10  usage= 5
07/25/11 16:19:56 Group b - starvation= 0 (0/5)  prio= 0.5
07/25/11 16:19:57 Group a - starvation= 0.8 (4/5)  prio= 0.501139
07/25/11 16:19:57 Group <none> - starvation= 3.40282e+38 (0/0)  prio= 0.5
07/25/11 16:19:57 group quotas: Group <none>  allocated= 0  usage= 0
07/25/11 16:19:57 group quotas: Group a  allocated= 5  usage= 4
07/25/11 16:19:57 group quotas: Group b  allocated= 5  usage= 1
07/25/11 16:19:57 Round 1 totals: allocated= 10  usage= 5

The jobs in groups is now balanced in time.

>>> VERIFIED

Comment 8 errata-xmlrpc 2011-09-07 16:41:20 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-1249.html