Bug 712972 - negotiating groups with no submitters
Summary: negotiating groups with no submitters
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor
Version: 2.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: 2.0.1
: ---
Assignee: Erik Erlandson
QA Contact: Tomas Rusnak
URL:
Whiteboard:
Depends On:
Blocks: 723887
TreeView+ depends on / blocked
 
Reported: 2011-06-13 18:57 UTC by Jon Thomas
Modified: 2012-11-16 10:00 UTC (History)
6 users (show)

Fixed In Version: condor-7.6.2-0.2
Doc Type: Bug Fix
Doc Text:
Cause: Logic that removes submitters with no idle jobs in the negotiator, interacting with multiple round robin iterations and collector update latencies Consequence: On multiple round robin iterations, groups can be sent to negotiation with no submitters, which wastes effort. Fix: Check was added to skip groups having no submitters. Result: Groups with no submitter no longer waste time in negotiation group.
Clone Of:
Environment:
Last Closed: 2011-09-07 16:43:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
patch to skip when no submitterAds (1.40 KB, patch)
2011-06-13 21:24 UTC, Jon Thomas
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:1249 0 normal SHIPPED_LIVE Moderate: Red Hat Enterprise MRG Grid 2.0 security, bug fix and enhancement update 2011-09-07 16:40:45 UTC

Description Jon Thomas 2011-06-13 18:57:54 UTC
Artifact of 678590 and 706512 when using round robin that leads to entering negotiateWithGroup with no submitters attached to the group.


06/07/11 21:20:38 Group <removed> - BEGIN NEGOTIATION
06/07/11 21:20:38 Phase 3:  Sorting submitter ads by priority ...
06/07/11 21:20:38 Phase 4.1:  Negotiating with schedds ...
06/07/11 21:20:38     numSlots = 365
06/07/11 21:20:38     slotWeightTotal = 365.000000
06/07/11 21:20:38     pieLeft = 0.000
06/07/11 21:20:38     NormalFactor = 0.000000
06/07/11 21:20:38     MaxPrioValue = 0.000000
06/07/11 21:20:38     NumSubmitterAds = 0
06/07/11 21:20:38  resources used by  are 0.000000
06/07/11 21:20:38  resources used scheddused 0.000000 groupUsed 0.000000
06/07/11 21:20:38  negotiateWithGroup resources used scheddAds length 0 

upstream: https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2230

Comment 1 Jon Thomas 2011-06-13 21:24:17 UTC
Created attachment 504551 [details]
patch to skip when no submitterAds

Comment 2 Erik Erlandson 2011-06-15 23:34:06 UTC
Test/Repro:

Using configuration:

CLAIM_WORKLIFE = 0
NEGOTIATOR_CONSIDER_PREEMPTION = FALSE
NEGOTIATOR_DEBUG = D_FULLDEBUG

NEGOTIATOR_INTERVAL = 30
SCHEDD_INTERVAL = 15

NUM_CPUS = 10

GROUP_NAMES = a, b
GROUP_QUOTA_a = 5
GROUP_QUOTA_b = 5

GROUP_QUOTA_ROUND_ROBIN_RATE = 1

Spool up the test pool, and then follow along on negotiator log:

$ tail -f NegotiatorLog | grep -e 'RR iter' -e 'skipping, no sub' -e 'Filtering.*no idle jobs'

Submit this job:

universe = vanilla
cmd = /bin/sleep
args = 20
should_transfer_files = if_needed
when_to_transfer_output = on_exit
+AccountingGroup="a.u1"
queue 2

After the fix, you should first see it go through 2 RR iterations. Then, after the job completes, you see "a.u1" removed on first RR iteration, followed by it skipping the subsequent iteration. Note, it may occasionally get correct up-to-date values from the collector, in which case the fix will not execute, so if you do not see the behavior after job completes, try again.

[root@rorschach log]$ tail -f NegotiatorLog | grep -e 'RR iter' -e 'skipping, no sub' -e 'Filtering.*no idle jobs'
# First negotiation (two RR iterations)
06/15/11 15:24:58 group quotas: entering RR iteration n= 1
06/15/11 15:24:59 group quotas: entering RR iteration n= 2
# ...
# After job completes, you see this:
06/15/11 15:25:29 group quotas: entering RR iteration n= 1
06/15/11 15:25:29 Filtering submitter a.u1@localdomain with no idle jobs
06/15/11 15:25:29 group quotas: entering RR iteration n= 2
06/15/11 15:25:29 Group a - skipping, no submitters (usage=0)

Before the fix, you should see two RR iterations on job completion (no skipping message) (unless the collector got lucky and had up-to-date values)

Comment 3 Erik Erlandson 2011-06-15 23:34:06 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause:
Logic that removes submitters with no idle jobs in the negotiator, interacting with multiple round robin iterations and collector update latencies

Consequence:
On multiple round robin iterations, groups can be sent to negotiation with no submitters, which wastes effort.

Fix:
Check was added to skip groups having no submitters.

Result:
Groups with no submitter no longer waste time in negotiation group.

Comment 4 Erik Erlandson 2011-06-17 00:26:40 UTC
Fix pending upstream, targeted for 7.6.2

Comment 6 Tomas Rusnak 2011-07-26 16:20:02 UTC
Reproduced on:

$CondorVersion: 7.6.1 Jun 02 2011 BuildID: RH-7.6.1-0.10.el5 $
$CondorPlatform: X86_64-RedHat_5.6 $

NegotiatorLog:

07/26/11 18:58:25 group quotas: group= a  quota= 5  requested= 2  allocated= 2  unallocated= 0
07/26/11 18:58:25 group quotas: group= b  quota= 5  requested= 0  allocated= 0  unallocated= 0
07/26/11 18:58:25 group quotas: groups= 3  requesting= 1  served= 1  unserved= 0  slots= 10  requested= 2  allocated= 2  surplus= 8
07/26/11 18:58:25 group quotas: entering RR iteration n= 1
07/26/11 18:58:25 Group <none> - skipping, zero slots allocated
07/26/11 18:58:25 Group a - BEGIN NEGOTIATION
07/26/11 18:58:25 Phase 3:  Sorting submitter ads by priority ...
...
07/26/11 18:58:25  negotiateWithGroup resources used scheddAds length 0
07/26/11 18:58:25 Group b - skipping, zero slots allocate
07/26/11 18:58:25 group quotas: entering RR iteration n= 2
07/26/11 18:58:25 group quotas: entering RR iteration n= 2
07/26/11 18:58:25 Group <none> - skipping, zero slots allocated
07/26/11 18:58:25 Group a - BEGIN NEGOTIATION
07/26/11 18:58:25 Phase 3:  Sorting submitter ads by priority ... 
...
07/26/11 18:58:25 group quotas: group= b  quota= 5  requested= 0  allocated= 0  unallocated= 0
07/26/11 18:58:25 group quotas: groups= 3  requesting= 0  served= 0  unserved= 0  slots= 10  requested= 0  allocated= 0  surplus= 10
07/26/11 18:58:25 group quotas: entering RR iteration n= 0
07/26/11 18:58:25 Group <none> - skipping, zero slots allocated
07/26/11 18:58:25 Group a - skipping, zero slots allocated

Comment 7 Tomas Rusnak 2011-07-27 08:36:29 UTC
Retested over all supported platforms x86,x86_64/RHEL5,RHEL6 with:

condor-7.6.3-0.2

# sudo -u test condor_submit groupwosubmitter.job && tail -f /var/log/condor/NegotiatorLog | grep -e 'RR iter' -e 'skipping, no sub' -e 'Filtering.*no idle jobs'
Submitting job(s)..
2 job(s) submitted to cluster 366.
07/27/11 09:27:06 group quotas: entering RR iteration n= 1
07/27/11 09:27:06 group quotas: entering RR iteration n= 2
07/27/11 09:27:37 group quotas: entering RR iteration n= 1
07/27/11 09:27:37 group quotas: entering RR iteration n= 2
07/27/11 09:27:37 Group a - skipping, no submitters (usage=0)
07/27/11 09:27:37 group quotas: entering RR iteration n= 0

Group negotiating without submitter was skipped.

>>> VERIFIED

Comment 8 errata-xmlrpc 2011-09-07 16:43:07 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-1249.html


Note You need to log in before you can comment on or make changes to this bug.