Artifact of 678590 and 706512 when using round robin that leads to entering negotiateWithGroup with no submitters attached to the group. 06/07/11 21:20:38 Group <removed> - BEGIN NEGOTIATION 06/07/11 21:20:38 Phase 3: Sorting submitter ads by priority ... 06/07/11 21:20:38 Phase 4.1: Negotiating with schedds ... 06/07/11 21:20:38 numSlots = 365 06/07/11 21:20:38 slotWeightTotal = 365.000000 06/07/11 21:20:38 pieLeft = 0.000 06/07/11 21:20:38 NormalFactor = 0.000000 06/07/11 21:20:38 MaxPrioValue = 0.000000 06/07/11 21:20:38 NumSubmitterAds = 0 06/07/11 21:20:38 resources used by are 0.000000 06/07/11 21:20:38 resources used scheddused 0.000000 groupUsed 0.000000 06/07/11 21:20:38 negotiateWithGroup resources used scheddAds length 0 upstream: https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2230
Created attachment 504551 [details] patch to skip when no submitterAds
Test/Repro: Using configuration: CLAIM_WORKLIFE = 0 NEGOTIATOR_CONSIDER_PREEMPTION = FALSE NEGOTIATOR_DEBUG = D_FULLDEBUG NEGOTIATOR_INTERVAL = 30 SCHEDD_INTERVAL = 15 NUM_CPUS = 10 GROUP_NAMES = a, b GROUP_QUOTA_a = 5 GROUP_QUOTA_b = 5 GROUP_QUOTA_ROUND_ROBIN_RATE = 1 Spool up the test pool, and then follow along on negotiator log: $ tail -f NegotiatorLog | grep -e 'RR iter' -e 'skipping, no sub' -e 'Filtering.*no idle jobs' Submit this job: universe = vanilla cmd = /bin/sleep args = 20 should_transfer_files = if_needed when_to_transfer_output = on_exit +AccountingGroup="a.u1" queue 2 After the fix, you should first see it go through 2 RR iterations. Then, after the job completes, you see "a.u1" removed on first RR iteration, followed by it skipping the subsequent iteration. Note, it may occasionally get correct up-to-date values from the collector, in which case the fix will not execute, so if you do not see the behavior after job completes, try again. [root@rorschach log]$ tail -f NegotiatorLog | grep -e 'RR iter' -e 'skipping, no sub' -e 'Filtering.*no idle jobs' # First negotiation (two RR iterations) 06/15/11 15:24:58 group quotas: entering RR iteration n= 1 06/15/11 15:24:59 group quotas: entering RR iteration n= 2 # ... # After job completes, you see this: 06/15/11 15:25:29 group quotas: entering RR iteration n= 1 06/15/11 15:25:29 Filtering submitter a.u1@localdomain with no idle jobs 06/15/11 15:25:29 group quotas: entering RR iteration n= 2 06/15/11 15:25:29 Group a - skipping, no submitters (usage=0) Before the fix, you should see two RR iterations on job completion (no skipping message) (unless the collector got lucky and had up-to-date values)
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: Logic that removes submitters with no idle jobs in the negotiator, interacting with multiple round robin iterations and collector update latencies Consequence: On multiple round robin iterations, groups can be sent to negotiation with no submitters, which wastes effort. Fix: Check was added to skip groups having no submitters. Result: Groups with no submitter no longer waste time in negotiation group.
Fix pending upstream, targeted for 7.6.2
Reproduced on: $CondorVersion: 7.6.1 Jun 02 2011 BuildID: RH-7.6.1-0.10.el5 $ $CondorPlatform: X86_64-RedHat_5.6 $ NegotiatorLog: 07/26/11 18:58:25 group quotas: group= a quota= 5 requested= 2 allocated= 2 unallocated= 0 07/26/11 18:58:25 group quotas: group= b quota= 5 requested= 0 allocated= 0 unallocated= 0 07/26/11 18:58:25 group quotas: groups= 3 requesting= 1 served= 1 unserved= 0 slots= 10 requested= 2 allocated= 2 surplus= 8 07/26/11 18:58:25 group quotas: entering RR iteration n= 1 07/26/11 18:58:25 Group <none> - skipping, zero slots allocated 07/26/11 18:58:25 Group a - BEGIN NEGOTIATION 07/26/11 18:58:25 Phase 3: Sorting submitter ads by priority ... ... 07/26/11 18:58:25 negotiateWithGroup resources used scheddAds length 0 07/26/11 18:58:25 Group b - skipping, zero slots allocate 07/26/11 18:58:25 group quotas: entering RR iteration n= 2 07/26/11 18:58:25 group quotas: entering RR iteration n= 2 07/26/11 18:58:25 Group <none> - skipping, zero slots allocated 07/26/11 18:58:25 Group a - BEGIN NEGOTIATION 07/26/11 18:58:25 Phase 3: Sorting submitter ads by priority ... ... 07/26/11 18:58:25 group quotas: group= b quota= 5 requested= 0 allocated= 0 unallocated= 0 07/26/11 18:58:25 group quotas: groups= 3 requesting= 0 served= 0 unserved= 0 slots= 10 requested= 0 allocated= 0 surplus= 10 07/26/11 18:58:25 group quotas: entering RR iteration n= 0 07/26/11 18:58:25 Group <none> - skipping, zero slots allocated 07/26/11 18:58:25 Group a - skipping, zero slots allocated
Retested over all supported platforms x86,x86_64/RHEL5,RHEL6 with: condor-7.6.3-0.2 # sudo -u test condor_submit groupwosubmitter.job && tail -f /var/log/condor/NegotiatorLog | grep -e 'RR iter' -e 'skipping, no sub' -e 'Filtering.*no idle jobs' Submitting job(s).. 2 job(s) submitted to cluster 366. 07/27/11 09:27:06 group quotas: entering RR iteration n= 1 07/27/11 09:27:06 group quotas: entering RR iteration n= 2 07/27/11 09:27:37 group quotas: entering RR iteration n= 1 07/27/11 09:27:37 group quotas: entering RR iteration n= 2 07/27/11 09:27:37 Group a - skipping, no submitters (usage=0) 07/27/11 09:27:37 group quotas: entering RR iteration n= 0 Group negotiating without submitter was skipped. >>> VERIFIED
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-1249.html