Description of problem: There is bigger submitter limit in negotiator log than it should be. Version-Release number of selected component (if applicable): condor-7.6.7-0.7 How reproducible: 100% Steps to Reproduce: Configuration: $ more 95.bz706512.config NEGOTIATOR_CONSIDER_PREEMPTION = FALSE NUM_CPUS = 10 GROUP_NAMES = a, b GROUP_QUOTA_a = 5 GROUP_QUOTA_b = 5 Test uses two submissions. The first submits a single job under submitter "a.u1": $ more a.u1.jsub universe = vanilla cmd = /bin/sleep args = 300 should_transfer_files = if_needed when_to_transfer_output = on_exit +AccountingGroup="a.u1" queue 1 The second will submit 4 or more jobs for submitter "a.u2": $ more a.u2.jsub universe = vanilla cmd = /bin/sleep args = 300 should_transfer_files = if_needed when_to_transfer_output = on_exit +AccountingGroup="a.u2" queue 4 The test is to first submit "a.u1.jsub", and let it negotiate. After it has negotiated, submit "a.u2.jsub." $ tail -f NegotiatorLog | grep -e 'Phase 4..:' -e 'Negotiating with.* at' -e 'submitterLimit *=' 03/21/12 05:59:43 Phase 4.1: Negotiating with schedds ... 03/21/12 05:59:43 Negotiating with a.u1@host at <IP:57823> 03/21/12 05:59:43 submitterLimit = 1.000000 03/21/12 06:08:19 Phase 4.1: Negotiating with schedds ... 03/21/12 06:08:19 Negotiating with a.u2@host at <IP:57823> 03/21/12 06:08:19 submitterLimit = 5.000000 Actual results: Submitter limit is equal to 5.000000 Expected results: There should be 'submitterLimit = 4.000000' Additional info: The number of running jobs under the group 'a' does not exceed group quota if there is 5 jobs submitted in second submission. There are following lines in negotiator log: # cat NegotiatorLog ... 03/21/12 06:08:19 matchmakingAlgorithm: limit 5.000000 used 4.000000 pieLeft 1.000000 03/21/12 06:08:19 Attempting to use cached MatchList: Failed (MatchList length: 0, Autocluster: 1, Schedd Name: a.u2.eng.bos.redhat.com, Schedd Address: <10.16.64.239:57823>) 03/21/12 06:08:19 Rejected 2.4 a.u2.eng.bos.redhat.com <10.16.64.239:57823>: group quota exceeded 03/21/12 06:08:19 Hit submitter limit: done negotiating 03/21/12 06:08:19 This submitter hit its submitterLimit. 03/21/12 06:08:19 resources used scheddUsed= 4.000000 03/21/12 06:08:19 Group a is using its quota 5 - halting negotiation 03/21/12 06:08:19 negotiateWithGroup resources used scheddAds length 1 ...
fix pushed to UPSTREAM-7.9.0-BZ805448-submitter-limits
REPRO/TEST. Configuration: NEGOTIATOR_CONSIDER_PREEMPTION = FALSE NUM_CPUS = 5 GROUP_NAMES = a GROUP_QUOTA_a = 5 Spin up pool. Watch negotiator log: $ tail -f NegotiatorLog | grep -e 'Phase 4..:' -e 'Negotiating with.* at' -e 'submitterLimit *=' Submit two jobs. Allow the first to negotiate before submitting the second: universe = vanilla cmd = /bin/sleep args = 300 should_transfer_files = if_needed when_to_transfer_output = on_exit +AccountingGroup="a.u1" queue 1 universe = vanilla cmd = /bin/sleep args = 300 should_transfer_files = if_needed when_to_transfer_output = on_exit +AccountingGroup="a.u2" queue 4 BEFORE FIX, you should see the following (2nd submitter limit is 5) $ tail -f NegotiatorLog | grep -e 'Phase 4..:' -e 'Negotiating with.* at' -e 'submitterLimit *=' 04/25/12 10:08:20 Phase 4.1: Negotiating with schedds ... 04/25/12 10:08:20 Negotiating with a.u1@localdomain at <192.168.1.2:33501> 04/25/12 10:08:20 submitterLimit = 1.000000 04/25/12 10:08:40 Phase 4.1: Negotiating with schedds ... 04/25/12 10:08:40 Negotiating with a.u2@localdomain at <192.168.1.2:33501> 04/25/12 10:08:40 submitterLimit = 5.000000 AFTER FIX, you should see (2nd submitter limit is 4) $ tail -f NegotiatorLog | grep -e 'Phase 4..:' -e 'Negotiating with.* at' -e 'submitterLimit *=' 04/25/12 10:09:47 Phase 4.1: Negotiating with schedds ... 04/25/12 10:09:47 Negotiating with a.u1@localdomain at <192.168.1.2:54831> 04/25/12 10:09:47 submitterLimit = 1.000000 04/25/12 10:10:07 Phase 4.1: Negotiating with schedds ... 04/25/12 10:10:07 Negotiating with a.u2@localdomain at <192.168.1.2:54831> 04/25/12 10:10:07 submitterLimit = 4.000000
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: Previously enhanced logic for computing submitter limits did not take NEGOTIATOR_CONSIDER_PREEMPTION setting into account. Consequence: When NEGOTIATOR_CONSIDER_PREEMPTION was set to false, submitter limits were not as tight as possible, resulting in some inefficiency Fix: Logic was updated to take consider-preemption settings into account Result: Submitter limits are tighter when consider-preemption setting is off.
Successfully reproduced on condor-7.6.7-0.7 Tested with condor-7.8.7-0.4 Tested on: RHEL5 x86_64,i386 RHEL6 x86_64,i386 # tail -f NegotiatorLog | grep -e 'Phase 4..:' -e 'Negotiating with.* at' -e 'submitterLimit *=' 11/12/12 19:40:15 Phase 4.1: Negotiating with schedds ... 11/12/12 19:40:15 Negotiating with a.u1@host at <IP:35649> 11/12/12 19:40:15 submitterLimit = 5.000000 11/12/12 19:40:39 Phase 4.1: Negotiating with schedds ... 11/12/12 19:40:39 Negotiating with a.u2@host at <IP:35649> 11/12/12 19:40:39 submitterLimit = 4.000000 The submitterLimit is now 4 as it should be, no errors group quota exceeded in NegotiatorLog. >>> verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-0564.html