Description of problem: LastNegotiationCycleSubmittersShareLimitX currently only counts submitters which specifically hit the submitter limit. This does not necessarily include submitters which run up against the accounting group limit. The statistic would be more intuitive if it counted the union of these cases.
Pushed a branch V7_4-BZ641431-unify-share-group-limits, which updates the semantic of LastNegotiationCycleSubmittersShareLimitX to include limits imposed by acct groups as well as share limits, which makes the statistic more intuitive. Also, I included this update upstream on the latest patch for gt#1393
What are the step to verify this issue, please?
(In reply to comment #3) > What are the step to verify this issue, please? Should be able to verify by setting up a group with quota (N) and submitting (N+1) jobs against it -- the submitter should show up in the LastNegotiationCycleSubmittersShareLimitX list.
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause The LastNegotiationCycleSubmittersShareLimit<N> negotiator classad stat attribute did not include submitters hitting share limits in a group-quota scenario. Consequence The statistic did not count all submitters hitting limits when accounting groups were in use. Change Logic was added to include submitter names in the attribute when a submitter hit its limit in the context of a group quota limit. Result The LastNegotiationCycleSubmittersShareLimit<N> attribute now includes submitters hitting submitter limits in all cases, including group quota limits.
Successfully reproduced on: $CondorVersion: 7.4.5 Feb 4 2011 BuildID: RH-7.4.5-0.8.el5 PRE-RELEASE $ $CondorPlatform: X86_64-LINUX_RHEL5 $ Scenario: config file: NUM_CPUS = 2 GROUP_NAMES = a, b GROUP_QUOTA_DYNAMIC_a = 0.5 GROUP_QUOTA_DYNAMIC_b = 0.5 # echo -e "universe=vanilla\ncmd=/bin/sleep\nargs=1d\n+AccountingGroup=\"a.u3\"\nqueue\n+AccountingGroup=\"a.u1\"\nqueue\n+AccountingGroup=\"a.u2\"\nqueue\n" | runuser condor -s /bin/bash -c "condor_submit" Submitting job(s)...... 6 job(s) submitted to cluster 1. # condor_q -- Submitter: hostname : <IP:39251> : hostname ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 1.0 condor 5/5 17:14 0+00:00:00 I 0 0.0 sleep 1d 1.1 condor 5/5 17:14 0+00:00:03 R 0 0.0 sleep 1d 1.2 condor 5/5 17:14 0+00:00:00 I 0 0.0 sleep 1d 3 jobs; 2 idle, 1 running, 0 held # condor_status -subsystem negotiator -l | grep SubmittersShareLimit LastNegotiationCycleSubmittersShareLimit0 = "" LastNegotiationCycleSubmittersShareLimit1 = "" LastNegotiationCycleSubmittersShareLimit2 = ""
Tested on: $CondorVersion: 7.6.1 Apr 27 2011 BuildID: RH-7.6.1-0.4.el5 $ $CondorPlatform: I686-RedHat_5.6 $ $CondorVersion: 7.6.1 Apr 27 2011 BuildID: RH-7.6.1-0.4.el5 $ $CondorPlatform: X86_64-RedHat_5.6 $ $CondorVersion: 7.6.1 Apr 27 2011 BuildID: RH-7.6.1-0.4.el6 $ $CondorPlatform: I686-RedHat_6.0 $ $CondorVersion: 7.6.1 Apr 27 2011 BuildID: RH-7.6.1-0.4.el6 $ $CondorPlatform: X86_64-RedHat_6.0 $ Scenario: config file: NUM_CPUS = 10 GROUP_NAMES = a, b GROUP_QUOTA_DYNAMIC_a = 0.5 GROUP_QUOTA_DYNAMIC_b = 0.5 # echo -e "universe=vanilla\ncmd=/bin/sleep\nargs=1d\n+AccountingGroup=\"a.u1\"\nqueue 3" | runuser condor -s /bin/bash -c "condor_submit" Submitting job(s)... 3 job(s) submitted to cluster 1. # cat /var/log/condor/NegotiatorLog | grep -e 'using its quota' -e 'exceeded' 05/05/11 17:23:14 Rejected 1.2 a.u1@hostname <IP:50966>: group quota exceeded 05/05/11 17:23:14 Group a is using its quota 3 - halting negotiation # condor_status -subsystem negotiator -l | grep SubmittersShareLimitLastNegotiationCycleSubmittersShareLimit0 = "a.u1@hostname" LastNegotiationCycleSubmittersShareLimit1 = "" The quota was not reached and even than this statistic item was filled. >>> ASSIGNED
(In reply to comment #8) > | runuser condor -s /bin/bash -c "condor_submit" > Submitting job(s)... > 3 job(s) submitted to cluster 1. > > # cat /var/log/condor/NegotiatorLog | grep -e 'using its quota' -e 'exceeded' > 05/05/11 17:23:14 Rejected 1.2 a.u1@hostname <IP:50966>: group quota > exceeded > 05/05/11 17:23:14 Group a is using its quota 3 - halting negotiation > > # condor_status -subsystem negotiator -l | grep > SubmittersShareLimitLastNegotiationCycleSubmittersShareLimit0 = "a.u1@hostname" > LastNegotiationCycleSubmittersShareLimit1 = "" > > The quota was not reached and even than this statistic item was filled. The behavior you're reporting above is correct: what's happening is that the quota-assignment logic will only assign a group the quota it requests. So group "a" was only requesting 3 slots (via "a.u1"). Therefore the negotiator assigned it a group quota of 3. In the negotiator loop, "a.u1" was assigned a submitter-limit of 3, which it hit when it got the 3 slots it asked for. So, it emitted the log message 'group quota exceeded' (a misleading log message that really means "I met my submitter limit and group-quotas are enabled") and then also tripped the *real* group quota check ("Group a is using its quota 3 - halting negotiation"). So, "a.u1" did hit its submitter limit, and it is correct that it showed up on NegotiationCycleSubmittersShareLimit0.
(In reply to comment #9) > The behavior you're reporting above is correct: what's happening is that the > quota-assignment logic will only assign a group the quota it requests. So > group "a" was only requesting 3 slots (via "a.u1"). Therefore the negotiator > assigned it a group quota of 3. > > In the negotiator loop, "a.u1" was assigned a submitter-limit of 3, which it > hit when it got the 3 slots it asked for. So, it emitted the log message > 'group quota exceeded' (a misleading log message that really means "I met my > submitter limit and group-quotas are enabled") and then also tripped the *real* > group quota check ("Group a is using its quota 3 - halting negotiation"). > > So, "a.u1" did hit its submitter limit, and it is correct that it showed up on > NegotiationCycleSubmittersShareLimit0. Well, I understand that submitter/group limit is set to less value than possible maximum, but I don't think that there should be any reject in negotiator log. From my point of view the job should not be rejected if there is still available slot and there was one. Consequent the statistic should be filled only if reject happened, at least I thought that, after I read Comment 4.
(In reply to comment #10) > From my point of view the job should not be rejected if there > is still available slot and there was one. > Consequent the statistic should be filled only if reject happened, at least I > thought that, after I read Comment 4. The semantics for how the negotiator loop sets the internal "rejForSubmitterLimit" flag are not very intuitive -- the NegotiationCycleSubmittersShareLimit<N> stats are keyed off of this flag. I consider the negotiator statistic to be correct, however we should consider an RFE for making the semantics of what the negotiator calls a 'rejection' more intuitive (and closely related, improve the misleading 'exceeded group quota' message)
Tested on: $CondorVersion: 7.6.1 Apr 27 2011 BuildID: RH-7.6.1-0.4.el5 $ $CondorPlatform: I686-RedHat_5.6 $ $CondorVersion: 7.6.1 Apr 27 2011 BuildID: RH-7.6.1-0.4.el5 $ $CondorPlatform: X86_64-RedHat_5.6 $ $CondorVersion: 7.6.1 Apr 27 2011 BuildID: RH-7.6.1-0.4.el6 $ $CondorPlatform: I686-RedHat_6.0 $ $CondorVersion: 7.6.1 Apr 27 2011 BuildID: RH-7.6.1-0.4.el6 $ $CondorPlatform: X86_64-RedHat_6.0 $ The statistic is working correctly, for problem with reject new bug 703905 was created. >>> VERIFIED
Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -8,4 +8,8 @@ Logic was added to include submitter names in the attribute when a submitter hit its limit in the context of a group quota limit. Result -The LastNegotiationCycleSubmittersShareLimit<N> attribute now includes submitters hitting submitter limits in all cases, including group quota limits.+The LastNegotiationCycleSubmittersShareLimit<N> attribute now includes submitters hitting submitter limits in all cases, including group quota limits. + +Release Notes Entry: + +Previously, the LastNegotiationCycleSubmittersShareLimit<N> negotiator classad stat attribute did not account for a submitter reaching the share limits in a group-quote scenario. The negotiator now includes submitter names in the attribute when any submitter reaches the submitter limit, including group quota limits.
Technical note can be viewed in the release notes for 2.0 at the documentation stage here: http://documentation-stage.bne.redhat.com/docs/en-US/Red_Hat_Enterprise_MRG/2.0/html-single/MRG_Release_Notes/index.html#tabl-MRG_Release_Notes-GRID_Update_Notes-RHM_Known_Issues
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2011-0889.html