Bug 674669

Summary: RFE: include submitters hitting group limits in Negotiator classad stat LastNegotiationCycleSubmittersShareLimitX
Product: Red Hat Enterprise MRG Reporter: Erik Erlandson <eerlands>
Component: condorAssignee: Erik Erlandson <eerlands>
Status: CLOSED ERRATA QA Contact: Lubos Trilety <ltrilety>
Severity: low Docs Contact:
Priority: medium    
Version: 1.3CC: claudiol, iboverma, jthomas, ltoscano, ltrilety, matt, mhusnain, mkudlej, whenry
Target Milestone: 2.0Keywords: FutureFeature
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: condor-7.5.6-0.1 Doc Type: Enhancement
Doc Text:
Cause The LastNegotiationCycleSubmittersShareLimit<N> negotiator classad stat attribute did not include submitters hitting share limits in a group-quota scenario. Consequence The statistic did not count all submitters hitting limits when accounting groups were in use. Change Logic was added to include submitter names in the attribute when a submitter hit its limit in the context of a group quota limit. Result The LastNegotiationCycleSubmittersShareLimit<N> attribute now includes submitters hitting submitter limits in all cases, including group quota limits. Release Notes Entry: Previously, the LastNegotiationCycleSubmittersShareLimit<N> negotiator classad stat attribute did not account for a submitter reaching the share limits in a group-quote scenario. The negotiator now includes submitter names in the attribute when any submitter reaches the submitter limit, including group quota limits.
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-06-23 15:38:01 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 641431    
Bug Blocks: 693778    
Attachments:
Description Flags
NegotiatorLog none

Description Erik Erlandson 2011-02-02 21:13:55 UTC
Description of problem:

LastNegotiationCycleSubmittersShareLimitX currently only counts submitters which specifically hit the submitter limit.  This does not necessarily include submitters which run up against the accounting group limit.  The statistic would be more intuitive if it counted the union of these cases.

Comment 1 Erik Erlandson 2011-02-02 21:15:38 UTC
Pushed a branch V7_4-BZ641431-unify-share-group-limits, which updates the
semantic of LastNegotiationCycleSubmittersShareLimitX to include limits imposed
by acct groups as well as share limits, which makes the statistic more
intuitive.

Also, I included this update upstream on the latest patch for gt#1393

Comment 3 Martin Kudlej 2011-03-07 13:21:22 UTC
What are the step to verify this issue, please?

Comment 4 Erik Erlandson 2011-03-07 15:21:15 UTC
(In reply to comment #3)
> What are the step to verify this issue, please?

Should be able to verify by setting up a group with quota (N) and submitting (N+1) jobs against it -- the submitter should show up in the LastNegotiationCycleSubmittersShareLimitX list.

Comment 5 Erik Erlandson 2011-04-27 20:51:12 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause
The LastNegotiationCycleSubmittersShareLimit<N> negotiator classad stat attribute did not include submitters hitting share limits in a group-quota scenario.

Consequence
The statistic did not count all submitters hitting limits when accounting groups were in use.

Change
Logic was added to include submitter names in the attribute when a submitter hit its limit in the context of a group quota limit.

Result
The LastNegotiationCycleSubmittersShareLimit<N> attribute now includes submitters hitting submitter limits in all cases, including group quota limits.

Comment 7 Lubos Trilety 2011-05-05 15:17:00 UTC
Successfully reproduced on:
$CondorVersion: 7.4.5 Feb  4 2011 BuildID: RH-7.4.5-0.8.el5 PRE-RELEASE $
$CondorPlatform: X86_64-LINUX_RHEL5 $

Scenario:
config file:
NUM_CPUS = 2
GROUP_NAMES = a, b
GROUP_QUOTA_DYNAMIC_a = 0.5
GROUP_QUOTA_DYNAMIC_b = 0.5

# echo -e
"universe=vanilla\ncmd=/bin/sleep\nargs=1d\n+AccountingGroup=\"a.u3\"\nqueue\n+AccountingGroup=\"a.u1\"\nqueue\n+AccountingGroup=\"a.u2\"\nqueue\n" |
runuser condor -s /bin/bash -c "condor_submit"
Submitting job(s)......
6 job(s) submitted to cluster 1.

# condor_q
-- Submitter: hostname : <IP:39251> : hostname
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
   1.0   condor          5/5  17:14   0+00:00:00 I  0   0.0  sleep 1d          
   1.1   condor          5/5  17:14   0+00:00:03 R  0   0.0  sleep 1d          
   1.2   condor          5/5  17:14   0+00:00:00 I  0   0.0  sleep 1d          

3 jobs; 2 idle, 1 running, 0 held

# condor_status -subsystem negotiator -l | grep SubmittersShareLimit
LastNegotiationCycleSubmittersShareLimit0 = ""
LastNegotiationCycleSubmittersShareLimit1 = ""
LastNegotiationCycleSubmittersShareLimit2 = ""

Comment 8 Lubos Trilety 2011-05-05 15:31:16 UTC
Tested on:
$CondorVersion: 7.6.1 Apr 27 2011 BuildID: RH-7.6.1-0.4.el5 $
$CondorPlatform: I686-RedHat_5.6 $

$CondorVersion: 7.6.1 Apr 27 2011 BuildID: RH-7.6.1-0.4.el5 $
$CondorPlatform: X86_64-RedHat_5.6 $

$CondorVersion: 7.6.1 Apr 27 2011 BuildID: RH-7.6.1-0.4.el6 $
$CondorPlatform: I686-RedHat_6.0 $

$CondorVersion: 7.6.1 Apr 27 2011 BuildID: RH-7.6.1-0.4.el6 $
$CondorPlatform: X86_64-RedHat_6.0 $

Scenario:
config file:
NUM_CPUS = 10
GROUP_NAMES = a, b
GROUP_QUOTA_DYNAMIC_a = 0.5
GROUP_QUOTA_DYNAMIC_b = 0.5

# echo -e "universe=vanilla\ncmd=/bin/sleep\nargs=1d\n+AccountingGroup=\"a.u1\"\nqueue 3" | runuser condor -s /bin/bash -c "condor_submit"
Submitting job(s)...
3 job(s) submitted to cluster 1.

# cat /var/log/condor/NegotiatorLog | grep -e 'using its quota' -e 'exceeded'
05/05/11 17:23:14       Rejected 1.2 a.u1@hostname <IP:50966>: group quota exceeded
05/05/11 17:23:14 Group a is using its quota 3 - halting negotiation

# condor_status -subsystem negotiator -l | grep SubmittersShareLimitLastNegotiationCycleSubmittersShareLimit0 = "a.u1@hostname"
LastNegotiationCycleSubmittersShareLimit1 = ""


The quota was not reached and even than this statistic item was filled.

>>> ASSIGNED

Comment 9 Erik Erlandson 2011-05-10 22:20:02 UTC
(In reply to comment #8)

> | runuser condor -s /bin/bash -c "condor_submit"
> Submitting job(s)...
> 3 job(s) submitted to cluster 1.
> 
> # cat /var/log/condor/NegotiatorLog | grep -e 'using its quota' -e 'exceeded'
> 05/05/11 17:23:14       Rejected 1.2 a.u1@hostname <IP:50966>: group quota
> exceeded
> 05/05/11 17:23:14 Group a is using its quota 3 - halting negotiation
> 
> # condor_status -subsystem negotiator -l | grep
> SubmittersShareLimitLastNegotiationCycleSubmittersShareLimit0 = "a.u1@hostname"
> LastNegotiationCycleSubmittersShareLimit1 = ""
> 
> The quota was not reached and even than this statistic item was filled.


The behavior you're reporting above is correct:  what's happening is that the quota-assignment logic will only assign a group the quota it requests.  So group "a" was only requesting 3 slots (via "a.u1").  Therefore the negotiator assigned it a group quota of 3.  

In the negotiator loop, "a.u1" was assigned a submitter-limit of 3, which it hit when it got the 3 slots it asked for.  So, it emitted the log message 'group quota exceeded'  (a misleading log message that really means "I met my submitter limit and group-quotas are enabled") and then also tripped the *real* group quota check ("Group a is using its quota 3 - halting negotiation").

So, "a.u1" did hit its submitter limit, and it is correct that it showed up on 
NegotiationCycleSubmittersShareLimit0.

Comment 10 Lubos Trilety 2011-05-11 12:35:55 UTC
(In reply to comment #9)
> The behavior you're reporting above is correct:  what's happening is that the
> quota-assignment logic will only assign a group the quota it requests.  So
> group "a" was only requesting 3 slots (via "a.u1").  Therefore the negotiator
> assigned it a group quota of 3.  
> 
> In the negotiator loop, "a.u1" was assigned a submitter-limit of 3, which it
> hit when it got the 3 slots it asked for.  So, it emitted the log message
> 'group quota exceeded'  (a misleading log message that really means "I met my
> submitter limit and group-quotas are enabled") and then also tripped the *real*
> group quota check ("Group a is using its quota 3 - halting negotiation").
> 
> So, "a.u1" did hit its submitter limit, and it is correct that it showed up on 
> NegotiationCycleSubmittersShareLimit0.

Well, I understand that submitter/group limit is set to less value than possible maximum, but I don't think that there should be any reject in negotiator log. From my point of view the job should not be rejected if there is still available slot and there was one.
Consequent the statistic should be filled only if reject happened, at least I thought that, after I read Comment 4.

Comment 11 Erik Erlandson 2011-05-11 15:04:03 UTC
(In reply to comment #10)

> From my point of view the job should not be rejected if there
> is still available slot and there was one.
> Consequent the statistic should be filled only if reject happened, at least I
> thought that, after I read Comment 4.

The semantics for how the negotiator loop sets the internal "rejForSubmitterLimit" flag are not very intuitive -- the NegotiationCycleSubmittersShareLimit<N> stats are keyed off of this flag.  

I consider the negotiator statistic to be correct, however we should consider an RFE for making the semantics of what the negotiator calls a 'rejection' more intuitive (and closely related, improve the misleading 'exceeded group quota' message)

Comment 13 Lubos Trilety 2011-05-11 15:59:14 UTC
Tested on:
$CondorVersion: 7.6.1 Apr 27 2011 BuildID: RH-7.6.1-0.4.el5 $
$CondorPlatform: I686-RedHat_5.6 $

$CondorVersion: 7.6.1 Apr 27 2011 BuildID: RH-7.6.1-0.4.el5 $
$CondorPlatform: X86_64-RedHat_5.6 $

$CondorVersion: 7.6.1 Apr 27 2011 BuildID: RH-7.6.1-0.4.el6 $
$CondorPlatform: I686-RedHat_6.0 $

$CondorVersion: 7.6.1 Apr 27 2011 BuildID: RH-7.6.1-0.4.el6 $
$CondorPlatform: X86_64-RedHat_6.0 $

The statistic is working correctly, for problem with reject new bug 703905 was
created.

>>> VERIFIED

Comment 14 Misha H. Ali 2011-06-01 00:55:41 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -8,4 +8,8 @@
 Logic was added to include submitter names in the attribute when a submitter hit its limit in the context of a group quota limit.
 
 Result
-The LastNegotiationCycleSubmittersShareLimit<N> attribute now includes submitters hitting submitter limits in all cases, including group quota limits.+The LastNegotiationCycleSubmittersShareLimit<N> attribute now includes submitters hitting submitter limits in all cases, including group quota limits.
+
+Release Notes Entry:
+
+Previously, the LastNegotiationCycleSubmittersShareLimit<N> negotiator classad stat attribute did not account for a submitter reaching the share limits in a group-quote scenario. The negotiator now includes submitter names in the attribute when any submitter reaches the submitter limit, including group quota limits.

Comment 15 Misha H. Ali 2011-06-06 03:30:30 UTC
Technical note can be viewed in the release notes for 2.0 at the documentation stage here:

http://documentation-stage.bne.redhat.com/docs/en-US/Red_Hat_Enterprise_MRG/2.0/html-single/MRG_Release_Notes/index.html#tabl-MRG_Release_Notes-GRID_Update_Notes-RHM_Known_Issues

Comment 16 errata-xmlrpc 2011-06-23 15:38:01 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2011-0889.html