Bug 721110 - RFE: Concurrency limit default grouping
Summary: RFE: Concurrency limit default grouping
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor
Version: 1.3
Hardware: x86_64
OS: Linux
high
medium
Target Milestone: 2.2
: ---
Assignee: Erik Erlandson
QA Contact: Lubos Trilety
URL:
Whiteboard: done
Depends On:
Blocks: 805351 828434
TreeView+ depends on / blocked
 
Reported: 2011-07-13 18:34 UTC by Scott Spurrier
Modified: 2018-12-01 18:57 UTC (History)
8 users (show)

Fixed In Version: condor-7.6.5-0.15
Doc Type: Enhancement
Doc Text:
Cause: Customer wanted to alter the available concurrency limits for jobs on a frequent basis. Consequence: Frequent negotiator reconfigurations were required, and impacted pool performance. Change: The negotiator accountant was enhanced to support named groups for scoping multiple concurrency limit defaults based on a limit name prefix. Result: Concurrency limits can be defined with multiple possible default values based on name prefix, without needing to invoke frequent negotiator reconfigurations.
Clone Of:
: 805351 (view as bug list)
Environment:
Last Closed: 2012-09-19 17:41:33 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Condor 2863 0 None None None Never
Red Hat Product Errata RHSA-2012:1278 0 normal SHIPPED_LIVE Moderate: Red Hat Enterprise MRG Grid 2.2 security update 2012-09-19 21:40:26 UTC

Internal Links: 756226

Description Scott Spurrier 2011-07-13 18:34:23 UTC
Description of problem:
We would like the ability to set a 1-tier grouping of concurrency limit defaults to assist in avoiding a high rate of negotiator reconfigs for concurrency limit adjustments. We envision consuming that as something like: 

CONCURRENCY_LIMIT_DEFAULT=150 
CONCURRENCY_LIMIT_DEFAULT_LIST = direct,batch CONCURRENCY_LIMIT_DEFAULT_LIST_direct = 40 
CONCURRENCY_LIMIT_DEFAULT_LIST_batch = 120 

A concurrency limit of "gl5000" would start with a value of 150. A concurrency limit of "direct.gl5000" would start with a value of 40. A concurrency limit of "batch.gl5000" would start with a value of 120.

Comment 8 Erik Erlandson 2012-03-08 20:03:56 UTC
Fix pushed to: UPSTREAM-7.7.6-BZ721110-scoped-default-limits

Comment 9 Erik Erlandson 2012-03-08 20:04:40 UTC
TESTING:

Begin with this configuration, that defines some scoped cc-limit defaults and a traditional cc-limit "TEST":

NEGOTIATOR_DEBUG = D_ACCOUNTANT | D_FULLDEBUG
NEGOTIATOR_INTERVAL = 20
NEGOTIATOR_CYCLE_DELAY = 5

# oversubscribe some slots to make testing easier:
NUM_CPUS = 25

# concurrency limit defaults:
CONCURRENCY_LIMIT_DEFAULT = 1
CONCURRENCY_LIMIT_DEFAULT_small = 2
CONCURRENCY_LIMIT_DEFAULT_medium = 5
CONCURRENCY_LIMIT_DEFAULT_large = 11

TEST_LIMIT = 3

Spin up a pool with this configuration, and submit this job:

universe = vanilla
cmd = /bin/sleep
args = 600
transfer_executable = false
should_transfer_files = if_needed
when_to_transfer_output = on_exit
concurrency_limits = large.test
queue 20
concurrency_limits = medium.test
queue 20
concurrency_limits = small.test
queue 20
concurrency_limits = test
queue 20
concurrency_limits = undef.test
queue 20
concurrency_limits = undef
queue 20

If you grep the negotiator log as follows, after submission you should see:

$ tail -f NegotiatorLog | grep -e '-------' -e 'Limits --' -e 'Limit:'
03/06/12 16:28:05 ---------- Started Negotiation Cycle ----------
03/06/12 16:28:06 Previous Limits --
03/06/12 16:28:06 Current Limits --
03/06/12 16:28:06 ---------- Finished Negotiation Cycle ----------
03/06/12 16:28:12 ---------- Started Negotiation Cycle ----------
03/06/12 16:28:13 Previous Limits --
03/06/12 16:28:13 Current Limits --
03/06/12 16:28:13 Concurrency Limit: large.test is 0.000000
03/06/12 16:28:13 Concurrency Limit: large.test is 1.000000
03/06/12 16:28:13 Concurrency Limit: large.test is 2.000000
03/06/12 16:28:13 Concurrency Limit: large.test is 3.000000
03/06/12 16:28:13 Concurrency Limit: large.test is 4.000000
03/06/12 16:28:13 Concurrency Limit: large.test is 5.000000
03/06/12 16:28:13 Concurrency Limit: large.test is 6.000000
03/06/12 16:28:13 Concurrency Limit: large.test is 7.000000
03/06/12 16:28:13 Concurrency Limit: large.test is 8.000000
03/06/12 16:28:13 Concurrency Limit: large.test is 9.000000
03/06/12 16:28:13 Concurrency Limit: large.test is 10.000000
03/06/12 16:28:13 Concurrency Limit: large.test is 11.000000
03/06/12 16:28:13 Concurrency Limit: medium.test is 0.000000
03/06/12 16:28:14 Concurrency Limit: medium.test is 1.000000
03/06/12 16:28:14 Concurrency Limit: medium.test is 2.000000
03/06/12 16:28:14 Concurrency Limit: medium.test is 3.000000
03/06/12 16:28:14 Concurrency Limit: medium.test is 4.000000
03/06/12 16:28:14 Concurrency Limit: medium.test is 5.000000
03/06/12 16:28:14 Concurrency Limit: small.test is 0.000000
03/06/12 16:28:14 Concurrency Limit: small.test is 1.000000
03/06/12 16:28:14 Concurrency Limit: small.test is 2.000000
03/06/12 16:28:14 Concurrency Limit: test is 0.000000
03/06/12 16:28:14 Concurrency Limit: test is 1.000000
03/06/12 16:28:14 Concurrency Limit: test is 2.000000
03/06/12 16:28:14 Concurrency Limit: test is 3.000000
03/06/12 16:28:14 Concurrency Limit: undef.test is 0.000000
03/06/12 16:28:14 Concurrency Limit: undef.test is 1.000000
03/06/12 16:28:14 Concurrency Limit: undef is 0.000000
03/06/12 16:28:14 Concurrency Limit: undef is 1.000000
03/06/12 16:28:15 ---------- Finished Negotiation Cycle ----------

Another verification that the concurrency limits were obeyed as defined:

$ qvhist ConcurrencyLimits -c 'JobStatus == 2'
     11 large.test
      5 medium.test
      2 small.test
      3 test
      1 undef
      1 undef.test
     23 total

note, qvhist can be found here: https://github.com/erikerlandson/bash_condor_tools

Comment 10 Erik Erlandson 2012-03-08 20:12:16 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause:
Customer wanted to alter the available concurrency limits for jobs on a frequent basis.

Consequence:
Frequent negotiator reconfigurations were required, and impacted pool performance.

Change:
The negotiator accountant was enhanced to support named groups for scoping multiple concurrency limit defaults based on a limit name prefix.

Result:
Concurrency limits can be defined with multiple possible default values based on name prefix, without needing to invoke frequent negotiator reconfigurations.

Comment 12 Lubos Trilety 2012-04-05 13:44:50 UTC
I run the scenario from Comment 9 and then I run condor_userprio -l. condor_userprio prints only two concurrency limits 'undef' and 'test':
# condor_userprio -l | grep -i limit
ConcurrencyLimit_undef = 1.000000
ConcurrencyLimit_test = 3.000000

I suppose there should be all used limits even those special with point in name.

Comment 13 Erik Erlandson 2012-04-05 23:37:56 UTC
(In reply to comment #12)
> I run the scenario from Comment 9 and then I run condor_userprio -l.
> condor_userprio prints only two concurrency limits 'undef' and 'test':
> # condor_userprio -l | grep -i limit
> ConcurrencyLimit_undef = 1.000000
> ConcurrencyLimit_test = 3.000000
> 
> I suppose there should be all used limits even those special with point in
> name.

The problem here is interesting: the concurrency limits are advertised by constructing an attribute name ConcurrencyLimit_<name>, but in the case of these new names, there is a '.' in them, which has special meaning in classads.  

The accountant attempts to invoke something like Assign('ConcurrencyLimit_large.test = 3'), but that fails, and it is ignored, because it's interpreting the '.' as a selection operator on 'ConcurrencyLimit_large'

It's not easy to get around, as the classads go over the wire, which means they are unparsed and then reparsed on the receiving end (userprio).  So attempting to use any other delimiting character is going to fail, by virtue of the fact that the lexer only lexes identifiers with alpha-numerics and '_'.    

So, we could advertise 'ConcurrencyLimit_large_test' instead of 'ConcurrencyLimit_large.test', but I think anything else would require rethinking how we report concurrency limits to userprio, or somehow leveraging support for single-quoted identifiers from the classad standard.

Comment 14 Erik Erlandson 2012-04-09 17:13:28 UTC
Update that advertises grouped cc-limits by replacing '.' with '_', (e.g. ConcurrencyLimit_large_test), committed to UPSTREAM-7.7.6-BZ721110-scoped-default-limits

Comment 16 Lubos Trilety 2012-05-25 06:57:08 UTC
Tested with:
condor-7.6.8-0.1

Tested on:
RHEL5 x86_64,i386  - passed
RHEL6 x86_64,i386  - passed

>>> VERIFIED

Comment 18 Lubos Trilety 2012-06-18 14:27:04 UTC
Tested with:
condor-7.6.5-0.15

Tested on:
RHEL5 x86_64,i386  - passed
RHEL6 x86_64,i386  - passed

>>> VERIFIED

Comment 20 errata-xmlrpc 2012-09-19 17:41:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-1278.html


Note You need to log in before you can comment on or make changes to this bug.