| Summary: | RFE: Concurrency limit default grouping | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | Scott Spurrier <spurrier> | |
| Component: | condor | Assignee: | Erik Erlandson <eerlands> | |
| Status: | CLOSED ERRATA | QA Contact: | Lubos Trilety <ltrilety> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | high | |||
| Version: | 1.3 | CC: | hpetty, iboverma, jneedle, jthomas, ltrilety, matt, mkudlej, tstclair | |
| Target Milestone: | 2.2 | Keywords: | FutureFeature | |
| Target Release: | --- | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | done | |||
| Fixed In Version: | condor-7.6.5-0.15 | Doc Type: | Enhancement | |
| Doc Text: |
Cause:
Customer wanted to alter the available concurrency limits for jobs on a frequent basis.
Consequence:
Frequent negotiator reconfigurations were required, and impacted pool performance.
Change:
The negotiator accountant was enhanced to support named groups for scoping multiple concurrency limit defaults based on a limit name prefix.
Result:
Concurrency limits can be defined with multiple possible default values based on name prefix, without needing to invoke frequent negotiator reconfigurations.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 805351 (view as bug list) | Environment: | ||
| Last Closed: | 2012-09-19 17:41:33 UTC | Type: | --- | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Bug Depends On: | ||||
| Bug Blocks: | 805351, 828434 | |||
|
Description
Scott Spurrier
2011-07-13 18:34:23 UTC
Fix pushed to: UPSTREAM-7.7.6-BZ721110-scoped-default-limits TESTING:
Begin with this configuration, that defines some scoped cc-limit defaults and a traditional cc-limit "TEST":
NEGOTIATOR_DEBUG = D_ACCOUNTANT | D_FULLDEBUG
NEGOTIATOR_INTERVAL = 20
NEGOTIATOR_CYCLE_DELAY = 5
# oversubscribe some slots to make testing easier:
NUM_CPUS = 25
# concurrency limit defaults:
CONCURRENCY_LIMIT_DEFAULT = 1
CONCURRENCY_LIMIT_DEFAULT_small = 2
CONCURRENCY_LIMIT_DEFAULT_medium = 5
CONCURRENCY_LIMIT_DEFAULT_large = 11
TEST_LIMIT = 3
Spin up a pool with this configuration, and submit this job:
universe = vanilla
cmd = /bin/sleep
args = 600
transfer_executable = false
should_transfer_files = if_needed
when_to_transfer_output = on_exit
concurrency_limits = large.test
queue 20
concurrency_limits = medium.test
queue 20
concurrency_limits = small.test
queue 20
concurrency_limits = test
queue 20
concurrency_limits = undef.test
queue 20
concurrency_limits = undef
queue 20
If you grep the negotiator log as follows, after submission you should see:
$ tail -f NegotiatorLog | grep -e '-------' -e 'Limits --' -e 'Limit:'
03/06/12 16:28:05 ---------- Started Negotiation Cycle ----------
03/06/12 16:28:06 Previous Limits --
03/06/12 16:28:06 Current Limits --
03/06/12 16:28:06 ---------- Finished Negotiation Cycle ----------
03/06/12 16:28:12 ---------- Started Negotiation Cycle ----------
03/06/12 16:28:13 Previous Limits --
03/06/12 16:28:13 Current Limits --
03/06/12 16:28:13 Concurrency Limit: large.test is 0.000000
03/06/12 16:28:13 Concurrency Limit: large.test is 1.000000
03/06/12 16:28:13 Concurrency Limit: large.test is 2.000000
03/06/12 16:28:13 Concurrency Limit: large.test is 3.000000
03/06/12 16:28:13 Concurrency Limit: large.test is 4.000000
03/06/12 16:28:13 Concurrency Limit: large.test is 5.000000
03/06/12 16:28:13 Concurrency Limit: large.test is 6.000000
03/06/12 16:28:13 Concurrency Limit: large.test is 7.000000
03/06/12 16:28:13 Concurrency Limit: large.test is 8.000000
03/06/12 16:28:13 Concurrency Limit: large.test is 9.000000
03/06/12 16:28:13 Concurrency Limit: large.test is 10.000000
03/06/12 16:28:13 Concurrency Limit: large.test is 11.000000
03/06/12 16:28:13 Concurrency Limit: medium.test is 0.000000
03/06/12 16:28:14 Concurrency Limit: medium.test is 1.000000
03/06/12 16:28:14 Concurrency Limit: medium.test is 2.000000
03/06/12 16:28:14 Concurrency Limit: medium.test is 3.000000
03/06/12 16:28:14 Concurrency Limit: medium.test is 4.000000
03/06/12 16:28:14 Concurrency Limit: medium.test is 5.000000
03/06/12 16:28:14 Concurrency Limit: small.test is 0.000000
03/06/12 16:28:14 Concurrency Limit: small.test is 1.000000
03/06/12 16:28:14 Concurrency Limit: small.test is 2.000000
03/06/12 16:28:14 Concurrency Limit: test is 0.000000
03/06/12 16:28:14 Concurrency Limit: test is 1.000000
03/06/12 16:28:14 Concurrency Limit: test is 2.000000
03/06/12 16:28:14 Concurrency Limit: test is 3.000000
03/06/12 16:28:14 Concurrency Limit: undef.test is 0.000000
03/06/12 16:28:14 Concurrency Limit: undef.test is 1.000000
03/06/12 16:28:14 Concurrency Limit: undef is 0.000000
03/06/12 16:28:14 Concurrency Limit: undef is 1.000000
03/06/12 16:28:15 ---------- Finished Negotiation Cycle ----------
Another verification that the concurrency limits were obeyed as defined:
$ qvhist ConcurrencyLimits -c 'JobStatus == 2'
11 large.test
5 medium.test
2 small.test
3 test
1 undef
1 undef.test
23 total
note, qvhist can be found here: https://github.com/erikerlandson/bash_condor_tools
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
New Contents:
Cause:
Customer wanted to alter the available concurrency limits for jobs on a frequent basis.
Consequence:
Frequent negotiator reconfigurations were required, and impacted pool performance.
Change:
The negotiator accountant was enhanced to support named groups for scoping multiple concurrency limit defaults based on a limit name prefix.
Result:
Concurrency limits can be defined with multiple possible default values based on name prefix, without needing to invoke frequent negotiator reconfigurations.
I run the scenario from Comment 9 and then I run condor_userprio -l. condor_userprio prints only two concurrency limits 'undef' and 'test': # condor_userprio -l | grep -i limit ConcurrencyLimit_undef = 1.000000 ConcurrencyLimit_test = 3.000000 I suppose there should be all used limits even those special with point in name. (In reply to comment #12) > I run the scenario from Comment 9 and then I run condor_userprio -l. > condor_userprio prints only two concurrency limits 'undef' and 'test': > # condor_userprio -l | grep -i limit > ConcurrencyLimit_undef = 1.000000 > ConcurrencyLimit_test = 3.000000 > > I suppose there should be all used limits even those special with point in > name. The problem here is interesting: the concurrency limits are advertised by constructing an attribute name ConcurrencyLimit_<name>, but in the case of these new names, there is a '.' in them, which has special meaning in classads. The accountant attempts to invoke something like Assign('ConcurrencyLimit_large.test = 3'), but that fails, and it is ignored, because it's interpreting the '.' as a selection operator on 'ConcurrencyLimit_large' It's not easy to get around, as the classads go over the wire, which means they are unparsed and then reparsed on the receiving end (userprio). So attempting to use any other delimiting character is going to fail, by virtue of the fact that the lexer only lexes identifiers with alpha-numerics and '_'. So, we could advertise 'ConcurrencyLimit_large_test' instead of 'ConcurrencyLimit_large.test', but I think anything else would require rethinking how we report concurrency limits to userprio, or somehow leveraging support for single-quoted identifiers from the classad standard. Update that advertises grouped cc-limits by replacing '.' with '_', (e.g. ConcurrencyLimit_large_test), committed to UPSTREAM-7.7.6-BZ721110-scoped-default-limits Tested with:
condor-7.6.8-0.1
Tested on:
RHEL5 x86_64,i386 - passed
RHEL6 x86_64,i386 - passed
>>> VERIFIED
Tested with:
condor-7.6.5-0.15
Tested on:
RHEL5 x86_64,i386 - passed
RHEL6 x86_64,i386 - passed
>>> VERIFIED
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2012-1278.html |