| Summary: | RFE: advertise the accounting group that a running job matched under on the resource ad | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | Erik Erlandson <eerlands> | |
| Component: | condor | Assignee: | Erik Erlandson <eerlands> | |
| Status: | CLOSED ERRATA | QA Contact: | Lubos Trilety <ltrilety> | |
| Severity: | low | Docs Contact: | ||
| Priority: | high | |||
| Version: | 2.2 | CC: | esammons, iboverma, ltoscano, ltrilety, matt, mkudlej, rrati, tstclair | |
| Target Milestone: | 2.3 | Keywords: | FutureFeature | |
| Target Release: | --- | |||
| Hardware: | All | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | condor-7.8.2-0.1 | Doc Type: | Enhancement | |
| Doc Text: |
Cause:
Customers interested in configuring job preemption policies that are aware of whether a job negotiated via autoregroup, and also what particular group name a job negotiated under.
Consequence:
Negotiating job and resource ads did not provide sufficient information to allow group-aware or negotiation-aware preemption policies.
Change:
The negotiator, schedd and startd claiming process was enhanced to maintain new fields RemoteAutoregroup, RemoteNegotiatingGroup and RemoteGroup, and their Submitter counterparts.
Result:
New job and resource attributes are available for use in defining PREEMPTION_REQUIREMENTS expressions that are group-aware and negotiation-aware.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 872763 876584 (view as bug list) | Environment: | ||
| Last Closed: | 2013-03-06 18:43:00 UTC | Type: | --- | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Bug Depends On: | ||||
| Bug Blocks: | 850563, 876584 | |||
|
Description
Erik Erlandson
2012-03-15 21:35:17 UTC
Testing this enhancement will be to use new attribute in PREEMPTION_REQUIREMENTS to verify that it is available and functions as expected in a preemption policy pushed to UPSTREAM-7.7.6-BZ803897-advertise-negotiated-group First test for proposed patch: using RemoteAutoregroup In the following, I use 'svhist' which can be found here: https://github.com/erikerlandson/bash_condor_tools Begin with the following configuration, which sets a preemption policy (PREEMPTION_REQUIREMENTS = RemoteAutoregroup), in other words "jobs negotiating under the autoregroup phase are considered for preemption": NEGOTIATOR_DEBUG = D_FULLDEBUG | D_MACHINE NEGOTIATOR_USE_SLOT_WEIGHTS = FALSE NEGOTIATOR_INTERVAL = 30 SCHEDD_INTERVAL = 15 # turn off round robin and multiple allocation rounds HGQ_ROUND_ROBIN_RATE = 100000000 HGQ_MAX_ALLOCATION_ROUNDS = 1 # make sure preemption can occur without friction MAXJOBRETIREMENTTIME = 0 CLAIM_WORKLIFE = 0 PREEMPT = False RANK = 0 NEGOTIATOR_CONSIDER_PREEMPTION = TRUE # set a preemption requirements expression to test new attributes: PREEMPTION_REQUIREMENTS = RemoteAutoregroup NUM_CPUS = 20 GROUP_NAMES = a, b GROUP_QUOTA_a = 5 GROUP_QUOTA_b = 5 GROUP_AUTOREGROUP = TRUE GROUP_ACCEPT_SURPLUS = FALSE GROUP_SORT_EXPR = GroupResourcesInUse / (1.0 + GroupQuota) Set the prio factor for "a.user" and "b.user" to 10, setting them up for preemption: $ condor_userprio -setfactor a.user@localdomain 10 The priority factor of a.user@localdomain was set to 10.000000 $ condor_userprio -setfactor b.user@localdomain 10 The priority factor of b.user@localdomain was set to 10.000000 Submit the following jobs to groups "a" and "b": universe = vanilla cmd = /bin/sleep args = 600 should_transfer_files = if_needed when_to_transfer_output = on_exit +AccountingGroup="a.user" queue 10 +AccountingGroup="b.user" queue 10 Let those jobs begin running. Check the behavior of RemoteAutoregroup and RemoteNegotiatingGroup: $ svhist RemoteAutoregroup RemoteNegotiatingGroup AccountingGroup 5 false | a | a.user@localdomain 5 false | b | b.user@localdomain 5 true | <none> | a.user@localdomain 5 true | <none> | b.user@localdomain 20 total from the above we see that autoregroup allowed "a.user" and "b.user" to run their remaining jobs above quota. For those jobs, RemoteAutoregroup is true, and RemoteNegotiatingGroup is "<none>", which is the correct behavior. Now submit the following jobs. These will negotiate under group "<none>": universe = vanilla cmd = /bin/sleep args = 600 should_transfer_files = if_needed when_to_transfer_output = on_exit +AccountingGroup="none.user" queue 10 Wait for the negotiator to cycle, then check that these new jobs are able to preempt "a" and "b" jobs that negotiated under "<none>" previously (but not the jobs that negotiated under their respective groups): $ svhist RemoteAutoregroup RemoteNegotiatingGroup AccountingGroup 5 false | a | a.user@localdomain 5 false | b | b.user@localdomain 10 true | <none> | none.user@localdomain 20 total Test 2: demonstrating RemoteNegotiatingGroup (and also RemoteGroup):
Using the same configuration as with the first test, above, except for a different preemption policy:
# jobs negotiating in their group can preempt jobs that negotiated outside their group
PREEMPTION_REQUIREMENTS = (SubmitterNegotiatingGroup == SubmitterGroup) && (RemoteNegotiatingGroup != RemoteGroup)
This time, alter the prio factors for "b.user" and "none.user":
$ condor_userprio -setfactor b.user@localdomain 10
The priority factor of b.user@localdomain was set to 10.000000
$ condor_userprio -setfactor none.user@localdomain 10
The priority factor of none.user@localdomain was set to 10.000000
Next, submit the following jobs, where both "b.user" and "none.user" will acquire some slots by negotiating under "<none>":
universe = vanilla
cmd = /bin/sleep
args = 600
should_transfer_files = if_needed
when_to_transfer_output = on_exit
+AccountingGroup="none.user"
queue 10
+AccountingGroup="b.user"
queue 10
Verify that b.user got 5 of its slots from "b" via quota, and another 5 via autoregroup. Observe that RemoteGroup shows that group "none" maps to "<none>".
$ svhist RemoteAutoregroup RemoteNegotiatingGroup AccountingGroup RemoteGroup
5 false | b | b.user@localdomain | b
5 true | <none> | b.user@localdomain | b
10 true | <none> | none.user@localdomain | <none>
20 total
Now submit some jobs for "a.user" - these will have to preempt to run:
universe = vanilla
cmd = /bin/sleep
args = 600
should_transfer_files = if_needed
when_to_transfer_output = on_exit
+AccountingGroup="a.user"
queue 10
Let these new jobs have a chance to negotiate. Now we verify that "a.user" was able to preempt "b.user" jobs negotiated in autoregroup ("<none>") since it's group name is not its negotiating-group name. Jobs negotiated under "b" are not preempted. Jobs from "none.user" are also not preempted because they negotiated under their group "<none>":
$ svhist RemoteAutoregroup RemoteNegotiatingGroup AccountingGroup RemoteGroup
5 false | a | a.user@localdomain | a
5 false | b | b.user@localdomain | b
10 true | <none> | none.user@localdomain | <none>
20 total
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
New Contents:
Cause:
Customers interested in configuring job preemption policies that are aware of whether a job negotiated via autoregroup, and also what particular group name a job negotiated under.
Consequence:
Negotiating job and resource ads did not provide sufficient information to allow group-aware or negotiation-aware preemption policies.
Change:
The negotiator, schedd and startd claiming process was enhanced to maintain new fields RemoteAutoregroup, RemoteNegotiatingGroup and RemoteGroup, and their Submitter counterparts.
Result:
New job and resource attributes are available for use in defining PREEMPTION_REQUIREMENTS expressions that are group-aware and negotiation-aware.
Tested on RHEL 5.9/6.4 x i386/x86_64 with condor-7.8.8-0.4.1 and it works. -->VERIFIED Tested on RHEL 5.9/6.4 x i386/x86_64 with condor-7.8.8-0.4.1 and it works. -->VERIFIED Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-0564.html |