Description of problem: Knowing the accounting group that a running job was matched under, for example whether a job matched under its "own" group or under "none" during autoregroup, would enable preemption policies desired by some customers.
Testing this enhancement will be to use new attribute in PREEMPTION_REQUIREMENTS to verify that it is available and functions as expected in a preemption policy
pushed to UPSTREAM-7.7.6-BZ803897-advertise-negotiated-group
First test for proposed patch: using RemoteAutoregroup In the following, I use 'svhist' which can be found here: https://github.com/erikerlandson/bash_condor_tools Begin with the following configuration, which sets a preemption policy (PREEMPTION_REQUIREMENTS = RemoteAutoregroup), in other words "jobs negotiating under the autoregroup phase are considered for preemption": NEGOTIATOR_DEBUG = D_FULLDEBUG | D_MACHINE NEGOTIATOR_USE_SLOT_WEIGHTS = FALSE NEGOTIATOR_INTERVAL = 30 SCHEDD_INTERVAL = 15 # turn off round robin and multiple allocation rounds HGQ_ROUND_ROBIN_RATE = 100000000 HGQ_MAX_ALLOCATION_ROUNDS = 1 # make sure preemption can occur without friction MAXJOBRETIREMENTTIME = 0 CLAIM_WORKLIFE = 0 PREEMPT = False RANK = 0 NEGOTIATOR_CONSIDER_PREEMPTION = TRUE # set a preemption requirements expression to test new attributes: PREEMPTION_REQUIREMENTS = RemoteAutoregroup NUM_CPUS = 20 GROUP_NAMES = a, b GROUP_QUOTA_a = 5 GROUP_QUOTA_b = 5 GROUP_AUTOREGROUP = TRUE GROUP_ACCEPT_SURPLUS = FALSE GROUP_SORT_EXPR = GroupResourcesInUse / (1.0 + GroupQuota) Set the prio factor for "a.user" and "b.user" to 10, setting them up for preemption: $ condor_userprio -setfactor a.user@localdomain 10 The priority factor of a.user@localdomain was set to 10.000000 $ condor_userprio -setfactor b.user@localdomain 10 The priority factor of b.user@localdomain was set to 10.000000 Submit the following jobs to groups "a" and "b": universe = vanilla cmd = /bin/sleep args = 600 should_transfer_files = if_needed when_to_transfer_output = on_exit +AccountingGroup="a.user" queue 10 +AccountingGroup="b.user" queue 10 Let those jobs begin running. Check the behavior of RemoteAutoregroup and RemoteNegotiatingGroup: $ svhist RemoteAutoregroup RemoteNegotiatingGroup AccountingGroup 5 false | a | a.user@localdomain 5 false | b | b.user@localdomain 5 true | <none> | a.user@localdomain 5 true | <none> | b.user@localdomain 20 total from the above we see that autoregroup allowed "a.user" and "b.user" to run their remaining jobs above quota. For those jobs, RemoteAutoregroup is true, and RemoteNegotiatingGroup is "<none>", which is the correct behavior. Now submit the following jobs. These will negotiate under group "<none>": universe = vanilla cmd = /bin/sleep args = 600 should_transfer_files = if_needed when_to_transfer_output = on_exit +AccountingGroup="none.user" queue 10 Wait for the negotiator to cycle, then check that these new jobs are able to preempt "a" and "b" jobs that negotiated under "<none>" previously (but not the jobs that negotiated under their respective groups): $ svhist RemoteAutoregroup RemoteNegotiatingGroup AccountingGroup 5 false | a | a.user@localdomain 5 false | b | b.user@localdomain 10 true | <none> | none.user@localdomain 20 total
Test 2: demonstrating RemoteNegotiatingGroup (and also RemoteGroup): Using the same configuration as with the first test, above, except for a different preemption policy: # jobs negotiating in their group can preempt jobs that negotiated outside their group PREEMPTION_REQUIREMENTS = (SubmitterNegotiatingGroup == SubmitterGroup) && (RemoteNegotiatingGroup != RemoteGroup) This time, alter the prio factors for "b.user" and "none.user": $ condor_userprio -setfactor b.user@localdomain 10 The priority factor of b.user@localdomain was set to 10.000000 $ condor_userprio -setfactor none.user@localdomain 10 The priority factor of none.user@localdomain was set to 10.000000 Next, submit the following jobs, where both "b.user" and "none.user" will acquire some slots by negotiating under "<none>": universe = vanilla cmd = /bin/sleep args = 600 should_transfer_files = if_needed when_to_transfer_output = on_exit +AccountingGroup="none.user" queue 10 +AccountingGroup="b.user" queue 10 Verify that b.user got 5 of its slots from "b" via quota, and another 5 via autoregroup. Observe that RemoteGroup shows that group "none" maps to "<none>". $ svhist RemoteAutoregroup RemoteNegotiatingGroup AccountingGroup RemoteGroup 5 false | b | b.user@localdomain | b 5 true | <none> | b.user@localdomain | b 10 true | <none> | none.user@localdomain | <none> 20 total Now submit some jobs for "a.user" - these will have to preempt to run: universe = vanilla cmd = /bin/sleep args = 600 should_transfer_files = if_needed when_to_transfer_output = on_exit +AccountingGroup="a.user" queue 10 Let these new jobs have a chance to negotiate. Now we verify that "a.user" was able to preempt "b.user" jobs negotiated in autoregroup ("<none>") since it's group name is not its negotiating-group name. Jobs negotiated under "b" are not preempted. Jobs from "none.user" are also not preempted because they negotiated under their group "<none>": $ svhist RemoteAutoregroup RemoteNegotiatingGroup AccountingGroup RemoteGroup 5 false | a | a.user@localdomain | a 5 false | b | b.user@localdomain | b 10 true | <none> | none.user@localdomain | <none> 20 total
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: Customers interested in configuring job preemption policies that are aware of whether a job negotiated via autoregroup, and also what particular group name a job negotiated under. Consequence: Negotiating job and resource ads did not provide sufficient information to allow group-aware or negotiation-aware preemption policies. Change: The negotiator, schedd and startd claiming process was enhanced to maintain new fields RemoteAutoregroup, RemoteNegotiatingGroup and RemoteGroup, and their Submitter counterparts. Result: New job and resource attributes are available for use in defining PREEMPTION_REQUIREMENTS expressions that are group-aware and negotiation-aware.
Tested on RHEL 5.9/6.4 x i386/x86_64 with condor-7.8.8-0.4.1 and it works. -->VERIFIED
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-0564.html