Bug 803897 - RFE: advertise the accounting group that a running job matched under on the resource ad
RFE: advertise the accounting group that a running job matched under on the r...
Status: CLOSED ERRATA
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor (Show other bugs)
2.2
All Linux
high Severity low
: 2.3
: ---
Assigned To: Erik Erlandson
Lubos Trilety
: FutureFeature
Depends On:
Blocks: 850563 876584
  Show dependency treegraph
 
Reported: 2012-03-15 17:35 EDT by Erik Erlandson
Modified: 2013-03-06 13:43 EST (History)
8 users (show)

See Also:
Fixed In Version: condor-7.8.2-0.1
Doc Type: Enhancement
Doc Text:
Cause: Customers interested in configuring job preemption policies that are aware of whether a job negotiated via autoregroup, and also what particular group name a job negotiated under. Consequence: Negotiating job and resource ads did not provide sufficient information to allow group-aware or negotiation-aware preemption policies. Change: The negotiator, schedd and startd claiming process was enhanced to maintain new fields RemoteAutoregroup, RemoteNegotiatingGroup and RemoteGroup, and their Submitter counterparts. Result: New job and resource attributes are available for use in defining PREEMPTION_REQUIREMENTS expressions that are group-aware and negotiation-aware.
Story Points: ---
Clone Of:
: 872763 876584 (view as bug list)
Environment:
Last Closed: 2013-03-06 13:43:00 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Condor 2885 None None None Never

  None (edit)
Description Erik Erlandson 2012-03-15 17:35:17 EDT
Description of problem:

Knowing the accounting group that a running job was matched under, for example whether a job matched under its "own" group or under "none" during autoregroup, would enable preemption policies desired by some customers.
Comment 2 Erik Erlandson 2012-03-19 12:52:42 EDT
Testing this enhancement will be to use new attribute in PREEMPTION_REQUIREMENTS to verify that it is available and functions as expected in a preemption policy
Comment 3 Erik Erlandson 2012-03-30 13:47:27 EDT
pushed to UPSTREAM-7.7.6-BZ803897-advertise-negotiated-group
Comment 4 Erik Erlandson 2012-03-30 13:49:32 EDT
First test for proposed patch: using RemoteAutoregroup

In the following, I use 'svhist' which can be found here: https://github.com/erikerlandson/bash_condor_tools

Begin with the following configuration, which sets a preemption policy (PREEMPTION_REQUIREMENTS = RemoteAutoregroup), in other words "jobs negotiating under the autoregroup phase are considered for preemption":

NEGOTIATOR_DEBUG = D_FULLDEBUG | D_MACHINE
NEGOTIATOR_USE_SLOT_WEIGHTS = FALSE
NEGOTIATOR_INTERVAL = 30
SCHEDD_INTERVAL	= 15

# turn off round robin and multiple allocation rounds
HGQ_ROUND_ROBIN_RATE = 100000000
HGQ_MAX_ALLOCATION_ROUNDS = 1

# make sure preemption can occur without friction
MAXJOBRETIREMENTTIME = 0
CLAIM_WORKLIFE = 0
PREEMPT = False
RANK = 0
NEGOTIATOR_CONSIDER_PREEMPTION = TRUE

# set a preemption requirements expression to test new attributes:
PREEMPTION_REQUIREMENTS = RemoteAutoregroup

NUM_CPUS = 20
GROUP_NAMES = a, b

GROUP_QUOTA_a = 5
GROUP_QUOTA_b = 5

GROUP_AUTOREGROUP = TRUE
GROUP_ACCEPT_SURPLUS = FALSE

GROUP_SORT_EXPR = GroupResourcesInUse / (1.0 + GroupQuota)

Set the prio factor for "a.user" and "b.user" to 10, setting them up for preemption:

$ condor_userprio -setfactor a.user@localdomain 10
The priority factor of a.user@localdomain was set to 10.000000
$ condor_userprio -setfactor b.user@localdomain 10
The priority factor of b.user@localdomain was set to 10.000000

Submit the following jobs to groups "a" and "b":

universe = vanilla
cmd = /bin/sleep
args = 600
should_transfer_files = if_needed
when_to_transfer_output = on_exit
+AccountingGroup="a.user"
queue 10
+AccountingGroup="b.user"
queue 10

Let those jobs begin running. Check the behavior of RemoteAutoregroup and RemoteNegotiatingGroup:

$ svhist RemoteAutoregroup RemoteNegotiatingGroup AccountingGroup
      5 false | a | a.user@localdomain
      5 false | b | b.user@localdomain
      5 true | <none> | a.user@localdomain
      5 true | <none> | b.user@localdomain
     20 total

from the above we see that autoregroup allowed "a.user" and "b.user" to run their remaining jobs above quota. For those jobs, RemoteAutoregroup is true, and RemoteNegotiatingGroup is "<none>", which is the correct behavior.

Now submit the following jobs. These will negotiate under group "<none>":

universe = vanilla
cmd = /bin/sleep
args = 600
should_transfer_files = if_needed
when_to_transfer_output = on_exit
+AccountingGroup="none.user"
queue 10

Wait for the negotiator to cycle, then check that these new jobs are able to preempt "a" and "b" jobs that negotiated under "<none>" previously (but not the jobs that negotiated under their respective groups):

$ svhist RemoteAutoregroup RemoteNegotiatingGroup AccountingGroup
      5 false | a | a.user@localdomain
      5 false | b | b.user@localdomain
     10 true | <none> | none.user@localdomain
     20 total
Comment 5 Erik Erlandson 2012-03-30 13:49:55 EDT
Test 2: demonstrating RemoteNegotiatingGroup (and also RemoteGroup):

Using the same configuration as with the first test, above, except for a different preemption policy:

# jobs negotiating in their group can preempt jobs that negotiated outside their group
PREEMPTION_REQUIREMENTS = (SubmitterNegotiatingGroup == SubmitterGroup) && (RemoteNegotiatingGroup != RemoteGroup)

This time, alter the prio factors for "b.user" and "none.user":

$ condor_userprio -setfactor b.user@localdomain 10
The priority factor of b.user@localdomain was set to 10.000000
$ condor_userprio -setfactor none.user@localdomain 10
The priority factor of none.user@localdomain was set to 10.000000

Next, submit the following jobs, where both "b.user" and "none.user" will acquire some slots by negotiating under "<none>":

universe = vanilla
cmd = /bin/sleep
args = 600
should_transfer_files = if_needed
when_to_transfer_output = on_exit
+AccountingGroup="none.user"
queue 10
+AccountingGroup="b.user"
queue 10

Verify that b.user got 5 of its slots from "b" via quota, and another 5 via autoregroup. Observe that RemoteGroup shows that group "none" maps to "<none>".

$ svhist RemoteAutoregroup RemoteNegotiatingGroup AccountingGroup RemoteGroup
      5 false | b | b.user@localdomain | b
      5 true | <none> | b.user@localdomain | b
     10 true | <none> | none.user@localdomain | <none>
     20 total

Now submit some jobs for "a.user" - these will have to preempt to run:

universe = vanilla
cmd = /bin/sleep
args = 600
should_transfer_files = if_needed
when_to_transfer_output = on_exit
+AccountingGroup="a.user"
queue 10

Let these new jobs have a chance to negotiate. Now we verify that "a.user" was able to preempt "b.user" jobs negotiated in autoregroup ("<none>") since it's group name is not its negotiating-group name. Jobs negotiated under "b" are not preempted. Jobs from "none.user" are also not preempted because they negotiated under their group "<none>":

$ svhist RemoteAutoregroup RemoteNegotiatingGroup AccountingGroup RemoteGroup
      5 false | a | a.user@localdomain | a
      5 false | b | b.user@localdomain | b
     10 true | <none> | none.user@localdomain | <none>
     20 total
Comment 6 Erik Erlandson 2012-03-30 13:55:56 EDT
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause:
Customers interested in configuring job preemption policies that are aware of whether a job negotiated via autoregroup, and also what particular group name a job negotiated under.

Consequence:
Negotiating job and resource ads did not provide sufficient information to allow group-aware or negotiation-aware preemption policies.

Change:
The negotiator, schedd and startd claiming process was enhanced to maintain new fields RemoteAutoregroup, RemoteNegotiatingGroup and RemoteGroup, and their Submitter counterparts.

Result:
New job and resource attributes are available for use in defining PREEMPTION_REQUIREMENTS expressions that are group-aware and negotiation-aware.
Comment 11 Martin Kudlej 2013-02-06 03:57:37 EST
Tested on RHEL 5.9/6.4 x i386/x86_64 with condor-7.8.8-0.4.1 and it works. -->VERIFIED
Comment 12 Martin Kudlej 2013-02-06 06:04:57 EST
Tested on RHEL 5.9/6.4 x i386/x86_64 with condor-7.8.8-0.4.1 and it works.
-->VERIFIED
Comment 14 errata-xmlrpc 2013-03-06 13:43:00 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0564.html

Note You need to log in before you can comment on or make changes to this bug.