Bug 712975 - enable additional GROUP_DYNAMIC_MACH_CONSTRAINT behavior
Summary: enable additional GROUP_DYNAMIC_MACH_CONSTRAINT behavior
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor
Version: 2.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: 2.0.1
: ---
Assignee: Erik Erlandson
QA Contact: Tomas Rusnak
URL:
Whiteboard:
Depends On:
Blocks: 723887
TreeView+ depends on / blocked
 
Reported: 2011-06-13 19:22 UTC by Jon Thomas
Modified: 2012-03-28 09:43 UTC (History)
5 users (show)

Fixed In Version: condor-7.6.3-0.1
Doc Type: Bug Fix
Doc Text:
Cause: GROUP_DYNAMIC_MACH_CONSTRAINT did not cause startd resource ads to be removed from the ad list Consequence: Resulted in resource traversal overhead that could be avoided in most cases. Fix: A new parameter NEGOTIATOR_STARTD_CONSTRAINT_REMOVE was added, which if set to TRUE will cause ads not satisfying GROUP_DYNAMIC_MACH_CONSTRAINT to be removed. Defaults to FALSE for backward compatability. Result: Traversal of resource ads not matching GROUP_DYNAMIC_MACH_CONSTRAINT can now be avoided if desired.
Clone Of:
Environment:
Last Closed: 2011-09-07 16:43:11 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
startdconstraint patch (9.05 KB, patch)
2011-06-13 19:22 UTC, Jon Thomas
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:1249 0 normal SHIPPED_LIVE Moderate: Red Hat Enterprise MRG Grid 2.0 security, bug fix and enhancement update 2011-09-07 16:40:45 UTC

Description Jon Thomas 2011-06-13 19:22:59 UTC
Created attachment 504525 [details]
startdconstraint patch

Currently, the GROUP_DYNAMIC_MACH_CONSTRAINT expression counts the startdAds that match the expression and group quotas are based upon this count. The issue is that the negotiation loop iterates ( multiple times ) through these startdAds. The ability to remove these startdAds would increase performance.

Adding patch to remove startdAds not matching GROUP_DYNAMIC_MACH_CONSTRAINT. Changes GROUP_DYNAMIC_MACH_CONSTRAINT to NEG_STARTD_CONSTRAINT (keeps backward compat) and adds NEG_STARTD_CONSTRAINT_REMOVE to toggle new behavior.

upstream: https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2232

Comment 1 Matthew Farrellee 2011-06-13 19:50:48 UTC
What about further reducing overhead and only querying the Collector for ads that will be used during negotiation?

Comment 2 Jon Thomas 2011-06-13 20:11:23 UTC
I thought of that too, but wasn't sure of that broke functional boundaries.

Comment 3 Erik Erlandson 2011-06-17 23:13:57 UTC
Fix upstream targeted for 7.6.2

Comment 4 Erik Erlandson 2011-06-17 23:21:13 UTC
repro/test

This fix introduces a new param NEGOTIATOR_STARTD_CONSTRAINT_REMOVE, which defaults to FALSE for backward compatability. If set to TRUE, then any ads not satisfying GROUP_DYNAMIC_MACH_CONSTRAINT are removed from the startd ad list, as well as not being counted for purposes of quota assignment.

Use the following config. Note that configuring no groups actually makes testing easier, since it allows the changes in startd length to show up in the negotiator ad.

NEGOTIATOR_DEBUG = D_FULLDEBUG
NEGOTIATOR_USE_SLOT_WEIGHTS = FALSE

NEGOTIATOR_INTERVAL = 30
SCHEDD_INTERVAL = 15

NUM_CPUS = 20

GROUP_NAMES =

NEGOTIATOR_STARTD_CONSTRAINT_REMOVE = TRUE
GROUP_DYNAMIC_MACH_CONSTRAINT = (SlotID <= 10)

Amusingly, no jobs are necessary to repro or test new behavior. Just allow the negotiator to run for a few cycles.

Before the fix, observe the candidate slots remain unchanged from total:

$ condor_status -neg -l | grep -e TotalSlots -e CandidateSlots
LastNegotiationCycleTotalSlots0 = 20
LastNegotiationCycleTotalSlots1 = 20
LastNegotiationCycleTotalSlots2 = 20
LastNegotiationCycleCandidateSlots0 = 20
LastNegotiationCycleCandidateSlots1 = 20
LastNegotiationCycleCandidateSlots2 = 20

After the fix, we see that the length of the startd ad list actually changed, as the non-satisfying slots were removed:

$ condor_status -neg -l | grep -e TotalSlots -e CandidateSlots
LastNegotiationCycleTotalSlots0 = 20
LastNegotiationCycleTotalSlots1 = 20
LastNegotiationCycleTotalSlots2 = 20
LastNegotiationCycleCandidateSlots0 = 10
LastNegotiationCycleCandidateSlots1 = 10
LastNegotiationCycleCandidateSlots2 = 10

Comment 5 Erik Erlandson 2011-06-17 23:21:14 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause:
GROUP_DYNAMIC_MACH_CONSTRAINT did not cause startd resource ads to be removed from the ad list

Consequence: Resulted in resource traversal overhead that could be avoided in most cases.

Fix:
A new parameter NEGOTIATOR_STARTD_CONSTRAINT_REMOVE was added, which if set to TRUE will cause ads not satisfying GROUP_DYNAMIC_MACH_CONSTRAINT to be removed.  Defaults to FALSE for backward compatability.

Result:
Traversal of resource ads not matching GROUP_DYNAMIC_MACH_CONSTRAINT can now be avoided if desired.

Comment 6 Jon Thomas 2011-06-27 19:23:04 UTC
The previous patch had some code to dprintf the fact that the startdads were trimmed. The new one doesn't which makes it difficult to debug. Any chance that can be added back in?

Comment 8 Timothy St. Clair 2011-07-05 16:04:10 UTC
Related tracking from review: 

https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2277

Comment 12 Tomas Rusnak 2011-07-25 12:19:17 UTC
Reproduced on:

# condor -v
$CondorVersion: 7.6.0 Mar 30 2011 BuildID: RH-7.6.0-0.4.el5 PRE-RELEASE-GRID $
$CondorPlatform: X86_64-Redhat_5.6 $

# condor_status -neg -l | grep -e TotalSlots -e CandidateSlots
LastNegotiationCycleTotalSlots0 = 20
LastNegotiationCycleTotalSlots1 = 20
LastNegotiationCycleTotalSlots2 = 20
LastNegotiationCycleCandidateSlots0 = 20
LastNegotiationCycleCandidateSlots1 = 20
LastNegotiationCycleCandidateSlots2 = 20

Comment 13 Tomas Rusnak 2011-07-25 12:33:25 UTC
Retested over all supported platforms x86,x86_64/RHEL5,RHEL6 with:

condor-7.6.3-0.2

# condor_status -neg -l | grep -e TotalSlots -e CandidateSlots
LastNegotiationCycleTotalSlots0 = 20
LastNegotiationCycleTotalSlots1 = 20
LastNegotiationCycleTotalSlots2 = 20
LastNegotiationCycleCandidateSlots0 = 10
LastNegotiationCycleCandidateSlots1 = 10
LastNegotiationCycleCandidateSlots2 = 10

New parameter NEGOTIATOR_STARTD_CONSTRAINT_REMOVE is working as expected.

>>> VERIFIED

Comment 14 errata-xmlrpc 2011-09-07 16:43:11 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-1249.html


Note You need to log in before you can comment on or make changes to this bug.