Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 712975 - enable additional GROUP_DYNAMIC_MACH_CONSTRAINT behavior
enable additional GROUP_DYNAMIC_MACH_CONSTRAINT behavior
Status: CLOSED ERRATA
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor (Show other bugs)
2.0
Unspecified Unspecified
medium Severity medium
: 2.0.1
: ---
Assigned To: Erik Erlandson
Tomas Rusnak
:
Depends On:
Blocks: 723887
  Show dependency treegraph
 
Reported: 2011-06-13 15:22 EDT by Jon Thomas
Modified: 2012-03-28 05:43 EDT (History)
5 users (show)

See Also:
Fixed In Version: condor-7.6.3-0.1
Doc Type: Bug Fix
Doc Text:
Cause: GROUP_DYNAMIC_MACH_CONSTRAINT did not cause startd resource ads to be removed from the ad list Consequence: Resulted in resource traversal overhead that could be avoided in most cases. Fix: A new parameter NEGOTIATOR_STARTD_CONSTRAINT_REMOVE was added, which if set to TRUE will cause ads not satisfying GROUP_DYNAMIC_MACH_CONSTRAINT to be removed. Defaults to FALSE for backward compatability. Result: Traversal of resource ads not matching GROUP_DYNAMIC_MACH_CONSTRAINT can now be avoided if desired.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-09-07 12:43:11 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
startdconstraint patch (9.05 KB, patch)
2011-06-13 15:22 EDT, Jon Thomas
no flags Details | Diff


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:1249 normal SHIPPED_LIVE Moderate: Red Hat Enterprise MRG Grid 2.0 security, bug fix and enhancement update 2011-09-07 12:40:45 EDT

  None (edit)
Description Jon Thomas 2011-06-13 15:22:59 EDT
Created attachment 504525 [details]
startdconstraint patch

Currently, the GROUP_DYNAMIC_MACH_CONSTRAINT expression counts the startdAds that match the expression and group quotas are based upon this count. The issue is that the negotiation loop iterates ( multiple times ) through these startdAds. The ability to remove these startdAds would increase performance.

Adding patch to remove startdAds not matching GROUP_DYNAMIC_MACH_CONSTRAINT. Changes GROUP_DYNAMIC_MACH_CONSTRAINT to NEG_STARTD_CONSTRAINT (keeps backward compat) and adds NEG_STARTD_CONSTRAINT_REMOVE to toggle new behavior.

upstream: https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2232
Comment 1 Matthew Farrellee 2011-06-13 15:50:48 EDT
What about further reducing overhead and only querying the Collector for ads that will be used during negotiation?
Comment 2 Jon Thomas 2011-06-13 16:11:23 EDT
I thought of that too, but wasn't sure of that broke functional boundaries.
Comment 3 Erik Erlandson 2011-06-17 19:13:57 EDT
Fix upstream targeted for 7.6.2
Comment 4 Erik Erlandson 2011-06-17 19:21:13 EDT
repro/test

This fix introduces a new param NEGOTIATOR_STARTD_CONSTRAINT_REMOVE, which defaults to FALSE for backward compatability. If set to TRUE, then any ads not satisfying GROUP_DYNAMIC_MACH_CONSTRAINT are removed from the startd ad list, as well as not being counted for purposes of quota assignment.

Use the following config. Note that configuring no groups actually makes testing easier, since it allows the changes in startd length to show up in the negotiator ad.

NEGOTIATOR_DEBUG = D_FULLDEBUG
NEGOTIATOR_USE_SLOT_WEIGHTS = FALSE

NEGOTIATOR_INTERVAL = 30
SCHEDD_INTERVAL = 15

NUM_CPUS = 20

GROUP_NAMES =

NEGOTIATOR_STARTD_CONSTRAINT_REMOVE = TRUE
GROUP_DYNAMIC_MACH_CONSTRAINT = (SlotID <= 10)

Amusingly, no jobs are necessary to repro or test new behavior. Just allow the negotiator to run for a few cycles.

Before the fix, observe the candidate slots remain unchanged from total:

$ condor_status -neg -l | grep -e TotalSlots -e CandidateSlots
LastNegotiationCycleTotalSlots0 = 20
LastNegotiationCycleTotalSlots1 = 20
LastNegotiationCycleTotalSlots2 = 20
LastNegotiationCycleCandidateSlots0 = 20
LastNegotiationCycleCandidateSlots1 = 20
LastNegotiationCycleCandidateSlots2 = 20

After the fix, we see that the length of the startd ad list actually changed, as the non-satisfying slots were removed:

$ condor_status -neg -l | grep -e TotalSlots -e CandidateSlots
LastNegotiationCycleTotalSlots0 = 20
LastNegotiationCycleTotalSlots1 = 20
LastNegotiationCycleTotalSlots2 = 20
LastNegotiationCycleCandidateSlots0 = 10
LastNegotiationCycleCandidateSlots1 = 10
LastNegotiationCycleCandidateSlots2 = 10
Comment 5 Erik Erlandson 2011-06-17 19:21:14 EDT
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause:
GROUP_DYNAMIC_MACH_CONSTRAINT did not cause startd resource ads to be removed from the ad list

Consequence: Resulted in resource traversal overhead that could be avoided in most cases.

Fix:
A new parameter NEGOTIATOR_STARTD_CONSTRAINT_REMOVE was added, which if set to TRUE will cause ads not satisfying GROUP_DYNAMIC_MACH_CONSTRAINT to be removed.  Defaults to FALSE for backward compatability.

Result:
Traversal of resource ads not matching GROUP_DYNAMIC_MACH_CONSTRAINT can now be avoided if desired.
Comment 6 Jon Thomas 2011-06-27 15:23:04 EDT
The previous patch had some code to dprintf the fact that the startdads were trimmed. The new one doesn't which makes it difficult to debug. Any chance that can be added back in?
Comment 8 Timothy St. Clair 2011-07-05 12:04:10 EDT
Related tracking from review: 

https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2277
Comment 12 Tomas Rusnak 2011-07-25 08:19:17 EDT
Reproduced on:

# condor -v
$CondorVersion: 7.6.0 Mar 30 2011 BuildID: RH-7.6.0-0.4.el5 PRE-RELEASE-GRID $
$CondorPlatform: X86_64-Redhat_5.6 $

# condor_status -neg -l | grep -e TotalSlots -e CandidateSlots
LastNegotiationCycleTotalSlots0 = 20
LastNegotiationCycleTotalSlots1 = 20
LastNegotiationCycleTotalSlots2 = 20
LastNegotiationCycleCandidateSlots0 = 20
LastNegotiationCycleCandidateSlots1 = 20
LastNegotiationCycleCandidateSlots2 = 20
Comment 13 Tomas Rusnak 2011-07-25 08:33:25 EDT
Retested over all supported platforms x86,x86_64/RHEL5,RHEL6 with:

condor-7.6.3-0.2

# condor_status -neg -l | grep -e TotalSlots -e CandidateSlots
LastNegotiationCycleTotalSlots0 = 20
LastNegotiationCycleTotalSlots1 = 20
LastNegotiationCycleTotalSlots2 = 20
LastNegotiationCycleCandidateSlots0 = 10
LastNegotiationCycleCandidateSlots1 = 10
LastNegotiationCycleCandidateSlots2 = 10

New parameter NEGOTIATOR_STARTD_CONSTRAINT_REMOVE is working as expected.

>>> VERIFIED
Comment 14 errata-xmlrpc 2011-09-07 12:43:11 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-1249.html

Note You need to log in before you can comment on or make changes to this bug.