Created attachment 504525 [details] startdconstraint patch Currently, the GROUP_DYNAMIC_MACH_CONSTRAINT expression counts the startdAds that match the expression and group quotas are based upon this count. The issue is that the negotiation loop iterates ( multiple times ) through these startdAds. The ability to remove these startdAds would increase performance. Adding patch to remove startdAds not matching GROUP_DYNAMIC_MACH_CONSTRAINT. Changes GROUP_DYNAMIC_MACH_CONSTRAINT to NEG_STARTD_CONSTRAINT (keeps backward compat) and adds NEG_STARTD_CONSTRAINT_REMOVE to toggle new behavior. upstream: https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2232
What about further reducing overhead and only querying the Collector for ads that will be used during negotiation?
I thought of that too, but wasn't sure of that broke functional boundaries.
Fix upstream targeted for 7.6.2
repro/test This fix introduces a new param NEGOTIATOR_STARTD_CONSTRAINT_REMOVE, which defaults to FALSE for backward compatability. If set to TRUE, then any ads not satisfying GROUP_DYNAMIC_MACH_CONSTRAINT are removed from the startd ad list, as well as not being counted for purposes of quota assignment. Use the following config. Note that configuring no groups actually makes testing easier, since it allows the changes in startd length to show up in the negotiator ad. NEGOTIATOR_DEBUG = D_FULLDEBUG NEGOTIATOR_USE_SLOT_WEIGHTS = FALSE NEGOTIATOR_INTERVAL = 30 SCHEDD_INTERVAL = 15 NUM_CPUS = 20 GROUP_NAMES = NEGOTIATOR_STARTD_CONSTRAINT_REMOVE = TRUE GROUP_DYNAMIC_MACH_CONSTRAINT = (SlotID <= 10) Amusingly, no jobs are necessary to repro or test new behavior. Just allow the negotiator to run for a few cycles. Before the fix, observe the candidate slots remain unchanged from total: $ condor_status -neg -l | grep -e TotalSlots -e CandidateSlots LastNegotiationCycleTotalSlots0 = 20 LastNegotiationCycleTotalSlots1 = 20 LastNegotiationCycleTotalSlots2 = 20 LastNegotiationCycleCandidateSlots0 = 20 LastNegotiationCycleCandidateSlots1 = 20 LastNegotiationCycleCandidateSlots2 = 20 After the fix, we see that the length of the startd ad list actually changed, as the non-satisfying slots were removed: $ condor_status -neg -l | grep -e TotalSlots -e CandidateSlots LastNegotiationCycleTotalSlots0 = 20 LastNegotiationCycleTotalSlots1 = 20 LastNegotiationCycleTotalSlots2 = 20 LastNegotiationCycleCandidateSlots0 = 10 LastNegotiationCycleCandidateSlots1 = 10 LastNegotiationCycleCandidateSlots2 = 10
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: GROUP_DYNAMIC_MACH_CONSTRAINT did not cause startd resource ads to be removed from the ad list Consequence: Resulted in resource traversal overhead that could be avoided in most cases. Fix: A new parameter NEGOTIATOR_STARTD_CONSTRAINT_REMOVE was added, which if set to TRUE will cause ads not satisfying GROUP_DYNAMIC_MACH_CONSTRAINT to be removed. Defaults to FALSE for backward compatability. Result: Traversal of resource ads not matching GROUP_DYNAMIC_MACH_CONSTRAINT can now be avoided if desired.
The previous patch had some code to dprintf the fact that the startdads were trimmed. The new one doesn't which makes it difficult to debug. Any chance that can be added back in?
Related tracking from review: https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2277
Reproduced on: # condor -v $CondorVersion: 7.6.0 Mar 30 2011 BuildID: RH-7.6.0-0.4.el5 PRE-RELEASE-GRID $ $CondorPlatform: X86_64-Redhat_5.6 $ # condor_status -neg -l | grep -e TotalSlots -e CandidateSlots LastNegotiationCycleTotalSlots0 = 20 LastNegotiationCycleTotalSlots1 = 20 LastNegotiationCycleTotalSlots2 = 20 LastNegotiationCycleCandidateSlots0 = 20 LastNegotiationCycleCandidateSlots1 = 20 LastNegotiationCycleCandidateSlots2 = 20
Retested over all supported platforms x86,x86_64/RHEL5,RHEL6 with: condor-7.6.3-0.2 # condor_status -neg -l | grep -e TotalSlots -e CandidateSlots LastNegotiationCycleTotalSlots0 = 20 LastNegotiationCycleTotalSlots1 = 20 LastNegotiationCycleTotalSlots2 = 20 LastNegotiationCycleCandidateSlots0 = 10 LastNegotiationCycleCandidateSlots1 = 10 LastNegotiationCycleCandidateSlots2 = 10 New parameter NEGOTIATOR_STARTD_CONSTRAINT_REMOVE is working as expected. >>> VERIFIED
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-1249.html