Bug 628086 - GROUP_DYNAMIC_MACH_CONSTRAINT unused with HFS
Summary: GROUP_DYNAMIC_MACH_CONSTRAINT unused with HFS
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor
Version: 1.2
Hardware: All
OS: Linux
high
high
Target Milestone: 1.3
: ---
Assignee: Matthew Farrellee
QA Contact: Tomas Rusnak
URL:
Whiteboard:
Depends On:
Blocks: 528800
TreeView+ depends on / blocked
 
Reported: 2010-08-27 20:42 UTC by Jon Thomas
Modified: 2018-11-14 18:57 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously, the value from the GROUP_DYNAMIC_MACH_CONSTRAINT calculation in the negotiator was not used, because the assignment for groupArray[0].maxAllowed was prior to the calculation. With this update, this calculation now successfully provides a way to trim the absolute number of assumed matchable slots.
Clone Of:
Environment:
Last Closed: 2010-10-14 16:09:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
patch to fix constraint problem (1.60 KB, patch)
2010-08-27 20:42 UTC, Jon Thomas
no flags Details | Diff
owner slots patch (2.38 KB, patch)
2010-09-07 20:13 UTC, Jon Thomas
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2010:0773 0 normal SHIPPED_LIVE Moderate: Red Hat Enterprise MRG Messaging and Grid Version 1.3 2010-10-14 15:56:44 UTC

Description Jon Thomas 2010-08-27 20:42:37 UTC
Created attachment 441610 [details]
patch to fix constraint problem

The value from the GROUP_DYNAMIC_MACH_CONSTRAINT calculation in negotiator is unused.

The assignment for groupArray[0].maxAllowed is prior to the calculation, but should be after the calculation.

Comment 2 Jon Thomas 2010-08-30 13:08:25 UTC
Easiest way to repro is create one group with quota and use GROUP_DYNAMIC_MACH_CONSTRAINT to trim a slot count and observe in negotiatorlog that

a) the slot count is actually reduced such as:

08/30/10 09:00:09 GROUP_DYNAMIC_MACH_CONSTRAINT constraint reduces machine count from 39 to 38

b) that the maxAllowed value for groupArray[0] is not set to the reduced value such as:

08/30/10 09:00:09 negotiationtime: finished sort - slots 38 group  auto true quota 1.000000 maxAllowed 39.000000 numsubmits 0 parent -1 child 2  left -1 right -1 i 0

In the above, the maxAllowed value for groupArray[0] should be 38, not 39.

Comment 3 Matthew Farrellee 2010-08-30 13:15:36 UTC
Can you simplify with GROUP_DYNAMIC_MACH_CONSTRAINT = FALSE?

FAIL/PASS is determined by jobs running or not?

Comment 4 Jon Thomas 2010-08-30 14:00:20 UTC
That would likely work, but I tend to look at the logs so I can isolate hfs behavior from matching, preemption, etc issues.

Comment 5 Jon Thomas 2010-09-07 20:13:51 UTC
Created attachment 445775 [details]
owner slots patch

Patch changes the order so the number of slots relects the machine constraints. 
Patch implements NEG_TRIM_OWNER_STARTDS

Comment 6 Matthew Farrellee 2010-09-09 15:49:44 UTC
Built in 7.4.4-0.11 sans NEG_TRIM_OWNER_STARTDS

Comment 8 Tomas Rusnak 2010-09-16 12:21:41 UTC
The GROUP_DYNAMIC_MACH_CONSTRAINT and/or NEG_TRIM_OWNER_STARTDS looks like undocumented feature. 

Please, could you specify, how to reproduce/verify this, and provide some information, how the parameter affect dynamic group?

Comment 9 Matthew Farrellee 2010-09-16 16:14:36 UTC
NEG_TRIM_OWNER_STARTDS does not exist.

GROUP_DYNAMIC_MACH_CONSTRAINT assists the Negotiator in converting dynamic group quotas, %'s, in absolute numbers. If you have a pool of 1000 slots and two groups each with 0.5 dynamic quota (50%), the negotiator will assume it can give 500 matches to each group. If in fact 998 of those slots are in Owner state, there are only 2 slots and the split should be 1 each. GROUP_DYNAMIC_MACH_CONSTRAINT provides a way to trim the absolute number of assumed matchable slots to make the absolute quotas more realistic.

It should probably default to State != "Owner" && Cpus > 0

Comment 12 Tomas Rusnak 2010-09-30 10:41:55 UTC
Reproduced on RHEL5/i686 with:

# condor -v
$CondorVersion: 7.4.4 Aug  9 2010 BuildID: RH-7.4.4-0.9.el5 PRE-RELEASE $
$CondorPlatform: I386-LINUX_RHEL5 $

# tail -f /var/log/condor/NegotiatorLog  | grep -i -E "MACH|sort"
09/30/10 06:39:04 GROUP_DYNAMIC_MACH_CONSTRAINT constraint reduces machine count from 10 to 9
09/30/10 06:39:04 negotiationtime:sorting
09/30/10 06:39:04 Sort : sorting group vector
09/30/10 06:39:04 Sorting : grouparray group b parent -1 child -1  left -1 right -1 i 0
09/30/10 06:39:04 Sort : stage two
09/30/10 06:39:04 midsort : grouparray group  parent -1 child 2  left -1 right -1 i 0
09/30/10 06:39:04 midsort : grouparray group a parent 0 child -1  left -1 right 2 i 1
09/30/10 06:39:04 midsort : grouparray group b parent 0 child -1  left 1 right -1 i 2
09/30/10 06:39:04 Sorted : grouparray group  parent -1 child 2  left -1 right -1 i 0
09/30/10 06:39:04 Sorted : grouparray group a parent 0 child -1  left -1 right 2 i 1
09/30/10 06:39:04 Sorted : grouparray group b parent 0 child -1  left 1 right -1 i 2
09/30/10 06:39:04 Sort : leaving
09/30/10 06:39:04 negotiationtime: finished sort - slots 9 group  auto true quota 1.000000 maxAllowed 10.000000 numsubmits 0 parent -1 child 2  left -1 right -1 i 0
09/30/10 06:39:04 negotiationtime: finished sort - slots 9 group a auto true quota 0.500000 maxAllowed 0.000000 numsubmits 0 parent 0 child -1  left -1 right 2 i 1
09/30/10 06:39:04 negotiationtime: finished sort - slots 9 group b auto true quota 0.500000 maxAllowed 0.000000 numsubmits 0 parent 0 child -1  left 1 right -1 i 2

Comment 13 Tomas Rusnak 2010-09-30 13:21:57 UTC
Retested on current condor-7.4.4-0.16 on all supported platforms - RHEL4,RHEL5/x86,x86_64.

Configuration:
NUM_CPUS = 10
GROUP_DYNAMIC_MACH_CONSTRAINT = ((SlotID != 4) && (Cpus > 0))
GROUP_NAMES = a,b
GROUP_QUOTA_DYNAMIC_a = 0.5
GROUP_QUOTA_DYNAMIC_b = 0.5
GROUP_AUTOREGROUP_a = TRUE
GROUP_AUTOREGROUP_b = TRUE
ALL_DEBUG = D_FULLDEBUG

==========================================================================
$CondorVersion: 7.4.4 Sep 27 2010 BuildID: RH-7.4.4-0.16.el5 PRE-RELEASE $
$CondorPlatform: I386-LINUX_RHEL5 $

# tail -f /var/log/condor/NegotiatorLog  | grep -i -E "MACH|sort"
09/30/10 08:27:34 GROUP_DYNAMIC_MACH_CONSTRAINT constraint reduces machine count from 10 to 9
09/30/10 08:27:34 negotiationtime:sorting
09/30/10 08:27:34 Sort : sorting group vector
09/30/10 08:27:34 Sorting : grouparray group b parent -1 child -1  left -1 right -1 i 0
09/30/10 08:27:34 Sort : stage two
09/30/10 08:27:34 midsort : grouparray group  parent -1 child 2  left -1 right -1 i 0
09/30/10 08:27:34 midsort : grouparray group a parent 0 child -1  left -1 right 2 i 1
09/30/10 08:27:34 midsort : grouparray group b parent 0 child -1  left 1 right -1 i 2
09/30/10 08:27:34 Sorted : grouparray group  parent -1 child 2  left -1 right -1 i 0
09/30/10 08:27:34 Sorted : grouparray group a parent 0 child -1  left -1 right 2 i 1
09/30/10 08:27:34 Sorted : grouparray group b parent 0 child -1  left 1 right -1 i 2
09/30/10 08:27:34 Sort : leaving
09/30/10 08:27:34 negotiationtime: finished sort - slots 9 group  auto true quota 1.000000 maxAllowed 9.000000 numsubmits 0 parent -1 child 2  left -1 right -1 i 0
09/30/10 08:27:34 negotiationtime: finished sort - slots 9 group a auto true quota 0.500000 maxAllowed 0.000000 numsubmits 0 parent 0 child -1  left -1 right 2 i 1
09/30/10 08:27:34 negotiationtime: finished sort - slots 9 group b auto true quota 0.500000 maxAllowed 0.000000 numsubmits 0 parent 0 child -1  left 1 right -1 i 2
==========================================================================
$CondorVersion: 7.4.4 Sep 27 2010 BuildID: RH-7.4.4-0.16.el5 PRE-RELEASE $
$CondorPlatform: X86_64-LINUX_RHEL5 $

# tail -f /var/log/condor/NegotiatorLog  | grep -i -E "MACH|sort"
09/30/10 09:07:39 GROUP_DYNAMIC_MACH_CONSTRAINT constraint reduces machine count from 10 to 9
09/30/10 09:07:39 negotiationtime:sorting
09/30/10 09:07:39 Sort : sorting group vector
09/30/10 09:07:39 Sorting : grouparray group b parent -1 child -1  left -1 right -1 i 0
09/30/10 09:07:39 Sort : stage two
09/30/10 09:07:39 midsort : grouparray group  parent -1 child 2  left -1 right -1 i 0
09/30/10 09:07:39 midsort : grouparray group a parent 0 child -1  left -1 right 2 i 1
09/30/10 09:07:39 midsort : grouparray group b parent 0 child -1  left 1 right -1 i 2
09/30/10 09:07:39 Sorted : grouparray group  parent -1 child 2  left -1 right -1 i 0
09/30/10 09:07:39 Sorted : grouparray group a parent 0 child -1  left -1 right 2 i 1
09/30/10 09:07:39 Sorted : grouparray group b parent 0 child -1  left 1 right -1 i 2
09/30/10 09:07:39 Sort : leaving
09/30/10 09:07:39 negotiationtime: finished sort - slots 9 group  auto true quota 1.000000 maxAllowed 9.000000 numsubmits 0 parent -1 child 2  left -1 right -1 i 0
09/30/10 09:07:39 negotiationtime: finished sort - slots 9 group a auto true quota 0.500000 maxAllowed 0.000000 numsubmits 0 parent 0 child -1  left -1 right 2 i 1
09/30/10 09:07:39 negotiationtime: finished sort - slots 9 group b auto true quota 0.500000 maxAllowed 0.000000 numsubmits 0 parent 0 child -1  left 1 right -1 i 2
==========================================================================
$CondorVersion: 7.4.4 Sep 27 2010 BuildID: RH-7.4.4-0.16.el4 PRE-RELEASE $
$CondorPlatform: I386-LINUX_RHEL4 $

# tail -f /var/log/condor/NegotiatorLog  | grep -i -E "MACH|sort"
09/30/10 09:12:27 GROUP_DYNAMIC_MACH_CONSTRAINT constraint reduces machine count from 10 to 9
09/30/10 09:12:27 negotiationtime:sorting
09/30/10 09:12:27 Sort : sorting group vector
09/30/10 09:12:27 Sorting : grouparray group b parent -1 child -1  left -1 right -1 i 0
09/30/10 09:12:27 Sort : stage two
09/30/10 09:12:27 midsort : grouparray group  parent -1 child 2  left -1 right -1 i 0
09/30/10 09:12:27 midsort : grouparray group a parent 0 child -1  left -1 right 2 i 1
09/30/10 09:12:27 midsort : grouparray group b parent 0 child -1  left 1 right -1 i 2
09/30/10 09:12:27 Sorted : grouparray group  parent -1 child 2  left -1 right -1 i 0
09/30/10 09:12:27 Sorted : grouparray group a parent 0 child -1  left -1 right 2 i 1
09/30/10 09:12:27 Sorted : grouparray group b parent 0 child -1  left 1 right -1 i 2
09/30/10 09:12:27 Sort : leaving
09/30/10 09:12:27 negotiationtime: finished sort - slots 9 group  auto true quota 1.000000 maxAllowed 9.000000 numsubmits 0 parent -1 child 2  left -1 right -1 i 0
09/30/10 09:12:27 negotiationtime: finished sort - slots 9 group a auto true quota 0.500000 maxAllowed 0.000000 numsubmits 0 parent 0 child -1  left -1 right 2 i 1
09/30/10 09:12:27 negotiationtime: finished sort - slots 9 group b auto true quota 0.500000 maxAllowed 0.000000 numsubmits 0 parent 0 child -1  left 1 right -1 i 2
==========================================================================
$CondorVersion: 7.4.4 Sep 27 2010 BuildID: RH-7.4.4-0.16.el4 PRE-RELEASE $
$CondorPlatform: X86_64-LINUX_RHEL4 $

# tail -f /var/log/condor/NegotiatorLog  | grep -i -E "MACH|sort"
09/30/10 09:12:58 GROUP_DYNAMIC_MACH_CONSTRAINT constraint reduces machine count from 10 to 9
09/30/10 09:12:58 negotiationtime:sorting
09/30/10 09:12:58 Sort : sorting group vector
09/30/10 09:12:58 Sorting : grouparray group b parent -1 child -1  left -1 right -1 i 0
09/30/10 09:12:58 Sort : stage two
09/30/10 09:12:58 midsort : grouparray group  parent -1 child 2  left -1 right -1 i 0
09/30/10 09:12:58 midsort : grouparray group a parent 0 child -1  left -1 right 2 i 1
09/30/10 09:12:58 midsort : grouparray group b parent 0 child -1  left 1 right -1 i 2
09/30/10 09:12:58 Sorted : grouparray group  parent -1 child 2  left -1 right -1 i 0
09/30/10 09:12:58 Sorted : grouparray group a parent 0 child -1  left -1 right 2 i 1
09/30/10 09:12:58 Sorted : grouparray group b parent 0 child -1  left 1 right -1 i 2
09/30/10 09:12:58 Sort : leaving
09/30/10 09:12:58 negotiationtime: finished sort - slots 9 group  auto true quota 1.000000 maxAllowed 9.000000 numsubmits 0 parent -1 child 2  left -1 right -1 i 0
09/30/10 09:12:58 negotiationtime: finished sort - slots 9 group a auto true quota 0.500000 maxAllowed 0.000000 numsubmits 0 parent 0 child -1  left -1 right 2 i 1
09/30/10 09:12:58 negotiationtime: finished sort - slots 9 group b auto true quota 0.500000 maxAllowed 0.000000 numsubmits 0 parent 0 child -1  left 1 right -1 i 2

Seems to be fixed, it's could be verified after documentation will be available.

Comment 14 Tomas Rusnak 2010-10-01 14:33:55 UTC
Patch included in current version of packages. New BZ639358 for documentation was created.
>>> VERIFIED

Comment 15 Florian Nadge 2010-10-07 15:36:13 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Previously, the value from the GROUP_DYNAMIC_MACH_CONSTRAINT calculation in the negotiator was not used, because the assignment for groupArray[0].maxAllowed was prior to the calculation. With this update, this calculation now successfully provides a way to trim the absolute number of assumed matchable slots.

Comment 19 errata-xmlrpc 2010-10-14 16:09:07 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0773.html


Note You need to log in before you can comment on or make changes to this bug.