Bug 519183
Summary: | Matchmaker code doesn't implement fair share correctly | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise MRG | Reporter: | Jon Thomas <jthomas> | ||||||||
Component: | grid | Assignee: | Jon Thomas <jthomas> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Lubos Trilety <ltrilety> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | 1.0 | CC: | ltrilety, matt, tao | ||||||||
Target Milestone: | 1.3 | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | All | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: |
Previously, when using group qoutas and the 'autoregroup' function, the group quotas did not grow relative to their initial proportions. This was caused by the fact that all users (including group users) were negotiated at the same time using prio normalization based on all user prios. With this update, group quotas scale into unused slots appropriately.
|
Story Points: | --- | ||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2010-10-14 16:12:14 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Jon Thomas
2009-08-25 14:50:49 UTC
posted upstream too http://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=712 Created attachment 361117 [details]
group hfs patch
This patch fixes the issue where the quota proportions are not carried over when using AUTOREGROUP.
Previously, AUTOREGROUP just meant that groups get scheduled against their quota and then would have an additional chance to to use an unused slot relative to the userprios of all users. This meant that a group with less quota than another group might see higher usage than the other group.
notes on negotiation behavior: 1) If no groups, negotiation occurs based upon userprios 2) Jobs from users with no accounting group or an unknown accounting group will only execute if the config provides unclaimed quota. This means total group quota is less than total number of slots or that dynamic group quota sums to less than 1.0. example where jobs do not run: GROUP_QUOTA_DYNAMIC_group_a = 0.50 GROUP_QUOTA_DYNAMIC_group_b = 0.30 GROUP_QUOTA_DYNAMIC_group_c = 0.20 example where jobs do run: GROUP_QUOTA_DYNAMIC_group_a = 0.50 GROUP_QUOTA_DYNAMIC_group_b = 0.30 GROUP_QUOTA_DYNAMIC_group_c = 0.10 in this example, 10% of slots are not designated as belonging to a group and is thus unclaimed. 3) Groups that have claimed quota, but not enough submitters to fill their quota return their unusable quota to the unclaimed quota pool. For example a user with a 50 slot quota and only 25 job submissions will return 25 slots to unclaimed quota. 4) Group users with autoregroup set to TRUE and with enough submitters to utilize more quota, claim unclaimed quota based upon their designated quota as a percent of total slots. Hence a group with GROUP_QUOTA_DYNAMIC_group_a = 0.50 will consume 50% of unclaimed quota. 5) In the case where there are non-group users and unclaimed quota, non-group users claim unclaimed quota based upon unclaimed quota as a percent of total slots. Hence, in this scenario: GROUP_QUOTA_DYNAMIC_group_a = 0.50 GROUP_QUOTA_DYNAMIC_group_b = 0.30 GROUP_QUOTA_DYNAMIC_group_c = 0.10 non-group users will claim 10% of unclaimed slots. 5) The claiming of unclaimed quota is iterative, so all unclaimed quota will be claimed if: a) there are non-group users with enough job submissions to use more quota b) groups with job submissions greater than their claimed quota have autoregroup set to true 6) If a group has no quota, users of that group become part of the pool of non-group users. Created attachment 361719 [details]
new patch
new patch that preserves old behavior for static configs
Created attachment 363702 [details]
fixed a small problem created when variable names were changed
Previous patch had a small issue introduced when I changed variable names and streamlined the number of jobs calculation. Behavior would have been slightly different when (non-group user job count > 0) && (non-group user job count < non-group user quota) in that the unused quota would not be added to the unclaimed pool.
*** Bug 523495 has been marked as a duplicate of this bug. *** Built since 7.4.0-0.5. Configuration: NUM_CPUS = 100 GROUP_NAMES = group0, group1 GROUP_QUOTA_DYNAMIC_group0 = 0.9 GROUP_QUOTA_DYNAMIC_group1 = 0.09 GROUP_AUTOREGROUP_group0 = FALSE GROUP_AUTOREGROUP_group1 = TRUE Reproduction scenario: 1. stop negotiator # condor_off -subsystem negotiator Sent "Kill-Daemon" command for "negotiator" to local master 2. submit two jobs $ cat group1.submit cmd=/bin/sleep args=10 +AccountingGroup="group1" queue 100 $ condor_submit group1.submit Submitting job(s)... 100 job(s) submitted to cluster 1. $ cat nogroup.submit cmd=/bin/sleep args=10 queue 100 $ condor_submit nogroup.submit Submitting job(s)... 100 job(s) submitted to cluster 2. 3. wait a minute and start negotiator # condor_on -subsystem negotiator Sent "Spawn-Daemon" command for "negotiator" to local master 4. see used resources # condor_userprio -l LastUpdate = 1284454125 Name1 = "condor_user@hostname" Priority1 = 0.500000 ResourcesUsed1 = 50 WeightedResourcesUsed1 = 50.000000 AccumulatedUsage1 = 0.000000 WeightedAccumulatedUsage1 = 0.000000 BeginUsageTime1 = 0 LastUsageTime1 = 0 PriorityFactor1 = 1.000000 Name2 = "group1@hostname" Priority2 = 0.500000 ResourcesUsed2 = 50 WeightedResourcesUsed2 = 50.000000 AccumulatedUsage2 = 0.000000 WeightedAccumulatedUsage2 = 0.000000 BeginUsageTime2 = 0 LastUsageTime2 = 0 PriorityFactor2 = 1.000000 NumSubmittors = 2 As I understood from previous comments group1 should obtain 9% of unused slots in first round, whilst no-group user should get only 1%, incrementally group1 should have 90% of unused slots and no-group user only 10%. But in this scenario both of them obtain equal number of slots. The same behaviour can be observed with more than two groups, in all cases the number of slots is distributed equally to all groups. Is this expected? If not please please move this bug to assigned. Hi The accountinggroup for group1 is not specified correctly. Correct format is group[.subgroup].username "group1" mean you are submitting to the nongroup pool because there is no ".". In this case,50% went to one nongroup user and 50% went to the other. That is expected with accountinggroup specified as it is. Tested with (version):
condor-7.4.4-0.9
Tested on:
RHEL4 i386,x86_64 - passed
RHEL5 i386,x86_64 - passed
>>> VERIFIED
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Previously, when using group qoutas and the 'autoregroup' function, the group quotas did not grow relative to their initial proportions. This was caused by the fact that all users (including group users) were negotiated at the same time using prio normalization based on all user prios. With this update, group quotas scale into unused slots appropriately. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0773.html |