Description of problem: Group user priority change is reset to GROUP_PRIO_FACTOR_<GROUP> by condor if the changed value is equal to DEFAULT_PRIO_FACTOR value. Version-Release number of selected component (if applicable): condor-7.4.5-0.8 How reproducible: 100% Steps to Reproduce: 1. set (the line 'DEFAULT_PRIO_FACTOR = 1' is not needed, cause that's the default setting) NUM_CPUS = 100 GROUP_NAMES = a1 GROUP_QUOTA_DYNAMIC_a1 = 0.05 GROUP_PRIO_FACTOR_a1 = 4.0 GROUP_AUTOREGROUP_a1 = TRUE 2. start condor 3. set group user priority factor for a1.user2 to 1 # condor_userprio -setfactor a1.user2@`hostname` 1 The priority factor of a1.user2@<hostname> was set to 1.000000 4. run 'condor_userprio -l' # condor_userprio -l LastUpdate = 1297075475 Name1 = "a1.user2@<hostname>" Priority1 = 0.500000 ResourcesUsed1 = 0 WeightedResourcesUsed1 = 0.000000 AccumulatedUsage1 = 0.000000 WeightedAccumulatedUsage1 = 0.000000 BeginUsageTime1 = 0 LastUsageTime1 = 0 PriorityFactor1 = 1.000000 NumSubmittors = 1 5. wait a minute, run 'condor_userprio -l' again # condor_userprio -l LastUpdate = 1297075515 NumSubmittors = 0 6. submit 100 jobs for each A1.user1 and A1.user2 # su condor_user -c 'echo -e "cmd=/bin/sleep\nargs=1d\n+AccountingGroup=\"a1.user1\"\nqueue 100\n+AccountingGroup=\"a1.user2\"\nqueue 100" | condor_submit' Submitting job(s)........................................................................................................................................................................................................ 200 job(s) submitted to cluster 1. 7. wait until all 100 jobs start run 'condor_userprio -l' again # condor_userprio -l LastUpdate = 1297075555 Name1 = "a1" Priority1 = 2.000000 ResourcesUsed1 = 100 ... PriorityFactor1 = 4.000000 Name2 = "a1.user1@<hostname>" Priority2 = 2.000000 ResourcesUsed2 = 50 ... PriorityFactor2 = 4.000000 Name3 = "a1.user2@<hostname>" Priority3 = 2.000000 ResourcesUsed3 = 50 ... PriorityFactor3 = 4.000000 NumSubmittors = 3 Actual results: a1.user1 - resources ... 50 - priority factor ... 4 a1.user2 - resources ... 50 - priority factor ... 4 Expected results: a1.user1 - resources ... 20 - priority factor ... 4 a1.user2 - resources ... 80 - priority factor ... 1 Additional info: see NegotiatorLog in attachment
Created attachment 477407 [details] NegotiatorLog
upstream: https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2442
REPRO/TEST Using the following configuration: NEGOTIATOR_DEBUG = D_FULLDEBUG NEGOTIATOR_INTERVAL = 30 SCHEDD_INTERVAL = 15 NEGOTIATOR_USE_SLOT_WEIGHTS = FALSE GROUP_QUOTA_MAX_ALLOCATION_ROUNDS = 1 NUM_CPUS = 10 GROUP_NAMES = a GROUP_QUOTA_a = 10 GROUP_PRIO_FACTOR_a = 4.0 GROUP_ACCEPT_SURPLUS_a = TRUE To reproduce, set the prio-factor on "a.u1" to be 1, and "a.u2" to be 2. Before the fix, the entry for "a.u1" will be deleted on the next accountant update, because its prio-factor was set to DEFAULT_PRIO_FACTOR (=1): # when the pool starts up, we have only entries for "a" and "<none>" [eje@rorschach ~]$ condor_userprio -l | grep -e '^Name' -e '^PriorityFactor' Name1 = "a" Name2 = "<none>" PriorityFactor1 = 4.000000 PriorityFactor2 = 1.000000 # now set prio-factor for "a.u1" and "a.u2" [eje@rorschach ~]$ condor_userprio -setfactor a.u1@localdomain 1 The priority factor of a.u1@localdomain was set to 1.000000 [eje@rorschach ~]$ condor_userprio -setfactor a.u2@localdomain 2 The priority factor of a.u2@localdomain was set to 2.000000 # immediately after setting, we see entries for "a.u1" and "a.u2" [eje@rorschach ~]$ condor_userprio -l | grep -e '^Name' -e '^PriorityFactor' Name1 = "a" Name2 = "<none>" Name3 = "a.u2@localdomain" Name4 = "a.u1@localdomain" PriorityFactor1 = 4.000000 PriorityFactor2 = 1.000000 PriorityFactor3 = 2.000000 PriorityFactor4 = 1.000000 # next time the accountant updates, the entry for "a.u1" is erased, because its prio factor happened to be equal to DEFAULT_PRIO_FACTOR. "a.u2" still exists. [eje@rorschach ~]$ condor_userprio -l | grep -e '^Name' -e '^PriorityFactor' Name1 = "a" Name2 = "<none>" Name3 = "a.u2@localdomain" PriorityFactor1 = 4.000000 PriorityFactor2 = 1.000000 PriorityFactor3 = 2.000000 After the fix, we see that the entry for "a.u1" does not get removed, which is the correct behavior # accountant entries before any submitters: [eje@rorschach ~]$ condor_userprio -l | grep -e '^Name' -e '^PriorityFactor' Name1 = "a" Name2 = "<none>" PriorityFactor1 = 4.000000 PriorityFactor2 = 1.000000 # set prio factor for "a.u1" and "a.u2" as before: [eje@rorschach ~]$ condor_userprio -setfactor a.u1@localdomain 1 The priority factor of a.u1@localdomain was set to 1.000000 [eje@rorschach ~]$ condor_userprio -setfactor a.u2@localdomain 2 The priority factor of a.u2@localdomain was set to 2.000000 # entries immediately after setting prio factors: [eje@rorschach ~]$ condor_userprio -l | grep -e '^Name' -e '^PriorityFactor' Name1 = "a" Name2 = "<none>" Name3 = "a.u2@localdomain" Name4 = "a.u1@localdomain" PriorityFactor1 = 4.000000 PriorityFactor2 = 1.000000 PriorityFactor3 = 2.000000 PriorityFactor4 = 1.000000 # Now we see that both "a.u1" and "a.u2" still exist after update, as expected: [eje@rorschach ~]$ condor_userprio -l | grep -e '^Name' -e '^PriorityFactor' Name1 = "a" Name2 = "<none>" Name3 = "a.u2@localdomain" Name4 = "a.u1@localdomain" PriorityFactor1 = 4.000000 PriorityFactor2 = 1.000000 PriorityFactor3 = 2.000000 PriorityFactor4 = 1.000000
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: A logic error in the check that determines whether a submitter record can be safely deleted from the Accountant. The logic checked for priority factor equal to default priority factor DEFAULT_PRIO_FACTOR, which allowed record deletion if the submitter's factor happened to be set to the default. Consequence: A submitter record with an explicity set priority factor could be deleted in appropriately if that factor was set to DEFAULT_PRIO_FACTOR. Fix: The deletion checking logic was corrected so that it detected any explicitly set value for priority factor, default or otherwise. Result: Submitter records are no longer improperly deleted if user-set priority factor happens to be equal to DEFAULT_PRIO_FACTOR.
Reproduced on RHEL 5.7 i386 along to #10: # rpm -qa | grep condor condor-classads-7.6.3-0.3.el5 condor-7.6.3-0.3.el5 Configuration: NEGOTIATOR_DEBUG = D_FULLDEBUG NEGOTIATOR_INTERVAL = 30 SCHEDD_INTERVAL = 15 NEGOTIATOR_USE_SLOT_WEIGHTS = FALSE GROUP_QUOTA_MAX_ALLOCATION_ROUNDS = 1 NUM_CPUS = 10 GROUP_NAMES = a GROUP_QUOTA_a = 10 GROUP_PRIO_FACTOR_a = 4.0 GROUP_ACCEPT_SURPLUS_a = TRUE # condor_userprio -l | grep -e '^Name' -e '^PriorityFactor' Name1 = "a" Name2 = "<none>" PriorityFactor1 = 4.000000 PriorityFactor2 = 1.000000 # condor_userprio -setfactor a.u1@$(hostname) 1 The priority factor of a.u1@<hostname> was set to 1.000000 # condor_userprio -setfactor a.u2@$(hostname) 2 The priority factor of a.u2@<hostname> was set to 2.000000 # condor_userprio -l | grep -e '^Name' -e '^PriorityFactor' Name1 = "a.u1@<hostname>" Name2 = "a" Name3 = "<none>" Name4 = "a.u2@<hostname>" PriorityFactor1 = 1.000000 PriorityFactor2 = 4.000000 PriorityFactor3 = 1.000000 PriorityFactor4 = 2.000000 # condor_userprio -l | grep -e '^Name' -e '^PriorityFactor' Name1 = "a" Name2 = "<none>" Name3 = "a.u2@<hostname>" PriorityFactor1 = 4.000000 PriorityFactor2 = 1.000000 PriorityFactor3 = 2.000000 Verified on RHEL 5.7 i386 along to #10: # rpm -qa | grep condor condor-7.6.4-0.7.el5 condor-classads-7.6.4-0.7.el5 # condor_userprio -l | grep -e '^Name' -e '^PriorityFactor' Name1 = "<none>" Name2 = "a" PriorityFactor1 = 1.000000 PriorityFactor2 = 4.000000 # condor_userprio -setfactor a.u1@$(hostname) 1 The priority factor of a.u1@<hostname> was set to 1.000000 # condor_userprio -setfactor a.u2@$(hostname) 2 The priority factor of a.u2@<hostname> was set to 2.000000 # condor_userprio -l | grep -e '^Name' -e '^PriorityFactor' Name1 = "<none>" Name2 = "a" Name3 = "a.u1@<hostname>" Name4 = "a.u2@<hostname>" PriorityFactor1 = 1.000000 PriorityFactor2 = 4.000000 PriorityFactor3 = 1.000000 PriorityFactor4 = 2.000000 # condor_userprio -l | grep -e '^Name' -e '^PriorityFactor' Name1 = "<none>" Name2 = "a" Name3 = "a.u1@<hostname>" Name4 = "a.u2@<hostname>" PriorityFactor1 = 1.000000 PriorityFactor2 = 4.000000 PriorityFactor3 = 1.000000 PriorityFactor4 = 2.000000 Output on platforms RHEL 5.7 x86_64, RHEL 6.1 i386 and RHEL 6.1 x86_64 is similar. Along to #0 reproduced on RHEL 5.7 i386 and verified on RHEL 5.7 i386, RHEL 5.7 x86_64, RHEL 6.1 i386 and RHEL 6.1 x86_64 => expected result. >>> VERIFIED
Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,11 +1 @@ -Cause: +The check that determined whether a submitter record could safely be deleted from the Accountant contained a logic error. The logic checked for priority factor equal to default priority factor DEFAULT_PRIO_FACTOR, which allowed record deletion if the submitter's factor happened to be set to the default. The deletion-checking logic has been corrected in these updated packages so that any explicitly-set value for priority factor, default or otherwise, is now detected. As a result, sumbitter records are no longer improperly deleted if the user-set priority factor happens to be equal to the DEFAULT_PRIO_FACTOR value.-A logic error in the check that determines whether a submitter record can be safely deleted from the Accountant. The logic checked for priority factor equal to default priority factor DEFAULT_PRIO_FACTOR, which allowed record deletion if the submitter's factor happened to be set to the default. - -Consequence: -A submitter record with an explicity set priority factor could be deleted in appropriately if that factor was set to DEFAULT_PRIO_FACTOR. - -Fix: -The deletion checking logic was corrected so that it detected any explicitly set value for priority factor, default or otherwise. - -Result: -Submitter records are no longer improperly deleted if user-set priority factor happens to be equal to DEFAULT_PRIO_FACTOR.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2012-0045.html