Description of problem: If you submit some job as user in subgroup condor_userprio -getreslist ends with: failed to get classad from negotiator Version-Release number of selected component (if applicable): condor-7.6.1-0.10 How reproducible: 100% Steps to Reproduce: 1. configure condor GROUP_QUOTA_DYNAMIC_group1.subgroup1=0.8 GROUP_QUOTA_DYNAMIC_group1=0.5 GROUP_NAMES=group1.subgroup1,group1 NUM_CPUS=10 2. submit a new jobs # echo -e "cmd=/bin/sleep\nargs=20\n+AccountingGroup=\"subgroup1.group1.user1\"\nqueue 100" | runuser -s /bin/bash -l condor -c "/usr/bin/condor_submit" Submitting job(s).............. 100 job(s) submitted to cluster 1. # echo -e "cmd=/bin/sleep\nargs=20\n+AccountingGroup=\"group1.user1\"\nqueue 100" | runuser -s /bin/bash -l condor -c "/usr/bin/condor_submit" Submitting job(s).............. 100 job(s) submitted to cluster 2. 3. try tu run 'condor_userprio -getreslist' # condor_userprio -getreslist group1.user1@`hostname` Resource Name Start Time Match Time ------------- ---------- ---------- Resource.slot3@hostname 6/20 16:24 0+00:00:33 Resource.slot2@hostname 6/20 16:24 0+00:00:35 Resource.slot1@hostname 6/20 16:24 0+00:00:35 ------------- ------------ ------------ Number of Resources Used: 3 # condor_userprio -getreslist subgroup1.group1.user1@`hostname` failed to get classad from negotiator Actual results: condor_userprio -getreslist fails on any user from subgroub Expected results: condor_userprio -getreslist works on all subgroups users Additional info:
(In reply to comment #0) > # echo -e > "cmd=/bin/sleep\nargs=20\n+AccountingGroup=\"subgroup1.group1.user1\"\nqueue > 100" | runuser -s /bin/bash -l condor -c "/usr/bin/condor_submit" I think the problem from the original repro may be with using the group name "subgroup1.group1" which does not exist, instead of "group1.subgroup1" If I submit to "group1" and "group1.subgroup1" then I get the following results (which seem correct to me): # "group1.user1" is using 1 slot, which matches quota config: [eje@rorschach ~]$ condor_userprio -getreslist group1.user1@localdomain Resource Name Start Time Match Time ------------- ---------- ---------- Resource.slot1 9/8 08:39 0+00:00:07 ------------- ------------ ------------ Number of Resources Used: 1 # "group1.subgroup1.user1" using 4 slots: [eje@rorschach ~]$ condor_userprio -getreslist group1.subgroup1.user1@localdomain Resource Name Start Time Match Time ------------- ---------- ---------- Resource.slot8 9/8 08:39 0+00:00:30 Resource.slot4 9/8 08:39 0+00:00:30 Resource.slot3 9/8 08:39 0+00:00:31 Resource.slot2 9/8 08:39 0+00:00:31 ------------- ------------ ------------ Number of Resources Used: 4 # "subgroup1.group1.user1" does not exist, and shows no resources: # (this from the original description) [eje@rorschach ~]$ condor_userprio -getreslist subgroup1.group1.user1@localdomain Resource Name Start Time Match Time ------------- ---------- ---------- ------------- ------------ ------------ Number of Resources Used: 0
(In reply to comment #1) > (In reply to comment #0) > I re-test it and find out that the problem is in long string not in subgroup. Also # condor_userprio -getreslist 12345678901234567890@`hostname` doesn't work for me. # condor_userprio -getreslist 12345678901234567890@`hostname` failed to get classad from negotiator
Upstream: https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2445
REPRO/TEST Consider the following configuration with a very long group name designed to trigger the hard limit on submitter name length: NEGOTIATOR_DEBUG = D_FULLDEBUG NEGOTIATOR_INTERVAL = 30 SCHEDD_INTERVAL = 15 NEGOTIATOR_USE_SLOT_WEIGHTS = FALSE GROUP_QUOTA_MAX_ALLOCATION_ROUNDS = 1 NUM_CPUS = 10 GROUP_NAMES = this_egregiously_long_group_name_will_exceed_sixty_four_character_limit GROUP_QUOTA_DYNAMIC_this_egregiously_long_group_name_will_exceed_sixty_four_character_limit = 1.0 The accountant itself can handle arbitrary submitter lengths internally. Using this submission file will succeed: universe = vanilla cmd = /bin/sleep args = 6000 should_transfer_files = if_needed when_to_transfer_output = on_exit +AccountingGroup="this_egregiously_long_group_name_will_exceed_sixty_four_character_limit.user" queue 1 Submit the above job, and you can see it in the accountant: $ condor_userprio -l | grep -e '^Name' Name1 = "this_egregiously_long_group_name_will_exceed_sixty_four_character_limit" Name2 = "<none>" Name3 = "this_egregiously_long_group_name_will_exceed_sixty_four_character_limit.user@localdomain" There are eight condor_userprio commands that involve passing the submitter name. If you invoke them after submitting the job above, like so: $ condor_userprio -setprio this_egregiously_long_group_name_will_exceed_sixty_four_character_limit.user@localdomain 2 $ condor_userprio -setfactor this_egregiously_long_group_name_will_exceed_sixty_four_character_limit.user@localdomain 2 $ condor_userprio -setaccum this_egregiously_long_group_name_will_exceed_sixty_four_character_limit.user@localdomain 2 $ condor_userprio -setbegin this_egregiously_long_group_name_will_exceed_sixty_four_character_limit.user@localdomain 2 $ condor_userprio -setlast this_egregiously_long_group_name_will_exceed_sixty_four_character_limit.user@localdomain 2 $ condor_userprio -resetusage this_egregiously_long_group_name_will_exceed_sixty_four_character_limit.user@localdomain $ condor_userprio -getreslist this_egregiously_long_group_name_will_exceed_sixty_four_character_limit.user@localdomain $ condor_userprio -delete this_egregiously_long_group_name_will_exceed_sixty_four_character_limit.user@localdomain Before the fix, you will see the following errors in the Negotiator Log: (note the error messages are somewhat misleading as well). Also, the operations themselves will not be performed, which you can see by inspecting the accountant with condor_userprio -all $ tail -f NegotiatorLog | grep 'Could not read' 09/08/11 12:07:55 Could not read schedd name and priority 09/08/11 12:08:38 Could not read schedd name and priority 09/08/11 12:10:52 Could not read schedd name and accumulatedUsage 09/08/11 12:12:23 Could not read schedd name and begin usage time 09/08/11 12:12:39 Could not read schedd name and last usage time 09/08/11 12:13:07 Could not read schedd name 09/08/11 12:13:16 Could not read schedd name 09/08/11 12:13:43 Could not read accountant record name $ condor_userprio -all Last Priority Update: 9/8 12:29 Effective Real Priority Res Total Usage Usage Last User Name Priority Priority Factor Used (wghted-hrs) Start Time Usage Time ------------------------------ --------- -------- ------------ ---- ----------- ---------------- ---------------- this_egregiously_long_group_na 0.51 0.51 1.00 1 0.43 9/08/2011 12:04 9/08/2011 12:29 this_egregiously_long_group_na 0.51 0.51 1.00 1 0.43 9/08/2011 12:04 9/08/2011 12:29 ------------------------------ --------- -------- ------------ ---- ----------- ---------------- ---------------- Number of users: 1 1 0.43 9/08/2011 12:04 9/07/2011 12:30 After the fix, you should see that there are no log messages, and the condor_userprio operations are performed. $ tail -f NegotiatorLog | grep 'Could not read' After the first five commands, (-setprio | -setfactor | -setaccum | -setbegin | -setlast) you should see this (note -setlast gets updated) $ condor_userprio -all Last Priority Update: 9/8 12:37 Effective Real Priority Res Total Usage Usage Last User Name Priority Priority Factor Used (wghted-hrs) Start Time Usage Time ------------------------------ --------- -------- ------------ ---- ----------- ---------------- ---------------- this_egregiously_long_group_na 0.50 0.50 1.00 1 0.03 9/08/2011 12:35 9/08/2011 12:37 this_egregiously_long_group_na 4.00 2.00 2.00 1 0.02 12/31/1969 17:00 9/08/2011 12:37 ------------------------------ --------- -------- ------------ ---- ----------- ---------------- ---------------- Number of users: 1 1 0.02 12/31/1969 17:00 9/07/2011 12:37 invoking -resetusage will set the "total usage" column back to zero: $ condor_userprio -all Last Priority Update: 9/8 12:41 Effective Real Priority Res Total Usage Usage Last User Name Priority Priority Factor Used (wghted-hrs) Start Time Usage Time ------------------------------ --------- -------- ------------ ---- ----------- ---------------- ---------------- this_egregiously_long_group_na 0.50 0.50 1.00 1 0.10 9/08/2011 12:35 9/08/2011 12:41 this_egregiously_long_group_na 4.00 2.00 2.00 1 0.00 9/08/2011 12:41 9/08/2011 12:41 ------------------------------ --------- -------- ------------ ---- ----------- ---------------- ---------------- Number of users: 1 1 0.00 9/08/2011 12:41 9/07/2011 12:41 Invoking -getreslist will give the expected slot: $ condor_userprio -getreslist this_egregiously_long_group_name_will_exceed_sixty_four_character_limit.user@localdomain Resource Name Start Time Match Time ------------- ---------- ---------- Resource.slot1 9/8 12:35 0+00:07:14 ------------- ------------ ------------ Number of Resources Used: 1 remove the job, and invoke condor_userprio -delete, and submitter is correctly deleted: $ condor_rm -all All jobs marked for removal. $ condor_userprio -delete this_egregiously_long_group_name_will_exceed_sixty_four_character_limit.user@localdomain The accountant record named this_egregiously_long_group_name_will_exceed_sixty_four_character_limit.user@localdomain was deleted $ condor_userprio -all Last Priority Update: 9/8 12:44 Effective Real Priority Res Total Usage Usage Last User Name Priority Priority Factor Used (wghted-hrs) Start Time Usage Time ------------------------------ --------- -------- ------------ ---- ----------- ---------------- ---------------- this_egregiously_long_group_na 0.50 0.50 1.00 1 0.14 9/08/2011 12:35 9/08/2011 12:44 ------------------------------ --------- -------- ------------ ---- ----------- ---------------- ---------------- Number of users: 0 0 0.00 12/31/1969 17:00 9/07/2011 12:44
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: Negotiator callback functions for condor_userprio used char[64] for reading submitter names from commands. Consequence: Any attempt to invoke a condor_userprio command with a submitter name longer than 63 chars would result in the name being truncated internally, causing command failure. Fix: Callback functions were updated to properly handle names of arbitrary length. Result: Commands from condor_userprio can now be used with submitter names of any length.
Reproduced on RHEL 5.7 i386: # rpm -qa | grep condor condor-7.6.1-0.10.el5 condor-classads-7.6.1-0.10.el5 # cat /tmp/bz714724.job universe = vanilla cmd = /bin/sleep args = 6000 should_transfer_files = if_needed when_to_transfer_output = on_exit +AccountingGroup="this_egregiously_long_group_name_will_exceed_sixty_four_character_limit.user" queue 1 # runuser -s /bin/bash -l condor -c "condor_submit /tmp/bz714724.job" Submitting job(s). 1 job(s) submitted to cluster 3. # condor_userprio -l | grep -e '^Name' Name1 = "this_egregiously_long_group_name_will_exceed_sixty_four_character_limit" Name2 = "this_egregiously_long_group_name_will_exceed_sixty_four_character_limit.user@<hostname>" # condor_userprio -setprio this_egregiously_long_group_name_will_exceed_sixty_four_character_limit.user@$(hostname) 2 The priority of this_egregiously_long_group_name_will_exceed_sixty_four_character_limit.user@<hostname> was set to 2.000000 # condor_userprio -setfactor this_egregiously_long_group_name_will_exceed_sixty_four_character_limit.user@$(hostname) 2 The priority factor of this_egregiously_long_group_name_will_exceed_sixty_four_character_limit.user@<hostname> was set to 2.000000 # condor_userprio -setaccum this_egregiously_long_group_name_will_exceed_sixty_four_character_limit.user@$(hostname) 2 The Accumulated Usage of this_egregiously_long_group_name_will_exceed_sixty_four_character_limit.user@<hostname> was set to 2.000000 # condor_userprio -setbegin this_egregiously_long_group_name_will_exceed_sixty_four_character_limit.user@$(hostname) 2 The Begin Usage Time of this_egregiously_long_group_name_will_exceed_sixty_four_character_limit.user@<hostname> was set to 2 # condor_userprio -setlast this_egregiously_long_group_name_will_exceed_sixty_four_character_limit.user@$(hostname) 2 The Last Usage Time of this_egregiously_long_group_name_will_exceed_sixty_four_character_limit.user@<hostname> was set to 2 # condor_userprio -all Last Priority Update: 10/12 17:58 Effective Real Priority Res Total Usage Usage Last User Name Priority Priority Factor Used (wghted-hrs) Start Time Usage Time ------------------------------ --------- -------- ------------ ---- ----------- ---------------- ---------------- this_egregiously_long_group_na 0.50 0.50 1.00 1 0.08 10/12/2011 17:30 10/12/2011 17:58 this_egregiously_long_group_na 0.50 0.50 1.00 1 0.02 10/12/2011 17:57 10/12/2011 17:58 ------------------------------ --------- -------- ------------ ---- ----------- ---------------- ---------------- Number of users: 1 1 0.02 10/12/2011 17:57 10/11/2011 17:58 # condor_userprio -resetusage this_egregiously_long_group_name_will_exceed_sixty_four_character_limit.user@$(hostname) The accumulated usage of this_egregiously_long_group_name_will_exceed_sixty_four_character_limit.user@<hostname> was reset # condor_userprio -all Last Priority Update: 10/12 17:58 Effective Real Priority Res Total Usage Usage Last User Name Priority Priority Factor Used (wghted-hrs) Start Time Usage Time ------------------------------ --------- -------- ------------ ---- ----------- ---------------- ---------------- this_egregiously_long_group_na 0.50 0.50 1.00 1 0.08 10/12/2011 17:30 10/12/2011 17:58 this_egregiously_long_group_na 0.50 0.50 1.00 1 0.03 10/12/2011 17:57 10/12/2011 17:58 ------------------------------ --------- -------- ------------ ---- ----------- ---------------- ---------------- Number of users: 1 1 0.03 10/12/2011 17:57 10/11/2011 17:59 # condor_userprio -getreslist this_egregiously_long_group_name_will_exceed_sixty_four_character_limit.user@$(hostname) failed to get classad from negotiator # condor_rm -all All jobs marked for removal. # condor_userprio -delete this_egregiously_long_group_name_will_exceed_sixty_four_character_limit.user@$(hostname) The accountant record named this_egregiously_long_group_name_will_exceed_sixty_four_character_limit.user@<hostname> was deleted # condor_userprio -all Last Priority Update: 10/12 17:59 Effective Real Priority Res Total Usage Usage Last User Name Priority Priority Factor Used (wghted-hrs) Start Time Usage Time ------------------------------ --------- -------- ------------ ---- ----------- ---------------- ---------------- this_egregiously_long_group_na 0.50 0.50 1.00 0 0.09 10/12/2011 17:30 10/12/2011 17:59 this_egregiously_long_group_na 0.50 0.50 1.00 0 0.03 10/12/2011 17:57 10/12/2011 17:59 ------------------------------ --------- -------- ------------ ---- ----------- ---------------- ---------------- Number of users: 1 0 0.03 10/12/2011 17:57 10/11/2011 17:59 # tail -f /var/log/condor/NegotiatorLog | grep 'Could not read' 10/12/11 17:58:39 Could not read schedd name and priority 10/12/11 17:58:43 Could not read schedd name and priority 10/12/11 17:58:46 Could not read schedd name and accumulatedUsage 10/12/11 17:58:49 Could not read schedd name and begin usage time 10/12/11 17:58:52 Could not read schedd name and last usage time 10/12/11 17:58:59 Could not read schedd name 10/12/11 17:59:09 Could not read schedd name 10/12/11 17:59:21 Could not read accountant record name Verified on RHEL 5.7 i386: # runuser -s /bin/bash -l condor -c "condor_submit /tmp/bz714724.job" Submitting job(s). 1 job(s) submitted to cluster 1. # condor_userprio -l | grep -e '^Name' Name1 = "<none>" Name2 = "this_egregiously_long_group_name_will_exceed_sixty_four_character_limit" Name3 = "this_egregiously_long_group_name_will_exceed_sixty_four_character_limit.user@<hostname>" # condor_userprio -setprio this_egregiously_long_group_name_will_exceed_sixty_four_character_limit.user@$(hostname) 2 The priority of this_egregiously_long_group_name_will_exceed_sixty_four_character_limit.user@<hostname> was set to 2.000000 # condor_userprio -setfactor this_egregiously_long_group_name_will_exceed_sixty_four_character_limit.user@$(hostname) 2 The priority factor of this_egregiously_long_group_name_will_exceed_sixty_four_character_limit.user@<hostname> was set to 2.000000 # condor_userprio -setaccum this_egregiously_long_group_name_will_exceed_sixty_four_character_limit.user@$(hostname) 2 The Accumulated Usage of this_egregiously_long_group_name_will_exceed_sixty_four_character_limit.user@<hostname> was set to 2.000000 # condor_userprio -setbegin this_egregiously_long_group_name_will_exceed_sixty_four_character_limit.user@$(hostname) 2 The Begin Usage Time of this_egregiously_long_group_name_will_exceed_sixty_four_character_limit.user@<hostname> was set to 2 # condor_userprio -setlast this_egregiously_long_group_name_will_exceed_sixty_four_character_limit.user@$(hostname) 2 The Last Usage Time of this_egregiously_long_group_name_will_exceed_sixty_four_character_limit.user@<hostname> was set to 2 # condor_userprio -all Last Priority Update: 10/12 17:30 Effective Real Priority Res Total Usage Usage Last User Name Priority Priority Factor Used (wghted-hrs) Start Time Usage Time ------------------------------ --------- -------- ------------ ---- ----------- ---------------- ---------------- this_egregiously_long_group_na 0.50 0.50 1.00 1 0.01 10/12/2011 17:30 10/12/2011 17:30 this_egregiously_long_group_na 4.00 2.00 2.00 1 0.01 1/01/1970 01:00 10/12/2011 17:30 ------------------------------ --------- -------- ------------ ---- ----------- ---------------- ---------------- Number of users: 1 1 0.01 1/01/1970 01:00 10/11/2011 17:30 # condor_userprio -resetusage this_egregiously_long_group_name_will_exceed_sixty_four_character_limit.user@$(hostname) The accumulated usage of this_egregiously_long_group_name_will_exceed_sixty_four_character_limit.user@<hostname> was reset # condor_userprio -all Last Priority Update: 10/12 17:30 Effective Real Priority Res Total Usage Usage Last User Name Priority Priority Factor Used (wghted-hrs) Start Time Usage Time ------------------------------ --------- -------- ------------ ---- ----------- ---------------- ---------------- this_egregiously_long_group_na 0.50 0.50 1.00 1 0.01 10/12/2011 17:30 10/12/2011 17:30 this_egregiously_long_group_na 4.00 2.00 2.00 1 0.00 10/12/2011 17:30 10/12/2011 17:30 ------------------------------ --------- -------- ------------ ---- ----------- ---------------- ---------------- Number of users: 1 1 0.00 10/12/2011 17:30 10/11/2011 17:30 # condor_userprio -getreslist this_egregiously_long_group_name_will_exceed_sixty_four_character_limit.user@$(hostname) Resource Name Start Time Match Time ------------- ---------- ---------- Resource.slot1@<hostname> 10/12 17:29 0+00:00:57 ------------- ------------ ------------ Number of Resources Used: 1 # condor_rm -all All jobs marked for removal. # condor_userprio -delete this_egregiously_long_group_name_will_exceed_sixty_four_character_limit.user@$(hostname) The accountant record named this_egregiously_long_group_name_will_exceed_sixty_four_character_limit.user@<hostname> was deleted # condor_userprio -all Last Priority Update: 10/12 17:30 Effective Real Priority Res Total Usage Usage Last User Name Priority Priority Factor Used (wghted-hrs) Start Time Usage Time ------------------------------ --------- -------- ------------ ---- ----------- ---------------- ---------------- this_egregiously_long_group_na 0.50 0.50 1.00 1 0.02 10/12/2011 17:30 10/12/2011 17:30 ------------------------------ --------- -------- ------------ ---- ----------- ---------------- ---------------- Number of users: 0 0 0.00 1/01/1970 01:00 10/11/2011 17:30 # tail -f /var/log/condor/NegotiatorLog | grep 'Could not read' <<< empty >>> Output on platform RHEL 5.7 x86_64, RHEL 6.1 i386, RHEL 6.1 x86_64 is similar. >>> VERIFIED
Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,11 +1 @@ -Cause: +Prior to this update, the Negotiator callback functions for the condor_userprio utility were limited to read only first 63 characters of submitter names from commands. As a consequence, any attempt to invoke a condor_userprio command with a submitter name longer than 63 characters resulted in the name being truncated internally, causing the command to fail. Now, the callback functions have been update to properly handle names of arbitrary length, thus fixing this bug.-Negotiator callback functions for condor_userprio used char[64] for reading submitter names from commands. - -Consequence: -Any attempt to invoke a condor_userprio command with a submitter name longer than 63 chars would result in the name being truncated internally, causing command failure. - -Fix: -Callback functions were updated to properly handle names of arbitrary length. - -Result: -Commands from condor_userprio can now be used with submitter names of any length.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2012-0045.html