Description of problem: When user submit jobs with one group and before they finish another user submit jobs with another group in some cases the first user still claims slots even when there are no jobs to run from that user. Version-Release number of selected component (if applicable): condor-7.4.4-0.14 How reproducible: 80% Steps to Reproduce: 1. set configuration NUM_CPUS = 100 GROUP_NAMES = A1, A2 GROUP_QUOTA_DYNAMIC_A1 = 0.09 GROUP_QUOTA_DYNAMIC_A2 = 0.01 GROUP_AUTOREGROUP_A1 = TRUE GROUP_AUTOREGROUP_A2 = TRUE 2. Submit 1000 short jobs with group A2 # su condor_user -c 'echo -e "cmd=/bin/sleep\nargs=2\n+AccountingGroup = \"A2.user\"\nqueue 1000" | condor_submit' Submitting job(s).................... 1000 job(s) submitted to cluster 2. 3. wait some time, until there are about 200 jobs to run # condor_q | grep jobs 221 jobs; 121 idle, 100 running, 0 held 4. Submit 100 long jobs with group A1 # su condor_user -c 'echo -e "cmd=/bin/sleep\nargs=1d\n+AccountingGroup = \"A1.user\"\nqueue 100" | condor_submit' Submitting job(s).................... 100 job(s) submitted to cluster 3. 5. wait until all jobs with A2 group finishes, see 'condor_userprio -l', 'condor_status' and 'condor_q' # condor_q -- Submitter: hostname : <ip_address:43186> : hostname ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 3.0 condor_user 9/24 08:01 0+00:19:32 R 0 3.9 sleep 1d ... 100 jobs; 43 idle, 57 running, 0 held # condor_status Name OpSys Arch State Activity LoadAv Mem ActvtyTime slot100@hostname LINUX X86_64 Claimed Busy 0.000 20 0+00:07:03 slot10@hostname LINUX X86_64 Claimed Idle 0.000 20 0+00:07:29 .... Machines Owner Claimed Unclaimed Matched Preempting X86_64/LINUX 100 0 100 0 0 0 Total 100 0 100 0 0 0 # condor_userprio -l LastUpdate = 1285330225 Name1 = "A2.user.eng.bos.redhat.com" Priority1 = 0.986168 ResourcesUsed1 = 43 ... Name2 = "A1" Priority2 = 0.718069 ResourcesUsed2 = 57 ... Name3 = "A2" Priority3 = 0.986168 ResourcesUsed3 = 43 ... Name4 = "A1.user.eng.bos.redhat.com" Priority4 = 0.718069 ResourcesUsed4 = 57 ... NumSubmittors = 4 6. after minute check it again, the results are still the same 7. run 'condor_rm -all' # condor_rm -all All jobs marked for removal. 8. see 'condor_status' # condor_status Name OpSys Arch State Activity LoadAv Mem ActvtyTime slot100@hostname LINUX X86_64 Claimed Busy 0.000 20 0+00:07:03 slot10@hostname LINUX X86_64 Claimed Idle 0.000 20 0+00:07:29 .... Machines Owner Claimed Unclaimed Matched Preempting X86_64/LINUX 100 0 43 57 0 0 Total 100 0 43 57 0 0 Actual results: some slots remained claimed by A2.user after some time they go to unclaimed state (5-30 minutes) Expected results: slots should be released when there is no job to run from user who claims them Additional info: in MatchLog can be found something like that: Rejected 3.62 A1.user@hostname <ip_address:46914>: insufficient priority
Please verify this is still a problem with condor 7.5.6-0.1
Will retest during validation cycle
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: N/A
Tested on: $CondorVersion: 7.6.1 Apr 27 2011 BuildID: RH-7.6.1-0.4.el6 $ $CondorPlatform: I686-RedHat_6.0 $ $CondorVersion: 7.6.1 Apr 27 2011 BuildID: RH-7.6.1-0.4.el6 $ $CondorPlatform: X86_64-RedHat_6.0 $ $CondorVersion: 7.6.1 Apr 27 2011 BuildID: RH-7.6.1-0.4.el5 $ $CondorPlatform: I686-RedHat_5.6 $ $CondorVersion: 7.6.1 Apr 27 2011 BuildID: RH-7.6.1-0.4.el5 $ $CondorPlatform: X86_64-RedHat_5.6 $ >>> VERIFIED
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2011-0889.html