Created attachment 423823 [details] log files and condor_config.local Description of problem: I've tried to submit 4,000 Windows jobs: for i in `seq 4`; do su xxx -c 'condor_submit /root/wait.bat.sub' || service condor stop || killall condor_schedd;sleep 30;done $ cat wait.bat.sub: universe = vanilla executable = /root/wait.bat arguments = 1 requirements = ( Arch=="Intel") && ( OpSys=="WINNT51" || OpSys=="WINNT52" ) should_transfer_files = YES when_to_transfer_output = ON_EXIT iwd = /tmp queue 1000 $ cat wait.bat: @ping 127.0.0.1 -n %1% -w 1000 > nul And scheduler has crashed after submitting first 1,000 jobs. I've set up full debug, so after condor_submit exit with return code > 0, I've stop condor service and then clean schedd process by "killall condor_schedd". Version-Release number of selected component (if applicable): condor-7.4.3-0.17.el5 How reproducible: 100% Steps to Reproduce: 1. set up condor pool: CM - RHEL 5.5beta + execute windows node 2. try to submit 1000 Windows jobs 3. wait for crash Actual results: Scheduler has crashed. Expected results: Scheduler doesn't crash. Additional info: $ cat ScheddLog: 06/11 15:38:51 (pid:8361) OwnerCheck retval 1 (success),no ad 06/11 15:38:51 (pid:8361) OwnerCheck retval 1 (success),no ad 06/11 15:38:51 (pid:8361) OwnerCheck retval 1 (success),no ad 06/11 15:38:51 (pid:8361) OwnerCheck retval 1 (success),no ad Stack dump for process 8361 at timestamp 1276285165 (0 frames) condor_config.local = Personal Condor(default settings) and: ALLOW_WRITE=* ALLOW_READ=* CREATE_CORE_FILES = True ABORT_ON_EXCEPTION = True ALL_DEBUG = D_FULLDEBUG
I've retested this 3 times with condor-7.4.3-0.20.el5 with for i in `seq 100`; do su xxx -c 'condor_submit /root/wait.bat.sub' || service condor stop || killall condor_schedd;sleep 30;condor_rm -all;sleep 10;done; And I don't see any Stack dump. It should be retested for all architectures and OSes for verifying.
I've retested this as in comment #3 on RHEL 5.5/4.8 x i386/x86_64 with condor-7.4.4-0.8 and it works. --> VERIFIED