Description of problem: I have this pool: 1st node(A=32bit RHEL5.9): CM, Scheduler, Execute node Other nodes(B=64bit RHEL5.9, C=32bit RHEL5.9): Execute node only I submit job(x.sub): universe=vanilla cmd=/bin/pwd output=out$(PROCESS).txt transfer_executable=false should_transfer_files=if_needed when_to_transfer_output=on_exit requirements=(FileSystemDomain =!= UNDEFINED && Arch =!= UNDEFINED) queue from node B to Scheduler on node A by command: $ condor_submit -remote A x.sub My goal is to submit job remotely from Execute node only to CM+Scheduler without transfering executable(because /bin/pwd is on all machines) and transfer std* if needed. Jobs work OK on nodes A and C but it doesn't work on node B. There is error in StarterLog: 05/24/13 14:30:19 Starting a VANILLA universe job with ID: 71.9 05/24/13 14:30:19 IWD: /var/lib/condor/spool/71/9/cluster71.proc9.subproc0 05/24/13 14:30:19 Failed to open '/var/lib/condor/spool/71/9/cluster71.proc9.subproc0/out9.txt' as standard output: No such file or directory (errno 2) 05/24/13 14:30:19 Failed to open some/all of the std files... 05/24/13 14:30:19 Aborting OsProc::StartJob. 05/24/13 14:30:19 Failed to start job, exiting 05/24/13 14:30:19 ShutdownFast all jobs. 05/24/13 14:30:19 condor_read() failed: recv(fd=6) returned -1, errno = 104 Connection reset by peer, reading 5 bytes from <ip:33251>. 05/24/13 14:30:19 IO: Failed to read packet header If I put RemoteIwd to job classads, it works also on node B. I think it should work also on node B WITHOUT RemoteIwd because Starter can detect there that there is no SPOOL on that machine because it is Execute node only. Note: I've tried to set SSL authentication in pool to have same authenticated users on all machines. I've also tried to set same UidDomain for all nodes in pool. None of these experiments didn't help to solve problem. Version-Release number of selected component (if applicable): condor-7.8.8-0.4.1.el5 condor-aviary-7.8.8-0.4.1.el5 condor-classads-7.8.8-0.4.1.el5 condor-job-hooks-1.5-6.el5 condor-low-latency-1.2-3.el5 condor-qmf-7.8.8-0.4.1.el5 condor-wallaby-base-db-1.25-1.el5 condor-wallaby-client-5.0.5-2.el5 condor-wallaby-tools-5.0.5-2.el5 python-condorutils-1.5-6.el5 python-qpid-0.18-4.el5 python-qpid-qmf-0.18-15.el5 python-wallabyclient-5.0.5-2.el5 qpid-cpp-client-0.18-14.el5 qpid-cpp-client-devel-0.18-14.el5 qpid-cpp-server-0.18-14.el5 qpid-qmf-0.18-15.el5 qpid-qmf-devel-0.18-15.el5 qpid-tools-0.18-8.el5 ruby-condor-wallaby-5.0.5-2.el5 ruby-qpid-qmf-0.18-15.el5 ruby-wallaby-0.16.3-1.el5 wallaby-0.16.3-1.el5 wallaby-utils-0.16.3-1.el5 How reproducible: 100% Actual results: Jobs don't run on node B. Expected results: Jobs will run on all nodes in pool with job classads described above.
MRG-Grid is in maintenance and only customer escalations will be considered. This issue can be reopened if a customer escalation associated with it occurs.