Red Hat Bugzilla – Bug 967010
Failed to open file in SPOOL on Execute node
Last modified: 2016-05-26 15:29:32 EDT
Description of problem:
I have this pool:
1st node(A=32bit RHEL5.9): CM, Scheduler, Execute node
Other nodes(B=64bit RHEL5.9, C=32bit RHEL5.9): Execute node only
I submit job(x.sub):
requirements=(FileSystemDomain =!= UNDEFINED && Arch =!= UNDEFINED)
from node B to Scheduler on node A by command:
$ condor_submit -remote A x.sub
My goal is to submit job remotely from Execute node only to CM+Scheduler without transfering executable(because /bin/pwd is on all machines) and transfer std* if needed.
Jobs work OK on nodes A and C but it doesn't work on node B.
There is error in StarterLog:
05/24/13 14:30:19 Starting a VANILLA universe job with ID: 71.9
05/24/13 14:30:19 IWD: /var/lib/condor/spool/71/9/cluster71.proc9.subproc0
05/24/13 14:30:19 Failed to open '/var/lib/condor/spool/71/9/cluster71.proc9.subproc0/out9.txt' as standard output: No such file or directory (errno 2)
05/24/13 14:30:19 Failed to open some/all of the std files...
05/24/13 14:30:19 Aborting OsProc::StartJob.
05/24/13 14:30:19 Failed to start job, exiting
05/24/13 14:30:19 ShutdownFast all jobs.
05/24/13 14:30:19 condor_read() failed: recv(fd=6) returned -1, errno = 104 Connection reset by peer, reading 5 bytes from <ip:33251>.
05/24/13 14:30:19 IO: Failed to read packet header
If I put RemoteIwd to job classads, it works also on node B.
I think it should work also on node B WITHOUT RemoteIwd because Starter can detect there that there is no SPOOL on that machine because it is Execute node only.
Note: I've tried to set SSL authentication in pool to have same authenticated users on all machines. I've also tried to set same UidDomain for all nodes in pool. None of these experiments didn't help to solve problem.
Version-Release number of selected component (if applicable):
Jobs don't run on node B.
Jobs will run on all nodes in pool with job classads described above.
MRG-Grid is in maintenance and only customer escalations will be considered. This issue can be reopened if a customer escalation associated with it occurs.